LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 08-24-2010, 10:52 AM   #46
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 453Reputation: 453Reputation: 453Reputation: 453Reputation: 453

Quote:
Originally Posted by ghostdog74 View Post
first you tell me why do you need to do that? And why is it related to text parsing at all?
Because this is how normal programming is done. I.e. input files are parsed, and the result of parsing is a data structure.

Then processing is performed on the data structure.

For modularity/extensibility data structures are exported and imported by next consumers in the data processing chain.

I've dealt with huge amounts of data - be it VLSI design, static timing analysis, VLSI verification, ASIC standard library cells characterization, acoustic modeling, whatever - the approach with data structures always works and is the book approach.
 
Old 08-24-2010, 10:57 AM   #47
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 453Reputation: 453Reputation: 453Reputation: 453Reputation: 453
Quote:
Originally Posted by Feynman View Post
Funny you found that disorganized. This is a pretty highly regarded software ...
Windows95/98 was also once considered highly regarded SW.

For SW to be good one needs competition - as everywhere else. I do not think quantum chemistry SW is widely used, so I do not expect competition in the field.

There are well known and highly regarded data formats/approaches used in scientific calculations, for example, HDF: http://www.hdfgroup.org/ .
 
Old 08-24-2010, 11:00 AM   #48
Feynman
Member
 
Registered: Aug 2010
Distribution: Gentoo
Posts: 62

Original Poster
Rep: Reputation: 15
Ok, I will rephrase that "easiest to learn" comment
Which language has commands/functions that are most naturally implemented to perform these tasks. For example:
If awk has a find_the_first_word_after_this_string("Insert string here") command, or
If perl has a grab_text_between_these_two_strings("string1", "string2") command,
then it is quite easy to decent which language is best suited for which task. I am ignoring performance because it seems that no consensus is coming any time soon regarding that. In any case, the fact that two senior members cannot reach a consensus about it means to me that awk and perl have only marginal differences in performance. Hence I place my main priority on implementation.
 
Old 08-24-2010, 11:03 AM   #49
Feynman
Member
 
Registered: Aug 2010
Distribution: Gentoo
Posts: 62

Original Poster
Rep: Reputation: 15
I am having trouble keeping up. Give me a moment to review all the posts. I missed one directly referring to GAMESS with a link to some kind of cookbook.
 
Old 08-24-2010, 11:05 AM   #50
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,696
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
Quote:
Originally Posted by Feynman View Post
Ok, I will rephrase that:
Which language has commands/functions that are most naturally implemented to perform these tasks. For example:
If awk has a find_the_first_word_after_this_string("Insert string here") command, or
If perl has a grab_text_between_these_two_strings("string1", "string2") command,
both have these functions, I have shown you how its done with awk.

Quote:
then it is quite easy to decent which language is best suited for which task. I am ignoring performance because it seems that no consensus is comming any time soon regarding that. In any case, the fact that two senior members cannot reach a consensus about it means to me that awk and perl have only marginal differences in performance. Hence I place my main priority on implementation.
Not true, awk parsing can be fast, if not, faster than Perl/Python/Ruby. And no, I am not disputing the fact that one can use Perl/Python for the job, what i don't agree is the "underlanguage" should not be learned comment.
 
Old 08-24-2010, 11:06 AM   #51
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 453Reputation: 453Reputation: 453Reputation: 453Reputation: 453
Quote:
Originally Posted by Feynman View Post
Ok, I will rephrase that:
Which language has commands/functions that are most naturally implemented to perform these tasks. For example:
If awk has a find_the_first_word_after_this_string("Insert string here") command, or
If perl has a grab_text_between_these_two_strings("string1", "string2") command,
then it is quite easy to decent which language is best suited for which task. I am ignoring performance because it seems that no consensus is comming any time soon regarding that. In any case, the fact that two senior members cannot reach a consensus about it means to me that awk and perl have only marginal differences in performance. Hence I place my main priority on implementation.
I am reiterating what I've said - you seem to be asking a wrong question.

Though Perl can do anything 'awk' can do.

The correct questions emanate from the understanding of the whole data parsing and processing mission. My whole experience tells me that 'awk' is insufficient for this. Or, in other words, relying on tools of limited capability (like 'awk') perpetuates data mess.

Another issue to consider - there are more than 15000 (8368 authors 18244 modules) Perl modules available at http://www.cpan.org/ -> http://search.cpan.org/ .

I.e. pretty much every standard programming task is already implemented in some kind of Perl module.
 
Old 08-24-2010, 11:10 AM   #52
Feynman
Member
 
Registered: Aug 2010
Distribution: Gentoo
Posts: 62

Original Poster
Rep: Reputation: 15
Ok, those perl scripts are indeed the type of thing I am looking for. That is not to say that the previously mentioned awk scripts would not work either. I was going to put these scripts in separate files anyway so the user would be able to invoke them at his/her convenience. Therefore, there is nothing from stopping me from writing one command perl_getafterstring, and another awk_getafterstring. I can test both--although I suspect the performance will vary and average performance of each will be very close. I am guessing this will come down to personal preference and case by case problems.
 
Old 08-24-2010, 11:11 AM   #53
Feynman
Member
 
Registered: Aug 2010
Distribution: Gentoo
Posts: 62

Original Poster
Rep: Reputation: 15
Thank you very much for both of your input. I will try to put both in my software.
 
Old 08-24-2010, 11:18 AM   #54
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,696
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
Quote:
Originally Posted by Sergei Steshenko View Post
My whole experience tells me that 'awk' is insufficient for this. Or, in other words, relying on tools of limited capability (like 'awk') perpetuates data mess.
you obviously do not have enough experience with awk.
please don't cloud the newbie mind with blatant lies. Awk is perfectly sufficient for what he is doing.

Last edited by ghostdog74; 08-24-2010 at 11:19 AM.
 
Old 08-24-2010, 11:20 AM   #55
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 453Reputation: 453Reputation: 453Reputation: 453Reputation: 453
Quote:
Originally Posted by ghostdog74 View Post
please don't cloud the newbie mind with blatant lies. Awk is perfectly sufficient for what he is doing.
If you show me how to export data structures using 'awk' and then to import them back, then 'awk' might be sufficient. Otherwise 'awk' is DOA.
 
Old 08-24-2010, 11:27 AM   #56
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,696
Blog Entries: 5

Rep: Reputation: 241Reputation: 241Reputation: 241
Quote:
Originally Posted by Sergei Steshenko View Post
If you show me how to export data structures using 'awk' and then to import them back, then 'awk' might be sufficient. Otherwise 'awk' is DOA.
why don't you show us how you solve his problem in Perl, and i will show you mine with awk. Then let him decide which one is simpler, more readable and works.

Last edited by ghostdog74; 08-24-2010 at 11:28 AM.
 
Old 08-24-2010, 11:40 AM   #57
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,561

Rep: Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939
Hi Feynman - try to ignore any bickering. May I ask if you are happy to progress on your own now or do you still require help?

I had a look at the file you attached. I am assuming this is only the input data? (I didn't read all of it just skimmed)

If you are still working on a solution that requires help, maybe using the data from this file you could give an example output that satisfies
what you are looking for?

If not required anymore, good luck
 
Old 08-24-2010, 11:42 AM   #58
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 453Reputation: 453Reputation: 453Reputation: 453Reputation: 453
Quote:
Originally Posted by ghostdog74 View Post
why don't you show us how you solve his problem in Perl ...
I have no interest in reconsidering all the circumstances which led to creation of Perl as a replacement for sh/sed/awkk and later transition from Perl 4 to Perl 5 with introduction of references and hierarchical data structures.

Because for me the considerations are obvious.

I do not care that 'awk' can in some case be faster than Perl because in the grand scheme of things (WRT data parsing and consequent data processing) it's not an issue.
 
Old 08-24-2010, 12:02 PM   #59
Feynman
Member
 
Registered: Aug 2010
Distribution: Gentoo
Posts: 62

Original Poster
Rep: Reputation: 15
Thank you grail. Well, with the given awk commands, I have 4/5 task covered. My description of the unsolved task was admittedly vague so I rewrote it in an earlier post. I will work on/copy-past from that cookbook site/get some help with perl programs that do the equivalent. I will try to have both available as separate commands for my program. I assume if I am using more than one cpu and if I have the right software installed that these text searches will be automatically redistributed across my other cpus. I really do not have any experience in parallelization, but I do have access to more than one cpu.
 
Old 08-24-2010, 02:11 PM   #60
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 453Reputation: 453Reputation: 453Reputation: 453Reputation: 453
Quote:
Originally Posted by Feynman View Post
Thank you grail. Well, with the given awk commands, I have 4/5 task covered. ...
So, then what is the output and what are you doing with it ?
 
  


Reply

Tags
data, file, parse, string, text


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
extracting lines from very large data files lothario Linux - Software 1 12-15-2009 09:22 PM
AWK/Perl for extracting data from txt file to numerous other files briana.paige Linux - Newbie 2 05-05-2009 09:53 AM
Extracting ASCII strings from a Binary files poorrej Linux - Newbie 2 10-31-2008 03:38 AM
extracting data from html files into one text file adityavpratap Slackware 9 05-10-2007 10:30 AM
Extracting MySQL data from raw files cs-cam Linux - Software 1 06-12-2006 11:22 PM


All times are GMT -5. The time now is 04:30 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration