[SOLVED] Methods for extracting data strings from output files

Sergei Steshenko · 08-24-2010, 10:52 AM

Quote:

Originally Posted by ghostdog74

first you tell me why do you need to do that? And why is it related to text parsing at all?

Because this is how normal programming is done. I.e. input files are parsed, and the result of parsing is a data structure.

Then processing is performed on the data structure.

For modularity/extensibility data structures are exported and imported by next consumers in the data processing chain.

I've dealt with huge amounts of data - be it VLSI design, static timing analysis, VLSI verification, ASIC standard library cells characterization, acoustic modeling, whatever - the approach with data structures always works and is the book approach.

Sergei Steshenko · 08-24-2010, 10:57 AM

Quote:

Originally Posted by Feynman

Funny you found that disorganized. This is a pretty highly regarded software ...

Windows95/98 was also once considered highly regarded SW.

For SW to be good one needs competition - as everywhere else. I do not think quantum chemistry SW is widely used, so I do not expect competition in the field.

There are well known and highly regarded data formats/approaches used in scientific calculations, for example, HDF: http://www.hdfgroup.org/ .

Feynman · 08-24-2010, 11:00 AM

Ok, I will rephrase that "easiest to learn" comment
Which language has commands/functions that are most naturally implemented to perform these tasks. For example:
If awk has a find_the_first_word_after_this_string("Insert string here") command, or
If perl has a grab_text_between_these_two_strings("string1", "string2") command,
then it is quite easy to decent which language is best suited for which task. I am ignoring performance because it seems that no consensus is coming any time soon regarding that. In any case, the fact that two senior members cannot reach a consensus about it means to me that awk and perl have only marginal differences in performance. Hence I place my main priority on implementation.

Feynman · 08-24-2010, 11:03 AM

I am having trouble keeping up. Give me a moment to review all the posts. I missed one directly referring to GAMESS with a link to some kind of cookbook.

ghostdog74 · 08-24-2010, 11:05 AM

Quote:

Originally Posted by Feynman

Ok, I will rephrase that:
Which language has commands/functions that are most naturally implemented to perform these tasks. For example:
If awk has a find_the_first_word_after_this_string("Insert string here") command, or
If perl has a grab_text_between_these_two_strings("string1", "string2") command,

both have these functions, I have shown you how its done with awk.

Quote:

then it is quite easy to decent which language is best suited for which task. I am ignoring performance because it seems that no consensus is comming any time soon regarding that. In any case, the fact that two senior members cannot reach a consensus about it means to me that awk and perl have only marginal differences in performance. Hence I place my main priority on implementation.

Not true, awk parsing can be fast, if not, faster than Perl/Python/Ruby. And no, I am not disputing the fact that one can use Perl/Python for the job, what i don't agree is the "underlanguage" should not be learned comment.

Sergei Steshenko · 08-24-2010, 11:06 AM

Quote:

Originally Posted by Feynman

Ok, I will rephrase that:
Which language has commands/functions that are most naturally implemented to perform these tasks. For example:
If awk has a find_the_first_word_after_this_string("Insert string here") command, or
If perl has a grab_text_between_these_two_strings("string1", "string2") command,
then it is quite easy to decent which language is best suited for which task. I am ignoring performance because it seems that no consensus is comming any time soon regarding that. In any case, the fact that two senior members cannot reach a consensus about it means to me that awk and perl have only marginal differences in performance. Hence I place my main priority on implementation.

I am reiterating what I've said - you seem to be asking a wrong question.

Though Perl can do anything 'awk' can do.

The correct questions emanate from the understanding of the whole data parsing and processing mission. My whole experience tells me that 'awk' is insufficient for this. Or, in other words, relying on tools of limited capability (like 'awk') perpetuates data mess.

Another issue to consider - there are more than 15000 (8368 authors 18244 modules) Perl modules available at http://www.cpan.org/ -> http://search.cpan.org/ .

I.e. pretty much every standard programming task is already implemented in some kind of Perl module.

Feynman · 08-24-2010, 11:10 AM

Ok, those perl scripts are indeed the type of thing I am looking for. That is not to say that the previously mentioned awk scripts would not work either. I was going to put these scripts in separate files anyway so the user would be able to invoke them at his/her convenience. Therefore, there is nothing from stopping me from writing one command perl_getafterstring, and another awk_getafterstring. I can test both--although I suspect the performance will vary and average performance of each will be very close. I am guessing this will come down to personal preference and case by case problems.

Feynman · 08-24-2010, 11:11 AM

Thank you very much for both of your input. I will try to put both in my software.

ghostdog74 · 08-24-2010, 11:18 AM

Quote:

Originally Posted by Sergei Steshenko

My whole experience tells me that 'awk' is insufficient for this. Or, in other words, relying on tools of limited capability (like 'awk') perpetuates data mess.

you obviously do not have enough experience with awk.
please don't cloud the newbie mind with blatant lies. Awk is perfectly sufficient for what he is doing.

Sergei Steshenko · 08-24-2010, 11:20 AM

Quote:

Originally Posted by ghostdog74

please don't cloud the newbie mind with blatant lies. Awk is perfectly sufficient for what he is doing.

If you show me how to export data structures using 'awk' and then to import them back, then 'awk' might be sufficient. Otherwise 'awk' is DOA.

ghostdog74 · 08-24-2010, 11:27 AM

Quote:

Originally Posted by Sergei Steshenko

If you show me how to export data structures using 'awk' and then to import them back, then 'awk' might be sufficient. Otherwise 'awk' is DOA.

why don't you show us how you solve his problem in Perl, and i will show you mine with awk. Then let him decide which one is simpler, more readable and works.

grail · 08-24-2010, 11:40 AM

Hi Feynman - try to ignore any bickering. May I ask if you are happy to progress on your own now or do you still require help?

I had a look at the file you attached. I am assuming this is only the input data? (I didn't read all of it just skimmed)

If you are still working on a solution that requires help, maybe using the data from this file you could give an example output that satisfies
what you are looking for?

If not required anymore, good luck

Sergei Steshenko · 08-24-2010, 11:42 AM

Quote:

Originally Posted by ghostdog74

why don't you show us how you solve his problem in Perl ...

I have no interest in reconsidering all the circumstances which led to creation of Perl as a replacement for sh/sed/awkk and later transition from Perl 4 to Perl 5 with introduction of references and hierarchical data structures.

Because for me the considerations are obvious.

I do not care that 'awk' can in some case be faster than Perl because in the grand scheme of things (WRT data parsing and consequent data processing) it's not an issue.

Feynman · 08-24-2010, 12:02 PM

Thank you grail. Well, with the given awk commands, I have 4/5 task covered. My description of the unsolved task was admittedly vague so I rewrote it in an earlier post. I will work on/copy-past from that cookbook site/get some help with perl programs that do the equivalent. I will try to have both available as separate commands for my program. I assume if I am using more than one cpu and if I have the right software installed that these text searches will be automatically redistributed across my other cpus. I really do not have any experience in parallelization, but I do have access to more than one cpu.

Sergei Steshenko · 08-24-2010, 02:11 PM

Quote:

Originally Posted by Feynman

Thank you grail. Well, with the given awk commands, I have 4/5 task covered. ...

So, then what is the output and what are you doing with it ?