[SOLVED] Methods for extracting data strings from output files

ghostdog74 · 08-24-2010, 10:16 AM

Quote:

Originally Posted by Sergei Steshenko

It's you who started using "holy grail" - I was talking about "overall optimization" WRT languages one invests his/her time in.

If your memory is failing you, may i please redirect you to post #7. You are the one who started it all by saying its a waste of time learning an "underlanguage" (which strangely, is still undefined till now.) Then you mentioned one must go for Perl/Python/Ruby because its "one language fits all". So isn't that your "holy grail" mentality taking effect? In my posts, i have never once mentioned OP definitely have to use awk. I just said OP can use awk as well to solve his problem, which i did show him how.

And then there's my question which you have consistently avoided. If Perl/Python/Ruby one day is going to be called "underlanguages", are you going to advice people not to learn them? Still no answer from you?
This answer will decide whether you are spouting crap or not.

In your last few posts, you mentioned about embedded systems and that "underlanguages" are only used in those systems. So now i ask you, is learning "underlanguages" that worthless now?

Feynman · 08-24-2010, 10:18 AM

Ignore this post. See my next post with the attachment

Sergei Steshenko · 08-24-2010, 10:20 AM

Quote:

Originally Posted by ghostdog74

... Then you mentioned one must go for Perl/Python/Ruby because its "one language fits all". So isn't that your "holy grail" mentality taking effect?
...

No, it isn't one language fits all. One may still need C/C++/OCaml/AnotherFastLanguage.

In the category of tightly coupled text parsing and related data processing Perl/Python/Ruby are clear winners over 'awk'.

ghostdog74 · 08-24-2010, 10:29 AM

Quote:

Originally Posted by Sergei Steshenko

In the category of tightly coupled text parsing and related data processing Perl/Python/Ruby are clear winners over 'awk'.

Again, another baseless assumption. Show some proof of those "winners" regarding text parsing and i will believe you. Otherwise, stop spouting your nonsense. Note, I am not an awk advocate. I like Perl as much as you do, and I use Python whenever i need to. I am only refuting you baseless comment that one should not waste time and learn awk (or other underlanguages as you defined it ) because I do believe they are still needed in various other environments, like the embedded systems you mentioned.

Feynman · 08-24-2010, 10:30 AM

Here is a typical output of a quantum chemistry package (GAMESS in this case). This is actually going to be the subject of my first study.

Sergei Steshenko · 08-24-2010, 10:31 AM

Quote:

Originally Posted by ghostdog74

... and i will believe you. ...

I don't care actually.

But if you wanna think, think for starters about exporting data structures from 'awk' and importing them into 'awk'.

Sergei Steshenko · 08-24-2010, 10:33 AM

Quote:

Originally Posted by Feynman

Here is a typical output of a quantum chemistry package (GAMESS in this case). This is actually going to be the subject of my first study.

You will quite likely need something like this:

http://docstore.mik.ua/orelly/perl/cookbook/ch06_09.htm
.

GrapefruiTgirl · 08-24-2010, 10:33 AM

Hmm.. Why is this thread marked [SOLVED] - I didn't note any particular solution, and the OP is still providing information and sample files recently. Are the arguing parties still trying to help the OP here? Perhaps the debate should be pruned off to another thread, and assisting the OP can resume (assuming the thread is not actually SOLVED - is it?)

Feynman · 08-24-2010, 10:34 AM

Might I inquire which is easiest to learn?

Feynman · 08-24-2010, 10:35 AM

Sorry, I marked it as solved earlier--before awk was mentioned. I figured my question was too vague to be answered thoroughly.

Sergei Steshenko · 08-24-2010, 10:37 AM

Quote:

Originally Posted by Feynman

Might I inquire which is easiest to learn?

You might

. But the true question is how to minimize:

easiest_to_learn * number_of_different_easiest_to_learn

product.

...

I looked at you data and it looks way too disorganized to me. I.e. my sensation is that the data is generated by quite a number of ad-hoc solutions with no clear architecture.

ghostdog74 · 08-24-2010, 10:38 AM

Quote:

Originally Posted by Sergei Steshenko

I don't care actually.

But if you wanna think, think for starters about exporting data structures from 'awk' and importing them into 'awk'.

first you tell me why do you need to do that? And why is it related to text parsing at all?

ghostdog74 · 08-24-2010, 10:41 AM

Quote:

Originally Posted by Feynman

Might I inquire which is easiest to learn?

easiest to learn in what sense? This is very vague. In terms of language syntax? In terms of number of libraries you can use? In terms of ?

ghostdog74 · 08-24-2010, 10:44 AM

Quote:

Originally Posted by GrapefruiTgirl

Are the arguing parties still trying to help the OP here?

I have done my part. See post #6. All the crap starts at post #7

Feynman · 08-24-2010, 10:51 AM

Funny you found that disorganized. This is a pretty highly regarded software and every quantum chemistry package I have used outputs data in this type of way.

The .dat file (also produced from a calculation) is more condensed, but it is essentially a chunk of the log file. I figured I would sift through the log file by default just in case I want to find something that is not in the dat file.

Anyway, the key here is that there are landmarks in that gibberish. For instance, if I want the total energy of the molecule, I want the number imediatly following the phrase "FINAL RHF ENERGY IS". And you will notice that the file is broken up into these chunks of data. Each has its own grammar/syntax (am am mostly self taught so forgive me if I use some terms incorrectly) and some unique landmark denoting where it start, and if you look carefully, there is a "-------" that comes before and after each chunk of data. Looking for things like this would be a typical task in sifting through the results of a quantum chemistry package.

I wanted to keep the scripts general so they would work not just for GAMESS, but most standard quantum chemistry packages. They all do this chunking thing. At this point, it seems if I have scripts that can perform those five tasks (actually, only four of them are needed--the last one is just a generalization of the third one), I should be able to extract just about any portion of any output generated by these packages. There are probably exceptions I have not thought of, but this would be an excellent start.