LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Methods for extracting data strings from output files (https://www.linuxquestions.org/questions/programming-9/methods-for-extracting-data-strings-from-output-files-828088/)

Feynman 08-23-2010 07:29 PM

Methods for extracting data strings from output files
 
I am trying to develop a method of reading files generated by other programs. I am trying to find the most versatile approach. I have been trying bash, and have been making good progress with sed, however I was wondering if there was a "standard" approach to this sort of thing.

The main features I would like to implement concern reading finding strings based on various forms of context and storing them to variables and/or arrays. Here are the most general tasks:
a) Read the first word(or floating point) that comes after a given string (solved in another thread)
b) Read the nth line after a given string
c) Read all text between two given strings
d) Save the output of task a), task b) or task c) (above) into an array if the "given string(s)" is/are not unique.
e)Read text between two non-unique strings i.e. text between the nth occurrence of string1 and the mth occurrence of string2

As far as I can tell, those five scripts should be able to parse just about any text pattern.

Does anyone have any suggestions for approaches (perl, sed, bash, etc)? I am by no means fluent in these languages. But I could use a starting point. My main concern is speed. I intend to use these scripts in a program that reads and writes hundreds of input and output files--each with a different value of some parameter(s). The files will most likely be no more than a few dozen lines, but I can think of some applications that could generate a few hundred lines. I have the input file generator down pretty well. Parsing the output is quite a bit trickier.

And, of course, the option for parallelization will be very desirable for many practical applications.

And if anyone cares to take a crack at writing a script that preforms these tasks please share!

wje_lq 08-23-2010 08:14 PM

Quote:

Originally Posted by Feynman (Post 4075563)
I am trying to develop a method of reading files generated by other programs. I am trying to find the most versatile approach.

Oh. I thought your main concern was speed. Oh, wait, it is:
Quote:

Originally Posted by Feynman (Post 4075563)
My main concern is speed. I intend to use these scripts in a program that reads and writes hundreds of input and output files--each with a different value of some parameter(s). The files will most likely be no more than a few dozen lines, but I can think of some applications that could generate a few hundred lines.

If most of your files are small, and your main concern is speed, it could very well end up that your main speed bottleneck is the time it takes to open each file. If this is the case, then it doesn't matter what language or approach you use.

The second most probable speed bottleneck is the time it takes to parse each file. If your main concern is speed, it might well be that you want to use lex and yacc (or, on Linux systems, flex and bison). To use these, you will be learning C or C++.

Learning C or C++ probably represents a development effort that is an order of magnitude higher than you want to invest. If that's the case, perhaps your main concern, at least at the beginning, is not speed, but effort required to develop the software.
Quote:

Originally Posted by Feynman (Post 4075563)
And if anyone cares to take a crack at writing a script that preforms these tasks please share!

"Anyone", in this case, would be Feynman. We normally don't do that sort of stuff around here. Show us some code, tell us precisely why it's not working, and we'll probably be glad to comment.

Hope this helps.

Sergei Steshenko 08-23-2010 08:15 PM

Quote:

Originally Posted by Feynman (Post 4075563)
...
As far as I can tell, those five scripts should be able to parse just about any text pattern.

Does anyone have any suggestions for approaches (perl, sed, bash, etc)?
...

There is no such thing as "be able to parse just about any text pattern" - you have to define and implement your input language. So maybe better do some standardization in the output formats.

Yes, I recommend Perl.

Wherever I could define standards, I was making programs to output data as Perl hierarchical data structures, so no parsing was necessary in the first place, rather Perl itself was used as the parser.

Feynman 08-23-2010 08:45 PM

I guess I give perl a shot. I will post a new thread if/when I hit a road block. Thanks for the long responses to my admittedly vague questions!

Sergei Steshenko 08-23-2010 08:51 PM

Perl's regular expressions engine is quite well optimized, so you may expect quite good speed.

ghostdog74 08-23-2010 09:17 PM

Quote:

Originally Posted by Feynman (Post 4075613)
I guess I give perl a shot.

you can do all those 5 tasks you mention with (g)awk as well. See my sig to learn about gawk

Quote:

a) Read the first word(or floating point) that comes after a given string (solved in another thread)
Code:

awk '/pattern/{f=1}f{for(i=1;i<=NF;i++) {if($i ~/[0-9]+\.[0-9]+/) print $i} }' file
Quote:

b) Read the nth line after a given string
Code:

awk 'c&&c--;/pattern/{c=2}' file
Quote:

c) Read all text between two given strings
Code:

awk -vRS="string2" '/string1/{gsub(/.*string1/,"");print}' file
Quote:

d) Save the output of task a), task b) or task c) (above) into an array if the "given string(s)" is/are not unique.
don't understand, but i am 100% sure this is easy to do as well

Quote:

e)Read text between two non-unique strings i.e. text between the nth occurrence of string1 and the mth occurrence of string2
one way
Code:

awk 'FNR==NR{
  for(i=1;i<=NF;i++) {
    if($i == "string1"){
        tm++
        if( tm==3 ){ linetm=FNR} #3rd occurence
    };
    if($i =="string2") { 
      sm++
      if( sm==2) { linesm=FNR} #2nd occurence
    }
  }
  next
}
FNR > linesm && FNR < linetm{
  print
} ' file file

not perfect, but you get the drift. And don't worry about awk's speed. It can be as fast, and sometimes faster than Perl/Python.

Sergei Steshenko 08-23-2010 09:25 PM

Quote:

Originally Posted by ghostdog74 (Post 4075640)
you can do all those 5 tasks you mention with (g)awk as well. See my sig to learn about gawk

Still, 'awk' is an "underlanguage".

Sergei Steshenko 08-23-2010 09:27 PM

Quote:

Originally Posted by Feynman (Post 4075613)
I guess I give perl a shot. ...

http://perldoc.perl.org/

ghostdog74 08-23-2010 09:43 PM

Quote:

Originally Posted by Sergei Steshenko (Post 4075648)
Still, 'awk' is an "underlanguage".

awk is the grandfather of Perl. And for his purpose,ie parsing files, awk is enough, and sometimes even faster than Perl. You are hereby advised to look at awk.info , especially this

Sergei Steshenko 08-23-2010 09:52 PM

Quote:

Originally Posted by ghostdog74 (Post 4075658)
awk is the grandfather of Perl. And for his purpose,ie parsing files, awk is enough, and sometimes even faster than Perl. You are hereby advised to look at awk.info , especially this

I know that 'awk' is the grandfather of Perl. Languages like Perl/Python/Ruby were invented to avoid the necessity to deal with the (un)holly sh/sed/awk trinity.

ghostdog74 08-23-2010 09:59 PM

Quote:

Originally Posted by Sergei Steshenko (Post 4075660)
I know that 'awk' is the grandfather of Perl. Languages like Perl/Python/Ruby were invented to avoid the necessity to deal with the (un)holly sh/sed/awk trinity.

Back up your point on (un)holly sh/sed/awk trinity with facts and figures as pertaining to OP's question. In other words, tell us why you think awk is not recommended (by you) to be used in this case. Otherwise, your comment does not hold any weight.

Sergei Steshenko 08-23-2010 10:12 PM

Quote:

Originally Posted by ghostdog74 (Post 4075666)
...with ... figures as pertaining to OP's question. ...

One language instead of many.

ghostdog74 08-23-2010 10:22 PM

Quote:

Originally Posted by Sergei Steshenko (Post 4075673)
One language instead of many.

wrong. awk is a programming language. There is no need to use other tools to parse files. Just awk is enough. So where is the "many language" you are talking about? The point is this, you can get the job done with awk, Perl/Python/Ruby whatever for his case. Your argument of awk being the "underlanguage" and not suitable for the job because you think that only Perl/Python/Ruby can do the job is flawed and weak. I have proven that awk can also do it in my reply to OP's particular task.

Sergei Steshenko 08-23-2010 10:32 PM

Quote:

Originally Posted by ghostdog74 (Post 4075677)
wrong. awk is a programming language. There is no need to use other tools to parse files. Just awk is enough. So where is the "many language" you are talking about? The point is this, you can get the job done with awk, Perl/Python/Ruby whatever for his case. Your argument of awk being the "underlanguage" and not suitable for the job because you think that only Perl/Python/Ruby can do the job is flawed and weak. I have proven that awk can also do it in my reply to OP's particular task.

I know that dealing with massive amounts of scientific data will reveal other than parsing needs.

ghostdog74 08-23-2010 10:50 PM

Quote:

Originally Posted by Sergei Steshenko (Post 4075685)
I know that dealing with massive amounts of scientific data will reveal other than parsing needs.

So do you think awk has no capabilities to handle scientific data? If you can show us (back up) this comment with some facts/examples, i will believe what you say. Other than that, you are just putting too much assumptions into the original problem(question) and saying something that has no concrete proof, (that awk is not suitable for his tasks. Yes, read the key words, his tasks)

Sergei Steshenko 08-23-2010 11:15 PM

Quote:

Originally Posted by ghostdog74 (Post 4075695)
So do you think awk has no capabilities to handle scientific data? If you can show us (back up) this comment with some facts/examples, i will believe what you say. Other than that, you are just putting too much assumptions into the original problem(question) and saying something that has no concrete proof, (that awk is not suitable for his tasks. Yes, read the key words, his tasks)

I am saying what I've said: 'awk' is an "underlanguage". I haven't said it's not suitable for text parsing. I am saying it is not worth learning in the grand scheme of things.

Feynman 08-23-2010 11:15 PM

WOW! Thank you so much! I cannot blame you for not fully understanding task d. I can give a simple but general example for the task a) implementation:
text file reads:

blah blah
blah add this word to the list: 1234.56 blah blah
blah blah
blah now don't forget to add this word to the list: PINAPPLE blah blah
And for bonus points,
it would be nice to know that the script
would be able to add this word to the list: 1!@#$%^&*()[]{};:'",<.>/?asdf blah blah
blah blah

As the file implies, save words that come after "add this word to the list:" to a list.

Thank's again.

ghostdog74 08-24-2010 12:00 AM

Quote:

Originally Posted by Sergei Steshenko (Post 4075715)
I am saying what I've said: 'awk' is an "underlanguage". I haven't said it's not suitable for text parsing. I am saying it is not worth learning in the grand scheme of things.

In the first reply to my post, you did say that "awk is an underlanguage" but you did not say that its not suitable for parsing. It gives the impression that you meant awk is not suitable for the task.

So what's the definition of "grand scheme of things" ? Are you saying that whenever one has a task to solve, he has to turn to Python/Perl/Ruby for the solution? Or what? I am not disapproving your notion of turning to these 3 languages (well known) for solving problems, but often than not, the "grand scheme of things" is dependent on the environment, and what tools one has to his disposal. Note: I did say that you can do it with awk as well (the key word is "as well"), but i did not say awk is the only thing that can solve the problem.

Sergei Steshenko 08-24-2010 12:15 AM

Quote:

Originally Posted by ghostdog74 (Post 4075742)
...
So what's the definition of "grand scheme of things" ? Are you saying that whenever one has a task to solve, he has to turn to Python/Perl/Ruby for the solution?
...

Essentially yes. The grand scheme of things is that one shouldn't spend his/her time learning underlanguages.

ghostdog74 08-24-2010 12:29 AM

Quote:

Originally Posted by Sergei Steshenko (Post 4075755)
Essentially yes. The grand scheme of things is that one shouldn't spend his/her time learning underlanguages.

by that argument, you are suggesting that people should not learn DOS batch, vbscript etc right?. So if you one day a better language comes about and takes over Python/Perl/Ruby (remember, languages do evolve), and Perl/Python/Ruby now becomes the "underlanguage", so now, are you going to change your point of view, in this case, we should not spend time learning Perl/Python/Ruby because they are now "underlanguages" ?

you should stop imprinting that kind of "holy grail" thinking onto other people unaware of what's going on.

By the way, i am curious. How do you actually measure and categorize "underlanguages"? It appears to me like its a scientific and proven technique.

Sergei Steshenko 08-24-2010 07:44 AM

Quote:

Originally Posted by ghostdog74 (Post 4075766)
by that argument, you are suggesting that people should not learn DOS batch, vbscript etc right?. So if you one day a better language comes about and takes over Python/Perl/Ruby (remember, languages do evolve), and Perl/Python/Ruby now becomes the "underlanguage", so now, are you going to change your point of view, in this case, we should not spend time learning Perl/Python/Ruby because they are now "underlanguages" ?

you should stop imprinting that kind of "holy grail" thinking onto other people unaware of what's going on.

By the way, i am curious. How do you actually measure and categorize "underlanguages"? It appears to me like its a scientific and proven technique.

My technique is purely subjective - if after making an overview of easily available languages on a platform I come to the conclusion that there are both underlanguages and normal languages, I choose the latter ones.

And yes, Perl/Python/Ruby can become underlanguages.

DOS batch language is definitely an underlanguage, and even though many many years ago I knew it somewhat, now I wouldn't consider learning it. For example, for Windows there is portable "Strawberry Perl", so if I need to do massive scripting under Windows, I'll use that Perl instead of DOS batch language.

ghostdog74 08-24-2010 08:33 AM

Quote:

Originally Posted by Sergei Steshenko (Post 4076079)
My technique is purely subjective - if after making an overview of easily available languages on a platform I come to the conclusion that there are both underlanguages and normal languages, I choose the latter ones.

since its subjective, it will apply to anyone else as well. Some times, one doesn't need "normal languages". You have also not defined how you "measure" "underlanguages", whatever that means.

Quote:

And yes, Perl/Python/Ruby can become underlanguages.
so, what's your conclusion? Will you advice people during that time Perl/Python/Ruby has become "underlanguages" that they should not waste their time learning them?


Quote:

DOS batch language is definitely an underlanguage, and even though many many years ago I knew it somewhat, now I wouldn't consider learning it.
Yes,but it doesn't mean it don't have any uses, right? Situations where you can't install anything on a Win32 machine, then one has to use what's available.

Coming back to the main point of argument. You mentioned awk is an "underlanguage" and that later you mentioned you did not say its not suitable for parsing. I take it that you agree awk can do the job for this task (even though its under YOUR definition of "underlanguage"). So we can stop this useless argument already. right ?

Sergei Steshenko 08-24-2010 08:54 AM

Quote:

Originally Posted by ghostdog74 (Post 4076115)
since its subjective, it will apply to anyone else as well. Some times, one doesn't need "normal languages". You have also not defined how you "measure" "underlanguages", whatever that means.


so, what's your conclusion? Will you advice people during that time Perl/Python/Ruby has become "underlanguages" that they should not waste their time learning them?



Yes,but it doesn't mean it don't have any uses, right? Situations where you can't install anything on a Win32 machine, then one has to use what's available.

Coming back to the main point of argument. You mentioned awk is an "underlanguage" and that later you mentioned you did not say its not suitable for parsing. I take it that you agree awk can do the job for this task (even though its under YOUR definition of "underlanguage"). So we can stop this useless argument already. right ?

The argument is that it is senseless to learn 'awk' in case of massive scientific data on the horizon. And in general it is senseless to learn a sea of underlanguages.

The only place for underlanguages is systems with limited resources, like tiny embedded ones - not the case here.

ghostdog74 08-24-2010 09:17 AM

Quote:

Originally Posted by Sergei Steshenko (Post 4076139)
The argument is that it is senseless to learn 'awk' in case of massive scientific data on the horizon.

baseless assumptions on one isolated case. You have not provided data to back up your claims on awk not able to do massive scientific tasks. Again, you have based on assumption that OP has massive scientific data to process. Also you keep avoiding my question on Perl/Python/Ruby being called "underlanguages" in the future. Is it senseless to learn them if they ever become "underlanguages"? All your comments up till now are all crap if you can't answer them truthfully.

Quote:

And in general it senseless to learn a sea of underlanguages.
So you agree and would advice people not to use Perl/Python/ruby if they ever become underlanguages? Correct? What's your definition of an "underlanguage", you have not told us also.

Quote:

The only place for underlanguages is systems with limited resources, like tiny embedded ones - not the case here.
It is also not the case here that 'underlanguages' like awk AS YOU DEFINED IT, cannot do the job OP asked. So for this example, do you now think that its useless to learn "underlanguages"?

grail 08-24-2010 09:26 AM

@ Sergei & ghostdog - guys I realise that you both believe passionately in what you have to say but it seems that although loosely based on
this question you are more arguing with each other than helping the OP. Far be it for me to complain against either of you as I respect both of
you in your given strengths and always read solutions that both of you post.

Please let us just present the solutions we feel will work and then as with all things on LQ let the OP decide which option they prefer to follow :)
If they are clever, they will give both the due merit as I know this is how I have been learning whilst participating in the forum.

Cheers
Grail

grail 08-24-2010 09:30 AM

Feynman - I know you provided in the first post the things you would like to achieve and in post #17 you provided some data. Perhaps you could show what, using the data provided, your output for each and or all steps would be?

ghostdog74 08-24-2010 09:35 AM

Quote:

Originally Posted by grail (Post 4076175)
Please let us just present the solutions we feel will work and then as with all things on LQ let the OP decide which option they prefer to follow :)
Cheers
Grail

please note that i have already presented my solutions. The tasks can be solved using awk as well. his solution is just look for the "holy grail", which is non-existent.

Sergei Steshenko 08-24-2010 10:02 AM

Quote:

Originally Posted by ghostdog74 (Post 4076185)
please note that i have already presented my solutions. The tasks can be solved using awk as well. his solution is just look for the "holy grail", which is non-existent.

It's you who started using "holy grail" - I was talking about "overall optimization" WRT languages one invests his/her time in.

Feynman 08-24-2010 10:12 AM

Well, I do not know how to attach files. Please tell me how. In any case, I do not have very large files at this point. Actually, I was hoping in part I could use these scripts to feed the output of smaller files into the input of other programs--so and each output would contain more information to sift through.

[Bit of background here]
For my purposes, I might calculate the properties of a few small molecules in parallel, have the scripts grab some portion of the data (which would be easily identifiable based on the structure of the output file the chemistry program generates) and concatenate it into a new input files that asks for information about how they would interact. Automating this process would be wonderfully useful. I suspect "professionals" already have these scripts and a strong knowledge of whatever language they were written in at hand, but I am still an undergrad and have much to learn about my computational recourses. I was hoping to put the final product on a website for free download and GNU usage. I suspect others like me will find it quite useful.

Anyway, I can certainly copy and past some example input files I was starting out with (these came as tests for one of the chemistry packages I am working with). Give me a second to boot up my virtual Debian. I will post it in the next reply.

Sergei Steshenko 08-24-2010 10:14 AM

Quote:

Originally Posted by Feynman (Post 4076210)
Well, I do not know how to attach files. Please tell me how. ...

When you press the "Quote" button answering a post, in the lower left part of your browser screen "Manage Attachments" button should appear.

ghostdog74 08-24-2010 10:16 AM

Quote:

Originally Posted by Sergei Steshenko (Post 4076204)
It's you who started using "holy grail" - I was talking about "overall optimization" WRT languages one invests his/her time in.

If your memory is failing you, may i please redirect you to post #7. You are the one who started it all by saying its a waste of time learning an "underlanguage" (which strangely, is still undefined till now.) Then you mentioned one must go for Perl/Python/Ruby because its "one language fits all". So isn't that your "holy grail" mentality taking effect? In my posts, i have never once mentioned OP definitely have to use awk. I just said OP can use awk as well to solve his problem, which i did show him how.

And then there's my question which you have consistently avoided. If Perl/Python/Ruby one day is going to be called "underlanguages", are you going to advice people not to learn them? Still no answer from you?
This answer will decide whether you are spouting crap or not.

In your last few posts, you mentioned about embedded systems and that "underlanguages" are only used in those systems. So now i ask you, is learning "underlanguages" that worthless now?

Feynman 08-24-2010 10:18 AM

Ignore this post. See my next post with the attachment

Sergei Steshenko 08-24-2010 10:20 AM

Quote:

Originally Posted by ghostdog74 (Post 4076214)
... Then you mentioned one must go for Perl/Python/Ruby because its "one language fits all". So isn't that your "holy grail" mentality taking effect?
...

No, it isn't one language fits all. One may still need C/C++/OCaml/AnotherFastLanguage.

In the category of tightly coupled text parsing and related data processing Perl/Python/Ruby are clear winners over 'awk'.

ghostdog74 08-24-2010 10:29 AM

Quote:

Originally Posted by Sergei Steshenko (Post 4076217)
In the category of tightly coupled text parsing and related data processing Perl/Python/Ruby are clear winners over 'awk'.

Again, another baseless assumption. Show some proof of those "winners" regarding text parsing and i will believe you. Otherwise, stop spouting your nonsense. Note, I am not an awk advocate. I like Perl as much as you do, and I use Python whenever i need to. I am only refuting you baseless comment that one should not waste time and learn awk (or other underlanguages as you defined it ) because I do believe they are still needed in various other environments, like the embedded systems you mentioned.

Feynman 08-24-2010 10:30 AM

1 Attachment(s)
Here is a typical output of a quantum chemistry package (GAMESS in this case). This is actually going to be the subject of my first study.

Sergei Steshenko 08-24-2010 10:31 AM

Quote:

Originally Posted by ghostdog74 (Post 4076225)
... and i will believe you. ...

I don't care actually.

But if you wanna think, think for starters about exporting data structures from 'awk' and importing them into 'awk'.

Sergei Steshenko 08-24-2010 10:33 AM

Quote:

Originally Posted by Feynman (Post 4076226)
Here is a typical output of a quantum chemistry package (GAMESS in this case). This is actually going to be the subject of my first study.

You will quite likely need something like this:

http://docstore.mik.ua/orelly/perl/cookbook/ch06_09.htm
.

GrapefruiTgirl 08-24-2010 10:33 AM

Hmm.. Why is this thread marked [SOLVED] - I didn't note any particular solution, and the OP is still providing information and sample files recently. Are the arguing parties still trying to help the OP here? Perhaps the debate should be pruned off to another thread, and assisting the OP can resume (assuming the thread is not actually SOLVED - is it?)

Feynman 08-24-2010 10:34 AM

Might I inquire which is easiest to learn?

Feynman 08-24-2010 10:35 AM

Sorry, I marked it as solved earlier--before awk was mentioned. I figured my question was too vague to be answered thoroughly.

Sergei Steshenko 08-24-2010 10:37 AM

Quote:

Originally Posted by Feynman (Post 4076235)
Might I inquire which is easiest to learn?

You might :). But the true question is how to minimize:

easiest_to_learn * number_of_different_easiest_to_learn

product.

...

I looked at you data and it looks way too disorganized to me. I.e. my sensation is that the data is generated by quite a number of ad-hoc solutions with no clear architecture.

ghostdog74 08-24-2010 10:38 AM

Quote:

Originally Posted by Sergei Steshenko (Post 4076228)
I don't care actually.

But if you wanna think, think for starters about exporting data structures from 'awk' and importing them into 'awk'.

first you tell me why do you need to do that? And why is it related to text parsing at all?

ghostdog74 08-24-2010 10:41 AM

Quote:

Originally Posted by Feynman (Post 4076235)
Might I inquire which is easiest to learn?

easiest to learn in what sense? This is very vague. In terms of language syntax? In terms of number of libraries you can use? In terms of ?

ghostdog74 08-24-2010 10:44 AM

Quote:

Originally Posted by GrapefruiTgirl (Post 4076234)
Are the arguing parties still trying to help the OP here?

I have done my part. See post #6. All the crap starts at post #7

Feynman 08-24-2010 10:51 AM

Funny you found that disorganized. This is a pretty highly regarded software and every quantum chemistry package I have used outputs data in this type of way.

The .dat file (also produced from a calculation) is more condensed, but it is essentially a chunk of the log file. I figured I would sift through the log file by default just in case I want to find something that is not in the dat file.

Anyway, the key here is that there are landmarks in that gibberish. For instance, if I want the total energy of the molecule, I want the number imediatly following the phrase "FINAL RHF ENERGY IS". And you will notice that the file is broken up into these chunks of data. Each has its own grammar/syntax (am am mostly self taught so forgive me if I use some terms incorrectly) and some unique landmark denoting where it start, and if you look carefully, there is a "-------" that comes before and after each chunk of data. Looking for things like this would be a typical task in sifting through the results of a quantum chemistry package.

I wanted to keep the scripts general so they would work not just for GAMESS, but most standard quantum chemistry packages. They all do this chunking thing. At this point, it seems if I have scripts that can perform those five tasks (actually, only four of them are needed--the last one is just a generalization of the third one), I should be able to extract just about any portion of any output generated by these packages. There are probably exceptions I have not thought of, but this would be an excellent start.

Sergei Steshenko 08-24-2010 10:52 AM

Quote:

Originally Posted by ghostdog74 (Post 4076240)
first you tell me why do you need to do that? And why is it related to text parsing at all?

Because this is how normal programming is done. I.e. input files are parsed, and the result of parsing is a data structure.

Then processing is performed on the data structure.

For modularity/extensibility data structures are exported and imported by next consumers in the data processing chain.

I've dealt with huge amounts of data - be it VLSI design, static timing analysis, VLSI verification, ASIC standard library cells characterization, acoustic modeling, whatever - the approach with data structures always works and is the book approach.

Sergei Steshenko 08-24-2010 10:57 AM

Quote:

Originally Posted by Feynman (Post 4076249)
Funny you found that disorganized. This is a pretty highly regarded software ...

Windows95/98 was also once considered highly regarded SW.

For SW to be good one needs competition - as everywhere else. I do not think quantum chemistry SW is widely used, so I do not expect competition in the field.

There are well known and highly regarded data formats/approaches used in scientific calculations, for example, HDF: http://www.hdfgroup.org/ .

Feynman 08-24-2010 11:00 AM

Ok, I will rephrase that "easiest to learn" comment
Which language has commands/functions that are most naturally implemented to perform these tasks. For example:
If awk has a find_the_first_word_after_this_string("Insert string here") command, or
If perl has a grab_text_between_these_two_strings("string1", "string2") command,
then it is quite easy to decent which language is best suited for which task. I am ignoring performance because it seems that no consensus is coming any time soon regarding that. In any case, the fact that two senior members cannot reach a consensus about it means to me that awk and perl have only marginal differences in performance. Hence I place my main priority on implementation.

Feynman 08-24-2010 11:03 AM

I am having trouble keeping up. Give me a moment to review all the posts. I missed one directly referring to GAMESS with a link to some kind of cookbook.

ghostdog74 08-24-2010 11:05 AM

Quote:

Originally Posted by Feynman (Post 4076255)
Ok, I will rephrase that:
Which language has commands/functions that are most naturally implemented to perform these tasks. For example:
If awk has a find_the_first_word_after_this_string("Insert string here") command, or
If perl has a grab_text_between_these_two_strings("string1", "string2") command,

both have these functions, I have shown you how its done with awk.

Quote:

then it is quite easy to decent which language is best suited for which task. I am ignoring performance because it seems that no consensus is comming any time soon regarding that. In any case, the fact that two senior members cannot reach a consensus about it means to me that awk and perl have only marginal differences in performance. Hence I place my main priority on implementation.
Not true, awk parsing can be fast, if not, faster than Perl/Python/Ruby. And no, I am not disputing the fact that one can use Perl/Python for the job, what i don't agree is the "underlanguage" should not be learned comment.


All times are GMT -5. The time now is 03:23 AM.