grep+awk+sed+paste+sort in one script?
Hi people, new user here. Hope I can get some help.
I have to extract data pairwise from two lines in one big (4MB+) text file. (The script is applied to a series of files but the data always comes pairwise from the same file) I did kind of solve the problem by using a csh script to call an awk script (and generating 4 temp files) but I'd like a more elegant way of doing it. What I have: Code:
getroots.csh Code:
root2pdb.awk I can provide source and target data if it would help, but I didn't want to make an unnecessarily long post. Thanks! |
Providing data (before & after) would surely help. And chances
are it can be done with awk alone =) Cheers, Tink |
Thanks for the quick answer, Tinkster.
Source: (in *.dlg file) Code:
Code:
ATOM 1 O UNK L 0 7.718 2.274 -6.002 1.00 -8.36 O - the energies (ex: -3.15 for ATOM 141) must match the coordinates for each run (it's the purpose of the entire exercise!) - the atom I want to extract for each run is always the "root" (always ATOM 1 for each run) - there are thousands of runs per file resulting in one line for each in the output - the coordinates I am interested in are always in a line that starts with DOCKED: ATOM - I wish to sort the output by lowest energy - original "ranking" is completely random (just by run # - here I gave run 30 as an example) - output must follow exactly that format (including number of blank spaces) - I'd like for this to work under Cygwin as well (just in case that's a limitation) - bonus question: can I extract the run number as well? (appending it at the right end of each line in the output) Thanks, Martin |
Hmmm ... can you provide a slightly larger sample (that would allow to
produce more than one row of output)? Maybe 2 or 3 output lines? Also, those sections with (variable length) ... are they actually separated by blank lines or is the data stream not blank-line separated? |
I have a rule I follow - if it's quick and the data is well-formed use sed; else if it's manipulating data for re-display, use awk; when it gets too complex/ugly with either of them, use perl.
So ... use perl ;) |
1 Attachment(s)
OK... guess I should have attached a larger text sample to begin with. Sorry, my bad.
I am attaching an (extensively truncated) example source file here. Note: Please rename .txt to .dlg The parts where I removed stuff are indicated by the line Code:
>TRUNCATED HERE Code:
ATOM 1 O UNK L 0 8.544 24.334 -2.603 1.00 -3.46 O But ideally, I would like this (the last number corresponding to the run / MODEL #) Code:
ATOM 1 O UNK L 0 8.544 24.334 -2.603 1.00 -3.46 O 256 The point is that I do know how to get the data I need using the scripts in the OP. I would just like to concatenate all this into one script - whether it's awk, csh or perl. |
Does this work for you? =o)
Code:
#!/bin/awk -f Cheers, Tink |
Thanks for the script, Tink!
It works *almost* perfectly, except that the atoms are not serially numbered. Using the example file, with your script I get: Code:
ATOM 10 O UNK L 0 8.544 24.334 -2.603 1.00 -3.46 O 256 Code:
ATOM 1 O UNK L 0 8.544 24.334 -2.603 1.00 -3.46 O 256 Seems like it doubles the run number (because you increase i twice?) but only up to a point... Can you - or anybody else on this forum! :-) fix it? Thanks for your help! |
Uhm... anybody?
|
The problem here is that the numbering of column 2 needs to be
applied after the sort on the last one ... which means we have to insert those *after* the sort, so it will again become a two step process, where the numeric IDs get *created* after the awk. I'll have a think about how to work this - sorry for the later response, had a mini-holiday of 3.5 days =} |
Any progress with my problem yet?
Maybe something involving a counter mechanism for the new file (i.e. just counting line# of the output) that is completely independent of the already applied i++ increment? PS: Thanks for keeping me updated, Tink. I really do appreciate your help. |
Quote:
sort in the output inside the loop of the END processing ... I'd need to see whether I can use awks built-in sort functions (asort or asorti) to do the sorting instead, and then do the increment count in there. The other (quick, but less elegant) way I see (for now, but I'm kind of preoccupied with with other stuff) is to put something bogus in the position of the counter, and then pipe that through a second awk that just replaces the bogus with a padded line number ... Can you try and work with those (English) pseudo instructions yourself? I guess it would only take me half an hour, but that's currently hard to dig up :) Cheers, Tink |
Thanks Tink, I finally figured out something that works using your suggestions.
|
Glad to hear it's done, thanks for coming back with the feed-back
and sorry I couldn't be of more assistance. Cheers, Tink |
All times are GMT -5. The time now is 10:23 AM. |