LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (http://www.linuxquestions.org/questions/linux-general-1/)
-   -   cat from between keywords (http://www.linuxquestions.org/questions/linux-general-1/cat-from-between-keywords-872975/)

mzh 04-04-2011 06:03 PM

cat from between keywords
 
Dear all, i'm new in the forum, hope to have some good discussions here in the future.
I have a question:
say I have a bunch of files with content as
Code:

more data
H            1.    4.2601909824E-06    1.1192576765E-05    2.6002448261E-05
H            1.    1.9632681605E-06    1.8570567186E-05    2.1741537960E-05
H            1.  -2.3238557216E-05    1.1226266025E-05  -2.3988312764E-05
H            1.  -5.1073885199E-06    1.9570853306E-06  -1.6892426793E-05
H            1.    1.3239551190E-06    1.3856376369E-06    3.9292096947E-06
 $END
 $HESS
ENERGY IS      -68.5097726137 E(NUC) IS      315.1972078453
 1  1 3.38518644E-01-2.59345972E-02-2.20431415E-01 4.35677008E-02-1.73975464E-02
 1  2 6.60042430E-02 7.22497090E-03 3.95542762E-03-2.04969589E-02-1.45090031E-02
 1  3-7.56719858E-03-1.27150433E-02-2.50241943E-02 4.61303420E-02 2.11752109E-02
 1  4-3.58623392E-02-1.39002233E-02-1.77647824E-02-1.57901109E-03-1.60179364E-03
 1  5-2.15259537E-03-1.43752716E-03-1.31201163E-03-4.27208217E-03 4.76034162E-02
 1  6 2.24833277E-03-1.11268353E-02-9.74181047E-02-2.91626903E-02 8.74736584E-03
 1  7-2.15400285E-02-3.40971482E-02 1.33311058E-02-7.77108704E-03-2.23749856E-03
 1  8 1.63696613E-03-3.47066683E-03-4.05719853E-04-2.31927657E-03-2.53871570E-03
 1  9 7.21058903E-06-6.35504509E-04-5.88477013E-04-8.31049017E-05 3.06961569E-04
 1 10 5.15566010E-05-2.76245027E-04 1.04519661E-04-1.61330769E-04-2.85330037E-04
 1 11 4.76965767E-04 1.65799220E-04-4.95788049E-04 4.01372143E-04-8.33988757E-05
 1 12 3.52109490E-04-9.55040014E-05-1.80314085E-03-5.89478903E-03 3.75899269E-03
 1 13 5.33075211E-04-5.54100430E-05 6.06786318E-05-7.18074009E-04 4.05424301E-03
 1 14 3.88196151E-03 1.57004496E-04 3.12619608E-04 1.74230814E-04-1.23218799E-01
 1 15 1.15067982E-01 4.50432329E-02-9.90256847E-02-3.18111872E-02 1.22994763E-01
 1 16-7.57634741E-04 1.22495204E-03 4.19373998E-03
 $END
more data

What I need is a way to get exactly the data inbetween (and including) the $HESS and subsequent $END keywords (and cat it to another file, but that I think I know how to do). The $HESS and $END keywords appear on indidual line numbers for all the files (but HESS appears only once in the file), so I have to find a way how to get the keyword. I know grep -A, but I dont know how much context I would need to give it in order to reach the next $END keyword.
I hope I could make my question clear enough, any helps on this would be very much appreciated.

kurumi 04-04-2011 07:15 PM

Ruby(1.9+)

Code:

$ ruby -ne 'print if /\$HESS/../\$END/' file

Telengard 04-04-2011 09:18 PM

Code:

foo$ f=0; while read l; do [[ $l =~ \$HESS ]] && f=1; [[ f -eq 1 ]] && echo $l; [[ $l =~ \$END ]] && f=0; done < file
$HESS
ENERGY IS -68.5097726137 E(NUC) IS 315.1972078453
1 1 3.38518644E-01-2.59345972E-02-2.20431415E-01 4.35677008E-02-1.73975464E-02
1 2 6.60042430E-02 7.22497090E-03 3.95542762E-03-2.04969589E-02-1.45090031E-02
1 3-7.56719858E-03-1.27150433E-02-2.50241943E-02 4.61303420E-02 2.11752109E-02
1 4-3.58623392E-02-1.39002233E-02-1.77647824E-02-1.57901109E-03-1.60179364E-03
1 5-2.15259537E-03-1.43752716E-03-1.31201163E-03-4.27208217E-03 4.76034162E-02
1 6 2.24833277E-03-1.11268353E-02-9.74181047E-02-2.91626903E-02 8.74736584E-03
1 7-2.15400285E-02-3.40971482E-02 1.33311058E-02-7.77108704E-03-2.23749856E-03
1 8 1.63696613E-03-3.47066683E-03-4.05719853E-04-2.31927657E-03-2.53871570E-03
1 9 7.21058903E-06-6.35504509E-04-5.88477013E-04-8.31049017E-05 3.06961569E-04
1 10 5.15566010E-05-2.76245027E-04 1.04519661E-04-1.61330769E-04-2.85330037E-04
1 11 4.76965767E-04 1.65799220E-04-4.95788049E-04 4.01372143E-04-8.33988757E-05
1 12 3.52109490E-04-9.55040014E-05-1.80314085E-03-5.89478903E-03 3.75899269E-03
1 13 5.33075211E-04-5.54100430E-05 6.06786318E-05-7.18074009E-04 4.05424301E-03
1 14 3.88196151E-03 1.57004496E-04 3.12619608E-04 1.74230814E-04-1.23218799E-01
1 15 1.15067982E-01 4.50432329E-02-9.90256847E-02-3.18111872E-02 1.22994763E-01
1 16-7.57634741E-04 1.22495204E-03 4.19373998E-03
$END
foo$


mzh 04-05-2011 05:52 AM

thanks a lot guys, great to get this kind of support in such short time.
apparently, there's also an AWK solution to this:
Code:

~/shell $ awk '/ \$HESS/,/^ \$END/' hess.dat >> new-file.inp
@Telengard: do you maybe mind giving me a short explanation of how your solution works? I have to say, I dont know what the =~ operator does.

H_TeXMeX_H 04-05-2011 10:49 AM

There's also a sed solution:

Code:

sed -n '/$HESS/,/$END/ p' input > output

Telengard 04-05-2011 01:09 PM

Quote:

Originally Posted by mzh (Post 4314405)
@Telengard: do you maybe mind giving me a short explanation of how your solution works? I have to say, I dont know what the =~ operator does.

Sure, np.

Code:

f=0; while read l; do [[ $l =~ \$HESS ]] && f=1; [[ f -eq 1 ]] && echo $l; [[ $l =~ \$END ]] && f=0; done < file
  • f=0: This clears a flag which I use to determine whether or not to print the current line.
  • while read l: Opens a Bash loop construct which will read from standard input, storing each line in the variable l as it goes.
  • do: Introduces the body of the loop construct. (Syntax candy).
  • [[ ... ]]: Encloses a Bash conditional construct which returns a status of either 0 or 1 depending on the result EXPRESSION. Do not confuse this with the [ (test) command.
  • $l =~ \$HESS: Matches the expanded value of $l against the regular expression \$HESS. =~ is an additional binary operator in Bash which performs regex matches in conditional expressions.
  • && f=1: Sets the flag for printing lines only if the previous command returns a status of 0. && is the AND operator in Bash command lists.
  • [[ f -eq 1 ]] && echo $l: Print the current line only if the flag is set. -eq is the Bash conditional operator which checks for numeric equality within conditional expressions. (See above).
  • [[ $l =~ \$END ]] && f=0: Clears the flag for printing lines only if the current line matches the regex \$END. (See Above).
  • done: Closes the Bash loop construct. (See Above).
  • < file: Redirects standard input from the file named file for the duration of the loop.

HTH

mzh 04-06-2011 02:24 AM

@telengard:
thanks a lot. if i get it correctly, what your solution does is it runs over the whole file, when it meets the $HESS, it sets $I=1 (kind of as a switch), then it prints everything until it meets the first $END, where it sets $I to 0, and does no longer print the subsequent lines.cool.

H_TeXMeX_H 04-06-2011 05:02 AM

I must say that that is one of the most complicated solutions I've seen in a while. But, it works, so you can do that.

Telengard 04-06-2011 10:23 AM

Quote:

Originally Posted by H_TeXMeX_H (Post 4315572)
I must say that that is one of the most complicated solutions I've seen in a while. But, it works, so you can do that.

Not at all; I could have made it much more convoluted ;)

IMHO providing multiple solutions is not a bad thing. OP gets to choose the one he understands and can better maintain for his own needs.

I must confess though, your sed solution is far more graceful and efficient.
:hattip:

Telengard 04-06-2011 10:33 AM

Quote:

Originally Posted by mzh (Post 4315420)
@telengard:
thanks a lot. if i get it correctly, what your solution does is it runs over the whole file, when it meets the $HESS, it sets $I=1 (kind of as a switch), then it prints everything until it meets the first $END, where it sets $I to 0, and does no longer print the subsequent lines.

That's basically it, yep.

I'm pleased to know you found my post helpful. If you feel your question has been adequately answered then please consider using the thread tools to mark this thread solved.


All times are GMT -5. The time now is 09:08 AM.