LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   awk help (https://www.linuxquestions.org/questions/programming-9/awk-help-624810/)

willinusf 02-29-2008 02:01 PM

awk help
 
I have a file which is a catalog of molecules with information about those molecules. It is structured as follows:

blah
molecule1
info
blah
molecule2
info


The number of lines of info for each molecule varies. The header "blah" stays constant. I want to extract "blah" through info for each molecule in place the extracted data into a file named after that molecule. So,

blah
molecule1
info

Would go into a file named molecule1 with the extension mol2 (molecule1.mol2). All files would have this extension. I'm new to programming/scripting and would appreciate any help/comments. I've done this:

awk '/^molecule/,/blah/' file

But, that of course leaves out the initial header "blah" and I have no idea how to loop this. Thanks.

Will

angrybanana 02-29-2008 02:30 PM

Code:

awk -F'\n' 'NR>1{print substr($0, 0, length($0)-1) > $1".mol2"}' RS='blah\n' catalog

radoulov 02-29-2008 03:32 PM

Another one (GNU Awk):

Code:

awk '{close(f);print RS $0>(f=$1".mol2")}' ORS= RS="blah" catalog
If you don't have problems openning too many files,
you could change the code to:

Code:

awk '{print RS $0>$1".mol2"}' ORS= RS="blah" catalog

ghostdog74 02-29-2008 09:06 PM

Quote:

Originally Posted by willinusf (Post 3074175)
I have a file which is a catalog of molecules with information about those molecules. It is structured as follows:

blah
molecule1
info
blah
molecule2
info


The number of lines of info for each molecule varies. The header "blah" stays constant. I want to extract "blah" through info for each molecule in place the extracted data into a file named after that molecule. So,

blah
molecule1
info

Would go into a file named molecule1 with the extension mol2 (molecule1.mol2).

there's an "algorithm" to do that. so you can use it in any other languages.
Code:

i=0
while read -r line
do
 case $line in
  blah )
        i=$(( i+1 )) #increment your file counter
        file="molecule${i}.mol2"  #initialize new file name
        echo $line >> $file;; # print to the new file name
  *) echo $line >>  $file ;;  # concat the rest of the line
 esac
done < "file"



All times are GMT -5. The time now is 03:20 PM.