awk help

willinusf · 02-29-2008, 02:01 PM

I have a file which is a catalog of molecules with information about those molecules. It is structured as follows:

blah
molecule1
info
blah
molecule2
info

The number of lines of info for each molecule varies. The header "blah" stays constant. I want to extract "blah" through info for each molecule in place the extracted data into a file named after that molecule. So,

blah
molecule1
info

Would go into a file named molecule1 with the extension mol2 (molecule1.mol2). All files would have this extension. I'm new to programming/scripting and would appreciate any help/comments. I've done this:

awk '/^molecule/,/blah/' file

But, that of course leaves out the initial header "blah" and I have no idea how to loop this. Thanks.

Will

angrybanana · 02-29-2008, 02:30 PM

Code:

awk -F'\n' 'NR>1{print substr($0, 0, length($0)-1) > $1".mol2"}' RS='blah\n' catalog

radoulov · 02-29-2008, 03:32 PM

Another one (GNU Awk):

Code:

awk '{close(f);print RS $0>(f=$1".mol2")}' ORS= RS="blah" catalog

If you don't have problems openning too many files,
you could change the code to:

Code:

awk '{print RS $0>$1".mol2"}' ORS= RS="blah" catalog

ghostdog74 · 02-29-2008, 09:06 PM

Quote:

Originally Posted by willinusf

I have a file which is a catalog of molecules with information about those molecules. It is structured as follows:

blah
molecule1
info
blah
molecule2
info

The number of lines of info for each molecule varies. The header "blah" stays constant. I want to extract "blah" through info for each molecule in place the extracted data into a file named after that molecule. So,

blah
molecule1
info

Would go into a file named molecule1 with the extension mol2 (molecule1.mol2).

there's an "algorithm" to do that. so you can use it in any other languages.

Code:

i=0
while read -r line
do
 case $line in
  blah ) 
        i=$(( i+1 )) #increment your file counter
        file="molecule${i}.mol2"  #initialize new file name
        echo $line >> $file;; # print to the new file name
  *) echo $line >>  $file ;;  # concat the rest of the line
 esac
done < "file"