AWK: split the file into multiple file and request for explanation of a known code
Dear Experts,
I have a file looks like: Code:
input_wez The output file should have a name labled by the three code after "input_" and a sequence number, like: Code:
wez_1.txt Code:
.... No "input_***" and No that empty line between the "input_***" and the content. I modified some other's code and now can achieve close result by: Code:
awk -F_ '/input/{ f=$2; n++; next} f{print > f "_" n ".pdb"} /END/{close(f);f=x}' INPUTFILE Code:
#EMPTY LINE APPEARED HERE My questions are: 1. How to eliminate the empty line by the simplest modification in above awk code 2. In the above awk code, what is the meaning of the f before Code:
f{print > f "_" n ".pdb"} Code:
{print > f "_" n ".pdb"} Is this a general method when I am trying to write to files? What is the general usage and functional purpose of Code:
f{....} 3. In the end of my awk code, when I close the file by Code:
{close(f);f=x} Could you please, may be, if you understand better the code than me, explain a bit more for these two parts of the code? I know, may be these questions are annoying. But now I am really tring very hard to understand AWK and I really hope I can use it more freely. To do that I have to have a better and deeper understanding. I hope these question may not disturb you too much. But, if you don't like it, please just ignore it. I would thank you all the same!!! |
Quote:
Quote:
Code:
/input/ Quote:
|
Hi all,
I found a answer for the 1st question, but may be not the simplest method: Code:
awk 'BEGIN {FS = "_"} /input/{ f=$2; n++;next} f{if (NF > 0) print > f "_" n ".txt"} /END/{close(f);f=x}' INPUTFILE Thanks! |
Hi, cristalp.
Try this: Code:
awk -F_ '/input/{ f=$2; n++; m=0; next;} {m++} m>1&&f{print > f "_" n ".pdb"} /end/{close(f); f=0}' test.txt 1. To eliminate empty line (if you mean the line after input_* ) one could use additional counter `m', which counts lines after input_* and print only lines with m > 2. See above for example. 2,3. In the code Code:
f{print > f "_" n ".pdb"} Resetting f to x means resetting f to empty string (because variable `x' is not set) so as to f be a logical false. Note that I reset `f' to zero with the same effect. If you remove `f' and use just {print > f "_" n ".pdb"}, then you get not only wez_1.pdb etc, but also _1.pdb etc. _n.pdb files contain what you called 'useless content' which follow n-th input_***...end record. This happens because you print every line regardless of the value of `f' and f=="" for useless content. Note that /END/ in your code should read as /end/ (if you use `end' in input file). Hope this helps. I apologize for my poor english. |
Quote:
|
All times are GMT -5. The time now is 12:42 PM. |