LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Awk to extract patterns till it hits blank line (in for loop) (https://www.linuxquestions.org/questions/linux-newbie-8/awk-to-extract-patterns-till-it-hits-blank-line-in-for-loop-892916/)

Tauro 07-21-2011 05:49 AM

Awk to extract patterns till it hits blank line (in for loop)
 
I have a list of patterns in file1
Code:

10047134
10047140
100816392
100913026
100913028
100913192
...
..

file2
Code:

>gi|10047134|ref|
MMWQCHLSAQDYRYYPVDGYSLLKRFPLHPLTGPRCPVQTVGQWLESIGLPQYENHLMANGFDNVQFMGSNVMEDQDLLE
HRKRILASLGLRPPNEATASTPVQYWQHHPEKLIFQSCDYKAFYLGSMLIKELRGTESTQDACAKMRANCQKSTEQMKKVPTIILSVSYKGVKFIDATNKNIIAEHEIRNISCAAQDPEDLSTFAYITKDLK
SNHHYCHVF

>gi|10047140|ref
MESEMETQSARAEEGFTQVTRKGGRRAKKRQAEQLSAAGEGGDAGRMDTEEARPAKRPVFPPLCGDGLLSGKEETRKIPV
PANRYTPLKENWMKIFTPIVEHLGLQIRFNLKSRNVEIRTCKETKDVSALTKAADFVKAFILGFQVEDALALIRLDDLFL
ESFEITDVKPLKGDHL

>gi|100913028|ref|
MEVAEKLQLLNHRPVTAVEIQLMVEESEERLTEEQIEALLHTVTSILPAEPEAEQKKNTNSNVAMDEEDPA

What i want to do is, for each pattern in file1 pull out the line containing the pattern and the info below it from file2, till it hits blank line.

The awk one-liner I used works given a single pattern:
Code:

awk 'BEGIN{RS=ORS="\n\n"; FS="\n"}/pattern/' infile
But..
Code:

for i in `cat file1`
> do
> awk 'BEGIN{RS=ORS="\n\n"; FS="\n"}/$i/' file2 >>Outfile
> done

...gives blank outfile.
Can you suggest a better way out ?!

grail 07-21-2011 09:34 AM

Well being it is a number of varying length, I presume you are defining the pattern with pipes as delimiters?

As for reading 2 files, pass file1 to awk inside your BEGIN and assign individual numbers to an array (normally you could do this in the script part instead but your
change in RS would read all of the first file), then check each number in second file for each record against the array.

The alternative is the way you started and simply use the -v option to assign $i to an awk variable. Of course the hit here is the awk is executed every time.

Tauro 07-21-2011 09:55 AM

Putting in an array and matching records gives me the line containing the pattern.
How do i go about printing the lines below it? :|

colucix 07-21-2011 10:23 AM

Quote:

Originally Posted by Tauro (Post 4421288)
How do i go about printing the lines below it? :|

Try getline inside a while loop, e.g.
Code:

awk -F"|" 'FNR == NR { pattern[$1]++; next } FNR < NR { if ( $2 in pattern ) { while ( $0 !~ /^$/ ) { print; getline } print "" } }' file1 file2

grail 07-21-2011 12:46 PM

Another alternative:
Code:

awk 'BEGIN{RS = ""; ORS = "\n\n"; FS = "\n"}FNR == NR{while(++i <= NF)a[$i]++;FS="|";next}$2 in a' file1 file2

Tauro 07-21-2011 11:20 PM

Got it...!!! :D
Thanks a ton grail n colucix :)


All times are GMT -5. The time now is 03:24 PM.