LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   awk power (https://www.linuxquestions.org/questions/programming-9/awk-power-948175/)

divyashree 06-02-2012 01:53 PM

awk power
 
2 Attachment(s)
I have 2 files as attached.

f1 contain a lots of lines like this:

Quote:

319030003003~319030084005~1~0070~EA~
399056084020~319030084006~1~0080~EA~
319056003001~399056084019~1~0010~EA~
319030084005~399056084020~1~0020~EA~
319030084006~319056084001~1~0030~EA~
319030003089~319030084005~1~0070~EA~
399056084029~319030084006~1~0080~EA~
f2 contains

Quote:

319030084006~
319056003001~
319030084005~
So using awk I want to delete the lines in f1, in which the 1st column contains the number in f2 and save in f3.

I want in f3:

Quote:

319030003003~319030084005~1~0070~EA~
399056084020~319030084006~1~0080~EA~
319030003089~319030084005~1~0070~EA~
399056084029~319030084006~1~0080~EA~
How can I do this using awk ??

colucix 06-02-2012 02:08 PM

Code:

awk -F~ 'FNR == NR { _[$1]++ } FNR < NR && !_[$1]' f2 f1 > f3
This uses the TRUE and FALSE interpretation of awk: reading f2 it increases the value of the array _ with index $1; if the number has been encountered in f2, the value of _[$1] is greater or equal to 1, that is TRUE, hence the negation is FALSE and the matching record in f1 is not printed out.

On the contrary, when it encounters a line in f1 whose first field was not listed in f2, the evaluation of _[$1] returns an empty string (that is FALSE) and the negation makes it TRUE: indeed we want these lines be printed out!

Hope this helps.

David the H. 06-02-2012 02:15 PM

Just for fun, here's a quick&dirty solution using grep, sed, and bash's process substitution.

Code:

grep -v -f <( sed 's/^/^/' f2 ) f1
The sed p.sub is there to alter the contents of f2 first into regexes that only match the beginning of the line.

Reuti 06-08-2012 06:09 AM

Although the question was for awk:
Code:

$ join -v 1 -t "~" <(sort f1) <(sort f2)


All times are GMT -5. The time now is 10:38 AM.