ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I found this awk code in a forum but the author offered no explanation on how it works. The code works by printing lines matched from files a.txt and b.txt. For example:
a.txt
one two three
cat dog bird
a b c
b.txt
one two three
cat dog bird
c b a
123 456 789
Executing the script, I get:
./awktest
one two three
cat dog bird
c b a
Code:
#!/bin/bash
awk '
NR==FNR {a[$0]
next
}
{for (i in a) {m=split (i, b, " ")
for (j=1; ($0 ~ b[j]) && j<=m; j++);
if (j > m) {print
next
}
}
}
' a.txt b.txt
What are the lines in red mean? More importantly the purple line. I appreciate any feedback. Thanks
The first thing I would ask is what are you expecting it to do?
To answer your specific questions, the lines in red test each line in b.txt to see if it contains each word of some line in a.txt (all of which have been entered into an array named a[]).
The part in purple:
Code:
($0 ~ b[j]) && j<=m;
... is the loop test which returns true (1) if both conditions are true:
Code:
($0 ~ b[j]) Tests if b[j] (i.e. some word from a line in a[])
is also in the current line from b.txt (i.e. $0)
j<=m Tests that there are not more words in the line from a.txt
than there are in the line from b.txt
Not all awk are created equal - I prefer GNU awk (gawk) for the extensions implemented. This may lead to one creating non-POSIX code - but for me that's not a concern. I find the gawk effective programming guide very handy - download it here.
Very expansive - not just a reference manual for the language.
I understand it a little bit better. But need to do more research on it.
What search terms of awk should I use for research to understand the author's code better?
Thanks in advance.
You are welcome!
As already noted by others, terms for further search would include NR, FNR, next and split. You should add regular expressions and awk operators to that list.
The GNU gawk programming guide linked above is an excellent and authoritative reference!
Although not the best tutorial to start with, once you understand the basics of how awk iterates over a file, the man and info pages, man awk and info awk, provide a very complete quick reference.
For example, to understand the ~ operator (regular expression match), in man awk you will find this:
Code:
Operators
The operators in AWK, in order of decreasing precedence, are:
...
~ !~ Regular expression match, negated match. NOTE: Do not use a constant regular expres‐
sion (/foo/) on the left-hand side of a ~ or !~. Only use one on the right-hand side.
The expression /foo/ ~ exp has the same meaning as (($0 ~ /foo/) ~ exp). This is usu‐
ally not what you want.
awk loops over the lines of the given input files; the awk code runs on each input line.
NR==FNR
true if the first file (a.txt) is read.
a[$0]
stores the input line (from a.txt) in an array as index (no value)
next
skips the following code (continue with the next input cycle)
The following code runs for each line from the remaining file (b.txt)
for (i in a)
loops thru the indexes of array a
each i is one line from a.txt
m=split (i, b, " ")
splits string i into elements that are stored in array b (as values)
m is the number of array members
for (j=1; ($0 ~ b[j]) && j<=m; j++)
loops thru the array b but can also stop if the first condition is not met
j can be 1...3
b[j] is the value
($0 ~ b[j])
true if value is in the current line (matched)
if (j > m)
true if every item matched
Note that the ~ operator matches everywhere. For example
"b" matches in "c bx a"
If you want an exact field match then have another loop that cycles through the current input fields $1...3 and compare with the == operator.
for loops and if's behave the same as most other languages
Quote:
Originally Posted by astrogeek
You are welcome!
As already noted by others, terms for further search would include NR, FNR, next and split. You should add regular expressions and awk operators to that list.
The GNU gawk programming guide linked above is an excellent and authoritative reference!
Although not the best tutorial to start with, once you understand the basics of how awk iterates over a file, the man and info pages, man awk and info awk, provide a very complete quick reference.
For example, to understand the ~ operator (regular expression match), in man awk you will find this:
Code:
Operators
The operators in AWK, in order of decreasing precedence, are:
...
~ !~ Regular expression match, negated match. NOTE: Do not use a constant regular expres‐
sion (/foo/) on the left-hand side of a ~ or !~. Only use one on the right-hand side.
The expression /foo/ ~ exp has the same meaning as (($0 ~ /foo/) ~ exp). This is usu‐
ally not what you want.
Hope that helps!
Quote:
Originally Posted by MadeInGermany
awk loops over the lines of the given input files; the awk code runs on each input line.
NR==FNR
true if the first file (a.txt) is read.
a[$0]
stores the input line (from a.txt) in an array as index (no value)
next
skips the following code (continue with the next input cycle)
The following code runs for each line from the remaining file (b.txt)
for (i in a)
loops thru the indexes of array a
each i is one line from a.txt
m=split (i, b, " ")
splits string i into elements that are stored in array b (as values)
m is the number of array members
for (j=1; ($0 ~ b[j]) && j<=m; j++)
loops thru the array b but can also stop if the first condition is not met
j can be 1...3
b[j] is the value
($0 ~ b[j])
true if value is in the current line (matched)
if (j > m)
true if every item matched
Note that the ~ operator matches everywhere. For example
"b" matches in "c bx a"
If you want an exact field match then have another loop that cycles through the current input fields $1...3 and compare with the == operator.
Thank you all. it's much more clearer now. Everyone who replied is awesome.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.