LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 04-22-2022, 05:59 PM   #1
noob555
LQ Newbie
 
Registered: Apr 2022
Posts: 13

Rep: Reputation: 0
Need help with Awk code explaination


hello,

I found this awk code in a forum but the author offered no explanation on how it works. The code works by printing lines matched from files a.txt and b.txt. For example:

a.txt
one two three
cat dog bird
a b c

b.txt

one two three
cat dog bird
c b a
123 456 789

Executing the script, I get:

./awktest
one two three
cat dog bird
c b a




Code:
#!/bin/bash

awk '
NR==FNR         {a[$0]
                 next
                }
                {for (i in a)   {m=split (i, b, " ")
                                 for (j=1; ($0 ~ b[j]) && j<=m; j++);
                                 if (j > m)     {print
                                                 next
                                                }
                                }
                }
' a.txt b.txt
What are the lines in red mean? More importantly the purple line. I appreciate any feedback. Thanks

Last edited by noob555; 04-22-2022 at 06:04 PM.
 
Old 04-22-2022, 07:06 PM   #2
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,276
Blog Entries: 24

Rep: Reputation: 4223Reputation: 4223Reputation: 4223Reputation: 4223Reputation: 4223Reputation: 4223Reputation: 4223Reputation: 4223Reputation: 4223Reputation: 4223Reputation: 4223
Welcome to LQ and the Programming forum!

The first thing I would ask is what are you expecting it to do?

To answer your specific questions, the lines in red test each line in b.txt to see if it contains each word of some line in a.txt (all of which have been entered into an array named a[]).

The part in purple:

Code:
($0 ~ b[j]) && j<=m;
... is the loop test which returns true (1) if both conditions are true:

Code:
($0 ~ b[j]) Tests if b[j] (i.e. some word from a line in a[])
            is also in the current line from b.txt (i.e. $0)
j<=m        Tests that there are not more words in the line from a.txt
            than there are in the line from b.txt
 
2 members found this post helpful.
Old 04-22-2022, 07:54 PM   #3
Skaperen
Senior Member
 
Registered: May 2009
Location: center of singularity
Distribution: Xubuntu, Ubuntu, Slackware, Amazon Linux, OpenBSD, LFS (on Sparc_32 and i386)
Posts: 2,689
Blog Entries: 31

Rep: Reputation: 176Reputation: 176
hmmm. i see no purple. the code that @astrogeek quoted shows up as very slightly brighter red.
 
Old 04-22-2022, 10:00 PM   #4
noob555
LQ Newbie
 
Registered: Apr 2022
Posts: 13

Original Poster
Rep: Reputation: 0
@astrogeek

Hi, astrogeek. Thanks for the warm welcome.

I understand it a little bit better. But need to do more research on it.


What search terms of awk should I use for research to understand the author's code better?

Thanks in advance.
 
Old 04-22-2022, 10:55 PM   #5
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,154

Rep: Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125
Not all awk are created equal - I prefer GNU awk (gawk) for the extensions implemented. This may lead to one creating non-POSIX code - but for me that's not a concern. I find the gawk effective programming guide very handy - download it here.

Very expansive - not just a reference manual for the language.
 
1 members found this post helpful.
Old 04-22-2022, 11:31 PM   #6
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,011

Rep: Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194
The (g)awk specific items to look up would be:

NR
FNR
next
split

for loops and if's behave the same as most other languages
 
1 members found this post helpful.
Old 04-23-2022, 12:38 AM   #7
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,276
Blog Entries: 24

Rep: Reputation: 4223Reputation: 4223Reputation: 4223Reputation: 4223Reputation: 4223Reputation: 4223Reputation: 4223Reputation: 4223Reputation: 4223Reputation: 4223Reputation: 4223
Quote:
Originally Posted by noob555 View Post
@astrogeek

Hi, astrogeek. Thanks for the warm welcome.

I understand it a little bit better. But need to do more research on it.

What search terms of awk should I use for research to understand the author's code better?

Thanks in advance.
You are welcome!

As already noted by others, terms for further search would include NR, FNR, next and split. You should add regular expressions and awk operators to that list.

The GNU gawk programming guide linked above is an excellent and authoritative reference!

Although not the best tutorial to start with, once you understand the basics of how awk iterates over a file, the man and info pages, man awk and info awk, provide a very complete quick reference.

For example, to understand the ~ operator (regular expression match), in man awk you will find this:

Code:
   Operators
       The operators in AWK, in order of decreasing precedence, are:

       ...

       ~ !~        Regular expression match, negated match.  NOTE: Do not use a constant regular  expres‐
                   sion (/foo/) on the left-hand side of a ~ or !~.  Only use one on the right-hand side.
                   The expression /foo/ ~ exp has the same meaning as (($0 ~ /foo/) ~ exp).  This is usu‐
                   ally not what you want.
Hope that helps!
 
Old 04-24-2022, 08:06 AM   #8
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,836

Rep: Reputation: 1219Reputation: 1219Reputation: 1219Reputation: 1219Reputation: 1219Reputation: 1219Reputation: 1219Reputation: 1219Reputation: 1219
awk loops over the lines of the given input files; the awk code runs on each input line.

NR==FNR
true if the first file (a.txt) is read.
a[$0]
stores the input line (from a.txt) in an array as index (no value)
next
skips the following code (continue with the next input cycle)
The following code runs for each line from the remaining file (b.txt)
for (i in a)
loops thru the indexes of array a
each i is one line from a.txt
m=split (i, b, " ")
splits string i into elements that are stored in array b (as values)
m is the number of array members
for (j=1; ($0 ~ b[j]) && j<=m; j++)
loops thru the array b but can also stop if the first condition is not met
j can be 1...3
b[j] is the value
($0 ~ b[j])
true if value is in the current line (matched)
if (j > m)
true if every item matched

Note that the ~ operator matches everywhere. For example
"b" matches in "c bx a"
If you want an exact field match then have another loop that cycles through the current input fields $1...3 and compare with the == operator.
 
2 members found this post helpful.
Old 04-24-2022, 10:50 PM   #9
noob555
LQ Newbie
 
Registered: Apr 2022
Posts: 13

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by grail View Post
The (g)awk specific items to look up would be:

NR
FNR
next
split

for loops and if's behave the same as most other languages
Quote:
Originally Posted by astrogeek View Post
You are welcome!

As already noted by others, terms for further search would include NR, FNR, next and split. You should add regular expressions and awk operators to that list.

The GNU gawk programming guide linked above is an excellent and authoritative reference!

Although not the best tutorial to start with, once you understand the basics of how awk iterates over a file, the man and info pages, man awk and info awk, provide a very complete quick reference.

For example, to understand the ~ operator (regular expression match), in man awk you will find this:

Code:
   Operators
       The operators in AWK, in order of decreasing precedence, are:

       ...

       ~ !~        Regular expression match, negated match.  NOTE: Do not use a constant regular  expres‐
                   sion (/foo/) on the left-hand side of a ~ or !~.  Only use one on the right-hand side.
                   The expression /foo/ ~ exp has the same meaning as (($0 ~ /foo/) ~ exp).  This is usu‐
                   ally not what you want.
Hope that helps!
Quote:
Originally Posted by MadeInGermany View Post
awk loops over the lines of the given input files; the awk code runs on each input line.

NR==FNR
true if the first file (a.txt) is read.
a[$0]
stores the input line (from a.txt) in an array as index (no value)
next
skips the following code (continue with the next input cycle)
The following code runs for each line from the remaining file (b.txt)
for (i in a)
loops thru the indexes of array a
each i is one line from a.txt
m=split (i, b, " ")
splits string i into elements that are stored in array b (as values)
m is the number of array members
for (j=1; ($0 ~ b[j]) && j<=m; j++)
loops thru the array b but can also stop if the first condition is not met
j can be 1...3
b[j] is the value
($0 ~ b[j])
true if value is in the current line (matched)
if (j > m)
true if every item matched

Note that the ~ operator matches everywhere. For example
"b" matches in "c bx a"
If you want an exact field match then have another loop that cycles through the current input fields $1...3 and compare with the == operator.
Thank you all. it's much more clearer now. Everyone who replied is awesome.

Last edited by noob555; 04-24-2022 at 10:52 PM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] sed inside awk or awk inside awk maddyfreaks Linux - Newbie 4 06-29-2016 01:10 PM
[SOLVED] Once again... awk.. awk... awk shivaa Linux - Newbie 13 12-31-2012 04:56 AM
802.11a-i explaination(s)? Curiosity is all.. MasterC Linux - Wireless Networking 1 07-06-2007 01:56 AM
Explaination Of DNS Terms Joe_Astor Linux - Networking 7 04-11-2004 02:56 AM
Is there an In dept explaination of 2.4.22 Kernal options? jimdaworm Linux - Newbie 2 11-15-2003 02:51 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 04:12 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration