Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to
LinuxQuestions.org , a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free.
Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please
contact us . If you need to reset your password,
click here .
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a
virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month.
Click here for more info.
02-20-2021, 10:38 PM
#1
LQ Newbie
Registered: Jan 2021
Posts: 18
Rep:
How can I use awk to count matched lines
greeting
Someone wrote this code on stackoverflow. The code works but I need to know how to add a count for matched lines and which line in a.txt found the match.
Contents of a.txt
01 02 03 04
11 12 13 14
04 02 03 01
19 20 21 25
11 15 39 01
Contents of b.txt
01 02 03 04 05
11 15 33 31 12
01 02 05 25 17
03 02 01 04 21
The Code
Code:
#!/bin/bash
awk 'NR==FNR { a[++n] = $1; b[n] = $2; c[n] = $3; d[n] = $4; next }
{ for (i=1; i<=n; ++i)
if ($0 ~ a[i] && $0 ~ b[i] && $0 ~ c[i] && $0 ~ d[i]) {
print; break }
}' a.txt b.txt
The output of script
Code:
$ awk-testmatch
01 02 03 04 05
03 02 01 04 21
How can I count the matches made, eg. 2 in above and which line in a.txt file
did the match?
I appreciate any help or suggestions, Thanks
02-20-2021, 11:12 PM
#2
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,153
Perhaps you should make some effort yourself. We're not here to write your code for you - we may help if you run into problems you can't solve.
02-20-2021, 11:47 PM
#3
LQ Guru
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,011
Agree with syg00 that you should at least give it a try, but I will give you a hint, the matching line will be a FNR value and an easy counter is cnt++
And to look up more information:
http://www.gnu.org/software/gawk/man...ode/index.html
02-21-2021, 12:27 AM
#4
LQ Newbie
Registered: Jan 2021
Posts: 18
Original Poster
Rep:
i tired this, but I don't know where tp put the FNR and cnt++
Code:
awk 'NR==FNR { a[++n] = $1; b[n] = $2; c[n] = $3; d[n] = $4; next }
{ for (i=1; i<=n; ++i)
if ($0 ~ a[i] && $0 ~ b[i] && $0 ~ c[i] && $0 ~ d[i]) { cnt++ } {
print FNR, cnt; print; break }
}' a.txt b.txt
The output of script
awk-testmatch
awk: cmd. line:4: error: `break' is not allowed outside a loop or switch
awk: cmd. line:4: error: `break' is not allowed outside a loop or switch
I removed the break statement and ran the script again. Below is the output
Code:
$ awk-testmatch
1 2
01 02 03 04 05
2 2
11 15 33 31 12
3 2
01 02 05 25 17
4 4
03 02 01 04 21
01 02 05 25 17 and
11 15 33 31 12 should not be in the output.
I did not write this script. And I know hardly nothing of awk. Please tell me how. Thanks
02-21-2021, 01:41 AM
#5
Member
Registered: Jul 2004
Location: Chennai, India
Posts: 952
Look carefully at all the ripple bracketd that you ADDED.
02-21-2021, 01:49 AM
#6
Senior Member
Registered: Dec 2011
Location: Simplicity
Posts: 2,832
You want to count how many lines are printed? Then have the cnt++ or ++cnt near the print.
Immediate print:
Code:
awk '
NR==FNR { a[++n] = $1; b[n] = $2; c[n] = $3; d[n] = $4; next }
{ for (i=1; i<=n; ++i)
if ($0 ~ a[i] && $0 ~ b[i] && $0 ~ c[i] && $0 ~ d[i]) {
print FNR, ++cnt; print; break
}
}
' a.txt b.txt
Summary print at the very end:
Code:
awk '
NR==FNR { a[++n] = $1; b[n] = $2; c[n] = $3; d[n] = $4; next }
{ for (i=1; i<=n; ++i)
if ($0 ~ a[i] && $0 ~ b[i] && $0 ~ c[i] && $0 ~ d[i]) {
print FNR; ++cnt; print; break
}
}
END { print cnt, "lines matched." }
' a.txt b.txt
1 members found this post helpful.
02-21-2021, 04:36 AM
#7
Senior Member
Registered: Mar 2020
Posts: 3,686
Rep:
Quote:
Originally Posted by
DV100
which line in a.txt file did the match?
That line number is in the variable
i . OTOH, FNR shows which line in
b.txt did the match.
Last edited by shruggy; 02-21-2021 at 09:08 AM .
1 members found this post helpful.
02-21-2021, 02:48 PM
#8
LQ Newbie
Registered: Jan 2021
Posts: 18
Original Poster
Rep:
Quote:
Originally Posted by
MadeInGermany
You want to count how many lines are printed? Then have the cnt++ or ++cnt near the print.
Immediate print:
Code:
awk '
NR==FNR { a[++n] = $1; b[n] = $2; c[n] = $3; d[n] = $4; next }
{ for (i=1; i<=n; ++i)
if ($0 ~ a[i] && $0 ~ b[i] && $0 ~ c[i] && $0 ~ d[i]) {
print FNR, ++cnt; print; break
}
}
' a.txt b.txt
Summary print at the very end:
Code:
awk '
NR==FNR { a[++n] = $1; b[n] = $2; c[n] = $3; d[n] = $4; next }
{ for (i=1; i<=n; ++i)
if ($0 ~ a[i] && $0 ~ b[i] && $0 ~ c[i] && $0 ~ d[i]) {
print FNR; ++cnt; print; break
}
}
END { print cnt, "lines matched." }
' a.txt b.txt
Thanks a bunch for the help. I didn't know about using END to summarize.
Code:
awk '
NR==FNR { a[++n] = $1; b[n] = $2; c[n] = $3; d[n] = $4; next }
{ for (i=1; i<=n; ++i)
if ($0 ~ a[i] && $0 ~ b[i] && $0 ~ c[i] && $0 ~ d[i]) {
print ++cnt; print; break
}
}
END { print a[i],b[i],c[i], "and", d[i], "match", cnt, "lines." }
' a.txt b.txt
Code:
$ awk-testmatch
1
01 02 03 04 05
2
03 02 01 04 21
01 02 03 and 04 match 2 lines.
Not sure what the 1 and 2 in bold is printed. But, hey the END code does what I want!!!
Last edited by DV100; 02-21-2021 at 02:49 PM .
02-21-2021, 02:53 PM
#9
LQ Newbie
Registered: Jan 2021
Posts: 18
Original Poster
Rep:
Quote:
Originally Posted by
shruggy
That line number is in the variable i . OTOH, FNR shows which line in b.txt did the match.
Thanks for that 'i' tip. After researching for loops in array i tried
Code:
END { print a[i],b[i],c[i], "and", d[i], "match", cnt, "lines." }
' a.txt b.txt
and it works.
Thanks a bunch too.
02-21-2021, 04:41 PM
#10
Senior Member
Registered: Dec 2011
Location: Simplicity
Posts: 2,832
prints the incremented cnt.
While
increments cnt
I think your output of a[i],b[i],c[i],d[i] is not always correct.
Perhaps you should count per matching pattern.
Code:
awk '
# loop over input lines
NR==FNR {
# true for 1st file (a.txt)
# store patterns
a[++n] = $1; b[n] = $2; c[n] = $3; d[n] = $4; next
}
# other file(s)
{
# loop over patterns
for (i=1; i<=n; ++i)
if ($0 ~ a[i] && $0 ~ b[i] && $0 ~ c[i] && $0 ~ d[i]) {
# found pattern in line
# print line, count per pattern; break the loop
print; cnt[i]++; break
}
}
END {
# loop over matched patterns
for (i in cnt)
print a[i],b[i],c[i],d[i], "match", cnt[i], "lines."
}
' a.txt b.txt
Last edited by MadeInGermany; 02-21-2021 at 04:45 PM .
1 members found this post helpful.
02-21-2021, 09:47 PM
#11
LQ Newbie
Registered: Jan 2021
Posts: 18
Original Poster
Rep:
Thanks a bunch MadeInGermany. The script was great.
BTW, I'm going to learn more about awk.
It's such a powerful command worth learning.
02-22-2021, 04:13 PM
#12
LQ Newbie
Registered: Jan 2021
Posts: 18
Original Poster
Rep:
I have a question. Anyone can answer if they know.
Why does the 'a' array has ++n, while b,c,d just has 'n' without pluses.
Quote:
a[++n] = $1; b[n] = $2; c[n] = $3; d[n] = $4; next
Thanks
02-22-2021, 06:01 PM
#13
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,153
Did you look at the doco grail alluded to ?. Check the section "Increment and Decrement Operators".
02-22-2021, 10:34 PM
#14
Senior Member
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,882
@OP
You could replace it with this:
Code:
++n; a[n] = $1; b[n] = $2; c[n] = $3; d[n] = $4; next
02-23-2021, 02:21 PM
#15
LQ Newbie
Registered: Jan 2021
Posts: 18
Original Poster
Rep:
Quote:
Originally Posted by
NevemTeve
@OP
You could replace it with this:
Code:
++n; a[n] = $1; b[n] = $2; c[n] = $3; d[n] = $4; next
Thanks. I'm trying to learn as much as possible. I saw some books in amazon about awk and they have over 100 pages and more. I thinks I might get one of those kindle books and start from the beginning.
All times are GMT -5. The time now is 02:02 PM .
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know .
Latest Threads
LQ News