LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 02-20-2021, 10:38 PM   #1
DV100
LQ Newbie
 
Registered: Jan 2021
Posts: 18

Rep: Reputation: Disabled
How can I use awk to count matched lines


greeting

Someone wrote this code on stackoverflow. The code works but I need to know how to add a count for matched lines and which line in a.txt found the match.

Contents of a.txt

01 02 03 04
11 12 13 14
04 02 03 01
19 20 21 25
11 15 39 01

Contents of b.txt

01 02 03 04 05
11 15 33 31 12
01 02 05 25 17
03 02 01 04 21

The Code

Code:
#!/bin/bash

awk 'NR==FNR { a[++n] = $1; b[n] = $2; c[n] = $3; d[n] = $4; next }
    { for (i=1; i<=n; ++i)
        if ($0 ~ a[i] && $0 ~ b[i] && $0 ~ c[i] && $0 ~ d[i]) {
            print; break }
    }' a.txt b.txt
The output of script
Code:
$ awk-testmatch
01 02 03 04 05
03 02 01 04 21

How can I count the matches made, eg. 2 in above and which line in a.txt file
did the match?

I appreciate any help or suggestions, Thanks
 
Old 02-20-2021, 11:12 PM   #2
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,153

Rep: Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125
Perhaps you should make some effort yourself. We're not here to write your code for you - we may help if you run into problems you can't solve.
 
Old 02-20-2021, 11:47 PM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,011

Rep: Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194Reputation: 3194
Agree with syg00 that you should at least give it a try, but I will give you a hint, the matching line will be a FNR value and an easy counter is cnt++

And to look up more information: http://www.gnu.org/software/gawk/man...ode/index.html
 
Old 02-21-2021, 12:27 AM   #4
DV100
LQ Newbie
 
Registered: Jan 2021
Posts: 18

Original Poster
Rep: Reputation: Disabled
i tired this, but I don't know where tp put the FNR and cnt++

Code:
awk 'NR==FNR { a[++n] = $1; b[n] = $2; c[n] = $3; d[n] = $4; next }
    { for (i=1; i<=n; ++i)
        if ($0 ~ a[i] && $0 ~ b[i] && $0 ~ c[i] && $0 ~ d[i]) { cnt++ } {
            print FNR, cnt; print; break }
    }' a.txt b.txt
The output of script

awk-testmatch
awk: cmd. line:4: error: `break' is not allowed outside a loop or switch
awk: cmd. line:4: error: `break' is not allowed outside a loop or switch


I removed the break statement and ran the script again. Below is the output
Code:
$ awk-testmatch
1 2
01 02 03 04 05
2 2
11 15 33 31 12
3 2
01 02 05 25 17
4 4
03 02 01 04 21

01 02 05 25 17
and 11 15 33 31 12 should not be in the output.

I did not write this script. And I know hardly nothing of awk. Please tell me how. Thanks
 
Old 02-21-2021, 01:41 AM   #5
AnanthaP
Member
 
Registered: Jul 2004
Location: Chennai, India
Posts: 952

Rep: Reputation: 217Reputation: 217Reputation: 217
Look carefully at all the ripple bracketd that you ADDED.
 
Old 02-21-2021, 01:49 AM   #6
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,832

Rep: Reputation: 1219Reputation: 1219Reputation: 1219Reputation: 1219Reputation: 1219Reputation: 1219Reputation: 1219Reputation: 1219Reputation: 1219
You want to count how many lines are printed? Then have the cnt++ or ++cnt near the print.
Immediate print:
Code:
awk '
    NR==FNR { a[++n] = $1; b[n] = $2; c[n] = $3; d[n] = $4; next }
    { for (i=1; i<=n; ++i)
        if ($0 ~ a[i] && $0 ~ b[i] && $0 ~ c[i] && $0 ~ d[i]) {
            print FNR, ++cnt; print; break
        }
    }
' a.txt b.txt
Summary print at the very end:
Code:
awk '
    NR==FNR { a[++n] = $1; b[n] = $2; c[n] = $3; d[n] = $4; next }
    { for (i=1; i<=n; ++i)
        if ($0 ~ a[i] && $0 ~ b[i] && $0 ~ c[i] && $0 ~ d[i]) {
            print FNR; ++cnt; print; break
        }
    }
    END { print cnt, "lines matched." }
' a.txt b.txt
 
1 members found this post helpful.
Old 02-21-2021, 04:36 AM   #7
shruggy
Senior Member
 
Registered: Mar 2020
Posts: 3,686

Rep: Reputation: Disabled
Quote:
Originally Posted by DV100 View Post
which line in a.txt file did the match?
That line number is in the variable i. OTOH, FNR shows which line in b.txt did the match.

Last edited by shruggy; 02-21-2021 at 09:08 AM.
 
1 members found this post helpful.
Old 02-21-2021, 02:48 PM   #8
DV100
LQ Newbie
 
Registered: Jan 2021
Posts: 18

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by MadeInGermany View Post
You want to count how many lines are printed? Then have the cnt++ or ++cnt near the print.
Immediate print:
Code:
awk '
    NR==FNR { a[++n] = $1; b[n] = $2; c[n] = $3; d[n] = $4; next }
    { for (i=1; i<=n; ++i)
        if ($0 ~ a[i] && $0 ~ b[i] && $0 ~ c[i] && $0 ~ d[i]) {
            print FNR, ++cnt; print; break
        }
    }
' a.txt b.txt
Summary print at the very end:
Code:
awk '
    NR==FNR { a[++n] = $1; b[n] = $2; c[n] = $3; d[n] = $4; next }
    { for (i=1; i<=n; ++i)
        if ($0 ~ a[i] && $0 ~ b[i] && $0 ~ c[i] && $0 ~ d[i]) {
            print FNR; ++cnt; print; break
        }
    }
    END { print cnt, "lines matched." }
' a.txt b.txt
Thanks a bunch for the help. I didn't know about using END to summarize.

Code:
awk '
    NR==FNR { a[++n] = $1; b[n] = $2; c[n] = $3; d[n] = $4; next }
    { for (i=1; i<=n; ++i)
        if ($0 ~ a[i] && $0 ~ b[i] && $0 ~ c[i] && $0 ~ d[i]) {
            print ++cnt; print; break
        }
    }
    END { print a[i],b[i],c[i], "and", d[i], "match", cnt, "lines." }
' a.txt b.txt

Code:
$ awk-testmatch
1
01 02 03 04 05
2
03 02 01 04 21
01 02 03 and 04 match 2 lines.
Not sure what the 1 and 2 in bold is printed. But, hey the END code does what I want!!!

Last edited by DV100; 02-21-2021 at 02:49 PM.
 
Old 02-21-2021, 02:53 PM   #9
DV100
LQ Newbie
 
Registered: Jan 2021
Posts: 18

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by shruggy View Post
That line number is in the variable i. OTOH, FNR shows which line in b.txt did the match.
Thanks for that 'i' tip. After researching for loops in array i tried
Code:
END { print a[i],b[i],c[i], "and", d[i], "match", cnt, "lines." }
' a.txt b.txt
and it works.

Thanks a bunch too.

 
Old 02-21-2021, 04:41 PM   #10
MadeInGermany
Senior Member
 
Registered: Dec 2011
Location: Simplicity
Posts: 2,832

Rep: Reputation: 1219Reputation: 1219Reputation: 1219Reputation: 1219Reputation: 1219Reputation: 1219Reputation: 1219Reputation: 1219Reputation: 1219
Code:
print ++cnt
prints the incremented cnt.
While
Code:
++cnt
increments cnt

I think your output of a[i],b[i],c[i],d[i] is not always correct.
Perhaps you should count per matching pattern.
Code:
awk '
# loop over input lines
    NR==FNR {
# true for 1st file (a.txt)
# store patterns
       a[++n] = $1; b[n] = $2; c[n] = $3; d[n] = $4; next
    }
# other file(s)
    {
# loop over patterns
        for (i=1; i<=n; ++i)
            if ($0 ~ a[i] && $0 ~ b[i] && $0 ~ c[i] && $0 ~ d[i]) {
# found pattern in line
# print line, count per pattern; break the loop
                print; cnt[i]++; break
            }
    }
    END {
# loop over matched patterns
        for (i in cnt)
              print a[i],b[i],c[i],d[i], "match", cnt[i], "lines."
    }
' a.txt b.txt

Last edited by MadeInGermany; 02-21-2021 at 04:45 PM.
 
1 members found this post helpful.
Old 02-21-2021, 09:47 PM   #11
DV100
LQ Newbie
 
Registered: Jan 2021
Posts: 18

Original Poster
Rep: Reputation: Disabled
Thanks a bunch MadeInGermany. The script was great.

BTW, I'm going to learn more about awk.

It's such a powerful command worth learning.
 
Old 02-22-2021, 04:13 PM   #12
DV100
LQ Newbie
 
Registered: Jan 2021
Posts: 18

Original Poster
Rep: Reputation: Disabled
I have a question. Anyone can answer if they know.

Why does the 'a' array has ++n, while b,c,d just has 'n' without pluses.

Quote:
a[++n] = $1; b[n] = $2; c[n] = $3; d[n] = $4; next
Thanks
 
Old 02-22-2021, 06:01 PM   #13
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,153

Rep: Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125Reputation: 4125
Did you look at the doco grail alluded to ?. Check the section "Increment and Decrement Operators".
 
Old 02-22-2021, 10:34 PM   #14
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 4,882
Blog Entries: 1

Rep: Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871Reputation: 1871
@OP
You could replace it with this:

Code:
++n; a[n] = $1; b[n] = $2; c[n] = $3; d[n] = $4; next
 
Old 02-23-2021, 02:21 PM   #15
DV100
LQ Newbie
 
Registered: Jan 2021
Posts: 18

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by NevemTeve View Post
@OP
You could replace it with this:

Code:
++n; a[n] = $1; b[n] = $2; c[n] = $3; d[n] = $4; next
Thanks. I'm trying to learn as much as possible. I saw some books in amazon about awk and they have over 100 pages and more. I thinks I might get one of those kindle books and start from the beginning.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Linux append lines in a file after matched lines are found shajay12 Linux - Newbie 4 02-25-2015 06:59 AM
[SOLVED] Remove all the matched lines from second file Almaz Linux - Newbie 3 07-28-2014 01:52 PM
Match file1 to file2 and remove non-matched lines by columns from file3 aori Linux - General 1 06-28-2014 02:33 AM
[SOLVED] Shell Script to replace specific columns on matched lines axl718 Programming 18 01-31-2013 06:12 PM
[SOLVED] sed and how to remove all lines after matched pattern transmutated Programming 5 06-13-2012 07:54 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 02:02 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration