LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 10-30-2013, 05:24 AM   #1
byran cheung
Member
 
Registered: Sep 2013
Posts: 321

Rep: Reputation: Disabled
find the number in the file


In my system , there are text files will be generated monthly , the file name begins with xxx , then year , month ( for example xxxxx201310.txt means Oct 2013 )

I have below command to count how many abc in the month , but it only count the number in this month .

NUMBER=$(cat xxxxx201310.txt |grep -c -s "abc" )

Could advise if I would like to have the following report , it shows the no. of abc for each month , and the total no. of abc , how to make it ? very thanks

no of abc in Oct 2013 = 111
no of abc in Nov 2013 = 222
no of abc in Dec 2013 = 333
"
"
"
Total no of abc = 999
 
Old 10-30-2013, 05:31 AM   #2
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,848

Rep: Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309
first, you do not need to use cat, grep -c -s abc xxxxx201310.txt is enough
next, grep can handle more files, just try: grep -c -s abc xxxxx*.txt
and finally you need a small script to sum up the numbers and print a formatted output.
Can you do that yourself?
 
Old 10-30-2013, 10:55 AM   #3
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
This will get you started ...

With this InFile ...
Code:
abc201309.txt
def201309.txt
pqr201311.txt
abc201311.txt
abc201309.txt
ghi201310.txt
ghi201310.txt
abc201310.txt
abc201311.txt
mno201311.txt
def201311.txt
abc201311.txt
jkl201310.txt
abc201310.txt
abc201311.txt
... this code ...
Code:
awk '{print substr($0,4,6)substr($0,1,3)}'  $InFile  \
|sort  \
|sed '$ a\**End Of File**'  \
|awk 'BEGIN{getline; prev=$0}
    {if (prev==$0) c++
   else {print "In year",substr(prev,1,4),
               "month",  substr(prev,5,2),
               "string", substr(prev,7,3),
               "was seen",c,"times"; prev=$0; c=1}}' >$OutFile
... produced this OutFile ...
Code:
In year 2013 month 09 string abc was seen 1 times
In year 2013 month 09 string def was seen 1 times
In year 2013 month 10 string abc was seen 2 times
In year 2013 month 10 string ghi was seen 2 times
In year 2013 month 10 string jkl was seen 1 times
In year 2013 month 11 string abc was seen 4 times
In year 2013 month 11 string def was seen 1 times
In year 2013 month 11 string mno was seen 1 times
In year 2013 month 11 string pqr was seen 1 times
Note: I tried (but failed) to do this with an awk one-liner, using an array and asorti. I'd love to see how this could be done. Gurus?

Daniel B. Martin

Last edited by danielbmartin; 10-30-2013 at 11:20 AM. Reason: Improved code
 
Old 10-30-2013, 10:48 PM   #4
byran cheung
Member
 
Registered: Sep 2013
Posts: 321

Original Poster
Rep: Reputation: Disabled
hi Daniel ,

thanks reply , what I would like to do is to find how many times the string "abc" occur , but not abc , def , ghi ...

thanks .
 
Old 10-31-2013, 01:36 AM   #5
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
assuming files are 'fixed' like xxxx201310.txt
Code:
MyString="abc"
for File in *.txt;do
    echo "No. of \"${MyString}\" in $(date "+%b %Y" --date="${File:4:6}01") = $(grep -c "${MyString}" "$File")"
done
NOTE: here grep is only counting lines
since you don't provide sample, I have no idea if that is what you want/need


not great, but fudged an awk to count each occurrence
Code:
MyString="abc"
for File in *.txt;do
    echo "No. of \"${MyString}\" in $(date "+%b %Y" --date="${File:4:6}01") = $(awk 'BEGIN{C=0}{for (i=1;i<=NF;i++) if ( $i ~ '${MyString}' ){C++}} END{print C}' "$File")"
done
be careful

"the" and "then" are two different words, but both contain "the"
 
Old 10-31-2013, 03:03 AM   #6
byran cheung
Member
 
Registered: Sep 2013
Posts: 321

Original Poster
Rep: Reputation: Disabled
thx Firerat ,

It works for count the number , could advise how to calculate the total number for all months ?

thanks
 
Old 10-31-2013, 03:09 AM   #7
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,848

Rep: Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309
Firerat, why do you use an awk script instead of grep -c "pattern" filename?
byran cheung, what is the problem with my tip?
Code:
for File in *.txt;do
C=$(grep -c "abc" $File)
SUM=$((SUM+C))
echo "$File: $C"
done
echo $SUM
(or something like this will do the job)
(not tested)
 
Old 10-31-2013, 03:34 AM   #8
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
grep -c only returns the number of lines with the pattern match
the awk I gave counts all occurrences of the pattern

thought I had explained that in my post
 
Old 10-31-2013, 03:38 AM   #9
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,848

Rep: Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309Reputation: 7309
based on the original post grep is sufficient, but I'm not really sure about that
 
Old 10-31-2013, 03:39 AM   #10
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
Quote:
Originally Posted by byran cheung View Post
thx Firerat ,

It works for count the number , could advise how to calculate the total number for all months ?

thanks
adapt a little

Code:
Total=0
MyString="abc"
for File in *.txt;do
    Count=$(awk 'BEGIN{C=0}{for (i=1;i<=NF;i++) if ( $i ~ '${MyString}' ){C++}} END{print C}' "$File")
    echo "No. of \"${MyString}\" in $(date "+%b %Y" --date="${File:4:6}01") = $Count"
    Total=$(( $Count + $Total ))
done
echo "Total No. of \"${MyString}\" = $Total"
 
Old 10-31-2013, 03:41 AM   #11
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
Quote:
Originally Posted by pan64 View Post
based on the original post grep is sufficient, but I'm not really sure about that
hence :

Quote:
Originally Posted by Firerat View Post
...
NOTE: here grep is only counting lines
since you don't provide sample, I have no idea if that is what you want/need


not great, but fudged an awk to count each occurrence
...
 
Old 10-31-2013, 09:29 AM   #12
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
With this InFile ...
Code:
abc201309.txt
def201309.txt
pqr201311.txt
abc201311.txt
abc201309.txt
ghi201310.txt
ghi201310.txt
abc201310.txt
abc201311.txt
mno201311.txt
def201311.txt
abc201311.txt
jkl201310.txt
abc201310.txt
abc201311.txt
... this awk ...
Code:
awk '{if (substr($0,1,3)=="abc") c[substr($0,4,6)]++}
   END{n=asorti(c,d); for (j=1;j<=n;j++)
       print "In year",substr(d[j],1,4),
             "month",  substr(d[j],5,2),
             "string abc",
             "was seen",c[d[j]],"times"}' $InFile >$OutFile
... produced this OutFile ...
Code:
In year 2013 month 09 string abc was seen 2 times
In year 2013 month 10 string abc was seen 2 times
In year 2013 month 11 string abc was seen 4 times
Daniel B. Martin
 
Old 11-01-2013, 02:18 AM   #13
byran cheung
Member
 
Registered: Sep 2013
Posts: 321

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by Firerat View Post
adapt a little

Code:
Total=0
MyString="abc"
for File in *.txt;do
    Count=$(awk 'BEGIN{C=0}{for (i=1;i<=NF;i++) if ( $i ~ '${MyString}' ){C++}} END{print C}' "$File")
    echo "No. of \"${MyString}\" in $(date "+%b %Y" --date="${File:4:6}01") = $Count"
    Total=$(( $Count + $Total ))
done
echo "Total No. of \"${MyString}\" = $Total"
Hi ,

I tested the script , it seems count the no. of lines , but not count the occurence of the "abc" , would you please check ? very thanks
 
Old 11-01-2013, 06:02 AM   #14
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783


oops, I should have tested it

the prblem was

if ( $i ~ '${MyString}' )

should be

if ( $i ~ /'${MyString}'/ )

Code:
Total=0
MyString="abc"
for File in *.txt;do
    Count=$(awk 'BEGIN{C=0}{for (i=1;i<=NF;i++) if ( $i ~ /'${MyString}'/ ){C++}} END{print C}' "$File")
    echo "No. of \"${MyString}\" in $(date "+%b %Y" --date="${File:4:6}01") = $Count"
    Total=$(( $Count + $Total ))
done
echo "Total No. of \"${MyString}\" = $Total"

I used this to test

Code:
In my system , there are text files will be generated monthly , the file name begins with xxx , then year , month ( for example xxxxx201310.txt means Oct 2013 )

I have below command to count how many abc in the month , but it only count the number in this month .

NUMBER=$(cat xxxxx201310.txt |grep -c -s "abc" )

Could advise if I would like to have the following report , it shows the no. of abc for each month , and the total no. of abc , how to make it ? very thanks

no of abc in Oct 2013 = 111 no of abc in Nov 2013 = 222 no of abc in Dec 2013 = 333
"
"
"
Total no of abc = 999
the awk gives you 8, whereas grep -c results in 5
 
Old 11-01-2013, 06:41 AM   #15
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian sid
Posts: 2,683

Rep: Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783Reputation: 783
actually, a much simpler count would be

Code:
Count=$( grep -o "${MyString}"  "$File" | wc -w )
the grep -o, only prints the pattern matches, the wc -w, is counting 'words'

actually, wc -w or wc -l ( lines ) will do

output from test file
Code:
grep -o abc xxxx201301.txt 
abc
abc
abc
abc
abc
abc
abc
abc
it will even split up, "abcabc" , which the awk fails to do
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] find largest number in file? qrange Linux - Newbie 5 04-09-2012 06:39 AM
How to find which file has a string when large number of files dwynter Linux - Newbie 2 09-10-2008 06:06 AM
Find Number of column in a file Swapna173 Linux - Newbie 3 06-04-2008 01:29 AM
How can I find the AG number of a certain file?(XFS) hxsrmeng Linux - Newbie 0 09-14-2007 12:21 PM
How to find out the number of open file descriptors? skie_knite007 Programming 2 12-11-2005 10:23 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:47 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration