LinuxQuestions.org
LinuxAnswers - the LQ Linux tutorial section.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 09-11-2009, 05:14 PM   #1
dbasch
LQ Newbie
 
Registered: Sep 2009
Posts: 5

Rep: Reputation: 0
Question Find/grep/wc command to find matching files, print filename and word count


Hi all,

I am trying to do a find/grep/wc command to find matching files, print the filename and then the word count of a specific pattern per file.

Here is my best (non-working) attempt so far:

wc `find . \( -name "*.as" -o -name "*.mxml" \) -exec grep -H HeightResizableList {}` \;

Thanks!,
Derek

Last edited by dbasch; 09-11-2009 at 05:17 PM.
 
Old 09-11-2009, 05:20 PM   #2
EricTRA
Guru
 
Registered: May 2009
Location: Gibraltar, Gibraltar
Distribution: Fedora 20 with Awesome WM
Posts: 6,805
Blog Entries: 1

Rep: Reputation: 1290Reputation: 1290Reputation: 1290Reputation: 1290Reputation: 1290Reputation: 1290Reputation: 1290Reputation: 1290Reputation: 1290
Hello and Welcome to LinuxQuestions.org,

First do the find command and pipe the output to grep and wc. That's one way. By the way is this homework? If so, please read the LQ Rules.

Kind regards,

Eric
 
Old 09-11-2009, 05:34 PM   #3
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373
Also: wc isn't needed, grep alone can do all you want. Take a look at the grep manpage for the appropriate option (No straight up answer, just in case this is homework....).
 
Old 09-11-2009, 06:07 PM   #4
lutusp
Member
 
Registered: Sep 2009
Distribution: Fedora
Posts: 835

Rep: Reputation: 101Reputation: 101
Quote:
Originally Posted by dbasch View Post
Hi all,

I am trying to do a find/grep/wc command to find matching files, print the filename and then the word count of a specific pattern per file.

Here is my best (non-working) attempt so far:

wc `find . \( -name "*.as" -o -name "*.mxml" \) -exec grep -H HeightResizableList {}` \;

Thanks!,
Derek
How about this:

Code:
pattern="this"

path="/path/to/files"

find $path | egrep "\.(html?|php)$" | while read filepath
do
   count=`cat $filepath | egrep -c "$pattern"`
   echo "There are $count examples of \"$pattern\" in $filepath."
done
1. Is this what you had in mind?
2. Can you foresee a glaring problem with this script when used with HTML files?
 
Old 09-14-2009, 12:50 PM   #5
dbasch
LQ Newbie
 
Registered: Sep 2009
Posts: 5

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by druuna View Post
Also: wc isn't needed, grep alone can do all you want. Take a look at the grep manpage for the appropriate option (No straight up answer, just in case this is homework....).
No homework, just trying to figure it out. Here is my second attempt:

Code:
$ find . \( -name "*.as" -o -name "*.mxml" \) -exec grep -hc HeightResizableLis
t {} \;
Unfortunately, it outputs the matching line count for every file, not just the files with more than 0 matches.

I see I can use the -m NUM to suppress any line matches above a certain number. However, I need the opposite. I need to suppress any line matches under a certain number. This can only be done with the -v matching inversion, which won't work in my case as I still want to match a certain phrase.

Thanks for the help!

Last edited by dbasch; 09-14-2009 at 12:55 PM.
 
Old 09-14-2009, 12:55 PM   #6
dbasch
LQ Newbie
 
Registered: Sep 2009
Posts: 5

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by lutusp View Post
How about this:

Code:
pattern="this"

path="/path/to/files"

find $path | egrep "\.(html?|php)$" | while read filepath
do
   count=`cat $filepath | egrep -c "$pattern"`
   echo "There are $count examples of \"$pattern\" in $filepath."
done
1. Is this what you had in mind?
2. Can you foresee a glaring problem with this script when used with HTML files?
Yes, that is what I had in mind.

I looked at it for quite a while. I don't see how an HTML file versus any other text file type would create a problem. Perhaps you can enlighten me.

Thanks!

Derek
 
Old 09-14-2009, 01:06 PM   #7
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373Reputation: 2373
Quote:
Originally Posted by dbasch View Post
Code:
$ find . \( -name "*.as" -o -name "*.mxml" \) -exec grep -hc HeightResizableLis
t {} \;
Unfortunately, it outputs the matching line count for every file, not just the files with more than 0 matches.
That wasn't part of the original description of the problem

If the file(s) with 0 hits need to be excluded you cannot do it with grep alone.

I see you already have a working solution, focus on that one.
 
Old 09-14-2009, 01:28 PM   #8
Kode
LQ Newbie
 
Registered: Dec 2004
Location: Barrie, Ontario
Distribution: Fedora 11, Prime GNU/Linux
Posts: 2

Rep: Reputation: 0
Just a fine point... It is not required to cat to grep/egrep...

Code:
count=`cat $filepath | egrep -c "$pattern"`
is actually redundant...

Code:
count=`egrep -c "$pattern" $filepath`
 
Old 09-14-2009, 02:58 PM   #9
dbasch
LQ Newbie
 
Registered: Sep 2009
Posts: 5

Original Poster
Rep: Reputation: 0
Thumbs up

Quote:
Originally Posted by lutusp View Post
How about this:

Code:
pattern="this"

path="/path/to/files"

find $path | egrep "\.(html?|php)$" | while read filepath
do
   count=`cat $filepath | egrep -c "$pattern"`
   echo "There are $count examples of \"$pattern\" in $filepath."
done
1. Is this what you had in mind?
2. Can you foresee a glaring problem with this script when used with HTML files?
OK, here is my working example

Thanks everyone!

Code:
patterns=(DashBorder HeightResizableList HeightResizableListRenderer ListDropIndicator MultiLineButton PreviewListPanel TwoColorVBox)

path="."

for pattern in ${patterns[@]}
  do

    echo "$pattern :" >> stats.txt

    find $path -nowarn | egrep "\.(as?|mxml)$" | while read filepath

    do
      count=`cat $filepath | egrep -c "$pattern"`

      if [ "$count" -gt "0" ]
        then
          echo "$filepath	$count" >> stats.txt
      fi
  
    done

    echo " "
  
  done
 
Old 09-14-2009, 05:46 PM   #10
lutusp
Member
 
Registered: Sep 2009
Distribution: Fedora
Posts: 835

Rep: Reputation: 101Reputation: 101
Quote:
Originally Posted by dbasch View Post
Yes, that is what I had in mind.

I looked at it for quite a while. I don't see how an HTML file versus any other text file type would create a problem. Perhaps you can enlighten me.

Thanks!

Derek
HTML files have formatting tags as well as text, and the search algorithm will detect the search pattern in the tags as well as the text visible in a browser. There are ways to avoid this problem, but first one must recognize that it exists.
 
Old 09-14-2009, 05:55 PM   #11
dbasch
LQ Newbie
 
Registered: Sep 2009
Posts: 5

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by lutusp View Post
HTML files have formatting tags as well as text, and the search algorithm will detect the search pattern in the tags as well as the text visible in a browser. There are ways to avoid this problem, but first one must recognize that it exists.
I thought that is what you might be alluding to. I would consider that level of filtering to be above what I needed. Thanks for the tip though.
 
  


Reply

Tags
find, grep


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Find/grep command to find matching files, print filename, then print matching content stefanlasiewski Programming 8 12-18-2013 05:36 PM
I can't find WORD COUNT in Open office! ahh where is it/ mr_coffee Linux - Newbie 9 02-17-2009 01:08 AM
Single find command to find multiple files? thok Linux - Newbie 7 01-31-2009 04:45 PM
awk command to find if any one argument is matching Ashok_mittal Linux - Newbie 2 01-17-2008 12:38 AM
Word count with grep DiagonalArg Linux - Software 3 02-13-2006 12:46 PM


All times are GMT -5. The time now is 03:09 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration