Find/grep/wc command to find matching files, print filename and word count

dbasch · 09-11-2009, 05:14 PM

Hi all,

I am trying to do a find/grep/wc command to find matching files, print the filename and then the word count of a specific pattern per file.

Here is my best (non-working) attempt so far:

wc `find . \( -name "*.as" -o -name "*.mxml" \) -exec grep -H HeightResizableList {}` \;

Thanks!,
Derek

EricTRA · 09-11-2009, 05:20 PM

Hello and Welcome to LinuxQuestions.org,

First do the find command and pipe the output to grep and wc. That's one way. By the way is this homework? If so, please read the LQ Rules.

Kind regards,

Eric

druuna · 09-11-2009, 05:34 PM

Also: wc isn't needed, grep alone can do all you want. Take a look at the grep manpage for the appropriate option (No straight up answer, just in case this is homework....).

lutusp · 09-11-2009, 06:07 PM

Quote:

Originally Posted by dbasch

Hi all,

I am trying to do a find/grep/wc command to find matching files, print the filename and then the word count of a specific pattern per file.

Here is my best (non-working) attempt so far:

wc `find . \( -name "*.as" -o -name "*.mxml" \) -exec grep -H HeightResizableList {}` \;

Thanks!,
Derek

How about this:

Code:

pattern="this"

path="/path/to/files"

find $path | egrep "\.(html?|php)$" | while read filepath
do
   count=`cat $filepath | egrep -c "$pattern"`
   echo "There are $count examples of \"$pattern\" in $filepath."
done

1. Is this what you had in mind?
2. Can you foresee a glaring problem with this script when used with HTML files?

dbasch · 09-14-2009, 12:50 PM

Quote:

Originally Posted by druuna

Also: wc isn't needed, grep alone can do all you want. Take a look at the grep manpage for the appropriate option (No straight up answer, just in case this is homework....).

No homework, just trying to figure it out. Here is my second attempt:

Code:

$ find . \( -name "*.as" -o -name "*.mxml" \) -exec grep -hc HeightResizableLis
t {} \;

Unfortunately, it outputs the matching line count for every file, not just the files with more than 0 matches.

I see I can use the -m NUM to suppress any line matches above a certain number. However, I need the opposite. I need to suppress any line matches under a certain number. This can only be done with the -v matching inversion, which won't work in my case as I still want to match a certain phrase.

Thanks for the help!

dbasch · 09-14-2009, 12:55 PM

Quote:

Originally Posted by lutusp

How about this:

Code:

pattern="this"

path="/path/to/files"

find $path | egrep "\.(html?|php)$" | while read filepath
do
   count=`cat $filepath | egrep -c "$pattern"`
   echo "There are $count examples of \"$pattern\" in $filepath."
done

1. Is this what you had in mind?
2. Can you foresee a glaring problem with this script when used with HTML files?

Yes, that is what I had in mind.

I looked at it for quite a while. I don't see how an HTML file versus any other text file type would create a problem. Perhaps you can enlighten me.

Thanks!

Derek

druuna · 09-14-2009, 01:06 PM

Quote:

Originally Posted by dbasch

Code:

$ find . \( -name "*.as" -o -name "*.mxml" \) -exec grep -hc HeightResizableLis
t {} \;

Unfortunately, it outputs the matching line count for every file, not just the files with more than 0 matches.

That wasn't part of the original description of the problem

If the file(s) with 0 hits need to be excluded you cannot do it with grep alone.

I see you already have a working solution, focus on that one.

Kode · 09-14-2009, 01:28 PM

Just a fine point... It is not required to cat to grep/egrep...

Code:

count=`cat $filepath | egrep -c "$pattern"`

is actually redundant...

Code:

count=`egrep -c "$pattern" $filepath`

dbasch · 09-14-2009, 02:58 PM

Quote:

Originally Posted by lutusp

How about this:

Code:

pattern="this"

path="/path/to/files"

find $path | egrep "\.(html?|php)$" | while read filepath
do
   count=`cat $filepath | egrep -c "$pattern"`
   echo "There are $count examples of \"$pattern\" in $filepath."
done

1. Is this what you had in mind?
2. Can you foresee a glaring problem with this script when used with HTML files?

OK, here is my working example

Thanks everyone!

Code:

patterns=(DashBorder HeightResizableList HeightResizableListRenderer ListDropIndicator MultiLineButton PreviewListPanel TwoColorVBox)

path="."

for pattern in ${patterns[@]}
  do

    echo "$pattern :" >> stats.txt

    find $path -nowarn | egrep "\.(as?|mxml)$" | while read filepath

    do
      count=`cat $filepath | egrep -c "$pattern"`

      if [ "$count" -gt "0" ]
        then
          echo "$filepath	$count" >> stats.txt
      fi
  
    done

    echo " "
  
  done

lutusp · 09-14-2009, 05:46 PM

Quote:

Originally Posted by dbasch

Yes, that is what I had in mind.

I looked at it for quite a while. I don't see how an HTML file versus any other text file type would create a problem. Perhaps you can enlighten me.

Thanks!

Derek

HTML files have formatting tags as well as text, and the search algorithm will detect the search pattern in the tags as well as the text visible in a browser. There are ways to avoid this problem, but first one must recognize that it exists.

dbasch · 09-14-2009, 05:55 PM

Quote:

Originally Posted by lutusp

HTML files have formatting tags as well as text, and the search algorithm will detect the search pattern in the tags as well as the text visible in a browser. There are ways to avoid this problem, but first one must recognize that it exists.

I thought that is what you might be alluding to. I would consider that level of filtering to be above what I needed. Thanks for the tip though.