LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (http://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Find/grep/wc command to find matching files, print filename and word count (http://www.linuxquestions.org/questions/linux-newbie-8/find-grep-wc-command-to-find-matching-files-print-filename-and-word-count-754498/)

dbasch 09-11-2009 05:14 PM

Find/grep/wc command to find matching files, print filename and word count
 
Hi all,

I am trying to do a find/grep/wc command to find matching files, print the filename and then the word count of a specific pattern per file.

Here is my best (non-working) attempt so far:

wc `find . \( -name "*.as" -o -name "*.mxml" \) -exec grep -H HeightResizableList {}` \;

Thanks!,
Derek

EricTRA 09-11-2009 05:20 PM

Hello and Welcome to LinuxQuestions.org,

First do the find command and pipe the output to grep and wc. That's one way. By the way is this homework? If so, please read the LQ Rules.

Kind regards,

Eric

druuna 09-11-2009 05:34 PM

Also: wc isn't needed, grep alone can do all you want. Take a look at the grep manpage for the appropriate option (No straight up answer, just in case this is homework....).

lutusp 09-11-2009 06:07 PM

Quote:

Originally Posted by dbasch (Post 3679310)
Hi all,

I am trying to do a find/grep/wc command to find matching files, print the filename and then the word count of a specific pattern per file.

Here is my best (non-working) attempt so far:

wc `find . \( -name "*.as" -o -name "*.mxml" \) -exec grep -H HeightResizableList {}` \;

Thanks!,
Derek

How about this:

Code:

pattern="this"

path="/path/to/files"

find $path | egrep "\.(html?|php)$" | while read filepath
do
  count=`cat $filepath | egrep -c "$pattern"`
  echo "There are $count examples of \"$pattern\" in $filepath."
done

1. Is this what you had in mind?
2. Can you foresee a glaring problem with this script when used with HTML files?

dbasch 09-14-2009 12:50 PM

Quote:

Originally Posted by druuna (Post 3679325)
Also: wc isn't needed, grep alone can do all you want. Take a look at the grep manpage for the appropriate option (No straight up answer, just in case this is homework....).

No homework, just trying to figure it out. Here is my second attempt:

Code:

$ find . \( -name "*.as" -o -name "*.mxml" \) -exec grep -hc HeightResizableLis
t {} \;

Unfortunately, it outputs the matching line count for every file, not just the files with more than 0 matches.

I see I can use the -m NUM to suppress any line matches above a certain number. However, I need the opposite. I need to suppress any line matches under a certain number. This can only be done with the -v matching inversion, which won't work in my case as I still want to match a certain phrase.

Thanks for the help!

dbasch 09-14-2009 12:55 PM

Quote:

Originally Posted by lutusp (Post 3679355)
How about this:

Code:

pattern="this"

path="/path/to/files"

find $path | egrep "\.(html?|php)$" | while read filepath
do
  count=`cat $filepath | egrep -c "$pattern"`
  echo "There are $count examples of \"$pattern\" in $filepath."
done

1. Is this what you had in mind?
2. Can you foresee a glaring problem with this script when used with HTML files?

Yes, that is what I had in mind.

I looked at it for quite a while. I don't see how an HTML file versus any other text file type would create a problem. Perhaps you can enlighten me.

Thanks!

Derek

druuna 09-14-2009 01:06 PM

Quote:

Originally Posted by dbasch (Post 3682507)
Code:

$ find . \( -name "*.as" -o -name "*.mxml" \) -exec grep -hc HeightResizableLis
t {} \;

Unfortunately, it outputs the matching line count for every file, not just the files with more than 0 matches.

That wasn't part of the original description of the problem ;)

If the file(s) with 0 hits need to be excluded you cannot do it with grep alone.

I see you already have a working solution, focus on that one.

Kode 09-14-2009 01:28 PM

Just a fine point... It is not required to cat to grep/egrep...

Code:

count=`cat $filepath | egrep -c "$pattern"`
is actually redundant...

Code:

count=`egrep -c "$pattern" $filepath`

dbasch 09-14-2009 02:58 PM

Quote:

Originally Posted by lutusp (Post 3679355)
How about this:

Code:

pattern="this"

path="/path/to/files"

find $path | egrep "\.(html?|php)$" | while read filepath
do
  count=`cat $filepath | egrep -c "$pattern"`
  echo "There are $count examples of \"$pattern\" in $filepath."
done

1. Is this what you had in mind?
2. Can you foresee a glaring problem with this script when used with HTML files?

OK, here is my working example :)

Thanks everyone!

Code:

patterns=(DashBorder HeightResizableList HeightResizableListRenderer ListDropIndicator MultiLineButton PreviewListPanel TwoColorVBox)

path="."

for pattern in ${patterns[@]}
  do

    echo "$pattern :" >> stats.txt

    find $path -nowarn | egrep "\.(as?|mxml)$" | while read filepath

    do
      count=`cat $filepath | egrep -c "$pattern"`

      if [ "$count" -gt "0" ]
        then
          echo "$filepath        $count" >> stats.txt
      fi
 
    done

    echo " "
 
  done


lutusp 09-14-2009 05:46 PM

Quote:

Originally Posted by dbasch (Post 3682512)
Yes, that is what I had in mind.

I looked at it for quite a while. I don't see how an HTML file versus any other text file type would create a problem. Perhaps you can enlighten me.

Thanks!

Derek

HTML files have formatting tags as well as text, and the search algorithm will detect the search pattern in the tags as well as the text visible in a browser. There are ways to avoid this problem, but first one must recognize that it exists.

dbasch 09-14-2009 05:55 PM

Quote:

Originally Posted by lutusp (Post 3682773)
HTML files have formatting tags as well as text, and the search algorithm will detect the search pattern in the tags as well as the text visible in a browser. There are ways to avoid this problem, but first one must recognize that it exists.

I thought that is what you might be alluding to. I would consider that level of filtering to be above what I needed. Thanks for the tip though.


All times are GMT -5. The time now is 11:46 PM.