LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   grep help (https://www.linuxquestions.org/questions/linux-newbie-8/grep-help-564689/)

aggressivebloodcell 06-26-2007 05:57 PM

grep help
 
Hey all,

I need some help using grep. How would I do a word count for specific words from a file. Lets say I am searching for apples, bannanas and grapes from a text file and need to output the frequency of those words. I don't want to do a grep of those words one by one.

Much Appreciated,

abc

jschiwal 06-26-2007 06:15 PM

I think it would be easier to use sed or tr to replace whitespace with newlines; sort the output; use grep -f wordlist to filter out the words you don't want; use uniq to count the occurances of each word.

#break up text into word list (using tr) |
#filter out unwanted words (using grep -f wordlist |
#sort |
#count the words (using uniq -c)

You may need to use sed somewhere in the pipeline to assemble patterns like "them-\nselves" -> "themselves". It depends on the format of the text file. You may see patterns like "my-\n\tself" or "my-\n self" that need to be fixed as well. If you use "tr" to replace returns and all whitespace with single spaces, you could pipe the output through sed to remove " -" patterns; then run through 'tr' again to change all of the spaces to returns.
Also, in the word list, you will want to remove punctuation for periods, so that you don't have seperate entries for "book" and "book." for example.

Examine the output of each part of the pipe work flow to make sure that the output is what you expect. A lot of the tweaking is adjusting the options.

macemoneta 06-26-2007 06:16 PM

Here's one way:

Code:

grep -o "apples\|bananas\|grapes" somefile.txt | sort | uniq -c

pixellany 06-26-2007 06:18 PM

grep -o personnel tmpfile|wc -w
finds the word "personnel" and counts the occurences. If you want to find several different words in 1 pass, I think you have to make a small script.

(Pseudo-code)
for i in <list of words>
count=grep $i <filename>|wc
printf (or echo) $1, $count


man grep for more on how grep works

AwesomeMachine 06-26-2007 07:21 PM

tr ' ' '\n' < file.txt | sort | uniq -c

Tinkster 06-26-2007 07:34 PM

And, as (almost) always, an awk version:
Code:

awk 'BEGIN{RS=" +"} /word1/ || /word2/ || /word3/ {word[$1]++}{for (i in word){print i" : "word[i]}}' file
Cheers, Tink

aggressivebloodcell 06-27-2007 04:43 PM

Thanks all.. you guys are very fast at replying.

-abc


All times are GMT -5. The time now is 12:20 PM.