Welcome to the most active Linux Forum on the web.
Go Back > Forums > Linux Forums > Linux - Newbie
User Name
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!


  Search this Thread
Old 06-26-2007, 05:57 PM   #1
LQ Newbie
Registered: Jun 2007
Posts: 3

Rep: Reputation: 0
grep help

Hey all,

I need some help using grep. How would I do a word count for specific words from a file. Lets say I am searching for apples, bannanas and grapes from a text file and need to output the frequency of those words. I don't want to do a grep of those words one by one.

Much Appreciated,

Old 06-26-2007, 06:15 PM   #2
LQ Guru
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 671Reputation: 671Reputation: 671Reputation: 671Reputation: 671Reputation: 671
I think it would be easier to use sed or tr to replace whitespace with newlines; sort the output; use grep -f wordlist to filter out the words you don't want; use uniq to count the occurances of each word.

#break up text into word list (using tr) |
#filter out unwanted words (using grep -f wordlist |
#sort |
#count the words (using uniq -c)

You may need to use sed somewhere in the pipeline to assemble patterns like "them-\nselves" -> "themselves". It depends on the format of the text file. You may see patterns like "my-\n\tself" or "my-\n self" that need to be fixed as well. If you use "tr" to replace returns and all whitespace with single spaces, you could pipe the output through sed to remove " -" patterns; then run through 'tr' again to change all of the spaces to returns.
Also, in the word list, you will want to remove punctuation for periods, so that you don't have seperate entries for "book" and "book." for example.

Examine the output of each part of the pipe work flow to make sure that the output is what you expect. A lot of the tweaking is adjusting the options.

Last edited by jschiwal; 06-26-2007 at 06:20 PM.
Old 06-26-2007, 06:16 PM   #3
Senior Member
Registered: Jan 2005
Location: Manalapan, NJ
Distribution: Fedora x86 and x86_64, Debian PPC and ARM, Android
Posts: 4,593
Blog Entries: 2

Rep: Reputation: 332Reputation: 332Reputation: 332Reputation: 332
Here's one way:

grep -o "apples\|bananas\|grapes" somefile.txt | sort | uniq -c
Old 06-26-2007, 06:18 PM   #4
LQ Veteran
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 738Reputation: 738Reputation: 738Reputation: 738Reputation: 738Reputation: 738Reputation: 738
grep -o personnel tmpfile|wc -w
finds the word "personnel" and counts the occurences. If you want to find several different words in 1 pass, I think you have to make a small script.

for i in <list of words>
count=grep $i <filename>|wc
printf (or echo) $1, $count

man grep for more on how grep works
Old 06-26-2007, 07:21 PM   #5
Senior Member
Registered: Jan 2005
Location: USA and Italy
Distribution: Debian testing/sid; OpenSuSE; Fedora; Mint
Posts: 3,031

Rep: Reputation: 539Reputation: 539Reputation: 539Reputation: 539Reputation: 539Reputation: 539
tr ' ' '\n' < file.txt | sort | uniq -c
Old 06-26-2007, 07:34 PM   #6
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910
And, as (almost) always, an awk version:
 awk 'BEGIN{RS=" +"} /word1/ || /word2/ || /word3/ {word[$1]++}{for (i in word){print i" : "word[i]}}' file
Cheers, Tink
Old 06-27-2007, 04:43 PM   #7
LQ Newbie
Registered: Jun 2007
Posts: 3

Original Poster
Rep: Reputation: 0
Thanks all.. you guys are very fast at replying.



Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
grep output on stdout and grep output to file don't match xnomad Linux - General 3 01-13-2007 04:56 AM
grep: grep for two substrings? eur0dad Linux - General 2 08-17-2006 04:03 PM
bash script with grep and sed: sed getting filenames from grep odysseus.lost Programming 1 07-17-2006 11:36 AM
grep ?? can grep us variables? DaFrEQ Linux - Software 4 09-14-2005 12:22 PM
ps -ef|grep -v root|grep apache<<result maelstrombob Linux - Newbie 1 09-24-2003 11:38 AM > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 09:52 PM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration