LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 06-26-2007, 05:57 PM   #1
aggressivebloodcell
LQ Newbie
 
Registered: Jun 2007
Posts: 3

Rep: Reputation: 0
grep help


Hey all,

I need some help using grep. How would I do a word count for specific words from a file. Lets say I am searching for apples, bannanas and grapes from a text file and need to output the frequency of those words. I don't want to do a grep of those words one by one.

Much Appreciated,

abc
 
Old 06-26-2007, 06:15 PM   #2
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 671Reputation: 671Reputation: 671Reputation: 671Reputation: 671Reputation: 671
I think it would be easier to use sed or tr to replace whitespace with newlines; sort the output; use grep -f wordlist to filter out the words you don't want; use uniq to count the occurances of each word.

#break up text into word list (using tr) |
#filter out unwanted words (using grep -f wordlist |
#sort |
#count the words (using uniq -c)

You may need to use sed somewhere in the pipeline to assemble patterns like "them-\nselves" -> "themselves". It depends on the format of the text file. You may see patterns like "my-\n\tself" or "my-\n self" that need to be fixed as well. If you use "tr" to replace returns and all whitespace with single spaces, you could pipe the output through sed to remove " -" patterns; then run through 'tr' again to change all of the spaces to returns.
Also, in the word list, you will want to remove punctuation for periods, so that you don't have seperate entries for "book" and "book." for example.

Examine the output of each part of the pipe work flow to make sure that the output is what you expect. A lot of the tweaking is adjusting the options.

Last edited by jschiwal; 06-26-2007 at 06:20 PM.
 
Old 06-26-2007, 06:16 PM   #3
macemoneta
Senior Member
 
Registered: Jan 2005
Location: Manalapan, NJ
Distribution: Fedora x86 and x86_64, Debian PPC and ARM, Android
Posts: 4,593
Blog Entries: 2

Rep: Reputation: 332Reputation: 332Reputation: 332Reputation: 332
Here's one way:

Code:
grep -o "apples\|bananas\|grapes" somefile.txt | sort | uniq -c
 
Old 06-26-2007, 06:18 PM   #4
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 738Reputation: 738Reputation: 738Reputation: 738Reputation: 738Reputation: 738Reputation: 738
grep -o personnel tmpfile|wc -w
finds the word "personnel" and counts the occurences. If you want to find several different words in 1 pass, I think you have to make a small script.

(Pseudo-code)
for i in <list of words>
count=grep $i <filename>|wc
printf (or echo) $1, $count


man grep for more on how grep works
 
Old 06-26-2007, 07:21 PM   #5
AwesomeMachine
Senior Member
 
Registered: Jan 2005
Location: USA and Italy
Distribution: Debian testing/sid; OpenSuSE; Fedora
Posts: 2,003

Rep: Reputation: 299Reputation: 299Reputation: 299
tr ' ' '\n' < file.txt | sort | uniq -c
 
Old 06-26-2007, 07:34 PM   #6
Tinkster
Moderator
 
Registered: Apr 2002
Location: in a fallen world
Distribution: slackware by choice, others too :} ... android.
Posts: 23,067
Blog Entries: 11

Rep: Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910Reputation: 910
And, as (almost) always, an awk version:
Code:
 awk 'BEGIN{RS=" +"} /word1/ || /word2/ || /word3/ {word[$1]++}{for (i in word){print i" : "word[i]}}' file
Cheers, Tink
 
Old 06-27-2007, 04:43 PM   #7
aggressivebloodcell
LQ Newbie
 
Registered: Jun 2007
Posts: 3

Original Poster
Rep: Reputation: 0
Thanks all.. you guys are very fast at replying.

-abc
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
grep output on stdout and grep output to file don't match xnomad Linux - General 3 01-13-2007 04:56 AM
grep: grep for two substrings? eur0dad Linux - General 2 08-17-2006 04:03 PM
bash script with grep and sed: sed getting filenames from grep odysseus.lost Programming 1 07-17-2006 11:36 AM
grep ?? can grep us variables? DaFrEQ Linux - Software 4 09-14-2005 12:22 PM
ps -ef|grep -v root|grep apache<<result maelstrombob Linux - Newbie 1 09-24-2003 11:38 AM


All times are GMT -5. The time now is 07:42 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration