Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I need some help using grep. How would I do a word count for specific words from a file. Lets say I am searching for apples, bannanas and grapes from a text file and need to output the frequency of those words. I don't want to do a grep of those words one by one.
I think it would be easier to use sed or tr to replace whitespace with newlines; sort the output; use grep -f wordlist to filter out the words you don't want; use uniq to count the occurances of each word.
#break up text into word list (using tr) |
#filter out unwanted words (using grep -f wordlist |
#sort |
#count the words (using uniq -c)
You may need to use sed somewhere in the pipeline to assemble patterns like "them-\nselves" -> "themselves". It depends on the format of the text file. You may see patterns like "my-\n\tself" or "my-\n self" that need to be fixed as well. If you use "tr" to replace returns and all whitespace with single spaces, you could pipe the output through sed to remove " -" patterns; then run through 'tr' again to change all of the spaces to returns.
Also, in the word list, you will want to remove punctuation for periods, so that you don't have seperate entries for "book" and "book." for example.
Examine the output of each part of the pipe work flow to make sure that the output is what you expect. A lot of the tweaking is adjusting the options.
grep -o personnel tmpfile|wc -w
finds the word "personnel" and counts the occurences. If you want to find several different words in 1 pass, I think you have to make a small script.
(Pseudo-code) for i in <list of words>
count=grep $i <filename>|wc
printf (or echo) $1, $count
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.