LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 11-27-2007, 03:29 AM   #1
George2
Member
 
Registered: Oct 2003
Posts: 354

Rep: Reputation: 30
word count issue


Hello everyone,


What is the command to find the number of word *FOO* in a given file (e.g. goo.txt)?


thanks in advance,
George
 
Old 11-27-2007, 03:47 AM   #2
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 63
You have to combine some commands. If you can split the file into one word per line, then you can pipe that into grep -c.

The tr command can be used to do that splitting. You could also use awk, perl or sed or some other tools, but tr is probably the smallest program to invoke, so it's a good choice. In this example we split on spaces or tabs. It is simple to add other word splitting characters if you wish.

Code:
tr ' \t' '\n\n' input_file |grep -c '^foo$'
 
Old 11-27-2007, 04:44 AM   #3
George2
Member
 
Registered: Oct 2003
Posts: 354

Original Poster
Rep: Reputation: 30
Thanks matthewg42,


Quote:
Originally Posted by matthewg42 View Post
You have to combine some commands. If you can split the file into one word per line, then you can pipe that into grep -c.

The tr command can be used to do that splitting. You could also use awk, perl or sed or some other tools, but tr is probably the smallest program to invoke, so it's a good choice. In this example we split on spaces or tabs. It is simple to add other word splitting characters if you wish.

Code:
tr ' \t' '\n\n' input_file |grep -c '^foo$'
Why the following command will split by space or tab or \n?

tr ' \t' '\n\n'


regards,
George
 
Old 11-27-2007, 04:56 AM   #4
radoulov
Member
 
Registered: Apr 2007
Location: Milano, Italia/Варна, България
Distribution: Ubuntu, Open SUSE
Posts: 212

Rep: Reputation: 35
Assuming GNU grep and no embedded newlines in FOO:

Code:
grep -o '\bFOO\b' goo.txt|wc -l
grep -c won't consider FOO FOO on the same line as two matches
(that's why wc -l).
 
Old 11-27-2007, 06:31 AM   #5
George2
Member
 
Registered: Oct 2003
Posts: 354

Original Poster
Rep: Reputation: 30
Thanks radoulov,


Why you add \b before and after FOO?

Quote:
Originally Posted by radoulov View Post
Assuming GNU grep and no embedded newlines in FOO:

Code:
grep -o '\bFOO\b' goo.txt|wc -l
grep -c won't consider FOO FOO on the same line as two matches
(that's why wc -l).

regards,
George
 
Old 11-27-2007, 06:49 AM   #6
radoulov
Member
 
Registered: Apr 2007
Location: Milano, Italia/Варна, България
Distribution: Ubuntu, Open SUSE
Posts: 212

Rep: Reputation: 35
Quote:
Originally Posted by George2 View Post
Thanks radoulov,


Why you add \b before and after FOO?
\b is a word boundary,
consider this:

Code:
$ print 'FOO,FOO
FOOFOO
"FOO"
(FOO)
xFOOx'|grep FOO    
FOO,FOO
FOOFOO
"FOO"
(FOO)
xFOOx
$ print 'FOO,FOO
FOOFOO
"FOO"
(FOO)
xFOOx'|grep  '^FOO$'  
$ 
$ print 'FOO,FOO
FOOFOO
"FOO"
(FOO)
xFOOx'|grep '\bFOO\b'
FOO,FOO
"FOO"
(FOO)
For more info check word boundaries

Edit: GNU grep has the -w option with the same meaning.

Last edited by radoulov; 11-27-2007 at 08:53 AM.
 
Old 11-27-2007, 07:11 AM   #7
matthewg42
Senior Member
 
Registered: Oct 2003
Location: UK
Distribution: Kubuntu 12.10 (using awesome wm though)
Posts: 3,530

Rep: Reputation: 63
Quote:
Originally Posted by George2 View Post
Thanks matthewg42,
Why the following command will split by space or tab or \n?

tr ' \t' '\n\n'


regards,
George
tr read two lists of characters, and then goes through all the input and translates any instance of any character in the first list with the corresponding character in the second list.

Consider these examples:
Code:
% echo "This is my input string" | tr 'itp' 'IT_'
tr replaces all instances of 'i' with 'I', all instance of 't' with 'T' and all instance of 'p' with '_'. Thus the output is:
Code:
ThIs Is my In_uT sTrIng
You can have as many characters as you like in the two parameters. You can also use ranges of characters, so long as the ranges match length:
Code:
% echo "This is my input string" | tr '[a-j]' '[0-9]'
T78s 8s my 8nput str8n6
% echo "This is my input string" | tr 'abcdefghik' 'xxxxxxxxxx'
Txxs xs my xnput strxnx
A space is represented simply with a space character. Tabs are represented with '\t', new lines with '\n'. Hence the behaviour of the original tr command:
Code:
% echo "This is my input string" | tr ' \t' '\n\n'
This
is
my
input
string
Once the output is one word per line like that, you can use grep -c to count all lines which match the pattern. Since you want to match the whole word, you can add ^ and $ around the pattern. Alternatively you could use the -x option to grep to match the whole line, so a slight different version of the originally suggested command is like this:
Code:
tr ' \t' '\n\n' < input_file |grep -cx 'foo'
I made a small mistake in the original command, thinking tr takes an option third parameter being the name of an input file. This is not the case, so I used input re-direction with the < operator.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
newbie shell scripting need help word count air23forlife Programming 1 03-22-2007 02:12 PM
Word count with grep DiagonalArg Linux - Software 3 02-13-2006 01:46 PM
word count pantera Programming 2 08-31-2004 08:23 AM
word count in a line pantera Programming 4 08-25-2004 02:14 PM
Word count in paragraph - Open Office, Sutekh Linux - Software 10 04-19-2003 11:27 PM


All times are GMT -5. The time now is 04:18 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration