Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place! |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
01-23-2017, 10:16 AM
|
#1
|
Member
Registered: Sep 2016
Location: Webster MA USA
Posts: 243
Rep: 
|
Count number of times ONE punctuation mark occurs in a file
My "add keywords" script uses both carat ("^") and comma (",") as delimiters. The text files for input should have one of each, but often there are multiple occurrences of periods (".") in them, and I want to check the files before hand from the command line to make sure the period "." occurs no more than once.
Code:
grep -o . foo | wc -l
in one file returns 137, even though there is one and only one period in the file.
Code:
cat foo |echo $x | tr -d -c '.' | wc -m
returns 0.
I know I must be doing something wrong, but mu question is, what am I doing wrong? This is one of those instances which proves Google is entirely useless; if it weren't I'd have found a solution there and wouldn't be asking this question.
Please help.
Carver
|
|
|
01-23-2017, 10:24 AM
|
#2
|
Senior Member
Registered: Aug 2006
Location: Detroit, MI
Distribution: GNU/Linux systemd
Posts: 4,278
|
Gives the answer 3.
Code:
echo "this. and. that." | tr -cd "\." | wc -c
Pipe your file through and it should count the intsances of the period. In regex a period signifies 'any character' so it has to be escaped.
Last edited by szboardstretcher; 01-23-2017 at 10:26 AM.
|
|
|
01-23-2017, 11:06 AM
|
#3
|
LQ Guru
Registered: Nov 2010
Location: Colorado
Distribution: OpenSUSE, CentOS
Posts: 5,573
|
Quote:
Originally Posted by L_Carver
Code:
grep -o . foo | wc -l
in one file returns 137, even though there is one and only one period in the file.
|
Grep match strings use regular expressions. In a regex, '.' is equivalent to '?', which matches any character. To match a literal '.' you need to delimit it:
Code:
grep -o '\.' foo | wc -l
|
|
|
01-23-2017, 11:48 AM
|
#4
|
Member
Registered: Dec 2016
Posts: 61
Rep:
|
If you guys don't mind a related question, the output of 'wc' baffles me.
For example running 'wc' without any options,
Code:
$ echo "this. and. that." | grep -o '\.' | wc
3 3 6
I would have predicted just one newline (from the 'echo' command).
Where does the newline number come from in this example?
Thanks,
Dave
[No sooner did I post this then it occurred to me that the newlines probably come from the three instances of grep finding the three periods. Sorry for the clutter.]
Last edited by dlb101010; 01-23-2017 at 11:52 AM.
|
|
|
01-23-2017, 11:58 AM
|
#5
|
LQ Veteran
Registered: Jul 2006
Location: London
Distribution: PCLinuxOS, Salix
Posts: 6,213
|
@ Dave
The output from "grep -o" is (to quote) "Print the matched parts of a matching line, with each such part on a separate output line" so in this case you get three lines with "." on them.
|
|
1 members found this post helpful.
|
01-23-2017, 12:36 PM
|
#6
|
Senior Member
Registered: Aug 2006
Location: Detroit, MI
Distribution: GNU/Linux systemd
Posts: 4,278
|
My 'tr' example will extract only matching characters and then use 'wc -c' to count the characters. If you use 'grep -o' you will end up with characters + '\n' on seperate lines and you will have to count with 'wc -l'.
|
|
1 members found this post helpful.
|
02-21-2017, 09:22 PM
|
#7
|
Member
Registered: Sep 2016
Location: Webster MA USA
Posts: 243
Original Poster
Rep: 
|
Quote:
Originally Posted by szboardstretcher
Gives the answer 3.
Code:
echo "this. and. that." | tr -cd "\." | wc -c
Pipe your file through and it should count the intsances of the period. In regex a period signifies 'any character' so it has to be escaped.
|
Reading this and the other replies, I like this method the best, since it's done the job the few times I've applied it to the task I meant to find a method for. Sorry if that sounds loop-y and redundant.
Carver
|
|
|
All times are GMT -5. The time now is 09:51 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|