Count number of times ONE punctuation mark occurs in a file
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Count number of times ONE punctuation mark occurs in a file
My "add keywords" script uses both carat ("^") and comma (",") as delimiters. The text files for input should have one of each, but often there are multiple occurrences of periods (".") in them, and I want to check the files before hand from the command line to make sure the period "." occurs no more than once.
Code:
grep -o . foo | wc -l
in one file returns 137, even though there is one and only one period in the file.
Code:
cat foo |echo $x | tr -d -c '.' | wc -m
returns 0.
I know I must be doing something wrong, but mu question is, what am I doing wrong? This is one of those instances which proves Google is entirely useless; if it weren't I'd have found a solution there and wouldn't be asking this question.
in one file returns 137, even though there is one and only one period in the file.
Grep match strings use regular expressions. In a regex, '.' is equivalent to '?', which matches any character. To match a literal '.' you need to delimit it:
I would have predicted just one newline (from the 'echo' command).
Where does the newline number come from in this example?
Thanks,
Dave
[No sooner did I post this then it occurred to me that the newlines probably come from the three instances of grep finding the three periods. Sorry for the clutter.]
@ Dave
The output from "grep -o" is (to quote) "Print the matched parts of a matching line, with each such part on a separate output line" so in this case you get three lines with "." on them.
My 'tr' example will extract only matching characters and then use 'wc -c' to count the characters. If you use 'grep -o' you will end up with characters + '\n' on seperate lines and you will have to count with 'wc -l'.
Pipe your file through and it should count the intsances of the period. In regex a period signifies 'any character' so it has to be escaped.
Reading this and the other replies, I like this method the best, since it's done the job the few times I've applied it to the task I meant to find a method for. Sorry if that sounds loop-y and redundant.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.