LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Count number of times ONE punctuation mark occurs in a file (https://www.linuxquestions.org/questions/linux-newbie-8/count-number-of-times-one-punctuation-mark-occurs-in-a-file-4175598067/)

L_Carver 01-23-2017 09:16 AM

Count number of times ONE punctuation mark occurs in a file
 
My "add keywords" script uses both carat ("^") and comma (",") as delimiters. The text files for input should have one of each, but often there are multiple occurrences of periods (".") in them, and I want to check the files before hand from the command line to make sure the period "." occurs no more than once.

Code:

grep -o . foo | wc -l
in one file returns 137, even though there is one and only one period in the file.

Code:

cat foo |echo $x | tr -d -c '.' | wc -m
returns 0.

I know I must be doing something wrong, but mu question is, what am I doing wrong? This is one of those instances which proves Google is entirely useless; if it weren't I'd have found a solution there and wouldn't be asking this question.

Please help.

Carver

szboardstretcher 01-23-2017 09:24 AM

Gives the answer 3.

Code:

echo "this. and. that." | tr -cd "\." | wc -c
Pipe your file through and it should count the intsances of the period. In regex a period signifies 'any character' so it has to be escaped.

suicidaleggroll 01-23-2017 10:06 AM

Quote:

Originally Posted by L_Carver (Post 5659042)
Code:

grep -o . foo | wc -l
in one file returns 137, even though there is one and only one period in the file.

Grep match strings use regular expressions. In a regex, '.' is equivalent to '?', which matches any character. To match a literal '.' you need to delimit it:
Code:

grep -o '\.' foo | wc -l

dlb101010 01-23-2017 10:48 AM

If you guys don't mind a related question, the output of 'wc' baffles me.
For example running 'wc' without any options,
Code:

$ echo "this. and. that." | grep -o '\.' | wc 
      3      3      6

I would have predicted just one newline (from the 'echo' command).

Where does the newline number come from in this example?

Thanks,
Dave

[No sooner did I post this then it occurred to me that the newlines probably come from the three instances of grep finding the three periods. Sorry for the clutter.]

DavidMcCann 01-23-2017 10:58 AM

@ Dave
The output from "grep -o" is (to quote) "Print the matched parts of a matching line, with each such part on a separate output line" so in this case you get three lines with "." on them.

szboardstretcher 01-23-2017 11:36 AM

My 'tr' example will extract only matching characters and then use 'wc -c' to count the characters. If you use 'grep -o' you will end up with characters + '\n' on seperate lines and you will have to count with 'wc -l'.

L_Carver 02-21-2017 08:22 PM

Quote:

Originally Posted by szboardstretcher (Post 5659045)
Gives the answer 3.

Code:

echo "this. and. that." | tr -cd "\." | wc -c
Pipe your file through and it should count the intsances of the period. In regex a period signifies 'any character' so it has to be escaped.

Reading this and the other replies, I like this method the best, since it's done the job the few times I've applied it to the task I meant to find a method for. Sorry if that sounds loop-y and redundant.

Carver


All times are GMT -5. The time now is 11:02 PM.