Character counting with sed/wc not working as expected.
Hello.
I am trying to count characters in a string containing a mix of characters and digits. (part of a program that parses usernames/UIDs/groupnames/GIDs. My requirements are that it must be able to handle arbitrary input (UID vs. username), and must use Bash. No problem, I thought - sed with a regex to strip out non alpha chars, pipe to wc and count either characters or bytes (I went with characters, but it doesn't change my output) Perhaps my machine is hosed, (just restored the VM to initial build, kernel is 2.6.18-128.el5, running RHEL 5.3) Here is the weirdness: Code:
echo "abc12" | sed 's/[^a-z][^A-Z]//g' | wc -m Thanks, SO |
Where do you come up with the -m option for wc? -c will count characters.
The echo command will add the line return. Use echo's -n option to suppress this. The regex should be [^a-zA-Z] or [^[:alpha:]] |
The problem is with the regex.
Code:
sed 's/[^a-z][^A-Z]//g' |
See if this enlightens you at all, change wc -m for od -c
|
The improved regex pattern fixed the issue. I had not previously thought that the return character was what was adding the extra char, but thanks for pointing that out as well. Script runs very nicely at this point.
|
Well I would mention other option:
Code:
sed 's/[:alpha:]//g' |
sed is overkill here, use tr. And you might try printf instead of echo.
Code:
printf '%s' 'abcde12345' | tr -cd '[:alpha:]' | wc -c Code:
text='abcde12345' |
All times are GMT -5. The time now is 02:43 AM. |