I am trying to count characters in a string containing a mix of characters and digits. (part of a program that parses usernames/UIDs/groupnames/GIDs. My requirements are that it must be able to handle arbitrary input (UID vs. username), and must use Bash.
No problem, I thought - sed with a regex to strip out non alpha chars, pipe to wc and count either characters or bytes (I went with characters, but it doesn't change my output)
Perhaps my machine is hosed, (just restored the VM to initial build, kernel is 2.6.18-128.el5, running RHEL 5.3)
Here is the weirdness:
echo "abc12" | sed 's/[^a-z][^A-Z]//g' | wc -m
echo "abc123" | sed 's/[^a-z][^A-Z]//g' | wc -m
echo "abc1234" | sed 's/[^a-z][^A-Z]//g' | wc -m
echo "abc12345" | sed 's/[^a-z][^A-Z]//g' | wc -m
For whatever reason, the result is always at least 1 higher than it should be. No problems there, it is easy to subtract by one. The weirdness is this: If the number of numerical digits is odd, wc finds an additional character. If the number of numerical digits is even, wc doesn't find the additional character. I don't *see* anything wrong with the regex, but am frankly baffled as to what is going on at this point. Any ideas? Does anyone else have the same behavior with sed/wc?