Published at LXer:
When cleaning UTF-8 text files I sometimes come across invisible characters that I call 'gremlins'.These aren't the usual non-printing characters, like whitespace and (horizontal) tab, which are non-printing characters I expect to find in the plain text files I work with. Gremlins are weird things like 'vertical tab', 'device control 2' and 'soft hyphen'. I don't know how they got into the files, but I want to get rid of them.
Read More...