Published at LXer:
In my data-cleaning work I often make up tallies of selected individual characters from big, UTF-8-encoded data files. What's the best way to do this? As shown below, I've tried grep/sort/uniq, AWK and Ruby, and AWK's the fastest. The trials also revealed an unexpected problem with the uniq program in GNU coreutils.
Read More...