Quote:
Originally Posted by danielbmartin
Is my /usr/share/dict/words different from yours?
|
Yes, but it is no big deal. Mine is from
dictionaries-common-1.11.5ubuntu1.
Quote:
Originally Posted by danielbmartin
Further, I had never heard of mawk and discover that it exists on my PC. Are there functional differences? If mawk is more efficient, should I use it routinely instead of awk?
|
Yes, there are functional differences. If you look at
The GNU Awk User's Manual, some functions (
asort() for example) are marked with a
#, meaning they are GNU extensions. (Sorting is often extremely useful, and although you
can implement it yourself as an awk function, using GNU awk extensions when you need them makes sense to me.)
Also, GNU awk can process NUL-separated ('\0' separators, just like strings in C) input simply by setting RS and/or FS to "\0". mawk cannot. I personally use gnu awk (
gawk) for e.g. filename mangling (supplying them via
find ... -print0 or
-printf '%p\0' or similar); this handles all possible filenames in Linux (in any character set supported in Linux, too, if you set
LANG=C LC_ALL=C to avoid errors due to invalid UTF-8 sequences).
It seems that
mawk would be more efficient for text-formatted data conversions, where sorting or NULs are not needed. For example, setting
RS="[\t\n\v\f\r ]*<";FS=">[\t\n\v\f\r ]*" in a BEGIN rule would give you XML tags in
$1 (including attributes) and all immediate content in
$2 . Obviously it gives no full XML support -- no CDATA sections or comments, for example --, and if you want structure, you need to keep a tag stack. But, if you only need to convert massive amounts of logically flat XML data, and you know it is parseable with awk, a simple script will often suffice. Awk is pretty efficient, after all; in this case it will even stream the data, one tag + immediate text content that follows, at a time, so it'll need very little memory, too.
In general, all awks try pretty hard to be
POSIX-compatible, and are to a large extent interchangeable. Aside from extensions and some bugs, of course. The POSIX standard for awk is quite close to historic implementations -- awk was first developed at Bell Labs in the 1970s.