sundialsvcs |
03-16-2017 11:11 AM |
The difference ... is ... Unicode. :)
Here's a paragraph that might be relevant, from Perl's perldoc perlre: (emphasis mine)
Quote:
Unlike most locales, which are specific to a language and country pair, Unicode classifies all the characters that are letters somewhere in the world as "\w". For example, your locale might not think that "LATIN SMALL LETTER ETH" is a letter (unless you happen to speak Icelandic), but Unicode does.
Similarly, all the characters that are decimal digits somewhere in the world will match "\d"; this is hundreds, not 10, possible matches. And some of those digits look like some of the 10 ASCII digits, but mean a different number, so a human could easily think a number is a different quantity than it really is. For example, "BENGALI DIGIT FOUR" (U+09EA) looks very much like an "ASCII DIGIT EIGHT" (U+0038). And, "\d+", may match strings of digits that are a mixture from different writing systems, creating a security issue. [...] The "/a"modifier can be used to force "\d" to match just the ASCII 0 through 9.
|
Perl's implementation of regular expressions is a de facto standard, duplicated by most other languages, therefore this discussion should be directly relevant.
See also a discussion of so-called "POSIX bracket expressions" (N.B. not "character classes" ...) e.g. here. Particularly, in this case, [:digit:].
|