LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (http://www.linuxquestions.org/questions/linux-general-1/)
-   -   Bash: $'\x00' -- What is this? (http://www.linuxquestions.org/questions/linux-general-1/bash-%24%5Cx00-what-is-this-4175439637/)

romagnolo 12-01-2012 08:40 PM

Bash: $'\x00' -- What is this?
 
Our super-modern commerce-oriented web search engines are simply too stupid to allow programmers to search information about specific sequences of symbols... Well, this is why I search for a solution here.

I came across a totally new, never-seen-before Bash expansion or substitution; this is constituted of sequences of characters like
Code:

$'\x00\x10\x20'
What it does is the replacement of the UTF-8 bytes sequence with the represented character.

I've never seen this in Bash's manual (and I'm leaned to think this is not described there at all) nor everywhere else.

Do you have any information on this construct, such as its name, or a document where this is described?

druuna 12-02-2012 03:51 AM

Those are hexadecimal eight-bit characters that can be used by many programs. Have a look at man ascii for their specific meaning.

This from the bash man page;
Code:

Words of the form $'string' are treated specially.  The word expands to
      string, with backslash-escaped characters replaced as specified by  the
      ANSI  C  standard.  Backslash escape sequences, if present, are decoded
      as follows:
              \a    alert (bell)
              \b    backspace
              \e
              \E    an escape character
              \f    form feed
              \n    new line
              \r    carriage return
              \t    horizontal tab
              \v    vertical tab
              \\    backslash
              \'    single quote
              \"    double quote
              \nnn  the eight-bit character whose value is  the  octal  value
                    nnn (one to three digits)
              \xHH  the  eight-bit  character  whose value is the hexadecimal
                    value HH (one or two hex digits)

              \cx    a control-x character


      The expanded result is single-quoted, as if the  dollar  sign  had  not
      been present.

Example:
Code:

# using hexadecimal values
$ echo  $'\x21\x22\x23'
!"#

# using octal values:
$ echo '!~' | tr '\041\176' 'X'
XX

PS: They are not UTF-8 specific.

romagnolo 12-02-2012 08:39 AM

Thank you, that's very useful. I didn't know the we had a man ascii too.
If only there was an effective way to parse across man pages..

jpollard 12-02-2012 09:35 AM

You will note that the first character is 0x00 - which is null. The sequence is the hex representation of a specific UTF32 glyph. Interpretation of the sequence is under the control of the font being used by the terminal emulator...

druuna 12-02-2012 09:51 AM

Quote:

Originally Posted by romagnolo (Post 4841173)
Thank you, that's very useful. I didn't know the we had a man ascii too.
If only there was an effective way to parse across man pages..

You do know that you can search all the man pages present on your box?
Code:

$ man -k utf-8
utf8 (7)            - an ASCII compatible multibyte Unicode encoding
FcStrCmp (3)        - compare UTF-8 strings
FcStrCmpIgnoreCase (3) - compare UTF-8 strings ignoring case
FcStrStr (3)        - locate UTF-8 substring
FcStrStrIgnoreCase (3) - locate UTF-8 substring ignoring ASCII case
FcUcs4ToUtf8 (3)    - convert UCS4 to UTF-8
FcUtf8Len (3)        - count UTF-8 encoded chars
FcUtf8ToUcs4 (3)    - convert UTF-8 to UCS4
utf-8 (7)            - an ASCII compatible multibyte Unicode encoding
uxterm (1)          - X terminal emulator for Unicode (UTF-8) environments

$ man -k ascii
aaxine (1)          - an ASCII art video player
ascii (7)            - the ASCII character set encoded in octal, decimal, and hexadecimal
asciitopgm (1)      - convert ASCII graphics into a portable graymap
asctime (3)          - transform date and time to broken-down time or ASCII
asctime_r (3)        - transform date and time to broken-down time or ASCII
ctime (3)            - transform date and time to broken-down time or ASCII
.
.
.
strtold (3)          - convert ASCII string to floating-point number
toascii (3)          - convert character to ASCII
utf-8 (7)            - an ASCII compatible multibyte Unicode encoding
utf8 (7)            - an ASCII compatible multibyte Unicode encoding

man man for details (Yes, there's a man page for the man page ;) ).

chrism01 12-03-2012 04:26 AM

... and if you don't like computerese eg switches (-k), there normally is an alias 'apropos=man -k', thus
Code:

apropos utf8
returns the same answers :)


All times are GMT -5. The time now is 04:23 PM.