LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   sed regex and removing 'whitespace' (https://www.linuxquestions.org/questions/linux-newbie-8/sed-regex-and-removing-whitespace-941153/)

uncle-c 04-22-2012 01:30 PM

sed regex and removing 'whitespace'
 
Was just reading the classic 'Sed One Liners' and I came up with this problem.

Code:

$ cat file
one
  two
    three
 $

Could someone explain why
Code:

$ sed 's/^[ \t]*//' file
removes any leading tabs & white spaces whereas

Code:

  sed 's/^[ \t]+//' file
does not ? What is the subtle difference between the two which causes only the former to remove leading white spaces from each line ?

Snark1994 04-22-2012 01:42 PM

Because the '+' is matching literally. You need

Code:

sed 's/^[ \t]\+//' file

David the H. 04-22-2012 10:58 PM

To be more specific, it's the difference between basic and extended regular expressions. grep and sed use basic regex by default, and most of the more advanced regex devices like '+' are not supported.

But gnu grep and sed also offer extended regex, which allows you to "activate" the special meanings of the characters by backslashing them. Perhaps a better way to do it, however, is to enable them globally with the use of "grep -E" and "sed -r". Then the behavior becomes reversed; the special meanings are enabled by default, and backslash escaping them makes them literal.

Code:

sed -r 's/^[ \t]+//' file
The grep man page goes into good detail about basic vs. extended regex.

Incidentally, if all you want to do is remove all instances of (a) certain character(s), you'll get better performance with tr.

Code:

tr -d '[ \t]' <file

uncle-c 04-23-2012 03:27 AM

Cheers guys. I had been using tr but knew that there was a method using sed. It was only when I read the Sed One Liners page that the '+' problem got me thinking. Could you somehow use a white space character class - [:space:] instead of ' [\t] ' to achieve the same result ?

colucix 04-23-2012 03:52 AM

Quote:

Originally Posted by uncle-c (Post 4660493)
Could you somehow use a white space character class - [:space:] instead of ' [\t] ' to achieve the same result ?

Yes, but it is available using the extended regexp as well:
Code:

sed -r 's/^[[:space:]]+//' file

David the H. 04-23-2012 08:25 AM

Note that the [:space:] character class covers several other characters as well; the full list being tab, newline, vertical tab, form feed, carriage return, and space. There's also [:blank:] which contains only the regular space and tab characters, and so is exactly equivalent to the above.

The grep info page is one place you'll find definitions for what the various classes cover.


All times are GMT -5. The time now is 10:26 AM.