sed regex and removing 'whitespace'
Was just reading the classic 'Sed One Liners' and I came up with this problem.
Code:
$ cat file Code:
$ sed 's/^[ \t]*//' file Code:
sed 's/^[ \t]+//' file |
Because the '+' is matching literally. You need
Code:
sed 's/^[ \t]\+//' file |
To be more specific, it's the difference between basic and extended regular expressions. grep and sed use basic regex by default, and most of the more advanced regex devices like '+' are not supported.
But gnu grep and sed also offer extended regex, which allows you to "activate" the special meanings of the characters by backslashing them. Perhaps a better way to do it, however, is to enable them globally with the use of "grep -E" and "sed -r". Then the behavior becomes reversed; the special meanings are enabled by default, and backslash escaping them makes them literal. Code:
sed -r 's/^[ \t]+//' file Incidentally, if all you want to do is remove all instances of (a) certain character(s), you'll get better performance with tr. Code:
tr -d '[ \t]' <file |
Cheers guys. I had been using tr but knew that there was a method using sed. It was only when I read the Sed One Liners page that the '+' problem got me thinking. Could you somehow use a white space character class - [:space:] instead of ' [\t] ' to achieve the same result ?
|
Quote:
Code:
sed -r 's/^[[:space:]]+//' file |
Note that the [:space:] character class covers several other characters as well; the full list being tab, newline, vertical tab, form feed, carriage return, and space. There's also [:blank:] which contains only the regular space and tab characters, and so is exactly equivalent to the above.
The grep info page is one place you'll find definitions for what the various classes cover. |
All times are GMT -5. The time now is 10:26 AM. |