LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 04-22-2012, 02:30 PM   #1
uncle-c
Member
 
Registered: Oct 2006
Location: The Ether
Distribution: Fedora 14, Ubuntu , Slax 5.1.8, OpenSolaris, Centos 4.8
Posts: 296

Rep: Reputation: 30
sed regex and removing 'whitespace'


Was just reading the classic 'Sed One Liners' and I came up with this problem.

Code:
 $ cat file
one
  two
    three
 $
Could someone explain why
Code:
 $ sed 's/^[ \t]*//' file
removes any leading tabs & white spaces whereas

Code:
  sed 's/^[ \t]+//' file
does not ? What is the subtle difference between the two which causes only the former to remove leading white spaces from each line ?

Last edited by uncle-c; 04-23-2012 at 05:55 AM.
 
Old 04-22-2012, 02:42 PM   #2
Snark1994
Senior Member
 
Registered: Sep 2010
Location: Wales, UK
Distribution: Arch
Posts: 1,632
Blog Entries: 3

Rep: Reputation: 345Reputation: 345Reputation: 345Reputation: 345
Because the '+' is matching literally. You need

Code:
sed 's/^[ \t]\+//' file
 
1 members found this post helpful.
Old 04-22-2012, 11:58 PM   #3
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957
To be more specific, it's the difference between basic and extended regular expressions. grep and sed use basic regex by default, and most of the more advanced regex devices like '+' are not supported.

But gnu grep and sed also offer extended regex, which allows you to "activate" the special meanings of the characters by backslashing them. Perhaps a better way to do it, however, is to enable them globally with the use of "grep -E" and "sed -r". Then the behavior becomes reversed; the special meanings are enabled by default, and backslash escaping them makes them literal.

Code:
sed -r 's/^[ \t]+//' file
The grep man page goes into good detail about basic vs. extended regex.

Incidentally, if all you want to do is remove all instances of (a) certain character(s), you'll get better performance with tr.

Code:
tr -d '[ \t]' <file
 
1 members found this post helpful.
Old 04-23-2012, 04:27 AM   #4
uncle-c
Member
 
Registered: Oct 2006
Location: The Ether
Distribution: Fedora 14, Ubuntu , Slax 5.1.8, OpenSolaris, Centos 4.8
Posts: 296

Original Poster
Rep: Reputation: 30
Cheers guys. I had been using tr but knew that there was a method using sed. It was only when I read the Sed One Liners page that the '+' problem got me thinking. Could you somehow use a white space character class - [:space:] instead of ' [\t] ' to achieve the same result ?
 
Old 04-23-2012, 04:52 AM   #5
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976
Quote:
Originally Posted by uncle-c View Post
Could you somehow use a white space character class - [:space:] instead of ' [\t] ' to achieve the same result ?
Yes, but it is available using the extended regexp as well:
Code:
sed -r 's/^[[:space:]]+//' file
 
1 members found this post helpful.
Old 04-23-2012, 09:25 AM   #6
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957
Note that the [:space:] character class covers several other characters as well; the full list being tab, newline, vertical tab, form feed, carriage return, and space. There's also [:blank:] which contains only the regular space and tab characters, and so is exactly equivalent to the above.

The grep info page is one place you'll find definitions for what the various classes cover.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
sed whitespace substitution problem. arashi256 Linux - Newbie 18 07-17-2009 12:03 PM
sed, replacing underscore with whitespace fjkum Programming 3 10-31-2007 01:09 AM
regex with sed to process file, need help on regex dwynter Linux - Newbie 5 08-31-2007 06:10 AM
BASH: Removing ALL whitespace from variable eur0dad Programming 1 09-07-2006 11:25 AM
Whitespace parsing sed? carl.waldbieser Programming 1 12-12-2005 05:24 PM


All times are GMT -5. The time now is 12:50 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration