LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 01-19-2014, 02:44 PM   #1
BeachHead
LQ Newbie
 
Registered: May 2012
Location: Germany
Distribution: Arch, AOSP
Posts: 24

Rep: Reputation: 2
sed - keep last word only (need explanation)


Hallo,
i want to keep only the last word from a line so i've written this:
Code:
echo 'aaa' | sed 's/.* \([^ ]*$\)/\1/'
echo 'aaa bbb' | sed 's/.* \([^ ]*$\)/\1/'
echo 'aaa bbb ccc' | sed 's/.* \([^ ]*$\)/\1/'
echo 'aaa bbb ccc ddd' | sed 's/.* \([^ ]*$\)/\1/'
echo 'aaa bbb ccc ddd eee' | sed 's/.* \([^ ]*$\)/\1/'
Code:
aaa
bbb
ccc
ddd
eee
Works properly so far but its not quite clear why. The intention was to match the whole line and output just a selected part (marked with () and \1).

My first attempt was rather this (it has no space after .*) and it didn't work (empty output):
Code:
echo 'aaa bbb ccc' | sed 's/.*\([^ ]*$\)/\1/'
So, what does this additional space actually do? Is it a separator in a regular expression? Will the second match '[^ ]*$' (if its a second match at all) inherit the result from the first one '.*'? It doesn't look like its treated literally.

Any idea?

Edit:
Tested on GNU sed (cygwin).

Last edited by BeachHead; 01-19-2014 at 02:54 PM.
 
Old 01-19-2014, 02:58 PM   #2
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976Reputation: 1976
The problem is that .* matches everything till the end of the line, so that nothing is left to match. You can understand this if you try:
Code:
echo 'aaa bbb ccc' | sed 's/.*\([^ ]\+\)/\1/'
where at least one character (not blank space) must be matched at the end of the line. Adding a space after .* makes it match everything until the last blank space in the line.
 
1 members found this post helpful.
Old 01-19-2014, 04:55 PM   #3
BeachHead
LQ Newbie
 
Registered: May 2012
Location: Germany
Distribution: Arch, AOSP
Posts: 24

Original Poster
Rep: Reputation: 2
I think i got it. Not sure if we mean the same here though.
The space actually is literal so its a SPACE,NON-SPACE condition that breaks '.*'. Your illustration code just shows the last character 'c' as the condition is just last NON-SPACE character.

One can imagine that match as some kind of a backwards or right-to-left parser here.

So, valid code should be:
Code:
echo 'aaa bbb ccc' | sed 's/.* \([^ ]\)/\1/'

Last edited by BeachHead; 01-19-2014 at 05:02 PM.
 
Old 01-19-2014, 05:32 PM   #4
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 14,845

Rep: Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823
Brave person to ask advice then argue with @colucix about regex ...

I've always found it useful to force extended regex when it's available due to the different regex engines in use, and avoids all/most of those escape characters. Also get comfortable with using character classes (especially [:space:]) - catches things like tabs and also makes things more obvious when posting code.
 
Old 01-19-2014, 05:51 PM   #5
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 14,845

Rep: Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823Reputation: 1823
Quote:
Originally Posted by BeachHead View Post
One can imagine that match as some kind of a backwards or right-to-left parser here.
Forgot to comment on this - do a search for "regex greediness" - this is really viltall to understanding and using) regex.
 
1 members found this post helpful.
Old 01-19-2014, 06:47 PM   #6
BeachHead
LQ Newbie
 
Registered: May 2012
Location: Germany
Distribution: Arch, AOSP
Posts: 24

Original Poster
Rep: Reputation: 2
Yep, thx to you both. I still have to learn a lot indeed.
 
Old 01-20-2014, 03:06 AM   #7
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,255

Rep: Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686Reputation: 2686
Quote:
i want to keep only the last word from a line
Whilst sed can perform the task, when it comes to investigating columns I would tend to use awk first:
Code:
awk '{print $NF}'
So I guess it depends on whether you wish to improve your regex skills or get the job done?
 
1 members found this post helpful.
Old 01-20-2014, 01:18 PM   #8
BeachHead
LQ Newbie
 
Registered: May 2012
Location: Germany
Distribution: Arch, AOSP
Posts: 24

Original Poster
Rep: Reputation: 2
Nice, thx. Looks cleaner and easier to read even though its 15% slower here on a 1000 loop.
Code:
sed: 0m48.627s
awk: 0m57.404s
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
sed append word at end of line if word is missing franjo124 Linux - Newbie 3 03-08-2012 09:41 PM
Search word and delete only the word and the line using Sed command kbmukesh Linux - Newbie 4 06-28-2011 07:35 AM
[SOLVED] Delete range word to word with sed when all you have is one line subby80 Linux - Enterprise 16 09-02-2010 06:04 AM
SED @ explanation wood Programming 3 05-13-2008 07:41 AM
sed help, replacing a letter with a word GridX Linux - Newbie 1 09-24-2003 11:21 AM


All times are GMT -5. The time now is 03:58 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration