LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 02-03-2009, 07:57 PM   #1
lindylex
Member
 
Registered: Mar 2007
Posts: 192

Rep: Reputation: 17
extract substring using sed and regular expressions (regexp)


I would like to extract a number substring using sed.
echo "ifeelfat398pounds" | sed -n -e '/[0-9]/,/[0-9]/p'

This is a very simple task but I have tried lots of combinations and have failed.

I want to extract this. "398"

Last edited by lindylex; 02-03-2009 at 08:00 PM.
 
Old 02-03-2009, 08:45 PM   #2
geek745
Member
 
Registered: Jul 2004
Location: Alton, IL
Distribution: Linux Mint; Slackware; Ubuntu; Slax
Posts: 172
Blog Entries: 2

Rep: Reputation: 34
isn't a good regexp
Code:
/\d+/
the "+" requires at least one occurrence and \d is a synonym for [0-9]

see www.regular-expressions.info

EDIT:
I tried your command, which returned the entire line, leading me to believe that sed's behavior is to return any line that matches, when given as you had it. I tried the substitution command, as follows:
Code:
echo "ifeelfat398pounds" | sed -n -e 's/.*([0-9]+).*/\1/p'
with these results:
Code:
sed: -e expression #1, char 19: invalid reference \1 on `s' command's RHS
Using instead the documentation's reference to a special character that represents only what was matched ('&'), I got nothing back:
Code:
echo "ifeelfat398pounds" | sed -n -e 's/([0-9]+)/&/p'
$echo "ifeelfat398pounds" | sed -n -e 's/([0-9]+)/&/p'
Hopefully someone with more sed experience than myself can help you out...

Last edited by geek745; 02-03-2009 at 09:13 PM.
 
Old 02-03-2009, 09:22 PM   #3
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,119

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
Regex (unfortunately) ain't regex. For sed you'll need [:digit:] or [0-9].
And it'll look ugly - grep is a better tool for this (have a look at -o switch)
 
Old 02-03-2009, 09:29 PM   #4
sal_paradise42
Member
 
Registered: Jul 2003
Location: Utah
Distribution: Gentoo FreeBSD 5.4
Posts: 150

Rep: Reputation: 16
you can also use perl

Code:
echo "ifeelfat398pounds" | perl -wlne 'print $1 if /(\d+)/'
 
Old 02-03-2009, 09:34 PM   #5
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682
I think the first poster was using extended regular expressions with regular sed, so the parenthesis need to be escaped:

Code:
echo 'iweigh297lbs' | sed 's/.*[^0-9]\([0-9][0-9]*\).*/\1/g'


If you want to replace the non-alpha characters, they need to be matched in the pattern to replace.
Sed will use the longest matching pattern, so I couldn't use '/.*\([0-9][0-9]*\)/' because the first pattern .* would match 'iweigh29', leaving just 7 to match the second pattern.

Last edited by jschiwal; 02-03-2009 at 09:40 PM.
 
Old 02-03-2009, 10:53 PM   #6
lindylex
Member
 
Registered: Mar 2007
Posts: 192

Original Poster
Rep: Reputation: 17
Jschiwal, thanks it works well.

echo 'iweigh297lbs' | sed 's/.*[^0-9]\([0-9][0-9]*\).*/\1/g'
 
Old 02-04-2009, 08:36 AM   #7
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
no need for external tools
Code:
# a="ifeelfat398pounds"
# b=${a//[a-zA-Z]/}
# echo $b
 
Old 02-04-2009, 10:58 AM   #8
gnashley
Amigo developer
 
Registered: Dec 2003
Location: Germany
Distribution: Slackware
Posts: 4,928

Rep: Reputation: 612Reputation: 612Reputation: 612Reputation: 612Reputation: 612Reputation: 612
ghostdog -what is the difference between these two:
b=${a//[a-zA-Z]/}
b=${a/[a-zA-Z]/}
Is the first example equivalent to using the 'g' with sed? If so I've been looking for that.

Last edited by gnashley; 02-04-2009 at 10:59 AM.
 
Old 02-04-2009, 07:26 PM   #9
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Quote:
Originally Posted by gnashley View Post
Is the first example equivalent to using the 'g' with sed? If so I've been looking for that.
yes, it means global replacement. pls check the bash guide in my sig for more details.
 
Old 02-04-2009, 10:44 PM   #10
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Quote:
Originally Posted by ghostdog74 View Post
no need for external tools
Code:
# a="ifeelfat398pounds"
# b=${a//[a-zA-Z]/}
# echo $b
This string won't work if there are any punctuation marks or characters outside of the english alphabet.

Try b=${a//[^0-9]/} instead.

But even this would lead to problems if there were more than one set of numbers in the string. Something like "ifeelfat398poundsand15ounces." would give you an output of "39815".

It would be nice if we could use full regex expressions inside of parameter substitution. Does anyone know if it's possible?
 
Old 02-04-2009, 11:52 PM   #11
lindylex
Member
 
Registered: Mar 2007
Posts: 192

Original Poster
Rep: Reputation: 17
ghostdog74 and David the H. that was nice. This is even better. I like bash scripting and am trying to utilize it to it's fullest. A goal is to minimize using other languages or external tools.

Thanks so much for the input so far.

Lex
 
Old 02-05-2009, 02:08 AM   #12
gnashley
Amigo developer
 
Registered: Dec 2003
Location: Germany
Distribution: Slackware
Posts: 4,928

Rep: Reputation: 612Reputation: 612Reputation: 612Reputation: 612Reputation: 612Reputation: 612
Thanks ghostdog, I had tried to find that info before but couldn't find or make sense of what I was reading, I guess. The bash man-page is like an epic...
 
Old 02-06-2009, 05:22 AM   #13
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
Quote:
Originally Posted by David the H. View Post
This string won't work if there are any punctuation marks or characters outside of the english alphabet.

Try b=${a//[^0-9]/} instead.

But even this would lead to problems if there were more than one set of numbers in the string. Something like "ifeelfat398poundsand15ounces." would give you an output of "39815".

It would be nice if we could use full regex expressions inside of parameter substitution. Does anyone know if it's possible?
Code:
# a="ifeelfat398poundsand15ounces"
# b=${a//[a-zA-Z]/ }
# set -- $b
# echo $1
398
# echo $2
15
 
Old 02-06-2009, 11:53 AM   #14
lindylex
Member
 
Registered: Mar 2007
Posts: 192

Original Poster
Rep: Reputation: 17
ghostdog74 is "shopt -s -o nounset" the same as this "# set --" from your example?

Last edited by lindylex; 02-08-2009 at 08:58 PM.
 
Old 12-21-2009, 05:40 PM   #15
warrentaylor
LQ Newbie
 
Registered: Dec 2009
Posts: 3

Rep: Reputation: 0
same problem, sort of

I am having the same problem....sort of. I want to extract a combination of character if they exist. If they don't exist, I want nothing. My problem is that if my pattern doesn't exist, I get the whole line returned.

if I have .....AA9999999999999999....., I want AA9999999999999999
if I have ............................, I want nothing.

where AA9999999999999999 is 2 capital alphas followed by 16 numerics.

I use 's/.*\(AA[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]\).*/\1/'

because \{16\} as a repeater doesn't work.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
extract substring from string in C baddah Programming 6 02-02-2010 04:22 AM
Sed/awk help with regular expressions needed AP81 Programming 3 07-28-2008 07:26 AM
Extract substring matching a regular expression tikit Linux - General 2 02-18-2008 01:47 PM
Replace substring with SED marri Programming 2 07-09-2005 05:18 PM
Sed and regular expressions tchernobog Linux - Software 2 08-14-2003 12:41 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:34 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration