LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   extract substring using sed and regular expressions (regexp) (https://www.linuxquestions.org/questions/programming-9/extract-substring-using-sed-and-regular-expressions-regexp-702074/)

lindylex 02-03-2009 07:57 PM

extract substring using sed and regular expressions (regexp)
 
I would like to extract a number substring using sed.
echo "ifeelfat398pounds" | sed -n -e '/[0-9]/,/[0-9]/p'

This is a very simple task but I have tried lots of combinations and have failed.

I want to extract this. "398"

geek745 02-03-2009 08:45 PM

isn't a good regexp
Code:

/\d+/
the "+" requires at least one occurrence and \d is a synonym for [0-9]

see www.regular-expressions.info

EDIT:
I tried your command, which returned the entire line, leading me to believe that sed's behavior is to return any line that matches, when given as you had it. I tried the substitution command, as follows:
Code:

echo "ifeelfat398pounds" | sed -n -e 's/.*([0-9]+).*/\1/p'
with these results:
Code:

sed: -e expression #1, char 19: invalid reference \1 on `s' command's RHS
Using instead the documentation's reference to a special character that represents only what was matched ('&'), I got nothing back:
Code:

echo "ifeelfat398pounds" | sed -n -e 's/([0-9]+)/&/p'
$echo "ifeelfat398pounds" | sed -n -e 's/([0-9]+)/&/p'

Hopefully someone with more sed experience than myself can help you out...

syg00 02-03-2009 09:22 PM

Regex (unfortunately) ain't regex. For sed you'll need [:digit:] or [0-9].
And it'll look ugly - grep is a better tool for this (have a look at -o switch)

sal_paradise42 02-03-2009 09:29 PM

you can also use perl

Code:

echo "ifeelfat398pounds" | perl -wlne 'print $1 if /(\d+)/'

jschiwal 02-03-2009 09:34 PM

I think the first poster was using extended regular expressions with regular sed, so the parenthesis need to be escaped:

Code:

echo 'iweigh297lbs' | sed 's/.*[^0-9]\([0-9][0-9]*\).*/\1/g'


If you want to replace the non-alpha characters, they need to be matched in the pattern to replace.

Sed will use the longest matching pattern, so I couldn't use '/.*\([0-9][0-9]*\)/' because the first pattern .* would match 'iweigh29', leaving just 7 to match the second pattern.

lindylex 02-03-2009 10:53 PM

Jschiwal, thanks it works well.

echo 'iweigh297lbs' | sed 's/.*[^0-9]\([0-9][0-9]*\).*/\1/g'

ghostdog74 02-04-2009 08:36 AM

no need for external tools
Code:

# a="ifeelfat398pounds"
# b=${a//[a-zA-Z]/}
# echo $b


gnashley 02-04-2009 10:58 AM

ghostdog -what is the difference between these two:
b=${a//[a-zA-Z]/}
b=${a/[a-zA-Z]/}
Is the first example equivalent to using the 'g' with sed? If so I've been looking for that.

ghostdog74 02-04-2009 07:26 PM

Quote:

Originally Posted by gnashley (Post 3432038)
Is the first example equivalent to using the 'g' with sed? If so I've been looking for that.

yes, it means global replacement. pls check the bash guide in my sig for more details.

David the H. 02-04-2009 10:44 PM

Quote:

Originally Posted by ghostdog74 (Post 3431870)
no need for external tools
Code:

# a="ifeelfat398pounds"
# b=${a//[a-zA-Z]/}
# echo $b


This string won't work if there are any punctuation marks or characters outside of the english alphabet.

Try b=${a//[^0-9]/} instead.

But even this would lead to problems if there were more than one set of numbers in the string. Something like "ifeelfat398poundsand15ounces." would give you an output of "39815".

It would be nice if we could use full regex expressions inside of parameter substitution. Does anyone know if it's possible?

lindylex 02-04-2009 11:52 PM

ghostdog74 and David the H. that was nice. This is even better. I like bash scripting and am trying to utilize it to it's fullest. A goal is to minimize using other languages or external tools.

Thanks so much for the input so far.

Lex

gnashley 02-05-2009 02:08 AM

Thanks ghostdog, I had tried to find that info before but couldn't find or make sense of what I was reading, I guess. The bash man-page is like an epic...

ghostdog74 02-06-2009 05:22 AM

Quote:

Originally Posted by David the H. (Post 3432612)
This string won't work if there are any punctuation marks or characters outside of the english alphabet.

Try b=${a//[^0-9]/} instead.

But even this would lead to problems if there were more than one set of numbers in the string. Something like "ifeelfat398poundsand15ounces." would give you an output of "39815".

It would be nice if we could use full regex expressions inside of parameter substitution. Does anyone know if it's possible?

Code:

# a="ifeelfat398poundsand15ounces"
# b=${a//[a-zA-Z]/ }
# set -- $b
# echo $1
398
# echo $2
15


lindylex 02-06-2009 11:53 AM

ghostdog74 is "shopt -s -o nounset" the same as this "# set --" from your example?

warrentaylor 12-21-2009 05:40 PM

same problem, sort of
 
I am having the same problem....sort of. I want to extract a combination of character if they exist. If they don't exist, I want nothing. My problem is that if my pattern doesn't exist, I get the whole line returned.

if I have .....AA9999999999999999....., I want AA9999999999999999
if I have ............................, I want nothing.

where AA9999999999999999 is 2 capital alphas followed by 16 numerics.

I use 's/.*\(AA[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]\).*/\1/'

because \{16\} as a repeater doesn't work.


All times are GMT -5. The time now is 12:18 PM.