extract substring using sed and regular expressions (regexp)
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
EDIT:
I tried your command, which returned the entire line, leading me to believe that sed's behavior is to return any line that matches, when given as you had it. I tried the substitution command, as follows:
Code:
echo "ifeelfat398pounds" | sed -n -e 's/.*([0-9]+).*/\1/p'
Regex (unfortunately) ain't regex. For sed you'll need [:digit:] or [0-9].
And it'll look ugly - grep is a better tool for this (have a look at -o switch)
I think the first poster was using extended regular expressions with regular sed, so the parenthesis need to be escaped:
Code:
echo 'iweigh297lbs' | sed 's/.*[^0-9]\([0-9][0-9]*\).*/\1/g'
If you want to replace the non-alpha characters, they need to be matched in the pattern to replace.
Sed will use the longest matching pattern, so I couldn't use '/.*\([0-9][0-9]*\)/' because the first pattern .* would match 'iweigh29', leaving just 7 to match the second pattern.
ghostdog -what is the difference between these two:
b=${a//[a-zA-Z]/}
b=${a/[a-zA-Z]/}
Is the first example equivalent to using the 'g' with sed? If so I've been looking for that.
This string won't work if there are any punctuation marks or characters outside of the english alphabet.
Try b=${a//[^0-9]/} instead.
But even this would lead to problems if there were more than one set of numbers in the string. Something like "ifeelfat398poundsand15ounces." would give you an output of "39815".
It would be nice if we could use full regex expressions inside of parameter substitution. Does anyone know if it's possible?
ghostdog74 and David the H. that was nice. This is even better. I like bash scripting and am trying to utilize it to it's fullest. A goal is to minimize using other languages or external tools.
Thanks ghostdog, I had tried to find that info before but couldn't find or make sense of what I was reading, I guess. The bash man-page is like an epic...
This string won't work if there are any punctuation marks or characters outside of the english alphabet.
Try b=${a//[^0-9]/} instead.
But even this would lead to problems if there were more than one set of numbers in the string. Something like "ifeelfat398poundsand15ounces." would give you an output of "39815".
It would be nice if we could use full regex expressions inside of parameter substitution. Does anyone know if it's possible?
I am having the same problem....sort of. I want to extract a combination of character if they exist. If they don't exist, I want nothing. My problem is that if my pattern doesn't exist, I get the whole line returned.
if I have .....AA9999999999999999....., I want AA9999999999999999
if I have ............................, I want nothing.
where AA9999999999999999 is 2 capital alphas followed by 16 numerics.
I use 's/.*\(AA[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]\).*/\1/'
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.