LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Regular expression question (https://www.linuxquestions.org/questions/programming-9/regular-expression-question-4175414740/)

dth4h 07-03-2012 04:35 PM

Regular expression question
 
I want to know how to pull the p and then the number(s) next to it out of these strings:

abcp1de
abcp21de32a

Out of the first one I want "p1" and out of the second one I want "p21"
and yes the strings will be all different, but they will always be just numbers and letters.

I tried to get a grep command that would do this, but I couldn't figure out the right syntax.

Note: If the p is not included and I just get the number that is fine, I would actually prefer that, but it has to be the number that is next to the p.

towheedm 07-03-2012 07:00 PM

Assuming p is always followed by one or more numbers and occurs only once per string:

You can see the required substring with grep:
Code:

echo abcp21de32a | grep --color p[0-9]*
Or use sed to return the substring:
Code:

echo abcp21de32a | sed 's,.*\(p[0-9]*\).*,\1,'
Hope it helps.

dth4h 07-03-2012 07:13 PM

Thanks! That worked. I am just curious, what does the --color grep option do? I tried the same grep command without that option and it gave me the same results.

towheedm 07-03-2012 07:18 PM

The --color option to grep shows the matched string in color. If it shows in color without the option, that's because it's already set in your BASH environment.

dth4h 07-03-2012 07:21 PM

Ok, that makes sense. Thanks again.

pixellany 07-03-2012 07:25 PM

the man page for grep will tell you about the --color option.

You really want just the numbers AFTER "P", so the sed command should look like this:
(This is a variant of what's already been suggested.)
Code:

sed -r 's/.*p([0-9]+).*/\1/'
Note that I use the -r flag to turn on extended regex rules---thus the () does not have to be escaped. I also use / as the delimiter.

danielbmartin 07-03-2012 07:39 PM

Quote:

Originally Posted by dth4h (Post 4718494)
If the p is not included and I just get the number that is fine, I would actually prefer that, but it has to be the number that is next to the p.

To get only the numeric string between "p" and the next alphabetic, rework towheedm's sed slightly:
Code:

sed -r 's/.*p([0-9]*).*/\1/' < $InFile
Daniel B. Martin

paw2012 07-03-2012 07:42 PM

what do I do
 
now what? this it what i get after rebooting. trying to upgrade to 17.
dropping to debug shell

pixellany 07-03-2012 07:47 PM

Quote:

Originally Posted by paw2012 (Post 4718611)
now what? this it what i go after rebooting. trying to upgrade to 17.
dropping to debug shell

???

Please start a new thread---and be sure to tell us what you are doing----by itself, the above makes no sense.

dth4h 07-03-2012 08:03 PM

Quote:

Originally Posted by pixellany (Post 4718595)
the man page for grep will tell you about the --color option.

You really want just the numbers AFTER "P", so the sed command should look like this:
(This is a variant of what's already been suggested.)
Code:

sed -r 's/.*p([0-9]+).*/\1/'
Note that I use the -r flag to turn on extended regex rules---thus the () does not have to be escaped. I also use / as the delimiter.

Yes this works better. Thanks.


Quote:

Originally Posted by danielbmartin (Post 4718608)
To get only the numeric string between "p" and the next alphabetic, rework towheedm's sed slightly:
Code:

sed -r 's/.*p([0-9]*).*/\1/' < $InFile
Daniel B. Martin

What does your line do that is different from towheedm's one?


Quote:

Originally Posted by paw2012 (Post 4718611)
now what? this it what i get after rebooting. trying to upgrade to 17.
dropping to debug shell

Random???

danielbmartin 07-03-2012 08:39 PM

Quote:

Originally Posted by dth4h (Post 4718636)
What does your line do that is different from towheedm's one?

Mine dropped the unwanted "p"... but pixellany beat me to the punch! Sorry for the duplication. Better too many answers than too few.

Daniel B. Martin

dth4h 07-03-2012 08:43 PM

Ahhh, lol ya that annoys me when that happens. I type this whole big thing (I type slow) and then I go back and see that someone else already beat me to the punch.

Anyway, thanks for your reply's and help.

danielbmartin 07-04-2012 08:22 AM

Uh-oh. Possible flaw.

I expanded the test input file to these three lines:
Code:

abcp1de
abcp21de32a
abcp21qp54b

Note that the third line has two p's. The "greedy" nature of sed causes several of the proposed solutions to deliver the numeric string following the second p. I don't think that is what the OP was expecting.

Daniel B. Martin

dth4h 07-04-2012 11:35 AM

Don't worry, the strings will never have more then one of the same letter in one string at one time. And if it does, my script should error out anyway.

Thanks for the info though.

pixellany 07-06-2012 06:12 AM

Quote:

Originally Posted by danielbmartin (Post 4719067)
Uh-oh. Possible flaw.

I expanded the test input file to these three lines:
Code:

abcp1de
abcp21de32a
abcp21qp54b

Note that the third line has two p's. The "greedy" nature of sed causes several of the proposed solutions to deliver the numeric string following the second p. I don't think that is what the OP was expecting.

Daniel B. Martin

Even though OP says it does not matter, let's note for completeness how to deal with this.

If there are two sequences on a line, and you want to match only the first one, then do this:
Code:

sed -r 's/[^p]*p([0-9]+).*/\1/'
Instead of searching for any number of characters followed by "p", search for any number of "not p"s


All times are GMT -5. The time now is 04:02 PM.