LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Regex find first 5-7 occurrences of a set of digits within a string (https://www.linuxquestions.org/questions/programming-9/regex-find-first-5-7-occurrences-of-a-set-of-digits-within-a-string-4175453125/)

metallica1973 03-07-2013 09:23 AM

Regex find first 5-7 occurrences of a set of digits within a string
 
Using these strings as an example:
Code:

<a onclick="doShowCHys=1;ShowWindowN(0,'/daman/man.php?asv4=145148&amp;playTogether=True',960,540,943437);return false;" title="">
<a onclick="doShowCHys=1;ShowWindowN(0,'/daman/man.php?asv4=1451486&amp;playTogether=True',960,540,94343);return false;" title="">
<a onclick="doShowCHys=1;ShowWindowN(0,'/daman/man.php?asv4=1451489&amp;playTogether=True',960,540,94343);return false;" title="">
<a onclick="doShowCHys=1;ShowWindowN(0,'/daman/man.php?asv4=45148&amp;playTogether=True',960,540,94343);return false;" title="">

Using a regular expression, how can I extract just the first 5-7 digits of a string(anywhere in the string)and end there? So in this case I only want to print out only the "first" set of 5-7 digits of a string which would give me an output of:
Code:

145148
1451486
1451489
45148

and "not" several sets on the same string
Code:

145148 943437
45148 94343

I tried:
Code:

\d{5,7}
and it grabs every occurrence on the same line??

druuna 03-07-2013 09:40 AM

Give this a try:
Code:

sed 's/.*=\([0-9]\{5,7\}\).*/\1/' infile

metallica1973 03-07-2013 09:51 AM

I apologize, I meant using a regular expression so,using a regex expression, how can I extract just the first 5-7 digits of a string(anywhere in the string)and end there? So in this case I only want to print out only the "first" set of 5-7 digits of a string which would give me an output of:
Code:

145148
1451486
1451489
45148


druuna 03-07-2013 09:54 AM

Quote:

Originally Posted by metallica1973 (Post 4906788)
I apologize, I mean using a regular expression so,using regex espression, how can I extract just the first 5-7 digits of a string(anywhere in the string)and end there? So in this case I only want to print out only the "first" set of 5-7 digits of a string which would give me an output of:

Please give an appropriate example.

The solution I gave does use a regexp: [0-9]{5,7} -> any number, 5 to 7 times.

metallica1973 03-07-2013 10:04 AM

Many thanks Druuna,

I modified my original post to reflect what I need but a quick summary:

Sample Strings
Code:

<a onclick="doShowCHys=1;ShowWindowN(0,'/daman/man.php?asv4=145148&amp;playTogether=True',960,540,943437);return false;" title="">
<a onclick="doShowCHys=1;ShowWindowN(0,'/daman/man.php?asv4=1451486&amp;playTogether=True',960,540,94343);return false;" title="">
<a onclick="doShowCHys=1;ShowWindowN(0,'/daman/man.php?asv4=1451489&amp;playTogether=True',960,540,94343);return false;" title="">
<a onclick="doShowCHys=1;ShowWindowN(0,'/daman/man.php?asv4=45148&amp;playTogether=True',960,540,94343);return false;" title="">

Want I want to get out of my regular expression:
Code:

145148
1451486
1451489
45148

and "not" several sets on the same string
Code:

145148 943437
45148 94343

I tried:
Code:

\d{5,7}
and it grabs every occurrence on the same line??

druuna 03-07-2013 10:22 AM

Sorry, I probably don't get it.

Code:

$ cat foobar
<a onclick="doShowCHys=1;ShowWindowN(0,'/daman/man.php?asv4=145148&amp;playTogether=True',960,540,943437);return false;" title="">
<a onclick="doShowCHys=1;ShowWindowN(0,'/daman/man.php?asv4=1451486&amp;playTogether=True',960,540,94343);return false;" title="">
<a onclick="doShowCHys=1;ShowWindowN(0,'/daman/man.php?asv4=1451489&amp;playTogether=True',960,540,94343);return false;" title="">
<a onclick="doShowCHys=1;ShowWindowN(0,'/daman/man.php?asv4=45148&amp;playTogether=True',960,540,94343);return false;" title="">
$ sed 's/.*=\([0-9]\{5,7\}\).*/\1/' foobar
145148
1451486
1451489
45148

But I guess, reading your replies, that this solution isn't what you are after.

danielbmartin 03-07-2013 10:49 AM

Quote:

Originally Posted by druuna (Post 4906780)
Give this a try:
Code:

sed 's/.*=\([0-9]\{5,7\}\).*/\1/' infile

The method of druuna (above) works perfectly for me, yet the OP is not satisfied.
Is there a communication failure?

Daniel B. Martin

grail 03-07-2013 10:50 AM

Or maybe grep:
Code:

grep -oP '(?<==)\d{5,7}' file

danielbmartin 03-07-2013 10:58 AM

Quote:

Originally Posted by grail (Post 4906830)
Code:

grep -oP '(?<==)\d{5,7}' file

Concise and correct. Superb! Surpasses my understanding. Please explain.

Daniel B. Martin

metallica1973 03-07-2013 11:17 AM

Sorry,I should have added that I am using Pythons and the "re" module. When I attempt to use the regex expression:
Code:

.*=\([0-9]\{5,7\}\).*/\1/
using this site to test the regex expression, it does not find what I want.
[url]http://gskinner.com/RegExr/[url]
Try it.

metallica1973 03-07-2013 11:17 AM

Sorry,I should have added that I am using Python and the "re" module. When I attempt to use the regex expression:
Code:

.*=\([0-9]\{5,7\}\).*/\1/
using this site to test the regex expression, it does not find what I want. I also tried:
Code:

[0-9]{5,7}
\d{5,7}

and it finds all the occurences. I see that you used sed, could that be my issues or ignorance?

http://gskinner.com/RegExr/

Try it.

grail 03-07-2013 11:43 AM

Hey Daniel ... Pretty simple except for the look behind part really.

(?<==) - This says to look in front of the rest of the matching regex and look for an equals (=) sign, but as we are only looking for it, it will not be included in the final output

The other newish part may also be the -P option for grep which is to use Perl type regular expressions.

To OP ... using the site you provided, the follow works just fine:
Code:

(?<==)\d{5,7}

grail 03-07-2013 11:44 AM

Hey Daniel ... Pretty simple except for the look behind part really.

(?<==) - This says to look in front of the rest of the matching regex and look for an equals (=) sign, but as we are only looking for it, it will not be included in the final output

The other newish part may also be the -P option for grep which is to use Perl type regular expressions.

To OP ... using the site you provided, the follow works just fine:
Code:

(?<==)\d{5,7}

metallica1973 03-07-2013 12:01 PM

Awesome
Code:

(?<==)\d{5,7}
it worked and many thanks to everyone for enlightening me on the other stuff. Regex's always gets the best of me.

danielbmartin 03-07-2013 12:11 PM

[QUOTE=grail;4906864]
Quote:

(?<==) - This says to look in front of the rest of the matching regex and look for an equals (=) sign ...[/code]
Thank you for this explanation. Now I see that your solution works for the sample file provided by the OP.

I interpret the problem statement this way:
Extract the first numeric string in each line which is of length 5, 6, or 7.
(No reliance on an equals sign.)

If possible, modify your solution to handle this InFile ...
Code:

this is9the way44the world123456ends35
not 54321 with a444444bang 42 but a9whimper

The desired OutFile is ...
Code:

123456
54321

Daniel B. Martin


All times are GMT -5. The time now is 03:19 AM.