LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-07-2013, 09:23 AM   #1
metallica1973
Senior Member
 
Registered: Feb 2003
Location: Washington D.C
Posts: 2,190

Rep: Reputation: 60
Regex find first 5-7 occurrences of a set of digits within a string


Using these strings as an example:
Code:
<a onclick="doShowCHys=1;ShowWindowN(0,'/daman/man.php?asv4=145148&amp;playTogether=True',960,540,943437);return false;" title="">
<a onclick="doShowCHys=1;ShowWindowN(0,'/daman/man.php?asv4=1451486&amp;playTogether=True',960,540,94343);return false;" title="">
<a onclick="doShowCHys=1;ShowWindowN(0,'/daman/man.php?asv4=1451489&amp;playTogether=True',960,540,94343);return false;" title="">
<a onclick="doShowCHys=1;ShowWindowN(0,'/daman/man.php?asv4=45148&amp;playTogether=True',960,540,94343);return false;" title="">
Using a regular expression, how can I extract just the first 5-7 digits of a string(anywhere in the string)and end there? So in this case I only want to print out only the "first" set of 5-7 digits of a string which would give me an output of:
Code:
145148
1451486
1451489
45148
and "not" several sets on the same string
Code:
145148 943437
45148 94343
I tried:
Code:
\d{5,7}
and it grabs every occurrence on the same line??

Last edited by metallica1973; 03-07-2013 at 11:12 AM.
 
Old 03-07-2013, 09:40 AM   #2
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Give this a try:
Code:
sed 's/.*=\([0-9]\{5,7\}\).*/\1/' infile
 
2 members found this post helpful.
Old 03-07-2013, 09:51 AM   #3
metallica1973
Senior Member
 
Registered: Feb 2003
Location: Washington D.C
Posts: 2,190

Original Poster
Rep: Reputation: 60
I apologize, I meant using a regular expression so,using a regex expression, how can I extract just the first 5-7 digits of a string(anywhere in the string)and end there? So in this case I only want to print out only the "first" set of 5-7 digits of a string which would give me an output of:
Code:
145148
1451486
1451489
45148

Last edited by metallica1973; 03-07-2013 at 10:02 AM.
 
Old 03-07-2013, 09:54 AM   #4
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Quote:
Originally Posted by metallica1973 View Post
I apologize, I mean using a regular expression so,using regex espression, how can I extract just the first 5-7 digits of a string(anywhere in the string)and end there? So in this case I only want to print out only the "first" set of 5-7 digits of a string which would give me an output of:
Please give an appropriate example.

The solution I gave does use a regexp: [0-9]{5,7} -> any number, 5 to 7 times.
 
Old 03-07-2013, 10:04 AM   #5
metallica1973
Senior Member
 
Registered: Feb 2003
Location: Washington D.C
Posts: 2,190

Original Poster
Rep: Reputation: 60
Many thanks Druuna,

I modified my original post to reflect what I need but a quick summary:

Sample Strings
Code:
<a onclick="doShowCHys=1;ShowWindowN(0,'/daman/man.php?asv4=145148&amp;playTogether=True',960,540,943437);return false;" title="">
<a onclick="doShowCHys=1;ShowWindowN(0,'/daman/man.php?asv4=1451486&amp;playTogether=True',960,540,94343);return false;" title="">
<a onclick="doShowCHys=1;ShowWindowN(0,'/daman/man.php?asv4=1451489&amp;playTogether=True',960,540,94343);return false;" title="">
<a onclick="doShowCHys=1;ShowWindowN(0,'/daman/man.php?asv4=45148&amp;playTogether=True',960,540,94343);return false;" title="">
Want I want to get out of my regular expression:
Code:
145148
1451486
1451489
45148
and "not" several sets on the same string
Code:
145148 943437
45148 94343
I tried:
Code:
\d{5,7}
and it grabs every occurrence on the same line??

Last edited by metallica1973; 03-07-2013 at 10:09 AM.
 
Old 03-07-2013, 10:22 AM   #6
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Sorry, I probably don't get it.

Code:
$ cat foobar 
<a onclick="doShowCHys=1;ShowWindowN(0,'/daman/man.php?asv4=145148&amp;playTogether=True',960,540,943437);return false;" title="">
<a onclick="doShowCHys=1;ShowWindowN(0,'/daman/man.php?asv4=1451486&amp;playTogether=True',960,540,94343);return false;" title="">
<a onclick="doShowCHys=1;ShowWindowN(0,'/daman/man.php?asv4=1451489&amp;playTogether=True',960,540,94343);return false;" title="">
<a onclick="doShowCHys=1;ShowWindowN(0,'/daman/man.php?asv4=45148&amp;playTogether=True',960,540,94343);return false;" title="">
$ sed 's/.*=\([0-9]\{5,7\}\).*/\1/' foobar 
145148
1451486
1451489
45148
But I guess, reading your replies, that this solution isn't what you are after.
 
Old 03-07-2013, 10:49 AM   #7
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by druuna View Post
Give this a try:
Code:
sed 's/.*=\([0-9]\{5,7\}\).*/\1/' infile
The method of druuna (above) works perfectly for me, yet the OP is not satisfied.
Is there a communication failure?

Daniel B. Martin
 
Old 03-07-2013, 10:50 AM   #8
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
Or maybe grep:
Code:
grep -oP '(?<==)\d{5,7}' file
 
2 members found this post helpful.
Old 03-07-2013, 10:58 AM   #9
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by grail View Post
Code:
grep -oP '(?<==)\d{5,7}' file
Concise and correct. Superb! Surpasses my understanding. Please explain.

Daniel B. Martin
 
Old 03-07-2013, 11:17 AM   #10
metallica1973
Senior Member
 
Registered: Feb 2003
Location: Washington D.C
Posts: 2,190

Original Poster
Rep: Reputation: 60
Sorry,I should have added that I am using Pythons and the "re" module. When I attempt to use the regex expression:
Code:
.*=\([0-9]\{5,7\}\).*/\1/
using this site to test the regex expression, it does not find what I want.
[url]http://gskinner.com/RegExr/[url]
Try it.
 
Old 03-07-2013, 11:17 AM   #11
metallica1973
Senior Member
 
Registered: Feb 2003
Location: Washington D.C
Posts: 2,190

Original Poster
Rep: Reputation: 60
Sorry,I should have added that I am using Python and the "re" module. When I attempt to use the regex expression:
Code:
.*=\([0-9]\{5,7\}\).*/\1/
using this site to test the regex expression, it does not find what I want. I also tried:
Code:
[0-9]{5,7}
\d{5,7}
and it finds all the occurences. I see that you used sed, could that be my issues or ignorance?

http://gskinner.com/RegExr/

Try it.

Last edited by metallica1973; 03-07-2013 at 11:25 AM.
 
1 members found this post helpful.
Old 03-07-2013, 11:43 AM   #12
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
Hey Daniel ... Pretty simple except for the look behind part really.

(?<==) - This says to look in front of the rest of the matching regex and look for an equals (=) sign, but as we are only looking for it, it will not be included in the final output

The other newish part may also be the -P option for grep which is to use Perl type regular expressions.

To OP ... using the site you provided, the follow works just fine:
Code:
(?<==)\d{5,7}
 
1 members found this post helpful.
Old 03-07-2013, 11:44 AM   #13
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
Hey Daniel ... Pretty simple except for the look behind part really.

(?<==) - This says to look in front of the rest of the matching regex and look for an equals (=) sign, but as we are only looking for it, it will not be included in the final output

The other newish part may also be the -P option for grep which is to use Perl type regular expressions.

To OP ... using the site you provided, the follow works just fine:
Code:
(?<==)\d{5,7}
 
1 members found this post helpful.
Old 03-07-2013, 12:01 PM   #14
metallica1973
Senior Member
 
Registered: Feb 2003
Location: Washington D.C
Posts: 2,190

Original Poster
Rep: Reputation: 60
Awesome
Code:
(?<==)\d{5,7}
it worked and many thanks to everyone for enlightening me on the other stuff. Regex's always gets the best of me.
 
Old 03-07-2013, 12:11 PM   #15
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
[QUOTE=grail;4906864]
Quote:
(?<==) - This says to look in front of the rest of the matching regex and look for an equals (=) sign ...[/code]
Thank you for this explanation. Now I see that your solution works for the sample file provided by the OP.

I interpret the problem statement this way:
Extract the first numeric string in each line which is of length 5, 6, or 7.
(No reliance on an equals sign.)

If possible, modify your solution to handle this InFile ...
Code:
this is9the way44the world123456ends35 
not 54321 with a444444bang 42 but a9whimper
The desired OutFile is ...
Code:
123456
54321
Daniel B. Martin
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Bash Shell Script to check if a string has only alphabets and digits. aswani Programming 8 08-16-2012 09:49 AM
Perl - regular expression to match variable number of digits after string kcleveland Programming 4 12-08-2011 04:46 AM
[SOLVED] sed remove all occurrences in a string hattori.hanzo Linux - Newbie 5 11-22-2010 04:46 AM
Perl to find regex and print following 5 lines after regex casperdaghost Linux - Newbie 3 08-29-2010 08:08 PM
[SOLVED] regex match string from start to find unique combinations fukawi2 Programming 6 02-11-2010 05:32 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:48 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration