LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 03-19-2011, 10:34 PM   #1
ckinninger
LQ Newbie
 
Registered: Mar 2011
Posts: 4

Rep: Reputation: 0
Extract Multiple Strings per Line


Hello Guys,

Not sure if sed, awk or maybe even grep would work at all...

I need to extract all 16 and 17 digit numbers from a multiline text file. They need to stay on their original lines.

Some lines may have multiple string matches

Input file:

d-1 1234567890123456 random_text 2543210987654321 random 3234567890123456 4543210987654321
d-2 1234567890123456 25432109876543210 random_text
d-4 1234567890123456 random 2543210987654321
d-5 12345
d-6 random_text

Output


d-1 1234567890123456 2543210987654321 3234567890123456 4543210987654321
d-2 1234567890123456 25432109876543210
d-4 1234567890123456 2543210987654321
d-5
d-6

First string of input file is always starts with "d-" and I want to keep that if possible.

Can you guys help me get a string going? I've spent time reading on sed but no luck. Just keep first or last match.

Thanks.

Last edited by ckinninger; 03-19-2011 at 10:44 PM.
 
Old 03-19-2011, 10:49 PM   #2
savona
Member
 
Registered: Mar 2011
Location: Bellmawr, NJ
Distribution: Red Hat / Fedora
Posts: 194

Rep: Reputation: 50
You should be able to extract what you need with egrep. I haven't tested, this is just from my head.


egrep '([0-9]\{16\})|([0-9]\{17\})' filename
 
Old 03-19-2011, 10:56 PM   #3
savona
Member
 
Registered: Mar 2011
Location: Bellmawr, NJ
Distribution: Red Hat / Fedora
Posts: 194

Rep: Reputation: 50
Actually now that I think about it the first regular expression would catch both 16 and 17 digit strings. So this should find both:

grep '[0-9]\{16\}' file
 
Old 03-19-2011, 11:14 PM   #4
ckinninger
LQ Newbie
 
Registered: Mar 2011
Posts: 4

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by savona View Post
Actually now that I think about it the first regular expression would catch both 16 and 17 digit strings. So this should find both:

grep '[0-9]\{16\}' file
Thanks for the input...

What that's doing is finding the lines and saving the original lines if anything in that line is 16 digits.

I am trying to do is save the "d-123" number and follow that with any string matches on the same line. Strip out the garbage (non string matches).
 
Old 03-19-2011, 11:22 PM   #5
Telengard
Member
 
Registered: Apr 2007
Location: USA
Distribution: Kubuntu 8.04
Posts: 579
Blog Entries: 8

Rep: Reputation: 147Reputation: 147
I think this does what you want. You should definitely review the code and carefully test it before trusting it. I am no expert and there is most likely a better solution to be found.

Code:
$ gawk --re-interval '{ printf $1 " " ; for ( i = 2 ; i <= NF ; i ++ ) { if ( $i ~ /^[0-9]{16,17}$/ ) { printf $i " " } } printf "\n" }' input.txt
d-1 1234567890123456 2543210987654321 3234567890123456 4543210987654321
d-2 1234567890123456 2543210987654321
d-4 1234567890123456 2543210987654321
d-5
d-6
$
HTH

Edit

http://www.gnu.org/software/gawk/man...ode/index.html

Last edited by Telengard; 03-19-2011 at 11:25 PM.
 
Old 03-19-2011, 11:29 PM   #6
ckinninger
LQ Newbie
 
Registered: Mar 2011
Posts: 4

Original Poster
Rep: Reputation: 0
[QUOTE=Telengard;4296620]I think this does what you want. You should definitely review the code and carefully test it before trusting it. I am no expert and there is most likely a better solution to be found.

Code:
$ gawk --re-interval '{ printf $1 " " ; for ( i = 2 ; i <= NF ; i ++ ) { if ( $i ~ /^[0-9]{16,17}$/ ) { printf $i " " } } printf "\n" }' input.txt
d-1 1234567890123456 2543210987654321 3234567890123456 4543210987654321
d-2 1234567890123456 2543210987654321
d-4 1234567890123456 2543210987654321
d-5
d-6
$
Sweet! Worked great on my 6 line test file. Now going for the big test... will let you know. Thanks a lot.
 
Old 03-19-2011, 11:50 PM   #7
ckinninger
LQ Newbie
 
Registered: Mar 2011
Posts: 4

Original Poster
Rep: Reputation: 0
Thumbs up

[QUOTE=Telengard;4296620]
Code:
$ gawk --re-interval '{ printf $1 " " ; for ( i = 2 ; i <= NF ; i ++ ) { if ( $i ~ /^[0-9]{16,17}$/ ) { printf $i " " } } printf "\n" }' input.txt
$

Worked great. I really appreciate it. I thought I could handle it on my own but 6 hours later nothing. Now I can move forward.
 
Old 03-20-2011, 01:38 AM   #8
Telengard
Member
 
Registered: Apr 2007
Location: USA
Distribution: Kubuntu 8.04
Posts: 579
Blog Entries: 8

Rep: Reputation: 147Reputation: 147
Cool

Quote:
Originally Posted by ckinninger View Post
I really appreciate it.
I'm very pleased to know you found my post helpful. If this solution is acceptable, then please consider using the thread tools option to mark this thread solved.

Happy Linux-ing
 
  


Reply

Tags
awk, field, regular expressions


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] How can I extract strings based on character positions? btacuso Linux - Newbie 8 03-25-2010 01:31 PM
Extract lines containing some strings without affectting sequential order cgcamal Programming 7 11-07-2008 12:57 AM
Search and Replace with multiple-line strings ChristianNerds.com Programming 4 08-21-2005 03:32 PM
Helix seems to die when running Extract Strings abefroman Linux - Security 0 08-04-2005 10:30 AM
sed to extract multiple matches in a line? mhoch3 Linux - Software 8 08-01-2005 04:32 PM


All times are GMT -5. The time now is 04:53 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration