Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place. |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
07-29-2011, 03:50 AM
|
#1
|
|
Member
Registered: Oct 2002
Location: Ayrshire, Scotland
Distribution: Suse(home) RHEL (Work)
Posts: 262
Rep:
|
Using grep or sed to return a regex match
Hi folks,
I'm working on a script that I want to search through a file for a regex match, and store each different match in an array. For instance, if I have a regex to search for an IP address, I would want to store each unique IP address found in the script to the array.
For this reason, I'm trying to find a way that, presumably with grep (or sed?), I can return the match found rather than the full line of text the line was found in.
Is this possible? I could strip the match from the line matched, but this would lead to extra complication with multiple matches on a single line. Am I missing something obvious?
Thanks in advance...
|
|
|
|
|
Click here to see the post LQ members have rated as the most helpful post in this thread.
|
07-29-2011, 04:13 AM
|
#2
|
|
Member
Registered: Jan 2010
Location: Lancashire
Distribution: Slackware Stable
Posts: 527
Rep: 
|
Check whether your implementation of grep supports the "-o" flag.
|
|
|
|
07-29-2011, 07:45 AM
|
#3
|
|
Member
Registered: Oct 2002
Location: Ayrshire, Scotland
Distribution: Suse(home) RHEL (Work)
Posts: 262
Original Poster
Rep:
|
Quote:
Originally Posted by devnull10
Check whether your implementation of grep supports the "-o" flag.
|
Unfortunately not. I need it to run on Linux and Solaris:
svr:user$ grep -o hello email.txt
grep: illegal option -- o
usage: grep [-[[AB] ]<num>] [-[CEFGVchilnqsvwx]] [-[ef]] <expr> [<files...>]
|
|
|
|
07-29-2011, 07:53 AM
|
#4
|
|
Moderator
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.4 OpenSuSE 12.2
Posts: 9,896
|
Using sed:
Code:
array=( $(sed 's/[^0-9]*\([0-9]\+\.[0-9]\+\.[0-9]\+.[0-9]\+\).*/\1/' file | uniq) )
Which shell are you using? The array assignment above and the command substitution syntax work in bash/ksh.
|
|
|
1 members found this post helpful.
|
07-29-2011, 07:56 AM
|
#5
|
|
Senior Member
Registered: Jan 2010
Posts: 1,604
|
Hi,
try this:
Code:
sed -nr "s/.*(PATTERN_TO_MATCH).*/\1/p" file
I used double quotes so that you can replace 'PATTERN_TO_MATCH' with a variable if you need it. Keep in mind, that you will have to escape the dots in an IP address, e.g.
1.2.3.4
must become
1\.2\.3\.4
|
|
|
|
07-29-2011, 08:01 AM
|
#6
|
|
Moderator
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.4 OpenSuSE 12.2
Posts: 9,896
|
@crts: good catch for the -n /p option. Regarding the -r option, it's not available on Solaris' sed (if I remember well), so that we have to escape parentheses to let them work as intended. The only thing I cannot solve using this sed approach is the presence of multiple IP addresses on the same line (if any).
Last edited by colucix; 07-29-2011 at 08:02 AM.
|
|
|
|
07-29-2011, 08:57 AM
|
#7
|
|
Senior Member
Registered: Jan 2010
Posts: 1,604
|
Quote:
Originally Posted by colucix
Regarding the -r option, it's not available on Solaris' sed
|
I did not know that. I also did not consider the possibility of multiple IP's on the same line. Let's say we have this file:
Code:
some junk 1.2.3.4 some more junk with numbers 2.3.4.5 eol
some junk 3.4.5.6 some more junk eol
some junk 4.5.6.7 with equal ip on same line 4.5.6.7 eol
some repetition 1.2.3.4 some more junk with numbers 2.3.4.5 eol
lots of ip 5.6.7.8 in 7.8.9.0 this 8.9.0.11 line 12.34.56.89 eol
(duplicate)lots of ip 5.6.7.8 in 7.8.9.0 this 8.9.0.11 line 12.34.56.89
With GNU sed we can handle it:
Code:
sed -rn 's/[^0-9]*(([0-9]+\.){3}[0-9]+)/\1\n/;T;P;D' file
# or without the -r option
sed -n 's/[^0-9]*\(\([0-9]\+\.\)\{3\}[0-9]\+\)/\1\n/;T;P;D' file
However, I am not sure about the sed capabilities on Solaris. So here is another solution with all GNU extensions disabled:
Code:
sed --posix -n 's/[^0-9]*\([0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*\)/\1\n/;t a;b;:a P;D' file
I know, it's ugly. With the --posix option it wouldn't even accept the '+' quantifier.
So we finally get something like:
Code:
array=( $(sed --posix -n 's/[^0-9]*\([0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*\)/\1\n/;t a;b;:a P;D' file | sort -u) )
|
|
|
2 members found this post helpful.
|
08-02-2011, 02:48 AM
|
#8
|
|
Member
Registered: Oct 2002
Location: Ayrshire, Scotland
Distribution: Suse(home) RHEL (Work)
Posts: 262
Original Poster
Rep:
|
Quote:
Originally Posted by crts
I know, it's ugly. With the --posix option it wouldn't even accept the '+' quantifier.
|
Welcome to my world!
Thanks for the response - very comprehensive. Script now working; I appreciate everyone's time on this.
Davee
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 01:59 AM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|