LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (http://www.linuxquestions.org/questions/linux-general-1/)
-   -   Using grep or sed to return a regex match (http://www.linuxquestions.org/questions/linux-general-1/using-grep-or-sed-to-return-a-regex-match-894436/)

davee 07-29-2011 03:50 AM

Using grep or sed to return a regex match
 
Hi folks,

I'm working on a script that I want to search through a file for a regex match, and store each different match in an array. For instance, if I have a regex to search for an IP address, I would want to store each unique IP address found in the script to the array.

For this reason, I'm trying to find a way that, presumably with grep (or sed?), I can return the match found rather than the full line of text the line was found in.

Is this possible? I could strip the match from the line matched, but this would lead to extra complication with multiple matches on a single line. Am I missing something obvious?

Thanks in advance...

devnull10 07-29-2011 04:13 AM

Check whether your implementation of grep supports the "-o" flag.

davee 07-29-2011 07:45 AM

Quote:

Originally Posted by devnull10 (Post 4428313)
Check whether your implementation of grep supports the "-o" flag.

Unfortunately not. I need it to run on Linux and Solaris:

svr:user$ grep -o hello email.txt
grep: illegal option -- o
usage: grep [-[[AB] ]<num>] [-[CEFGVchilnqsvwx]] [-[ef]] <expr> [<files...>]

colucix 07-29-2011 07:53 AM

Using sed:
Code:

array=( $(sed 's/[^0-9]*\([0-9]\+\.[0-9]\+\.[0-9]\+.[0-9]\+\).*/\1/' file | uniq) )
Which shell are you using? The array assignment above and the command substitution syntax work in bash/ksh.

crts 07-29-2011 07:56 AM

Hi,

try this:
Code:

sed -nr "s/.*(PATTERN_TO_MATCH).*/\1/p" file
I used double quotes so that you can replace 'PATTERN_TO_MATCH' with a variable if you need it. Keep in mind, that you will have to escape the dots in an IP address, e.g.
1.2.3.4

must become
1\.2\.3\.4

colucix 07-29-2011 08:01 AM

@crts: good catch for the -n /p option. Regarding the -r option, it's not available on Solaris' sed (if I remember well), so that we have to escape parentheses to let them work as intended. The only thing I cannot solve using this sed approach is the presence of multiple IP addresses on the same line (if any).

crts 07-29-2011 08:57 AM

Quote:

Originally Posted by colucix (Post 4428450)
Regarding the -r option, it's not available on Solaris' sed

I did not know that. I also did not consider the possibility of multiple IP's on the same line. Let's say we have this file:
Code:

some junk 1.2.3.4 some more junk with numbers 2.3.4.5 eol
some junk 3.4.5.6 some more junk eol
some junk 4.5.6.7 with equal ip on same line 4.5.6.7 eol
some repetition 1.2.3.4 some more junk with numbers 2.3.4.5 eol
lots of ip 5.6.7.8 in 7.8.9.0 this 8.9.0.11 line 12.34.56.89 eol
(duplicate)lots of ip 5.6.7.8 in 7.8.9.0 this 8.9.0.11 line 12.34.56.89

With GNU sed we can handle it:
Code:

sed -rn 's/[^0-9]*(([0-9]+\.){3}[0-9]+)/\1\n/;T;P;D' file
# or without the -r option
sed -n 's/[^0-9]*\(\([0-9]\+\.\)\{3\}[0-9]\+\)/\1\n/;T;P;D' file

However, I am not sure about the sed capabilities on Solaris. So here is another solution with all GNU extensions disabled:
Code:

sed --posix -n 's/[^0-9]*\([0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*\)/\1\n/;t a;b;:a P;D' file
I know, it's ugly. With the --posix option it wouldn't even accept the '+' quantifier.

So we finally get something like:
Code:

array=( $(sed --posix -n 's/[^0-9]*\([0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*\)/\1\n/;t a;b;:a P;D' file | sort -u) )

davee 08-02-2011 02:48 AM

Quote:

Originally Posted by crts (Post 4428511)
I know, it's ugly. With the --posix option it wouldn't even accept the '+' quantifier.

Welcome to my world!

Thanks for the response - very comprehensive. Script now working; I appreciate everyone's time on this.
Davee


All times are GMT -5. The time now is 04:23 AM.