LinuxQuestions.org - [SOLVED] Using grep or sed to return a regex match

- Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)

- - Using grep or sed to return a regex match (https://www.linuxquestions.org/questions/linux-general-1/using-grep-or-sed-to-return-a-regex-match-894436/)

Using grep or sed to return a regex match

Hi folks,

I'm working on a script that I want to search through a file for a regex match, and store each different match in an array. For instance, if I have a regex to search for an IP address, I would want to store each unique IP address found in the script to the array.

For this reason, I'm trying to find a way that, presumably with grep (or sed?), I can return the match found rather than the full line of text the line was found in.

Is this possible? I could strip the match from the line matched, but this would lead to extra complication with multiple matches on a single line. Am I missing something obvious?

Thanks in advance...

Check whether your implementation of grep supports the "-o" flag.

Quote:

Originally Posted by devnull10 (Post 4428313)

Check whether your implementation of grep supports the "-o" flag.

Unfortunately not. I need it to run on Linux and Solaris:

svr:user$ grep -o hello email.txt
grep: illegal option -- o
usage: grep [-[[AB] ]<num>] [-[CEFGVchilnqsvwx]] [-[ef]] <expr> [<files...>]

Using sed:

Code:

array=( $(sed 's/[^0-9]*$[0-9]\+\.[0-9]\+\.[0-9]\+.[0-9]\+$.*/\1/' file | uniq) )

Which shell are you using? The array assignment above and the command substitution syntax work in bash/ksh.

Hi,

try this:

Code:

sed -nr "s/.*(PATTERN_TO_MATCH).*/\1/p" file

I used double quotes so that you can replace 'PATTERN_TO_MATCH' with a variable if you need it. Keep in mind, that you will have to escape the dots in an IP address, e.g.
1.2.3.4

must become
1\.2\.3\.4

@crts: good catch for the -n /p option. Regarding the -r option, it's not available on Solaris' sed (if I remember well), so that we have to escape parentheses to let them work as intended. The only thing I cannot solve using this sed approach is the presence of multiple IP addresses on the same line (if any).

Quote:

Originally Posted by colucix (Post 4428450)

Regarding the -r option, it's not available on Solaris' sed

I did not know that. I also did not consider the possibility of multiple IP's on the same line. Let's say we have this file:

Code:

some junk 1.2.3.4 some more junk with numbers 2.3.4.5 eol

some junk 3.4.5.6 some more junk eol

some junk 4.5.6.7 with equal ip on same line 4.5.6.7 eol

some repetition 1.2.3.4 some more junk with numbers 2.3.4.5 eol

lots of ip 5.6.7.8 in 7.8.9.0 this 8.9.0.11 line 12.34.56.89 eol

(duplicate)lots of ip 5.6.7.8 in 7.8.9.0 this 8.9.0.11 line 12.34.56.89

With GNU sed we can handle it:

Code:

sed -rn 's/[^0-9]*(([0-9]+\.){3}[0-9]+)/\1\n/;T;P;D' file

# or without the -r option

sed -n 's/[^0-9]*\(\([0-9]\+\.\)\{3\}[0-9]\+\)/\1\n/;T;P;D' file

However, I am not sure about the sed capabilities on Solaris. So here is another solution with all GNU extensions disabled:

Code:

sed --posix -n 's/[^0-9]*$[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*$/\1\n/;t a;b;:a P;D' file

I know, it's ugly. With the --posix option it wouldn't even accept the '+' quantifier.

So we finally get something like:

Code:

array=( $(sed --posix -n 's/[^0-9]*\([0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*\.[0-9][0-9]*\)/\1\n/;t a;b;:a P;D' file | sort -u) )

Quote:

Originally Posted by crts (Post 4428511)

I know, it's ugly. With the --posix option it wouldn't even accept the '+' quantifier.

Welcome to my world!

Thanks for the response - very comprehensive. Script now working; I appreciate everyone's time on this.
Davee