printing pattern match and not whole line that matches pattern

Avatar33 · 03-08-2005, 08:28 AM

Hi all.

I've been jumping between the manuals of grep, awk and sed to find a way to print the match of a pattern.
Grep seems able to print the entire line that matches the regular expression, but I want to print only the string that matches the regular expression. I could not find anything in awk or sed manuals.

For example I have a html file that has many links in it. I want to output the location of the links to a plain text file. So I would need to make a regular expression similar to the following:

Code:

href="[^"\r\n]*"

that matches everything between the quotes of the href.
I could output this to a file and then remove the href part.

What tool should I be using to do this?

Thanks in advance.
Avatar

druuna · 03-08-2005, 09:09 AM

Hi,

Something like this maybe:

echo '<A HREF="xdpyinfo.1.html">xdpyinfo(1)</A>' | sed 's/.*HREF="$.*$".*/\1/'

$ echo '<A HREF="xdpyinfo.1.html">xdpyinfo(1)</A>' | sed 's/.*HREF="$.*$".*/\1/'
xdpyinfo.1.html

The $ , $ and \1 are the key. The \1 represents and print that what is found between the $ and $ in the searchstring.

Hope this helps.

Avatar33 · 03-08-2005, 11:51 AM

That's really cool.

I've gota get into a sed manual/tutorial one of these days :-)

Thanks
Avatar

95se · 03-08-2005, 03:33 PM

If your using grep,

grep -o PATTERN

The -o option tells it to output only the matching part of the string. Check out man grep for more info.

wapcaplet · 03-09-2005, 12:24 AM

Handy one-liners for sed is a nice reference, too. I use it a lot

iggi · 11-05-2007, 02:45 AM

Hi all,

Quote:

$ echo '<A HREF="xdpyinfo.1.html">xdpyinfo(1)</A>' | sed 's/.*HREF="$.*$".*/\1/'
xdpyinfo.1.html

Exactly what I was looking for :-) Only problem: sed prints every line also the ones not mathing. Using -n option suppresses "everything". How can I solve this?

grep -o is nice but doesn't offer the flexibility of using  which allows you to match something bigger but print only part of it.

Thanks in advance!

Dirk

ntubski · 11-05-2007, 12:31 PM

Try

Code:

sed -n 's/.*HREF="\(.*\)".*/\1/p'

pixellany · 11-05-2007, 12:38 PM

My favorite sed and awk tutorials here: http://www.grymoire.com/Unix/

druuna · 11-05-2007, 12:38 PM

Hi,

The sed part used is just a search and print, and is indeed done on all lines in a file.

It's not entirely clear to me what you want to match and what you do not want to match, but the following example should get you going again:

Code:

$ cat sed.infile
a line
another line
<A HREF="xdpyinfo.0.html">xdpyinfo(0)</A>
<A HREF="xdpyinfo.1.html">xdpyinfo(1)</A>
line in the middle
<A HREF="xdpyinfo.2.html">xdpyinfo(2)</A>
<A HREF="xdpyinfo.3.html">xdpyinfo(3)</A>
last line


$ sed -n '/xdpyinfo/s/.*HREF="\(.*\)".*/\1/p' sed.infile 
xdpyinfo.0.html
xdpyinfo.1.html
xdpyinfo.2.html
xdpyinfo.3.html

Hope this helps.

angrybanana · 11-05-2007, 01:46 PM

awk:

Code:

awk -F'"' 'NR>1&&$0=$2' RS='HREF=' file

iggi · 11-06-2007, 03:38 AM

Thanks guys! Problem solved: -n in combination with /p. Will have a look at those tutorials... looking good!

Dirk

loc.nguyen · 05-05-2009, 01:50 PM

Please help this:

cat aa
<a href=#Say,123> >>Hi<<
<a href=#Say,234> >>Hello<< <a href=#Say,345> >>World<<

Code:

cat aa | sed -n 's/.*href=#Say,\(.*\)>.*/\1/p'

123> >
345> >

What is sed or awk command to get like this:
123
234
345

If this work then it is fine but the above is referred.
cat bb
<a href=#Say,123> >>Hi<<
<a href=#Say,234> >>Hello<<

to:
123
234

Kenhelm · 05-05-2009, 09:28 PM

Using GNU sed

Code:

# Patterns such as [^<]*< limit "greedy matching"
sed -n 's/<a href=#Say,\([^>]*\)>[^<]*<</\1/gp' aa
123
234 345

# Adding 's/< </<\n</g' converts the space into a newline
sed -n 's/< </<\n</g; s/<a href=#Say,\([^>]*\)>[^<]*<</\1/gp' aa
123
234
345

loc.nguyen · 05-06-2009, 06:17 AM

It works but then I have another problems so I wrote in awk then. Thanks