LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   printing pattern match and not whole line that matches pattern (http://www.linuxquestions.org/questions/programming-9/printing-pattern-match-and-not-whole-line-that-matches-pattern-299122/)

Avatar33 03-08-2005 09:28 AM

printing pattern match and not whole line that matches pattern
 
Hi all.

I've been jumping between the manuals of grep, awk and sed to find a way to print the match of a pattern.
Grep seems able to print the entire line that matches the regular expression, but I want to print only the string that matches the regular expression. I could not find anything in awk or sed manuals.

For example I have a html file that has many links in it. I want to output the location of the links to a plain text file. So I would need to make a regular expression similar to the following:
Code:

href="[^"\r\n]*"
that matches everything between the quotes of the href.
I could output this to a file and then remove the href part.

What tool should I be using to do this?

Thanks in advance.
Avatar

druuna 03-08-2005 10:09 AM

Hi,

Something like this maybe:

echo '<A HREF="xdpyinfo.1.html">xdpyinfo(1)</A>' | sed 's/.*HREF="\(.*\)".*/\1/'

$ echo '<A HREF="xdpyinfo.1.html">xdpyinfo(1)</A>' | sed 's/.*HREF="\(.*\)".*/\1/'
xdpyinfo.1.html

The \( , \) and \1 are the key. The \1 represents and print that what is found between the \( and \) in the searchstring.

Hope this helps.

Avatar33 03-08-2005 12:51 PM

That's really cool.

I've gota get into a sed manual/tutorial one of these days :-)

Thanks
Avatar

95se 03-08-2005 04:33 PM

If your using grep,

grep -o PATTERN

The -o option tells it to output only the matching part of the string. Check out man grep for more info.

wapcaplet 03-09-2005 01:24 AM

Handy one-liners for sed is a nice reference, too. I use it a lot :)

iggi 11-05-2007 03:45 AM

Hi all,

Quote:

$ echo '<A HREF="xdpyinfo.1.html">xdpyinfo(1)</A>' | sed 's/.*HREF="\(.*\)".*/\1/'
xdpyinfo.1.html
Exactly what I was looking for :-) Only problem: sed prints every line also the ones not mathing. Using -n option suppresses "everything". How can I solve this?

grep -o is nice but doesn't offer the flexibility of using \( \) which allows you to match something bigger but print only part of it.

Thanks in advance!

Dirk

ntubski 11-05-2007 01:31 PM

Try
Code:

sed -n 's/.*HREF="\(.*\)".*/\1/p'

pixellany 11-05-2007 01:38 PM

My favorite sed and awk tutorials here: http://www.grymoire.com/Unix/

druuna 11-05-2007 01:38 PM

Hi,

The sed part used is just a search and print, and is indeed done on all lines in a file.

It's not entirely clear to me what you want to match and what you do not want to match, but the following example should get you going again:
Code:

$ cat sed.infile
a line
another line
<A HREF="xdpyinfo.0.html">xdpyinfo(0)</A>
<A HREF="xdpyinfo.1.html">xdpyinfo(1)</A>
line in the middle
<A HREF="xdpyinfo.2.html">xdpyinfo(2)</A>
<A HREF="xdpyinfo.3.html">xdpyinfo(3)</A>
last line


$ sed -n '/xdpyinfo/s/.*HREF="\(.*\)".*/\1/p' sed.infile
xdpyinfo.0.html
xdpyinfo.1.html
xdpyinfo.2.html
xdpyinfo.3.html

Hope this helps.

angrybanana 11-05-2007 02:46 PM

awk:
Code:

awk -F'"' 'NR>1&&$0=$2' RS='HREF=' file

iggi 11-06-2007 04:38 AM

Thanks guys! Problem solved: -n in combination with /p. Will have a look at those tutorials... looking good!

Dirk

loc.nguyen 05-05-2009 02:50 PM

Similar problems
 
Please help this:

cat aa
<a href=#Say,123> >>Hi<<
<a href=#Say,234> >>Hello<< <a href=#Say,345> >>World<<

Code:

cat aa | sed -n 's/.*href=#Say,\(.*\)>.*/\1/p'
123> >
345> >

What is sed or awk command to get like this:
123
234
345

If this work then it is fine but the above is referred.
cat bb
<a href=#Say,123> >>Hi<<
<a href=#Say,234> >>Hello<<

to:
123
234

Kenhelm 05-05-2009 10:28 PM

Using GNU sed
Code:

# Patterns such as [^<]*< limit "greedy matching"
sed -n 's/<a href=#Say,\([^>]*\)>[^<]*<</\1/gp' aa
123
234 345

# Adding 's/< </<\n</g' converts the space into a newline
sed -n 's/< </<\n</g; s/<a href=#Say,\([^>]*\)>[^<]*<</\1/gp' aa
123
234
345


loc.nguyen 05-06-2009 07:17 AM

It works
 
It works but then I have another problems so I wrote in awk then. Thanks


All times are GMT -5. The time now is 06:57 AM.