LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-08-2005, 08:28 AM   #1
Avatar33
Member
 
Registered: May 2003
Location: South Africa
Distribution: Ubuntu
Posts: 75

Rep: Reputation: 15
printing pattern match and not whole line that matches pattern


Hi all.

I've been jumping between the manuals of grep, awk and sed to find a way to print the match of a pattern.
Grep seems able to print the entire line that matches the regular expression, but I want to print only the string that matches the regular expression. I could not find anything in awk or sed manuals.

For example I have a html file that has many links in it. I want to output the location of the links to a plain text file. So I would need to make a regular expression similar to the following:
Code:
href="[^"\r\n]*"
that matches everything between the quotes of the href.
I could output this to a file and then remove the href part.

What tool should I be using to do this?

Thanks in advance.
Avatar
 
Old 03-08-2005, 09:09 AM   #2
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Hi,

Something like this maybe:

echo '<A HREF="xdpyinfo.1.html">xdpyinfo(1)</A>' | sed 's/.*HREF="\(.*\)".*/\1/'

$ echo '<A HREF="xdpyinfo.1.html">xdpyinfo(1)</A>' | sed 's/.*HREF="\(.*\)".*/\1/'
xdpyinfo.1.html

The \( , \) and \1 are the key. The \1 represents and print that what is found between the \( and \) in the searchstring.

Hope this helps.

Last edited by druuna; 03-08-2005 at 12:03 PM.
 
Old 03-08-2005, 11:51 AM   #3
Avatar33
Member
 
Registered: May 2003
Location: South Africa
Distribution: Ubuntu
Posts: 75

Original Poster
Rep: Reputation: 15
That's really cool.

I've gota get into a sed manual/tutorial one of these days :-)

Thanks
Avatar
 
Old 03-08-2005, 03:33 PM   #4
95se
Member
 
Registered: Apr 2002
Location: Windsor, ON, CA
Distribution: Ubuntu
Posts: 740

Rep: Reputation: 32
If your using grep,

grep -o PATTERN

The -o option tells it to output only the matching part of the string. Check out man grep for more info.
 
1 members found this post helpful.
Old 03-09-2005, 12:24 AM   #5
wapcaplet
LQ Guru
 
Registered: Feb 2003
Location: Colorado Springs, CO
Distribution: Gentoo
Posts: 2,018

Rep: Reputation: 48
Handy one-liners for sed is a nice reference, too. I use it a lot
 
Old 11-05-2007, 02:45 AM   #6
iggi
LQ Newbie
 
Registered: Nov 2007
Posts: 29

Rep: Reputation: 16
Hi all,

Quote:
$ echo '<A HREF="xdpyinfo.1.html">xdpyinfo(1)</A>' | sed 's/.*HREF="\(.*\)".*/\1/'
xdpyinfo.1.html
Exactly what I was looking for :-) Only problem: sed prints every line also the ones not mathing. Using -n option suppresses "everything". How can I solve this?

grep -o is nice but doesn't offer the flexibility of using \( \) which allows you to match something bigger but print only part of it.

Thanks in advance!

Dirk
 
Old 11-05-2007, 12:31 PM   #7
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,781

Rep: Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082Reputation: 2082
Try
Code:
sed -n 's/.*HREF="\(.*\)".*/\1/p'
 
Old 11-05-2007, 12:38 PM   #8
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
My favorite sed and awk tutorials here: http://www.grymoire.com/Unix/
 
Old 11-05-2007, 12:38 PM   #9
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Hi,

The sed part used is just a search and print, and is indeed done on all lines in a file.

It's not entirely clear to me what you want to match and what you do not want to match, but the following example should get you going again:
Code:
$ cat sed.infile
a line
another line
<A HREF="xdpyinfo.0.html">xdpyinfo(0)</A>
<A HREF="xdpyinfo.1.html">xdpyinfo(1)</A>
line in the middle
<A HREF="xdpyinfo.2.html">xdpyinfo(2)</A>
<A HREF="xdpyinfo.3.html">xdpyinfo(3)</A>
last line


$ sed -n '/xdpyinfo/s/.*HREF="\(.*\)".*/\1/p' sed.infile 
xdpyinfo.0.html
xdpyinfo.1.html
xdpyinfo.2.html
xdpyinfo.3.html
Hope this helps.
 
Old 11-05-2007, 01:46 PM   #10
angrybanana
Member
 
Registered: Oct 2003
Distribution: Archlinux
Posts: 147

Rep: Reputation: 21
awk:
Code:
awk -F'"' 'NR>1&&$0=$2' RS='HREF=' file
 
Old 11-06-2007, 03:38 AM   #11
iggi
LQ Newbie
 
Registered: Nov 2007
Posts: 29

Rep: Reputation: 16
Thanks guys! Problem solved: -n in combination with /p. Will have a look at those tutorials... looking good!

Dirk
 
Old 05-05-2009, 01:50 PM   #12
loc.nguyen
LQ Newbie
 
Registered: May 2009
Posts: 2

Rep: Reputation: 0
Similar problems

Please help this:

cat aa
<a href=#Say,123> >>Hi<<
<a href=#Say,234> >>Hello<< <a href=#Say,345> >>World<<

Code:
cat aa | sed -n 's/.*href=#Say,\(.*\)>.*/\1/p'
123> >
345> >

What is sed or awk command to get like this:
123
234
345

If this work then it is fine but the above is referred.
cat bb
<a href=#Say,123> >>Hi<<
<a href=#Say,234> >>Hello<<

to:
123
234

Last edited by loc.nguyen; 05-05-2009 at 01:52 PM.
 
Old 05-05-2009, 09:28 PM   #13
Kenhelm
Member
 
Registered: Mar 2008
Location: N. W. England
Distribution: Mandriva
Posts: 360

Rep: Reputation: 170Reputation: 170
Using GNU sed
Code:
# Patterns such as [^<]*< limit "greedy matching"
sed -n 's/<a href=#Say,\([^>]*\)>[^<]*<</\1/gp' aa
123
234 345

# Adding 's/< </<\n</g' converts the space into a newline
sed -n 's/< </<\n</g; s/<a href=#Say,\([^>]*\)>[^<]*<</\1/gp' aa
123
234
345
 
Old 05-06-2009, 06:17 AM   #14
loc.nguyen
LQ Newbie
 
Registered: May 2009
Posts: 2

Rep: Reputation: 0
It works

It works but then I have another problems so I wrote in awk then. Thanks
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Shell Script to Delete line if pattern exists topcat Programming 22 08-23-2011 04:58 AM
Procmail: match pattern then pass to shell script essdeeay Linux - Software 1 11-08-2004 02:19 PM
Removing Text in a single line starting with one pattern ending on another mgwheeler Programming 13 08-03-2004 04:36 PM
Grep pattern first line of a file ericcarlson Linux - Newbie 11 07-20-2004 10:51 AM
Pattern search in a line jitz Linux - General 2 12-06-2003 04:50 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 07:50 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration