LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 07-28-2009, 09:06 AM   #1
yaazz
LQ Newbie
 
Registered: Jul 2009
Posts: 5

Rep: Reputation: 0
SED how to find multiple patterns on a single line


Hi there, I am trying to parse some product titles out of a HTML file that can be obtained at
http://www.moen.com/ecatalog/gallery..._/N-67p?Erp=12

What I am trying to do, is grab the titles of the products (such as fina, mannerly, etc) and print them off one by one. I didnt think this would be difficult at all using regular expressions, but the problem is, that some of the titles are in a huge piece of HTML all stuck on a single line.

Usually I would just use SED, and replace the line with just the part that I wanted, but this wont work in this case since there is a lot of extra crap that will be printed off as well.

My last guess was to use this command
curl 'http://www.moen.com/ecatalog/gallery/bathroom-faucets-sink/_/N-67p?Erp=12' | sed -n 's/.*target="_top"><span class="producttitle">\([a-zA-Z]*\)<\/span>.*/\1/pg'

The output is as follows
Mannerly
Fina
Fina

Which as you can see by looking at the page in a webbrowser, is the last product on each line.
If you take out the .* on each side of the search part of the regex, and look at the output with a fine tooth comb, you can see that this DOES find and replace every regular expression, its just that I dont know how to print them off without all of the extra HTML I dont need.


if this does not make sense please feel free to tell me and I will make myself clearer
 
Old 07-28-2009, 09:47 AM   #2
paulsm4
Guru
 
Registered: Mar 2004
Distribution: SusE 8.2
Posts: 5,863
Blog Entries: 1

Rep: Reputation: Disabled
You might be able to use "sed -e PATTERN1 -e PATTERN2 ...":

http://www.unix.com/shell-programmin...-commands.html
 
Old 07-28-2009, 07:32 PM   #3
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 240Reputation: 240Reputation: 240
Code:
wget -q -O-   'http://www.moen.com/ecatalog/gallery/bathroom-faucets-sink/_/N-67p?Erp=12' | awk 'BEGIN{RS="</span>"}
/producttitle/{
    gsub(/.*producttitle\">/,"")
    gsub(/<.*/,"")    
    if ($0) print 
}'
 
Old 07-29-2009, 06:36 AM   #4
yaazz
LQ Newbie
 
Registered: Jul 2009
Posts: 5

Original Poster
Rep: Reputation: 0
Cool thanks a lot ive never seen the gsub operator before in AWK.

What is it doing exactly?
Removing everything before /.*producttitle> then everything after it until the next </span>??
 
Old 07-29-2009, 09:17 AM   #5
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 240Reputation: 240Reputation: 240
Quote:
Originally Posted by yaazz View Post
Cool thanks a lot ive never seen the gsub operator before in AWK.
most probably, you have been using awk as simple one liners

Quote:
What is it doing exactly?
Removing everything before /.*producttitle> then everything after it until the next </span>??
half right. the second gsub remove everything from "<" (after the text your need to get) till the end.
 
Old 07-29-2009, 09:30 AM   #6
joeBuffer
Member
 
Registered: Jul 2009
Distribution: Ubuntu 9.04
Posts: 325

Rep: Reputation: 42
gsub is global substitution, like s/pattern/replacement/g in sed.
 
Old 07-30-2009, 10:03 AM   #7
yaazz
LQ Newbie
 
Registered: Jul 2009
Posts: 5

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by ghostdog74 View Post
most probably, you have been using awk as simple one liners


half right. the second gsub remove everything from "<" (after the text your need to get) till the end.

Yeah i'm just realizing how powerful AWK is as of late. I always just used sed as it was a lot easier to learn, but after learning some of its amazing features, AWK is soooo much better. I hear perl is very good for text manipulation as well.
What are some other members favorite program?
 
Old 07-30-2009, 01:56 PM   #8
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Arch/XFCE
Posts: 17,802

Rep: Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728Reputation: 728
Quote:
Originally Posted by paulsm4 View Post
You might be able to use "sed -e PATTERN1 -e PATTERN2 ...":

http://www.unix.com/shell-programmin...-commands.html
Doesn't this look for PATTERN1 OR PATTERN2?

How about this?:
Code:
sed -n '/P1/{/P2/{/P3/{/P4/p}}}' filename
Keep nesting as required
Quote:
Originally Posted by yaazz
What are some other members favorite program?
Would you believe----SED!!
 
Old 07-30-2009, 08:30 PM   #9
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,695
Blog Entries: 5

Rep: Reputation: 240Reputation: 240Reputation: 240
Quote:
Originally Posted by yaazz View Post
Yeah i'm just realizing how powerful AWK is as of late. I always just used sed as it was a lot easier to learn,
use sed for simple substitution. anything beyond that, use awk (or others)

Quote:
I hear perl is very good for text manipulation as well.
so is Python, or Ruby , or others, with good string manipulation functions/classes. And, its Perl, not perl.

Last edited by ghostdog74; 07-30-2009 at 08:32 PM.
 
Old 07-31-2009, 04:20 AM   #10
joeBuffer
Member
 
Registered: Jul 2009
Distribution: Ubuntu 9.04
Posts: 325

Rep: Reputation: 42
I haven't been using them or learning about them for all that long, but I like sed and awk, and anything else that I can put to use.
I've learned about and used a decent amount of Perl, but I couldn't give an opinion except that it's not like it makes me want to avoid learning about sed and awk or anything else.

By the way, Larry Wall has this to say about Perl, on the Linux Journal website:
Quote:
Another interesting tidbit is that the name “perl” wasn't capitalized at first. UNIX was still very much a lower-case-only OS at the time. In fact, I think you could call it an anti-upper-case OS. It's a bit like the folks who start posting on the Net and affect not to capitalize anything. Eventually, most of them come back to the point where they realize occasional capitalization is useful for efficient communication. In Perl's case, we realized about the time of Perl 4 that it was useful to distinguish between “perl” the program and “Perl” the language. If you find a first edition of the Camel Book, you'll see that the title was Programming perl, with a small “p”. Nowadays, the title is Programming Perl.
Also:
Quote:
A couple of years ago, I ran into someone at a trade show who was representing the NSA (National Security Agency). He mentioned to someone else in passing that he'd written a filter program in Perl, so without telling him who I was, I asked him if I could tell people that the NSA uses Perl. His response was, “Doesn't everyone?” So now I don't tell people the NSA uses Perl. I merely tell people the NSA thinks everyone uses Perl. They should know, after all.

Last edited by joeBuffer; 07-31-2009 at 04:28 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
linux find to find files with multiple patterns subu_s Programming 6 12-15-2010 12:15 AM
Single find command to find multiple files? thok Linux - Newbie 7 01-31-2009 04:45 PM
combining multiple sed operations into a single command kushalkoolwal Programming 3 09-16-2008 05:58 PM
Remembering patterns and printing only those patterns using sed bernie82 Programming 5 05-26-2005 05:18 PM
sed - multiple matches on the same line mjoc27x Programming 6 04-17-2003 07:22 AM


All times are GMT -5. The time now is 08:37 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration