SED how to find multiple patterns on a single line
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Introduction to Linux - A Hands on Guide
This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.
Click Here to receive this Complete Guide absolutely free.
What I am trying to do, is grab the titles of the products (such as fina, mannerly, etc) and print them off one by one. I didnt think this would be difficult at all using regular expressions, but the problem is, that some of the titles are in a huge piece of HTML all stuck on a single line.
Usually I would just use SED, and replace the line with just the part that I wanted, but this wont work in this case since there is a lot of extra crap that will be printed off as well.
My last guess was to use this command
curl 'http://www.moen.com/ecatalog/gallery/bathroom-faucets-sink/_/N-67p?Erp=12' | sed -n 's/.*target="_top"><span class="producttitle">\([a-zA-Z]*\)<\/span>.*/\1/pg'
The output is as follows
Which as you can see by looking at the page in a webbrowser, is the last product on each line.
If you take out the .* on each side of the search part of the regex, and look at the output with a fine tooth comb, you can see that this DOES find and replace every regular expression, its just that I dont know how to print them off without all of the extra HTML I dont need.
if this does not make sense please feel free to tell me and I will make myself clearer
most probably, you have been using awk as simple one liners
half right. the second gsub remove everything from "<" (after the text your need to get) till the end.
Yeah i'm just realizing how powerful AWK is as of late. I always just used sed as it was a lot easier to learn, but after learning some of its amazing features, AWK is soooo much better. I hear perl is very good for text manipulation as well.
What are some other members favorite program?
I haven't been using them or learning about them for all that long, but I like sed and awk, and anything else that I can put to use.
I've learned about and used a decent amount of Perl, but I couldn't give an opinion except that it's not like it makes me want to avoid learning about sed and awk or anything else.
By the way, Larry Wall has this to say about Perl, on the Linux Journal website:
Another interesting tidbit is that the name “perl” wasn't capitalized at first. UNIX was still very much a lower-case-only OS at the time. In fact, I think you could call it an anti-upper-case OS. It's a bit like the folks who start posting on the Net and affect not to capitalize anything. Eventually, most of them come back to the point where they realize occasional capitalization is useful for efficient communication. In Perl's case, we realized about the time of Perl 4 that it was useful to distinguish between “perl” the program and “Perl” the language. If you find a first edition of the Camel Book, you'll see that the title was Programming perl, with a small “p”. Nowadays, the title is Programming Perl.
A couple of years ago, I ran into someone at a trade show who was representing the NSA (National Security Agency). He mentioned to someone else in passing that he'd written a filter program in Perl, so without telling him who I was, I asked him if I could tell people that the NSA uses Perl. His response was, “Doesn't everyone?” So now I don't tell people the NSA uses Perl. I merely tell people the NSA thinks everyone uses Perl. They should know, after all.