LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 10-15-2007, 11:08 AM   #1
DIMonS
LQ Newbie
 
Registered: Aug 2007
Posts: 13

Rep: Reputation: 0
Extract "itemTitle" from ebay web page


I am trying to extract the itemTitle of an ebay item from the html page that I have saved and use it in another part of a script.

I have come as far as:

cat ebayISAPI.dll\......html | grep class\=\"itemTitle\"\> which give me the block of text with 1 instance of the phrase itemTitle that I want to use ....

but can't seem to get the | sed -n '/itemTitle/,/h1/p' to work at all depite going almost blind reading the man pages and examples that I have found.

I am going along the right lines I think but confirmation would be good so I can continue my research would be helpful.

The intention is to them rename the html file to the title of the item which I think I have sussed.

TY
 
Old 10-16-2007, 02:52 AM   #2
DIMonS
LQ Newbie
 
Registered: Aug 2007
Posts: 13

Original Poster
Rep: Reputation: 0
It is sort of working

A little research has revealed that the sed -n '/itemTitle/,/h1/p' is working in that it prints the whole line that includes the start expression. So it is doing the same as grep class\=\"itemTitle\"\>. Pointers to gets getting the text between the start and end expressions?

TY
 
Old 10-16-2007, 04:10 AM   #3
ghostdog74
Senior Member
 
Registered: Aug 2006
Posts: 2,697
Blog Entries: 5

Rep: Reputation: 244Reputation: 244Reputation: 244
give a sample of that html page, as well as the things you want to get.
 
Old 10-16-2007, 05:50 AM   #4
DIMonS
LQ Newbie
 
Registered: Aug 2007
Posts: 13

Original Poster
Rep: Reputation: 0
This is the last part of the output from the grep

....imgsrc="http://pics.ebaystatic.com/aw/pics/globalAssets/ltCurve.gif" width="8" height="8"></td><td></td><td class="titlePadding"><h1 class="itemTitle"></h1></td><td width="100%" class="titlePadding"><h1 class="itemTitle">WW2 RAF Spitfire secret signalling transmitter</h1></td><td align="right" nowrap>

It is just the bold part, obviously changes with each new file, that I want to be able to use to rename the same file in another script that I found on this web site. Awesome resource don't you think!
 
Old 11-05-2007, 07:27 AM   #5
DIMonS
LQ Newbie
 
Registered: Aug 2007
Posts: 13

Original Poster
Rep: Reputation: 0
Nearly there.

So .... awk 'NR>1&&$0=RS$1$2$3' RS="itemTitle\">" filename works a treat and gives the result of WW2RAFSptifire

Adding a $4 adds the next word surrounded by spaces eg WW2RAFSptifiresecret.

But for the gold plated version .... what can I do to add all the words up to the </h1> or should I cut my losses and go with what I have.

DIMonS
 
Old 11-05-2007, 11:29 AM   #6
angrybanana
Member
 
Registered: Oct 2003
Distribution: Archlinux
Posts: 147

Rep: Reputation: 21
AWK
Just use '<' as the field seperator and grab the first field.
Code:
awk -F'<' 'NR>1&&$0=$1' RS='<h1 class="itemTitle">'
perl or sed might be better for this.

edit:
Perl
Code:
perl -lne 'print for m{<h1 class="itemTitle">(.*?)</h1>}g'

Last edited by angrybanana; 11-05-2007 at 02:00 PM.
 
Old 11-06-2007, 06:36 AM   #7
DIMonS
LQ Newbie
 
Registered: Aug 2007
Posts: 13

Original Poster
Rep: Reputation: 0
TY V much all.

DIMonS
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Suggestion: for "subscribed threads" & "top of page" buttons Old_Fogie LQ Suggestions & Feedback 7 07-10-2006 05:10 PM
Get web page "URL" ajaykumar502 Programming 2 03-13-2006 03:08 PM
my web browser "mozilla fire fox" isn't rendering the page, rather opening the page amolgupta Linux - Software 2 07-26-2005 12:41 AM
difference between "Web server local URL" and "IPv4 address"? kpachopoulos Linux - General 2 09-17-2004 01:30 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:56 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration