Extract "itemTitle" from ebay web page
I am trying to extract the itemTitle of an ebay item from the html page that I have saved and use it in another part of a script.
I have come as far as: cat ebayISAPI.dll\......html | grep class\=\"itemTitle\"\> which give me the block of text with 1 instance of the phrase itemTitle that I want to use .... but can't seem to get the | sed -n '/itemTitle/,/h1/p' to work at all depite going almost blind reading the man pages and examples that I have found. I am going along the right lines I think but confirmation would be good so I can continue my research would be helpful. The intention is to them rename the html file to the title of the item which I think I have sussed. TY |
It is sort of working
A little research has revealed that the sed -n '/itemTitle/,/h1/p' is working in that it prints the whole line that includes the start expression. So it is doing the same as grep class\=\"itemTitle\"\>. Pointers to gets getting the text between the start and end expressions?
TY |
give a sample of that html page, as well as the things you want to get.
|
This is the last part of the output from the grep
....imgsrc="http://pics.ebaystatic.com/aw/pics/globalAssets/ltCurve.gif" width="8" height="8"></td><td></td><td class="titlePadding"><h1 class="itemTitle"></h1></td><td width="100%" class="titlePadding"><h1 class="itemTitle">WW2 RAF Spitfire secret signalling transmitter</h1></td><td align="right" nowrap> It is just the bold part, obviously changes with each new file, that I want to be able to use to rename the same file in another script that I found on this web site. Awesome resource don't you think! |
Nearly there.
So .... awk 'NR>1&&$0=RS$1$2$3' RS="itemTitle\">" filename works a treat and gives the result of WW2RAFSptifire Adding a $4 adds the next word surrounded by spaces eg WW2RAFSptifiresecret. But for the gold plated version .... what can I do to add all the words up to the </h1> or should I cut my losses and go with what I have. DIMonS |
AWK
Just use '<' as the field seperator and grab the first field. Code:
awk -F'<' 'NR>1&&$0=$1' RS='<h1 class="itemTitle">' edit: Perl Code:
perl -lne 'print for m{<h1 class="itemTitle">(.*?)</h1>}g' |
TY V much all.
DIMonS |
All times are GMT -5. The time now is 06:17 AM. |