LinuxQuestions.org
Latest LQ Deal: Complete CCNA, CCNP & Red Hat Certification Training Bundle
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 11-03-2014, 04:56 AM   #1
shridhar22
Member
 
Registered: Mar 2012
Posts: 42

Rep: Reputation: Disabled
Question Read and extract table data in HTML from unix


I need extract rows 48 to 53 (which have "-" in all the 5 columns).
This page gets updated daily and I need names in the second column. Currently the are MSNG SIBN MSRS RASP MTLRP MTLR
http://www.nkcbank.ru/viewCatalog.do?menuKey=254

I used curl command to get the html code but dont know how to extract my required data. TIA
 
Old 11-03-2014, 05:12 AM   #2
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 8,505

Rep: Reputation: 2434Reputation: 2434Reputation: 2434Reputation: 2434Reputation: 2434Reputation: 2434Reputation: 2434Reputation: 2434Reputation: 2434Reputation: 2434Reputation: 2434
looks like you need a html/xml parser, for example in perl.....
 
Old 11-03-2014, 03:17 PM   #3
ondoho
Senior Member
 
Registered: Dec 2013
Posts: 4,620

Rep: Reputation: 984Reputation: 984Reputation: 984Reputation: 984Reputation: 984Reputation: 984Reputation: 984Reputation: 984
try html-xml-utils
and
xmllint (part of libxml2)
 
Old 11-04-2014, 03:01 AM   #4
shridhar22
Member
 
Registered: Mar 2012
Posts: 42

Original Poster
Rep: Reputation: Disabled
thanks ondoho and pan64, I did install html-xml-utils but still a bit confused how to extract the second column from rows marked with *
 
Old 11-04-2014, 03:13 PM   #5
ondoho
Senior Member
 
Registered: Dec 2013
Posts: 4,620

Rep: Reputation: 984Reputation: 984Reputation: 984Reputation: 984Reputation: 984Reputation: 984Reputation: 984Reputation: 984
well, i'm not going to click that link.
so if you want to post some html, explain what you want to extract, and show us what you tried so far, we'll be more than happy to assist.
 
Old 11-04-2014, 04:36 PM   #6
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 18,327

Rep: Reputation: 3881Reputation: 3881Reputation: 3881Reputation: 3881Reputation: 3881Reputation: 3881Reputation: 3881Reputation: 3881Reputation: 3881Reputation: 3881Reputation: 3881
Quote:
Originally Posted by shridhar22 View Post
thanks ondoho and pan64, I did install html-xml-utils but still a bit confused how to extract the second column from rows marked with *
Ok..so why don't you post what you have written, show us a sample of the input data, and what you're wanting as the output data, and we can try to help. But we're not going to write your code for you, or click bank-website links in Russia. Post your code and relevant details.
 
Old 11-05-2014, 01:28 AM   #7
shridhar22
Member
 
Registered: Mar 2012
Posts: 42

Original Poster
Rep: Reputation: Disabled
Okay, sorry if I conflicted the forum norms, I tried to quote the html page source but its (480122 characters) . I need information from 1st table=>name in 2nd column=> which has all blanks (-).
I used the * special character (there are total 7 * signs on the page source) and tried to check from where I can extract name in the second column. I found if * is at line number 3 then my required word is at line number 6 and so on.

I used the following command, which works correct for me as of now, but I know this logic/regex only survives till the time my word is available 3 lines ahead of the greped * symbol.

Quote:
curl --silent http://www.nkcbank.com/viewCatalog.do?menuKey=254 | awk -v lines=3 '/\*/ {for(i=lines;i;--i)getline; print $0 }' | grep -Eo '\b[[:upper:]][[:upper:]][[:upper:]]+\b'
[QUOTE]
 
Old 11-05-2014, 06:08 AM   #8
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 8,505

Rep: Reputation: 2434Reputation: 2434Reputation: 2434Reputation: 2434Reputation: 2434Reputation: 2434Reputation: 2434Reputation: 2434Reputation: 2434Reputation: 2434Reputation: 2434
awk | grep can be combined into one single awk script.
The script you wrote does not check the 5 occurrences of - (but a *), that is not the same thing at all.

Here you can find additional information and tips:
http://stackoverflow.com/questions/1...ble-using-bash
 
Old 11-05-2014, 01:24 PM   #9
ondoho
Senior Member
 
Registered: Dec 2013
Posts: 4,620

Rep: Reputation: 984Reputation: 984Reputation: 984Reputation: 984Reputation: 984Reputation: 984Reputation: 984Reputation: 984
shridhar22, please believe me, in the long run you'll be happier using html-xml-utils, which contain some commands that parse html - something that you're now trying to re-implement from scratch.
xmllint is actually even better, but harder to use.

it's probably easier to parse by css classes, so instead of looking for "the 1st table", you'd be looking for "a table that has the class xxxx"

you can upload the html code of the whole page somewhere else, so interested helpers can use that to see what you're trying to achieve.

i'm not a good coder, but i once made a weather forecast script that uses above mentioned utilities, if you want you can take a peek here.
 
  


Reply

Tags
curl, html, regexp


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
how to read data from text file and output into a table? j123 Linux - Newbie 26 04-15-2010 01:58 AM
looking for a perl script to convert html table data into a csv file swiftguy121 Linux - Software 2 04-25-2007 08:28 PM
How to read HTML or TXT file and output the data? koolkicks311 Programming 1 04-21-2007 12:13 AM
Extract data from disk, without partition table... mandatory Linux - General 1 10-20-2003 07:35 AM
retriving mysql data, and placing it inside a table in html using perl rhuser Programming 1 03-12-2003 06:04 AM


All times are GMT -5. The time now is 03:44 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration