LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 07-13-2012, 11:14 AM   #1
zero_maniac
LQ Newbie
 
Registered: Jul 2012
Posts: 2

Rep: Reputation: Disabled
Using Perl to extract values from an HTML file


Hi. I have an HTML file that has a section devoted to countries, and I would like to make a hash of the country codes and the name of the country. Here is an example of the HTML code:

Code:
<option value="SL">Sierra Leone</option><option value="SG">Singapore</option><option value="SK">Slovakia</option><option value="SI">Slovenia</option><option value="SB">Solomon Islands</option><option value="ZA">South Africa</option><option value="ES">Spain</option>
So for instance, I'd like to have "SL" => "Sierra Leone" in my hash, but how to extract only these values from the HTML source using Perl?

Thank you for your suggestions.

Last edited by zero_maniac; 07-13-2012 at 11:15 AM. Reason: Spelling errors.
 
Old 07-13-2012, 02:58 PM   #2
kakaka
Member
 
Registered: Sep 2003
Posts: 382

Rep: Reputation: 86
Are you familiar with CPAN? There are various HTML parsing
modules available for PERL.

Or if you are familiar with regular expressions in PERL,
you can always use regular expressions, such as by locating
a unique ID of the select statement, then marching through
the contained option elements.
 
Old 07-14-2012, 10:11 PM   #3
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,395
Blog Entries: 2

Rep: Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903Reputation: 903
I am a really big advocate of using industrial strength parsers for parsing HTML, but in this case, I think a simple Perl parser can do it. If you have some code that already extracts the part you've posted here, then it should be simple enought to split() on '</option><option value="', and then for each option section, split again on '">' to extract the key/value pairs.

--- rod.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
how to extract ascii separated values in a text file? depam Linux - General 4 01-27-2012 12:43 AM
Python: Extract names and values from HTML tags Dogs Programming 2 02-10-2011 08:56 AM
Extract spesific text from an HTML file mister_0101 Programming 6 07-24-2005 04:50 PM
Extract text from a html file gsphanikumar6 Linux - Newbie 2 08-20-2004 01:11 PM
cgi perl : I cant get perl to append my html file... the_y_man Programming 3 03-22-2004 05:07 AM


All times are GMT -5. The time now is 08:52 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration