Latest LQ Deal: Complete CCNA, CCNP & Red Hat Certification Training Bundle
Go Back > Forums > Non-*NIX Forums > Programming
User Name
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.


  Search this Thread
Old 07-13-2012, 11:14 AM   #1
LQ Newbie
Registered: Jul 2012
Posts: 2

Rep: Reputation: Disabled
Using Perl to extract values from an HTML file

Hi. I have an HTML file that has a section devoted to countries, and I would like to make a hash of the country codes and the name of the country. Here is an example of the HTML code:

<option value="SL">Sierra Leone</option><option value="SG">Singapore</option><option value="SK">Slovakia</option><option value="SI">Slovenia</option><option value="SB">Solomon Islands</option><option value="ZA">South Africa</option><option value="ES">Spain</option>
So for instance, I'd like to have "SL" => "Sierra Leone" in my hash, but how to extract only these values from the HTML source using Perl?

Thank you for your suggestions.

Last edited by zero_maniac; 07-13-2012 at 11:15 AM. Reason: Spelling errors.
Old 07-13-2012, 02:58 PM   #2
Registered: Sep 2003
Posts: 382

Rep: Reputation: 87
Are you familiar with CPAN? There are various HTML parsing
modules available for PERL.

Or if you are familiar with regular expressions in PERL,
you can always use regular expressions, such as by locating
a unique ID of the select statement, then marching through
the contained option elements.
Old 07-14-2012, 10:11 PM   #3
LQ 5k Club
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,397
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
I am a really big advocate of using industrial strength parsers for parsing HTML, but in this case, I think a simple Perl parser can do it. If you have some code that already extracts the part you've posted here, then it should be simple enought to split() on '</option><option value="', and then for each option section, split again on '">' to extract the key/value pairs.

--- rod.


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
how to extract ascii separated values in a text file? depam Linux - General 4 01-27-2012 12:43 AM
Python: Extract names and values from HTML tags Dogs Programming 2 02-10-2011 08:56 AM
Extract spesific text from an HTML file mister_0101 Programming 6 07-24-2005 04:50 PM
Extract text from a html file gsphanikumar6 Linux - Newbie 2 08-20-2004 01:11 PM
cgi perl : I cant get perl to append my html file... the_y_man Programming 3 03-22-2004 05:07 AM > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 10:52 PM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration