zero_maniac 07-13-2012 12:14 PM

Using Perl to extract values from an HTML file
Hi. I have an HTML file that has a section devoted to countries, and I would like to make a hash of the country codes and the name of the country. Here is an example of the HTML code:


<option value="SL">Sierra Leone</option><option value="SG">Singapore</option><option value="SK">Slovakia</option><option value="SI">Slovenia</option><option value="SB">Solomon Islands</option><option value="ZA">South Africa</option><option value="ES">Spain</option>
So for instance, I'd like to have "SL" => "Sierra Leone" in my hash, but how to extract only these values from the HTML source using Perl?

Thank you for your suggestions.

kakaka 07-13-2012 03:58 PM

Are you familiar with CPAN? There are various HTML parsing
modules available for PERL.

Or if you are familiar with regular expressions in PERL,
you can always use regular expressions, such as by locating
a unique ID of the select statement, then marching through
the contained option elements.

theNbomr 07-14-2012 11:11 PM

I am a really big advocate of using industrial strength parsers for parsing HTML, but in this case, I think a simple Perl parser can do it. If you have some code that already extracts the part you've posted here, then it should be simple enought to split() on '</option><option value="', and then for each option section, split again on '">' to extract the key/value pairs.

