LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Writing script to extract appropriate line from a web site using links (https://www.linuxquestions.org/questions/linux-newbie-8/writing-script-to-extract-appropriate-line-from-a-web-site-using-links-840506/)

ben1173 10-26-2010 07:22 AM

Writing script to extract appropriate line from a web site using links
 
Hi,
I need to write a script called '~/get_birthrate' which when invoked with a two letter country abbreviation(i.e, au,ch,ni), extracts the appropriate line which contains the information about the country's birth rate from the URL http://www.cia.gov/library/publicati...k/geos/ca.html (where "ca.html" should be replaced with the appropriate two letter abbreviation). The output should look like:

$ get_birthrate au
8.69 births/1,000 population (2007 est.)
$ get_birthrate ch
13.45 births/1,000 population (2007 est.)
$ get_birthrate ni
40.2 births/1,000 population (2007 est.)

any help will really be appreciated...

Thank you

Expeto 10-26-2010 07:40 AM

your link redirects

you need to use this kind of link "https://www.cia.gov/library/publications/the-world-factbook/geos/ca.html" https instead of http

about your questions, not a very good way of doing this will be this

Code:

[Ax@localhost ~]$ wget https://www.cia.gov/library/publications/the-world-factbook/geos/ca.html
--2010-10-26 15:43:20--  https://www.cia.gov/library/publications/the-world-factbook/geos/ca.html
Resolving www.cia.gov... 198.81.129.125
Connecting to www.cia.gov|198.81.129.125|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 366447 (358K) [text/html]
Saving to: “ca.html”

100%[=====================================>] 366,447    55.9K/s  in 6.1s   

2010-10-26 15:43:28 (59.0 KB/s) - “ca.html” saved [366447/366447]

[Ax@localhost ~]$ grep "births/1,000 population" *.html | cut -c 48-88
10.28 births/1,000 population (2010 est.)
[Ax@localhost ~]$ rm *.html


grail 10-26-2010 08:54 AM

If you use -O with - you can pump it straight into grep, sed or awk and strip out what you like:
Code:

wget -O- https://www.cia.gov/library/publications/the-world-factbook/geos/ca.html | grep -oE "[^>]*births/1,000 population[^<]*"

mjolnir 10-26-2010 09:09 AM

@grail Nice, I just tried this and it worked like a charm.

Expeto 10-26-2010 10:33 AM

@grail wow, I didn't knew that. A very useful trick


All times are GMT -5. The time now is 10:25 AM.