LinuxQuestions.org - Writing script to extract appropriate line from a web site using links

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - Writing script to extract appropriate line from a web site using links (https://www.linuxquestions.org/questions/linux-newbie-8/writing-script-to-extract-appropriate-line-from-a-web-site-using-links-840506/)

Writing script to extract appropriate line from a web site using links

Hi,
I need to write a script called '~/get_birthrate' which when invoked with a two letter country abbreviation(i.e, au,ch,ni), extracts the appropriate line which contains the information about the country's birth rate from the URL http://www.cia.gov/library/publicati...k/geos/ca.html (where "ca.html" should be replaced with the appropriate two letter abbreviation). The output should look like:

$ get_birthrate au
8.69 births/1,000 population (2007 est.)
$ get_birthrate ch
13.45 births/1,000 population (2007 est.)
$ get_birthrate ni
40.2 births/1,000 population (2007 est.)

any help will really be appreciated...

Thank you

your link redirects

you need to use this kind of link "https://www.cia.gov/library/publications/the-world-factbook/geos/ca.html" https instead of http

about your questions, not a very good way of doing this will be this

Code:

[Ax@localhost ~]$ wget https://www.cia.gov/library/publications/the-world-factbook/geos/ca.html

--2010-10-26 15:43:20--  https://www.cia.gov/library/publications/the-world-factbook/geos/ca.html

Resolving www.cia.gov... 198.81.129.125

Connecting to www.cia.gov|198.81.129.125|:443... connected.

HTTP request sent, awaiting response... 200 OK

Length: 366447 (358K) [text/html]

Saving to: “ca.html”



100%[=====================================>] 366,447    55.9K/s  in 6.1s    



2010-10-26 15:43:28 (59.0 KB/s) - “ca.html” saved [366447/366447]



[Ax@localhost ~]$ grep "births/1,000 population" *.html | cut -c 48-88

10.28 births/1,000 population (2010 est.)

[Ax@localhost ~]$ rm *.html

If you use -O with - you can pump it straight into grep, sed or awk and strip out what you like:

Code:

wget -O- https://www.cia.gov/library/publications/the-world-factbook/geos/ca.html | grep -oE "[^>]*births/1,000 population[^<]*"

@grail Nice, I just tried this and it worked like a charm.

@grail wow, I didn't knew that. A very useful trick