LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 10-26-2010, 07:22 AM   #1
ben1173
LQ Newbie
 
Registered: Sep 2010
Posts: 8

Rep: Reputation: 0
Writing script to extract appropriate line from a web site using links


Hi,
I need to write a script called '~/get_birthrate' which when invoked with a two letter country abbreviation(i.e, au,ch,ni), extracts the appropriate line which contains the information about the country's birth rate from the URL http://www.cia.gov/library/publicati...k/geos/ca.html (where "ca.html" should be replaced with the appropriate two letter abbreviation). The output should look like:

$ get_birthrate au
8.69 births/1,000 population (2007 est.)
$ get_birthrate ch
13.45 births/1,000 population (2007 est.)
$ get_birthrate ni
40.2 births/1,000 population (2007 est.)

any help will really be appreciated...

Thank you
 
Old 10-26-2010, 07:40 AM   #2
Expeto
Member
 
Registered: Sep 2010
Posts: 30

Rep: Reputation: Disabled
your link redirects

you need to use this kind of link "https://www.cia.gov/library/publications/the-world-factbook/geos/ca.html" https instead of http

about your questions, not a very good way of doing this will be this

Code:
[Ax@localhost ~]$ wget https://www.cia.gov/library/publications/the-world-factbook/geos/ca.html
--2010-10-26 15:43:20--  https://www.cia.gov/library/publications/the-world-factbook/geos/ca.html
Resolving www.cia.gov... 198.81.129.125
Connecting to www.cia.gov|198.81.129.125|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 366447 (358K) [text/html]
Saving to: “ca.html”

100%[=====================================>] 366,447     55.9K/s   in 6.1s    

2010-10-26 15:43:28 (59.0 KB/s) - “ca.html” saved [366447/366447]

[Ax@localhost ~]$ grep "births/1,000 population" *.html | cut -c 48-88
10.28 births/1,000 population (2010 est.)
[Ax@localhost ~]$ rm *.html

Last edited by Expeto; 10-26-2010 at 07:48 AM.
 
Old 10-26-2010, 08:54 AM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,438

Rep: Reputation: 2842Reputation: 2842Reputation: 2842Reputation: 2842Reputation: 2842Reputation: 2842Reputation: 2842Reputation: 2842Reputation: 2842Reputation: 2842Reputation: 2842
If you use -O with - you can pump it straight into grep, sed or awk and strip out what you like:
Code:
wget -O- https://www.cia.gov/library/publications/the-world-factbook/geos/ca.html | grep -oE "[^>]*births/1,000 population[^<]*"
 
Old 10-26-2010, 09:09 AM   #4
mjolnir
Member
 
Registered: Apr 2003
Posts: 748

Rep: Reputation: 82
@grail Nice, I just tried this and it worked like a charm.
 
Old 10-26-2010, 10:33 AM   #5
Expeto
Member
 
Registered: Sep 2010
Posts: 30

Rep: Reputation: Disabled
@grail wow, I didn't knew that. A very useful trick
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Trying to write a command line script that will make symbolic links dave247 Programming 5 10-21-2010 12:24 PM
Who have links to my web site? pching Linux - General 11 02-16-2009 06:44 AM
run script to extract data to put to web cghcgh Programming 9 06-10-2008 02:11 AM


All times are GMT -5. The time now is 04:35 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration