LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 08-13-2010, 03:11 PM   #1
PradeepKr
LQ Newbie
 
Registered: Aug 2010
Posts: 6

Rep: Reputation: 0
Download a webpage using unix


how to download a webpage using unix nd then parse throufh its content to extract particular portion (like header or title) ?
 
Old 08-13-2010, 03:44 PM   #2
MS3FGX
LQ Guru
 
Registered: Jan 2004
Location: NJ, USA
Distribution: Slackware, Debian
Posts: 5,852

Rep: Reputation: 361Reputation: 361Reputation: 361Reputation: 361
You can download web pages with wget, and then parse the plaintext files with tools like grep, sed, and awk. We could probably give a bit more specific direction if you explained exactly what you wanted to do.
 
Old 08-13-2010, 06:22 PM   #3
dreamwalking
Member
 
Registered: Dec 2005
Distribution: Slackware 14
Posts: 106

Rep: Reputation: 31
Quote:
wget --wait=20 --limit-rate=20K -c -r -p http://www.example.com
wait=20 pauses for 20 secs between downloads, limit-rate, well, limits the download rate, -c so that you can resume download if it's interrupted, -p to get everything in the page (images needed to display it, etc), -r for recrusive download.
 
Old 08-14-2010, 01:40 PM   #4
PradeepKr
LQ Newbie
 
Registered: Aug 2010
Posts: 6

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by MS3FGX View Post
You can download web pages with wget, and then parse the plaintext files with tools like grep, sed, and awk. We could probably give a bit more specific direction if you explained exactly what you wanted to do.

The requirement is,
I already have a script written in PHP which can download a web page and parse through it but it runs sluggishly.
I want a fast working script, may be shell script, for extracting the links from a webpage.
 
Old 08-14-2010, 04:16 PM   #5
btmiller
Senior Member
 
Registered: May 2004
Location: In the DC 'burbs
Distribution: Arch, Scientific Linux, Debian, Ubuntu
Posts: 4,290

Rep: Reputation: 378Reputation: 378Reputation: 378Reputation: 378
What languages do you know? I hear the HTML::Parse module of Perl is quite good and easy to use, but I switched from Perl to Python (and stopped doing HTML parsing regularly) a couple years ago. If you're trying to extract some sort of data from HTML pages it's possible that the author ofr those pages already put the data in some more easily accessible form (e.g. XML) or provides an API to access it. You might check and see if that is the case.
 
Old 08-15-2010, 10:53 AM   #6
PradeepKr
LQ Newbie
 
Registered: Aug 2010
Posts: 6

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by btmiller View Post
What languages do you know? I hear the HTML::Parse module of Perl is quite good and easy to use, but I switched from Perl to Python (and stopped doing HTML parsing regularly) a couple years ago. If you're trying to extract some sort of data from HTML pages it's possible that the author ofr those pages already put the data in some more easily accessible form (e.g. XML) or provides an API to access it. You might check and see if that is the case.

I know Perl, PHP, basic UNIX etc.
No, the author if the webpages cannot give any API. I need to do it as raw HTML only.
Using perl is also same as using PHP(which I am already donig).

I need superfast HTML parsing.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Invoking UNIX shell commands from HTML webpage ixcel87 Programming 8 05-12-2010 03:22 PM
Webpage explaining UNIX vs. Windows phantom_cyph Linux - General 15 03-18-2007 07:41 PM
LXer: Linux/UNIX: Double your download speed with download accelerator LXer Syndicated Linux News 1 09-28-2006 07:44 AM
script to download a webpage pankaj99 Programming 4 09-23-2006 06:28 AM
Stalled WebPage Load/No Download DigitMole Mandriva 2 02-02-2004 05:09 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 04:13 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration