LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 05-04-2006, 03:19 PM   #1
powah
Member
 
Registered: Mar 2005
Distribution: FC, Gentoo
Posts: 276

Rep: Reputation: 30
wget fail to download pdf files


I want to download all the pdf files at the web site http://www.advancedlinuxprogramming.com/alp-folder

There are about 20 pdf files so I want to use wget to download them.
However, I do not figure out the correct way to do that.
I tried these but all failed:
$ wget -r -l1 --no-parent -A.pdf http://www.advancedlinuxprogramming.com/alp-folder

$ wget -r --no-parent -A.pdf http://www.advancedlinuxprogramming.com/alp-folder

$ wget --convert-links -r -A pdf http://www.advancedlinuxprogramming.com/alp-folder/

$ wget --convert-links -r -A "*.pdf" http://www.advancedlinuxprogramming.com/alp-folder/

$ wget --version
GNU Wget 1.9+cvs-stable (Red Hat modified)

Copyright (C) 2003 Free Software Foundation, Inc.

I use FC3 linux.
 
Old 05-04-2006, 04:03 PM   #2
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682
The robots.txt file doesn't allow it.

You could save that webpage ( Downloading... ) in your browser and extract the locations of each listed pdf file from the .html file you saved (try sed for this). Then you could use curl -O in a "for" loop to download each file in your list.
 
Old 05-04-2006, 04:38 PM   #3
powah
Member
 
Registered: Mar 2005
Distribution: FC, Gentoo
Posts: 276

Original Poster
Rep: Reputation: 30
I discover that "wget -erobots=off" will make Wget ignore the robots.txt file
i.e. this will download all pdf files:
wget --convert-links -r -A "*.pdf" -erobots=off http://www.advancedlinuxprogramming.c

Problem is solved.
Thanks!

Quote:
Originally Posted by jschiwal
The robots.txt file doesn't allow it.

You could save that webpage ( Downloading... ) in your browser and extract the locations of each listed pdf file from the .html file you saved (try sed for this). Then you could use curl -O in a "for" loop to download each file in your list.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
wget and slackware download ivanatora Slackware 4 01-12-2008 06:12 AM
WGET: How do I cancel a download? PionexUser Linux - Software 3 12-06-2005 01:30 PM
I want to download ftp-site files via wget and socks5 proxy server. jiawj Red Hat 2 10-28-2004 04:32 PM
wget download all files of certain type GT_Onizuka Linux - Software 1 05-10-2004 09:33 PM
what to do with 5 parts of wget download Bruce Hill Linux - Software 2 09-11-2003 11:47 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 11:00 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration