wget fail to download pdf files
I want to download all the pdf files at the web site http://www.advancedlinuxprogramming.com/alp-folder
There are about 20 pdf files so I want to use wget to download them. However, I do not figure out the correct way to do that. I tried these but all failed: $ wget -r -l1 --no-parent -A.pdf http://www.advancedlinuxprogramming.com/alp-folder $ wget -r --no-parent -A.pdf http://www.advancedlinuxprogramming.com/alp-folder $ wget --convert-links -r -A pdf http://www.advancedlinuxprogramming.com/alp-folder/ $ wget --convert-links -r -A "*.pdf" http://www.advancedlinuxprogramming.com/alp-folder/ $ wget --version GNU Wget 1.9+cvs-stable (Red Hat modified) Copyright (C) 2003 Free Software Foundation, Inc. I use FC3 linux. |
The robots.txt file doesn't allow it.
You could save that webpage ( Downloading... ) in your browser and extract the locations of each listed pdf file from the .html file you saved (try sed for this). Then you could use curl -O in a "for" loop to download each file in your list. |
I discover that "wget -erobots=off" will make Wget ignore the robots.txt file
i.e. this will download all pdf files: wget --convert-links -r -A "*.pdf" -erobots=off http://www.advancedlinuxprogramming.c Problem is solved. Thanks! Quote:
|
All times are GMT -5. The time now is 08:34 AM. |