wget multiple downloads problem
Hello (first post here :o)
So I was trying to download all the lectures from my concurrent programming class (mpla-server.mpla.com/courses/CE123) with wget and I failed (it only downloaded an index.html) It's weird cause I managed to do it in an other class page e.x (mpla-server.mpla.com/CE124/lectures). I should point out that the latter had both /lectures.php and lectures/ which gives a dir with all the pdf files . The first page has hrefs with the pdfs pages but when I try wget recursively it doesn't find any pdfs. 10q in advance. Sorry if it's already answered. |
Hi,
Are you saying that the page you are interested in has links to pdf files, but you cannot download them with wget? try this: Code:
lynx --dump <website> | awk '/http/{print $2}' | grep .pdf > output.txt then you can try: Code:
for i in $( cat output.txt ); do wget $i; done |
Quote:
The q is why at the first page wget can't get the PDFs. |
Not sure why,
but maybe because the first page has links to the pdfs (not the actual files), and wget is not configured to follow them. the second page has lectures.php which is what you get when you hit it with a browser probably, but it also has the actual pdf files in that directory. |
Many sites check your browser's user-agent string and/or use cookies in order to block mass downloading programs, and often return a simple index.html instead of the desired file such cases. It's possible to spoof these things, but it can be more complex and site-specific.
You can start by using the -U option to make wget appear to be another browser, at least. |
All times are GMT -5. The time now is 12:04 AM. |