LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 10-31-2012, 02:27 PM   #1
agtzim
LQ Newbie
 
Registered: Oct 2012
Posts: 2

Rep: Reputation: Disabled
wget multiple downloads problem


Hello (first post here )

So I was trying to download all the lectures from my concurrent programming class (mpla-server.mpla.com/courses/CE123) with wget and I failed (it only downloaded an index.html)

It's weird cause I managed to do it in an other class page e.x (mpla-server.mpla.com/CE124/lectures). I should point out that the latter had both /lectures.php and lectures/ which gives a dir with all the pdf files . The first page has hrefs with the pdfs pages but when I try wget recursively it doesn't find any pdfs.

10q in advance.
Sorry if it's already answered.
 
Old 10-31-2012, 06:29 PM   #2
DutchGeek
Member
 
Registered: Sep 2006
Distribution: SuSE, Slackware
Posts: 55

Rep: Reputation: 4
Hi,

Are you saying that the page you are interested in has links to pdf files, but you cannot download them with wget?

try this:
Code:
lynx --dump <website> | awk '/http/{print $2}' | grep .pdf > output.txt
it will fetch all the links on the page that have a pdf extension, and put them in a text file

then you can try:
Code:
for i in $( cat output.txt ); do wget $i; done
this will loop through the text file and download the links, I hope it works for you
 
1 members found this post helpful.
Old 11-01-2012, 07:44 AM   #3
agtzim
LQ Newbie
 
Registered: Oct 2012
Posts: 2

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by DutchGeek View Post
Hi,

Are you saying that the page you are interested in has links to pdf files, but you cannot download them with wget?

try this:
Code:
lynx --dump <website> | awk '/http/{print $2}' | grep .pdf > output.txt
it will fetch all the links on the page that have a pdf extension, and put them in a text file

then you can try:
Code:
for i in $( cat output.txt ); do wget $i; done
this will loop through the text file and download the links, I hope it works for you
Thanx man. The first page has href for the PDFs. The second page has both lectures.php and lectures/. The lectures/ gives a page "index of CE124/lectures" and at the bottom says something about apache.

The q is why at the first page wget can't get the PDFs.

Last edited by agtzim; 11-01-2012 at 08:02 AM.
 
Old 11-01-2012, 05:14 PM   #4
DutchGeek
Member
 
Registered: Sep 2006
Distribution: SuSE, Slackware
Posts: 55

Rep: Reputation: 4
Not sure why,
but maybe because the first page has links to the pdfs (not the actual files), and wget is not configured to follow them. the second page has lectures.php which is what you get when you hit it with a browser probably, but it also has the actual pdf files in that directory.
 
Old 11-03-2012, 01:09 PM   #5
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957
Many sites check your browser's user-agent string and/or use cookies in order to block mass downloading programs, and often return a simple index.html instead of the desired file such cases. It's possible to spoof these things, but it can be more complex and site-specific.

You can start by using the -U option to make wget appear to be another browser, at least.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
cron+wget = multiple downloads stokilo Linux - Newbie 1 03-02-2009 12:49 PM
How to change the destination of downloads when using wget? Uxq Linux - Newbie 1 07-02-2008 09:56 PM
it downloads in firefox, not in wget moob8 Linux - Desktop 3 02-10-2007 10:40 AM
Where are wget downloads saved when using bash? kb9agt General 6 01-29-2007 12:04 PM
Running wget multiple downloads simultaneously stefaandk Linux - Newbie 1 09-04-2006 08:46 AM


All times are GMT -5. The time now is 04:57 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration