wget failing to download some URLs
The background
This question is about a situation involving wget running under linux. I run slackware 12 (not sure if this matters, wget is wget, right?). My web browser is firefox with cookies and javacrap disabled. I do this to prevent a great deal of ads. I have dialup. Because I have dialup, larger downloads (anything mover 300k) are done incrementally (download a little. Stop partway though. resume later on). The downloading ability of firefox is very broken. Firefox will stop about 100 to 200k in and then claim it is done. Firefox does not realize that it has only gotten a partial file. The reason for this was never figured out: it may perhaps be discussed on another thread, at a later time. As a workaround, I use a batch-mode downloader called wget. It's a rather spiffy program, once one takes the time to learn it. Many an otherwise ungettable file have been fetched with wget. The Problem Recently, wget has been failing to download things that firefox can at least partially download. There have been pictures that firefox will load and display but that wget fails on. There have been smaller files that firefox will download but that wget fails to even start downloading. So here is a specific example. A real URL is: Code:
http://djdebo.com/podcastgen/?p=episode&name=2008-05-01_podcast1recording.mp3 From the right click menu in firefox, I obtain the link to the actual file. This is Code:
http://djdebo.com/podcastgen/download.php?filename=2008-05-01_podcast1recording.mp3 Code:
wget -c "http://djdebo.com/podcastgen/download.php?filename=2008-05-01_podcast1recording.mp3" The following happens: Code:
--11:02:02-- http://djdebo.com/podcastgen/download.php?filename=2008-05-01_podcast1recording.mp3 The facts in the above example, summarized:
So, this leads me to theorize:
The questions
Thank you in advance. |
Well you seem laregly justified in being confused. it seems that it's just a very badly maintained website... the file *does* actually download, but it's initially junk... this is what wireshark shows:
Code:
GET /podcastgen/download.php?filename=2008-05-01_podcast1recording.mp3 HTTP/1.0 |
Thank you. Your analysis gave me the information necessary to then figure out what to do. My theory of hidden bits in the URL was wayyyyy off.
Anyway, I've gotten wget to work on the specific cited example URL. The solution was to add a "--ignore-length" parameter to the wget command line. The bogus content-length indicator was ignored and wget forged ahead, though without its usual display of the percent downloaded figure. :) Thanks again! :D |
well it was a little tin foil hat, but the behavior was certainly strange. I did originally think it was a missing referrer link (so the browser knows you are clicking on the link on the real page, not just pulling the file outside of the blog environment, which tbh could well be seen to be the "hidden" information, so don't be too hard on yourself!
|
dude i think u got this all wrong..
when u make request like http://stoptazmo.com/downloads/get_f...naruto_187.zip u r asking server to execute a php file sending it necessary parameters..the output of this file will lead u to actual location of the file like see here ------------------------------- swamy_virupaksha@virupaksha-laptop:~/tmp$ wget --spider "http://stoptazmo.com/downloads/get_file.php?file_category=naruto&mirror=1&file_name=naruto_187.zip" Spider mode enabled. Check if remote file exists. --2011-07-20 17:02:54-- http://stoptazmo.com/downloads/get_f...naruto_187.zip Resolving stoptazmo.com... 67.220.213.75 Connecting to stoptazmo.com|67.220.213.75|:80... connected. HTTP request sent, awaiting response... 302 Found Location: http://mirror2.stoptazmo.com/c38f959...naruto_187.zip [following] Spider mode enabled. Check if remote file exists. --2011-07-20 17:02:57-- http://mirror2.stoptazmo.com/c38f959...naruto_187.zip Resolving mirror2.stoptazmo.com... 72.20.4.246 Connecting to mirror2.stoptazmo.com|72.20.4.246|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 2028234 (1.9M) [application/zip] Remote file exists. ------------------------------------- the file actually exists at the location " http://mirror2.stoptazmo.com/c38f95935814576eaaf98fb1a765932d/4e26bce9//naruto/naruto_187.zip" so u need to wget this file not the php file .... if u want to follow the mirror links of the php file n download dem directly then use --mirror option with wget like -------------------------------- wget --mirror http://stoptazmo.com/downloads/get_f...naruto_187.zip ---------------------------------- the above command will execute get_file.php with proper arguments passed to it and it will give the actual url of the file we need to download(naruto_187.zip), --mirror option will follow that link and directly download that file ... |
dude i think u got this all wrong..
when u make request like http://stoptazmo.com/downloads/get_f...naruto_187.zip u r asking server to execute a php file sending it necessary parameters..the output of this file will lead u to actual location of the file like see here ------------------------------- swamy_virupaksha@virupaksha-laptop:~/tmp$ wget --spider "http://stoptazmo.com/downloads/get_file.php?file_category=naruto&mirror=1&file_name=naruto_187.zip" Spider mode enabled. Check if remote file exists. --2011-07-20 17:02:54-- http://stoptazmo.com/downloads/get_f...naruto_187.zip Resolving stoptazmo.com... 67.220.213.75 Connecting to stoptazmo.com|67.220.213.75|:80... connected. HTTP request sent, awaiting response... 302 Found Location: http://mirror2.stoptazmo.com/c38f959...naruto_187.zip [following] Spider mode enabled. Check if remote file exists. --2011-07-20 17:02:57-- http://mirror2.stoptazmo.com/c38f959...naruto_187.zip Resolving mirror2.stoptazmo.com... 72.20.4.246 Connecting to mirror2.stoptazmo.com|72.20.4.246|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 2028234 (1.9M) [application/zip] Remote file exists. ------------------------------------- the file actually exists at the location " http://mirror2.stoptazmo.com/c38f95935814576eaaf98fb1a765932d/4e26bce9//naruto/naruto_187.zip" so u need to wget this file not the php file .... if u want to follow the mirror links of the php file n download dem directly then use --mirror option with wget like -------------------------------- wget --mirror http://stoptazmo.com/downloads/get_f...naruto_187.zip ---------------------------------- the above command will execute get_file.php with proper arguments passed to it and it will give the actual url of the file we need to download(naruto_187.zip), --mirror option will follow that link and directly download that file ... |
Please don't resurrect old threads, this thread is more than three years old. Also please spell out your words, u is not you and r is not are.
|
who signs up just to post to a dead thread?
|
All times are GMT -5. The time now is 07:49 AM. |