LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   wget failing to download some URLs (https://www.linuxquestions.org/questions/linux-general-1/wget-failing-to-download-some-urls-641434/)

moob8 05-11-2008 10:55 AM

wget failing to download some URLs
 
The background

This question is about a situation involving wget running under linux.

I run slackware 12 (not sure if this matters, wget is wget, right?).

My web browser is firefox with cookies and javacrap disabled. I do this to prevent a great deal of ads.

I have dialup. Because I have dialup, larger downloads (anything mover 300k) are done incrementally (download a little. Stop partway though. resume later on).

The downloading ability of firefox is very broken. Firefox will stop about 100 to 200k in and then claim it is done. Firefox does not realize that it has only gotten a partial file. The reason for this was never figured out: it may perhaps be discussed on another thread, at a later time.

As a workaround, I use a batch-mode downloader called wget. It's a rather spiffy program, once one takes the time to learn it. Many an otherwise ungettable file have been fetched with wget.

The Problem

Recently, wget has been failing to download things that firefox can at least partially download. There have been pictures that firefox will load and display but that wget fails on. There have been smaller files that firefox will download but that wget fails to even start downloading.

So here is a specific example. A real URL is:
Code:

http://djdebo.com/podcastgen/?p=episode&name=2008-05-01_podcast1recording.mp3
If you pop that in your browser, you'll go to a page that has a link to a podcast, an mp3 file of a DJ mix that said DJ has made available for public download. Clicking the "download" link on that pages causes firefox to begin to download. Downloading through firefox is useless to me, but the fact of it beginning to download confirms that there is a correct link to an actual file there to be downloaded.

From the right click menu in firefox, I obtain the link to the actual file. This is
Code:

http://djdebo.com/podcastgen/download.php?filename=2008-05-01_podcast1recording.mp3
So, I paste the link to the command line and construct the following wget command:
Code:

wget -c "http://djdebo.com/podcastgen/download.php?filename=2008-05-01_podcast1recording.mp3"
Note that the quotes around the URL are needed because the URL contains a question mark.

The following happens:
Code:

--11:02:02--  http://djdebo.com/podcastgen/download.php?filename=2008-05-01_podcast1recording.mp3
          => `download.php?filename=2008-05-01_podcast1recording.mp3'
Resolving djdebo.com... 66.226.64.35
Connecting to djdebo.com|66.226.64.35|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 0 [audio/mpeg]

    [ <=>                                                    ] 0            --.--K/s           

11:02:04 (0.00 B/s) - `download.php?filename=2008-05-01_podcast1recording.mp3' saved [0/0]

And I am then returned to the prompt. Total run time, under five seconds.

The facts in the above example, summarized:
  • there exists a valid link to an actual file
  • firefox can at least begin to download it
  • wget fails to download it

So, this leads me to theorize:
  • firefox sends an extra hidden bit as part of the URL.
  • firefox has a way of ascertaining what the special hidden bit is.
  • This way of ascertaining does not involve java, javascript or cookies.
  • This hidden bit is not in the visible URL and is not in the html source but somehow firefox can reconstruct it.
  • when getting this link from firefox (copy link location), firefox omits the extra hidden part.
  • wget does not ascertain what the special hidden bit is.

The questions
  • How do I force wget to ascertain and build the extra bits in the URL and then send the URL such that wget actually fetches the file.
  • Failing that, is there a way to ascertain the full hidden URL and then type it out as part of the URL for the wget command line?

Thank you in advance.

acid_kewpie 05-11-2008 01:20 PM

Well you seem laregly justified in being confused. it seems that it's just a very badly maintained website... the file *does* actually download, but it's initially junk... this is what wireshark shows:

Code:

GET /podcastgen/download.php?filename=2008-05-01_podcast1recording.mp3 HTTP/1.0

User-Agent: Wget/1.10.2

Accept: */*

Host: djdebo.com

Connection: Keep-Alive



HTTP/1.1 200 OK

Date: Sun, 11 May 2008 17:47:40 GMT

Server: Apache/1.3.39 (Unix)

Cache-Control: must-revalidate, post-check=0, pre-check=0, private

Content-Disposition: attachment; filename=2008-05-01_podcast1recording.mp3;

Content-Transfer-Encoding: binary

Expires: 0

Pragma: public

X-Powered-By: PHP/4.4.7

Content-Length: media/

Keep-Alive: timeout=15, max=256

Connection: Keep-Alive

Content-Type: audio/mpeg



<br />
<b>Warning</b>:  filesize() [<a href='function.filesize'>function.filesize</a>]:
Stat failed for 2008-05-01_podcast1recording.mp3 (errno=2 - No such file or
directory) in <b>/home/u2/scotto811/html/podcastgen/download.php</b> on line
<b>55</b><br />
ID3......DTT2....Podcast1Recording.COM....engiTunPGAP.0..TEN....iTunes
v7.6.1.COM..h.engiTunNORM. 0000029B 0000028D 000037C6 00003A7D 000E5C6D 0034BA0F
0000805D 00007F02 0037231A 00139663.COM....engiTunSMPB. 00000000 00000210
000006FF 000000000B245DF1 00000000 06104055 00000000 00000000 00000000 00000000
00000000

So the php function that's downloading this is shafted. the actual mp3 itself will contain that http error message, and it's down to your player as to whether it ignores the junk or not. mplayer certainly played it fine. TBH i'm not sure *why* wget aborts, but it's not suprising considering it's being given junk. It *might* be comparing the recieved mime type to the magic file, in which case that might be where it see's the unexpected data and aborts. Or on second thoughts, i'd reckon on the jibberish content-length header of "media/"

moob8 05-11-2008 09:13 PM

Thank you. Your analysis gave me the information necessary to then figure out what to do. My theory of hidden bits in the URL was wayyyyy off.

Anyway, I've gotten wget to work on the specific cited example URL. The solution was to add a "--ignore-length" parameter to the wget command line. The bogus content-length indicator was ignored and wget forged ahead, though without its usual display of the percent downloaded figure. :)

Thanks again! :D

acid_kewpie 05-12-2008 03:28 AM

well it was a little tin foil hat, but the behavior was certainly strange. I did originally think it was a missing referrer link (so the browser knows you are clicking on the link on the real page, not just pulling the file outside of the blog environment, which tbh could well be seen to be the "hidden" information, so don't be too hard on yourself!

virupaksha 07-20-2011 06:42 AM

dude i think u got this all wrong..

when u make request like http://stoptazmo.com/downloads/get_f...naruto_187.zip

u r asking server to execute a php file sending it necessary parameters..the output of this file will lead u to actual location of the file like see here
-------------------------------

swamy_virupaksha@virupaksha-laptop:~/tmp$ wget --spider "http://stoptazmo.com/downloads/get_file.php?file_category=naruto&mirror=1&file_name=naruto_187.zip"
Spider mode enabled. Check if remote file exists.
--2011-07-20 17:02:54-- http://stoptazmo.com/downloads/get_f...naruto_187.zip
Resolving stoptazmo.com... 67.220.213.75
Connecting to stoptazmo.com|67.220.213.75|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://mirror2.stoptazmo.com/c38f959...naruto_187.zip [following]
Spider mode enabled. Check if remote file exists.
--2011-07-20 17:02:57-- http://mirror2.stoptazmo.com/c38f959...naruto_187.zip
Resolving mirror2.stoptazmo.com... 72.20.4.246
Connecting to mirror2.stoptazmo.com|72.20.4.246|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2028234 (1.9M) [application/zip]
Remote file exists.

-------------------------------------

the file actually exists at the location " http://mirror2.stoptazmo.com/c38f95935814576eaaf98fb1a765932d/4e26bce9//naruto/naruto_187.zip" so u need to wget this file not the php file ....

if u want to follow the mirror links of the php file n download dem directly then use --mirror option with wget like

--------------------------------
wget --mirror http://stoptazmo.com/downloads/get_f...naruto_187.zip
----------------------------------

the above command will execute get_file.php with proper arguments passed to it and it will give the actual url of the file we need to download(naruto_187.zip), --mirror option will follow that link and directly download that file ...

virupaksha 07-20-2011 06:44 AM

dude i think u got this all wrong..

when u make request like http://stoptazmo.com/downloads/get_f...naruto_187.zip

u r asking server to execute a php file sending it necessary parameters..the output of this file will lead u to actual location of the file like see here
-------------------------------

swamy_virupaksha@virupaksha-laptop:~/tmp$ wget --spider "http://stoptazmo.com/downloads/get_file.php?file_category=naruto&mirror=1&file_name=naruto_187.zip"
Spider mode enabled. Check if remote file exists.
--2011-07-20 17:02:54-- http://stoptazmo.com/downloads/get_f...naruto_187.zip
Resolving stoptazmo.com... 67.220.213.75
Connecting to stoptazmo.com|67.220.213.75|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://mirror2.stoptazmo.com/c38f959...naruto_187.zip [following]
Spider mode enabled. Check if remote file exists.
--2011-07-20 17:02:57-- http://mirror2.stoptazmo.com/c38f959...naruto_187.zip
Resolving mirror2.stoptazmo.com... 72.20.4.246
Connecting to mirror2.stoptazmo.com|72.20.4.246|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2028234 (1.9M) [application/zip]
Remote file exists.

-------------------------------------

the file actually exists at the location " http://mirror2.stoptazmo.com/c38f95935814576eaaf98fb1a765932d/4e26bce9//naruto/naruto_187.zip" so u need to wget this file not the php file ....

if u want to follow the mirror links of the php file n download dem directly then use --mirror option with wget like

--------------------------------
wget --mirror http://stoptazmo.com/downloads/get_f...naruto_187.zip
----------------------------------

the above command will execute get_file.php with proper arguments passed to it and it will give the actual url of the file we need to download(naruto_187.zip), --mirror option will follow that link and directly download that file ...

TobiSGD 07-20-2011 06:46 AM

Please don't resurrect old threads, this thread is more than three years old. Also please spell out your words, u is not you and r is not are.

acid_kewpie 07-20-2011 06:50 AM

who signs up just to post to a dead thread?


All times are GMT -5. The time now is 07:49 AM.