LinuxQuestions.org
LinuxAnswers - the LQ Linux tutorial section.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices

Reply
 
Search this Thread
Old 05-11-2008, 11:55 AM   #1
moob8
Member
 
Registered: Sep 2006
Distribution: slackware
Posts: 132

Rep: Reputation: 15
wget failing to download some URLs


The background

This question is about a situation involving wget running under linux.

I run slackware 12 (not sure if this matters, wget is wget, right?).

My web browser is firefox with cookies and javacrap disabled. I do this to prevent a great deal of ads.

I have dialup. Because I have dialup, larger downloads (anything mover 300k) are done incrementally (download a little. Stop partway though. resume later on).

The downloading ability of firefox is very broken. Firefox will stop about 100 to 200k in and then claim it is done. Firefox does not realize that it has only gotten a partial file. The reason for this was never figured out: it may perhaps be discussed on another thread, at a later time.

As a workaround, I use a batch-mode downloader called wget. It's a rather spiffy program, once one takes the time to learn it. Many an otherwise ungettable file have been fetched with wget.

The Problem

Recently, wget has been failing to download things that firefox can at least partially download. There have been pictures that firefox will load and display but that wget fails on. There have been smaller files that firefox will download but that wget fails to even start downloading.

So here is a specific example. A real URL is:
Code:
http://djdebo.com/podcastgen/?p=episode&name=2008-05-01_podcast1recording.mp3
If you pop that in your browser, you'll go to a page that has a link to a podcast, an mp3 file of a DJ mix that said DJ has made available for public download. Clicking the "download" link on that pages causes firefox to begin to download. Downloading through firefox is useless to me, but the fact of it beginning to download confirms that there is a correct link to an actual file there to be downloaded.

From the right click menu in firefox, I obtain the link to the actual file. This is
Code:
http://djdebo.com/podcastgen/download.php?filename=2008-05-01_podcast1recording.mp3
So, I paste the link to the command line and construct the following wget command:
Code:
wget -c "http://djdebo.com/podcastgen/download.php?filename=2008-05-01_podcast1recording.mp3"
Note that the quotes around the URL are needed because the URL contains a question mark.

The following happens:
Code:
--11:02:02--  http://djdebo.com/podcastgen/download.php?filename=2008-05-01_podcast1recording.mp3
           => `download.php?filename=2008-05-01_podcast1recording.mp3'
Resolving djdebo.com... 66.226.64.35
Connecting to djdebo.com|66.226.64.35|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 0 [audio/mpeg]

    [ <=>                                                     ] 0             --.--K/s             

11:02:04 (0.00 B/s) - `download.php?filename=2008-05-01_podcast1recording.mp3' saved [0/0]
And I am then returned to the prompt. Total run time, under five seconds.

The facts in the above example, summarized:
  • there exists a valid link to an actual file
  • firefox can at least begin to download it
  • wget fails to download it

So, this leads me to theorize:
  • firefox sends an extra hidden bit as part of the URL.
  • firefox has a way of ascertaining what the special hidden bit is.
  • This way of ascertaining does not involve java, javascript or cookies.
  • This hidden bit is not in the visible URL and is not in the html source but somehow firefox can reconstruct it.
  • when getting this link from firefox (copy link location), firefox omits the extra hidden part.
  • wget does not ascertain what the special hidden bit is.

The questions
  • How do I force wget to ascertain and build the extra bits in the URL and then send the URL such that wget actually fetches the file.
  • Failing that, is there a way to ascertain the full hidden URL and then type it out as part of the URL for the wget command line?

Thank you in advance.
 
Old 05-11-2008, 02:20 PM   #2
acid_kewpie
Moderator
 
Registered: Jun 2001
Location: UK
Distribution: Gentoo, RHEL, Fedora, Centos
Posts: 43,414

Rep: Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967
Well you seem laregly justified in being confused. it seems that it's just a very badly maintained website... the file *does* actually download, but it's initially junk... this is what wireshark shows:

Code:
GET /podcastgen/download.php?filename=2008-05-01_podcast1recording.mp3 HTTP/1.0

User-Agent: Wget/1.10.2

Accept: */*

Host: djdebo.com

Connection: Keep-Alive



HTTP/1.1 200 OK

Date: Sun, 11 May 2008 17:47:40 GMT

Server: Apache/1.3.39 (Unix)

Cache-Control: must-revalidate, post-check=0, pre-check=0, private

Content-Disposition: attachment; filename=2008-05-01_podcast1recording.mp3;

Content-Transfer-Encoding: binary

Expires: 0

Pragma: public

X-Powered-By: PHP/4.4.7

Content-Length: media/

Keep-Alive: timeout=15, max=256

Connection: Keep-Alive

Content-Type: audio/mpeg



<br />
<b>Warning</b>:  filesize() [<a href='function.filesize'>function.filesize</a>]:
Stat failed for 2008-05-01_podcast1recording.mp3 (errno=2 - No such file or
directory) in <b>/home/u2/scotto811/html/podcastgen/download.php</b> on line
<b>55</b><br />
ID3......DTT2....Podcast1Recording.COM....engiTunPGAP.0..TEN....iTunes
v7.6.1.COM..h.engiTunNORM. 0000029B 0000028D 000037C6 00003A7D 000E5C6D 0034BA0F
0000805D 00007F02 0037231A 00139663.COM....engiTunSMPB. 00000000 00000210
000006FF 000000000B245DF1 00000000 06104055 00000000 00000000 00000000 00000000
00000000
So the php function that's downloading this is shafted. the actual mp3 itself will contain that http error message, and it's down to your player as to whether it ignores the junk or not. mplayer certainly played it fine. TBH i'm not sure *why* wget aborts, but it's not suprising considering it's being given junk. It *might* be comparing the recieved mime type to the magic file, in which case that might be where it see's the unexpected data and aborts. Or on second thoughts, i'd reckon on the jibberish content-length header of "media/"

Last edited by acid_kewpie; 05-11-2008 at 02:25 PM.
 
Old 05-11-2008, 10:13 PM   #3
moob8
Member
 
Registered: Sep 2006
Distribution: slackware
Posts: 132

Original Poster
Rep: Reputation: 15
Thank you. Your analysis gave me the information necessary to then figure out what to do. My theory of hidden bits in the URL was wayyyyy off.

Anyway, I've gotten wget to work on the specific cited example URL. The solution was to add a "--ignore-length" parameter to the wget command line. The bogus content-length indicator was ignored and wget forged ahead, though without its usual display of the percent downloaded figure.

Thanks again!

Last edited by moob8; 05-11-2008 at 10:16 PM. Reason: spelling
 
Old 05-12-2008, 04:28 AM   #4
acid_kewpie
Moderator
 
Registered: Jun 2001
Location: UK
Distribution: Gentoo, RHEL, Fedora, Centos
Posts: 43,414

Rep: Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967
well it was a little tin foil hat, but the behavior was certainly strange. I did originally think it was a missing referrer link (so the browser knows you are clicking on the link on the real page, not just pulling the file outside of the blog environment, which tbh could well be seen to be the "hidden" information, so don't be too hard on yourself!
 
Old 07-20-2011, 07:42 AM   #5
virupaksha
LQ Newbie
 
Registered: Jul 2011
Posts: 3

Rep: Reputation: Disabled
Smile

dude i think u got this all wrong..

when u make request like http://stoptazmo.com/downloads/get_f...naruto_187.zip

u r asking server to execute a php file sending it necessary parameters..the output of this file will lead u to actual location of the file like see here
-------------------------------

swamy_virupaksha@virupaksha-laptop:~/tmp$ wget --spider "http://stoptazmo.com/downloads/get_file.php?file_category=naruto&mirror=1&file_name=naruto_187.zip"
Spider mode enabled. Check if remote file exists.
--2011-07-20 17:02:54-- http://stoptazmo.com/downloads/get_f...naruto_187.zip
Resolving stoptazmo.com... 67.220.213.75
Connecting to stoptazmo.com|67.220.213.75|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://mirror2.stoptazmo.com/c38f959...naruto_187.zip [following]
Spider mode enabled. Check if remote file exists.
--2011-07-20 17:02:57-- http://mirror2.stoptazmo.com/c38f959...naruto_187.zip
Resolving mirror2.stoptazmo.com... 72.20.4.246
Connecting to mirror2.stoptazmo.com|72.20.4.246|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2028234 (1.9M) [application/zip]
Remote file exists.

-------------------------------------

the file actually exists at the location " http://mirror2.stoptazmo.com/c38f95935814576eaaf98fb1a765932d/4e26bce9//naruto/naruto_187.zip" so u need to wget this file not the php file ....

if u want to follow the mirror links of the php file n download dem directly then use --mirror option with wget like

--------------------------------
wget --mirror http://stoptazmo.com/downloads/get_f...naruto_187.zip
----------------------------------

the above command will execute get_file.php with proper arguments passed to it and it will give the actual url of the file we need to download(naruto_187.zip), --mirror option will follow that link and directly download that file ...
 
Old 07-20-2011, 07:44 AM   #6
virupaksha
LQ Newbie
 
Registered: Jul 2011
Posts: 3

Rep: Reputation: Disabled
dude i think u got this all wrong..

when u make request like http://stoptazmo.com/downloads/get_f...naruto_187.zip

u r asking server to execute a php file sending it necessary parameters..the output of this file will lead u to actual location of the file like see here
-------------------------------

swamy_virupaksha@virupaksha-laptop:~/tmp$ wget --spider "http://stoptazmo.com/downloads/get_file.php?file_category=naruto&mirror=1&file_name=naruto_187.zip"
Spider mode enabled. Check if remote file exists.
--2011-07-20 17:02:54-- http://stoptazmo.com/downloads/get_f...naruto_187.zip
Resolving stoptazmo.com... 67.220.213.75
Connecting to stoptazmo.com|67.220.213.75|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://mirror2.stoptazmo.com/c38f959...naruto_187.zip [following]
Spider mode enabled. Check if remote file exists.
--2011-07-20 17:02:57-- http://mirror2.stoptazmo.com/c38f959...naruto_187.zip
Resolving mirror2.stoptazmo.com... 72.20.4.246
Connecting to mirror2.stoptazmo.com|72.20.4.246|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2028234 (1.9M) [application/zip]
Remote file exists.

-------------------------------------

the file actually exists at the location " http://mirror2.stoptazmo.com/c38f95935814576eaaf98fb1a765932d/4e26bce9//naruto/naruto_187.zip" so u need to wget this file not the php file ....

if u want to follow the mirror links of the php file n download dem directly then use --mirror option with wget like

--------------------------------
wget --mirror http://stoptazmo.com/downloads/get_f...naruto_187.zip
----------------------------------

the above command will execute get_file.php with proper arguments passed to it and it will give the actual url of the file we need to download(naruto_187.zip), --mirror option will follow that link and directly download that file ...
 
Old 07-20-2011, 07:46 AM   #7
TobiSGD
Moderator
 
Registered: Dec 2009
Location: Hanover, Germany
Distribution: Main: Gentoo Others: What fits the task
Posts: 15,623
Blog Entries: 2

Rep: Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078Reputation: 4078
Please don't resurrect old threads, this thread is more than three years old. Also please spell out your words, u is not you and r is not are.
 
Old 07-20-2011, 07:50 AM   #8
acid_kewpie
Moderator
 
Registered: Jun 2001
Location: UK
Distribution: Gentoo, RHEL, Fedora, Centos
Posts: 43,414

Rep: Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967Reputation: 1967
who signs up just to post to a dead thread?
 
  


Reply

Tags
download, failure, firefox, linux, url, wget


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
wget and slackware download ivanatora Slackware 4 01-12-2008 06:12 AM
site download with wget ygloo Linux - General 7 09-27-2006 06:49 AM
wget wont download ?? paul_mat Linux - Software 1 11-01-2005 01:45 AM
XMMS download URLs down. Any url to download it? astitva Linux - Newbie 3 12-28-2003 06:16 AM
what to do with 5 parts of wget download Bruce Hill Linux - Software 2 09-11-2003 11:47 AM


All times are GMT -5. The time now is 12:12 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration