The background
This question is about a situation involving wget running under linux.
I run slackware 12 (not sure if this matters, wget is wget, right?).
My web browser is firefox with cookies and javacrap disabled. I do this to prevent a great deal of ads.
I have dialup. Because I have dialup, larger downloads (anything mover 300k) are done incrementally (download a little. Stop partway though. resume later on).
The downloading ability of firefox is very broken. Firefox will stop about 100 to 200k in and then claim it is done. Firefox does not realize that it has only gotten a partial file. The reason for this was never figured out: it may perhaps be discussed on another thread, at a later time.
As a workaround, I use a batch-mode downloader called wget. It's a rather spiffy program, once one takes the time to learn it. Many an otherwise ungettable file have been fetched with wget.
The Problem
Recently, wget has been failing to download things that firefox can at least partially download. There have been pictures that firefox will load and display but that wget fails on. There have been smaller files that firefox will download but that wget fails to even start downloading.
So here is a specific example. A real URL is:
Code:
http://djdebo.com/podcastgen/?p=episode&name=2008-05-01_podcast1recording.mp3
If you pop that in your browser, you'll go to a page that has a link to a podcast, an mp3 file of a DJ mix that said DJ has made available for public download. Clicking the "download" link on that pages causes firefox to begin to download. Downloading through firefox is useless to me, but the fact of it beginning to download confirms that there is a correct link to an actual file there to be downloaded.
From the right click menu in firefox, I obtain the link to the actual file. This is
Code:
http://djdebo.com/podcastgen/download.php?filename=2008-05-01_podcast1recording.mp3
So, I paste the link to the command line and construct the following wget command:
Code:
wget -c "http://djdebo.com/podcastgen/download.php?filename=2008-05-01_podcast1recording.mp3"
Note that the quotes around the URL are needed because the URL contains a question mark.
The following happens:
Code:
--11:02:02-- http://djdebo.com/podcastgen/download.php?filename=2008-05-01_podcast1recording.mp3
=> `download.php?filename=2008-05-01_podcast1recording.mp3'
Resolving djdebo.com... 66.226.64.35
Connecting to djdebo.com|66.226.64.35|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 0 [audio/mpeg]
[ <=> ] 0 --.--K/s
11:02:04 (0.00 B/s) - `download.php?filename=2008-05-01_podcast1recording.mp3' saved [0/0]
And I am then returned to the prompt. Total run time, under five seconds.
The facts in the above example, summarized:
- there exists a valid link to an actual file
- firefox can at least begin to download it
- wget fails to download it
So, this leads me to theorize:
- firefox sends an extra hidden bit as part of the URL.
- firefox has a way of ascertaining what the special hidden bit is.
- This way of ascertaining does not involve java, javascript or cookies.
- This hidden bit is not in the visible URL and is not in the html source but somehow firefox can reconstruct it.
- when getting this link from firefox (copy link location), firefox omits the extra hidden part.
- wget does not ascertain what the special hidden bit is.
The questions
- How do I force wget to ascertain and build the extra bits in the URL and then send the URL such that wget actually fetches the file.
- Failing that, is there a way to ascertain the full hidden URL and then type it out as part of the URL for the wget command line?
Thank you in advance.