wget problem

spx2 · 08-15-2006, 11:40 AM

i want to sort out some mess im trying to figure out
i saw some site showing some mp3's
i took them from the .html source
i put them in one of my own here
http://spx.t35.com/h.html
now i want to do in console wget -r http://spx.t35.com/h.html to get them
the problem is that it doesn't seem to download them and i don't know why
if anyone knows what the problem might be please tell me

KenJackson · 08-15-2006, 01:42 PM

Are sure those are mp3 files and not just jpeg graphic files?
That is one truely weird website in any case.

danga1993 · 08-15-2006, 01:52 PM

As pointed out, yes, they are linking purely to jpg files. Why not just run a recursive wget on the original site with these links instead of making your own page?

spx2 · 08-16-2006, 04:41 AM

they are mp3's ,they are to be renamed after download.
anyway,i told above that i am doing wget -r spx.t35...blabla
and i am getting nowhere.
but if the links are to be downloaded one by one things work
well.
i am not going to sit and click my mouse allot to download
them all,i think this might be a solution not only for me
but also for others to automate such tasks.
if so,it would be pretty helpful and neat if we are to think
of a solution.
one would be to use vim to
take out the html tags and have only the links and then
put "wget " in front of every line.
but how do i do that ?

timmeke · 08-16-2006, 05:32 AM

wget should be able to follow the links in the html document (using -r) of course.
What error message is wget giving you (if any)?

If you are getting the html page and then editing it, you could also try using 'sed' or 'awk' to do it for you, rather than editing the file in an editor like vim.
Once you get the file with all the URLs, wget's -i option should do the trick, I suppose.

KenJackson · 08-16-2006, 05:35 AM

I got it to work. Apparently wget doesn't think h.html is valid html--maybe because it lacks a DOCTYPE line. But it works if you download h.html separately and then use wget like this:

$ wget -nd -A.jpg -ih.html --force-html

The optional "-nd" switch puts all the files in the current directory.
The optional "-A.jpg" causes only .jpg files to be accepted and doesn't keep others.
You could also try "-D web.tiscali.it" to only download from that domain.

spx2 · 08-17-2006, 02:56 AM

nice solution,can you explain how you did it kenjackson

KenJackson · 08-17-2006, 05:25 AM

Explain? I'm not sure if you are asking how to execute the command, explain why it works, or explain the process by which I came up with it.

Since I am developing a website as my new hobby, and learning all things web, it bothered me that h.html didn't have a DOCTYPE statement. I also read the man page for wget and noticed the "--force-html" switch (which I had seen before and wondered when you would ever use it). So I tried it.