LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   problem with wget (https://www.linuxquestions.org/questions/linux-newbie-8/problem-with-wget-647873/)

goncalopp 06-08-2008 06:16 PM

problem with wget
 
Hello everyone!
I'm somewhat new to linux, so please bear with me :)

I'm using Ubuntu 8.04, and I'm currently trying enlightenment 17, which has the most wonderful slideshow desktop gadget; it basically shows on the desktop the pictures you put on a given directory. After growing tired of my local images, I thought it'd be great if I could just make it fetch images from the web... So, after doing my homework on wget, grep, and awk, I'm currently trying to write a shell script that downloads some new images to the gadget directory everyday. The problem is, my wget is just not working as I would expect it to (prolly my fault here but... :P)

For example:
If I do a "wget -r -l2 -A jpg [h t t p : / /]google[dot]com", it doesn't download anything (I can't post links yet... it should download *at least* the logo jpg, right?)
A simple "wget -r -l2 [h t t p : / /]google[dot]com" just gives me the index.html file

At first I thought there was something wrong with my wget, but I've built it from source, and nothing changed...

Am I missing something?
Thank you!

billymayday 06-08-2008 06:35 PM

Well first, it would seem that the images are .png not .jpg, so that would be an issue (have a look at the source code for the page)

Second, I'm pretty sure you can set web servers to stop downloading of images referenced from html code, and since the google code seems to pull images from a separate directory, there's a pretty good chance they've done this.

In any case, the images aren't in the root of www.google.com, so I guess it makes sense that it can't find them there

Try your code on a site you know has jpgs in it first to check it's correctness

goncalopp 06-08-2008 07:03 PM

You're right, google is indeed a bad example...
I'm currently trying flick.com.
It has at least one jpg:
Code:

<img src="http://l.yimg.com/g/images/home_photo_kk.jpg" alt=""
but, "wget -nd -r -l1 -A jpg http://flickr.com" only returns "robots.txt"... Which it shouldn't :confused: I have no idea what's going on...
I've also tried "-U Mozilla", with no luck

--edit--
ok, it seems I was missing "-p", to download all files referenced by the html page.
I'm now trying with deviantart.com, which as plenty of jpgs, and I'm using
wget -nd -r -l2 -p -U Mozilla http://deviantart.com
which gives me just index.html...?

goncalopp 06-08-2008 07:46 PM

Solved it!
It seems by default wget doesn't download files from other hosts...
A simple "--span-hosts" did the trick!
Thanks for your help :)


All times are GMT -5. The time now is 07:06 PM.