[SOLVED] curl Question

desmond33 · 09-07-2010, 11:44 PM

Hi. I purchased a book that comes with access to an online archive of images, and I want to download all of them. The website is only set up to download the images one at a time, though, so I want to use a program to automatically download them.

The website is http://www.taschen.com/pages/en/comm..._1/index.1.htm

I tried the command:

Code:

curl -u username -O http://www.taschen.com/media_archives/type1/downloads/_Q6Q5783.jpg.zip

but it downloaded the following text file:

Code:

Found
The document has moved here.

Apache/2.2.8 (Ubuntu) mod_python/3.3.1 Python/2.5.2 PHP/5.2.4-2ubuntu5.10 with Suhosin-Patch mod_ssl/2.2.8 OpenSSL/0.9.8g mod_perl/2.0.3 Perl/v5.8.8 Server at www.taschen.com Port 80

Which redirects to http://www.taschen.com/type1

This is the first time I've downloaded something in this way, any help is greatly appreciated. I'm using curl, btw, because I'm using Mac OS, which does not come with wget.

14moose · 09-08-2010, 12:50 AM

Hi -

I tried to look, but Taschen is password-protected.

SUGGESTION:
It sounds like the "downloads" URL you tried is simply no longer valid.

Manually copy one of the files. Once you know the exact URL for one of the files, then you might have a better chance using curl for the rest of them.

desmond33 · 09-08-2010, 01:04 AM

That's the thing—the url that I put in the curl command works fine when I type it in firefox; the download starts like any other zip file, though if I haven't logged in to the website already it redirects me to a login page. So I imagine the issue is getting curl to look like an authenticated user.

14moose · 09-08-2010, 09:44 AM

Hi, again -

Try this:

Quote:

http://ask.metafilter.com/18923/How-...okie-with-CURL

If you mean the username and password are entered in a form on a login page, then cURL can "submit" that form like:

curl -d "username=miniape&password=SeCrEt" http://whatever.com/login

and if you want to store the cookie that comes back you do so by specifying a cookie file:

curl -c cookies.txt -d "username=miniape&password=SeCrEt" http://whatever.com/login

and to use those cookie in later requests you do:

curl -b cookies.txt -d "username=miniape&password=SeCrEt" http://whatever.com/login

or do both if you want to both send and receive cookies:

curl -b cookies.txt -c cookies.txt -d "username=miniape&password=SeCrEt" http://whatever.com/login

Valery Reznic · 09-10-2010, 09:19 AM

Quote:

Originally Posted by desmond33

Hi. I purchased a book that comes with access to an online archive of images, and I want to download all of them. The website is only set up to download the images one at a time, though, so I want to use a program to automatically download them.

The website is http://www.taschen.com/pages/en/comm..._1/index.1.htm

I tried the command:

Code:

curl -u username -O http://www.taschen.com/media_archives/type1/downloads/_Q6Q5783.jpg.zip

but it downloaded the following text file:

Code:

Found
The document has moved here.

Apache/2.2.8 (Ubuntu) mod_python/3.3.1 Python/2.5.2 PHP/5.2.4-2ubuntu5.10 with Suhosin-Patch mod_ssl/2.2.8 OpenSSL/0.9.8g mod_perl/2.0.3 Perl/v5.8.8 Server at www.taschen.com Port 80

Which redirects to http://www.taschen.com/type1

This is the first time I've downloaded something in this way, any help is greatly appreciated. I'm using curl, btw, because I'm using Mac OS, which does not come with wget.

Looks lie you have to specify -L option. From curl manpage:

Code:

       -L/--location
              (HTTP/HTTPS) If the server reports that the requested  page  has
              moved to a different location (indicated with a Location: header
              and a 3XX response code), this option will make  curl  redo  the
              request  on the new place.

Valery Reznic · 09-10-2010, 02:32 PM

Quote:

Originally Posted by desmond33

Hi. I purchased a book that comes with access to an online archive of images, and I want to download all of them. The website is only set up to download the images one at a time, though, so I want to use a program to automatically download them.

The website is http://www.taschen.com/pages/en/comm..._1/index.1.htm

I tried the command:

Code:

curl -u username -O http://www.taschen.com/media_archives/type1/downloads/_Q6Q5783.jpg.zip

but it downloaded the following text file:

Code:

Found
The document has moved here.

Apache/2.2.8 (Ubuntu) mod_python/3.3.1 Python/2.5.2 PHP/5.2.4-2ubuntu5.10 with Suhosin-Patch mod_ssl/2.2.8 OpenSSL/0.9.8g mod_perl/2.0.3 Perl/v5.8.8 Server at www.taschen.com Port 80

Which redirects to http://www.taschen.com/type1

This is the first time I've downloaded something in this way, any help is greatly appreciated. I'm using curl, btw, because I'm using Mac OS, which does not come with wget.

Looks like you have to specify -L option. From curl manpage:

Code:

       -L/--location
              (HTTP/HTTPS) If the server reports that the requested  page  has
              moved to a different location (indicated with a Location: header
              and a 3XX response code), this option will make  curl  redo  the
              request  on the new place.

desmond33 · 09-12-2010, 06:12 PM

Thanks for the help. I've figured out how to download a file now by using a firefox add-on called "live headers" to look at the cookies used by firefox, and then using a curl command like

Code:

curl -b "name1=value1; name2=value2" http://example.com/file.zip -O

Now I'm just trying to figure out how to download multiple files in the same directory automatically.

desmond33 · 09-13-2010, 01:25 AM

It turns out that curl cannot download recursively, which is a shame. It can download a range of sequentially numbered files, though. The files that I wanted to download were not named sequentially, but the html files that link to them are, so I downloaded all of them with curl and put them into a text file with this command

Code:

curl -b "name1=value1; name2=value2" http://www.example.com/index[1-128].html > html_dump.txt

then I used grep and a text editor to make a file with just the filenames that I wanted, ("file_name.zip", for example, with each name on a seperate line) and used a bash script to download them with curl:

Code:

#!/bin/bash

for name in `cat file_list.txt`
do
  curl -b "name1=value1; name2=value2" http://www.example.com/$name -O
done

exit 0

Just in case anyone was curious.

Valery Reznic · 09-13-2010, 01:57 AM

Quote:

Originally Posted by desmond33

It turns out that curl cannot download recursively, which is a shame. It can download a range of sequentially numbered files, though. The files that I wanted to download were not named sequentially, but the hmtl files that link to them are, so I downloaded all of them with curl and put them into a text file with this command

Code:

curl -b "name1=value1; name2=value2" http://www.example.com/index[1-128].html > html_dump.txt

then I used grep and a text editor to make a file with just the filenames that I wanted, ("file_name.zip", for example, with each name on a seperate line) and used a bash script to download them with curl:

Code:

#!/bin/bash

for name in `cat file_list.txt`
do
  curl -b "name1=value1; name2=value2" http://www.example.com/$name -O
done

exit 0

Just in case anyone was curious.

May be wget able to do what you need ?

desmond33 · 09-13-2010, 02:26 AM

Yeah, wget would probably be easier, but I didn't feel like compiling/installing it for Mac OS.