LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > General
User Name
Password
General This forum is for non-technical general discussion which can include both Linux and non-Linux topics. Have fun!

Notices



Reply
 
Search this Thread
Old 09-09-2007, 01:27 AM   #1
brian0918
Member
 
Registered: Apr 2003
Posts: 87

Rep: Reputation: 15
Cool How to mass-download from a site?


I have access to a pay-site hosting thousands of public domain images. Since the pay is only for the access (after all, the images are PD), I should be able to download and freely distribute them to all.

Now it's just a simple matter of putting that into practice. To download an image normally, it opens a page up, which fetches the image through javascript and crafty document.write disguises. In order to save the actual file, you have to right-click the image and go to Save As (in Windows, of course).

So, as a first attempt, I determined how the page URLs were generated (all end with a number that increases by 1 for each consecutive page) - the image urls each have unique hashes, so I couldn't touch them. Then I used a program to generate URLs for me (called urlgen). Then I used a regex editor to put html tags around the url's. Then I tried using Firefox's DownThemAll extension to download each page, hoping it would also download the page's content. It didn't. It only downloaded the html of the page.

I know this would be a helluva lot easier to do in Linux, but in Windows, are there any suggestions for accomplishing this? I'm thinking next I'll try one of those keyboard/mouse button combination recorders, but was hoping someone had a simpler (Windows-based) solution.

Thanks!

Last edited by brian0918; 09-09-2007 at 01:28 AM. Reason: typo
 
Old 09-09-2007, 04:45 AM   #2
makuyl
Senior Member
 
Registered: Dec 2004
Location: Helsinki
Distribution: Debian Sid
Posts: 1,107

Rep: Reputation: 53
Some paysites actually prefer to make a profit. Having people do mass downloads of whole sites consumes a lot of bandwidth, and that costs money. Guess why it isn't recommended...
As for you freely distributing the pictures,... hmm. Copyright and distribution rights come to mind.
 
Old 09-09-2007, 07:29 AM   #3
me-macias
Member
 
Registered: Feb 2006
Posts: 40

Rep: Reputation: 15
Quote:
Originally Posted by brian0918 View Post
So, as a first attempt, I determined how the page URLs were generated (all end with a number that increases by 1 for each consecutive page) - the image urls each have unique hashes, so I couldn't touch them. Then I used a program to generate URLs for me (called urlgen). Then I used a regex editor to put html tags around the url's.
Did I understand you correctly you havel all URL's you needed? Then make the list and feed wget with it.

have a nice day, bye
 
Old 09-09-2007, 11:49 AM   #4
brian0918
Member
 
Registered: Apr 2003
Posts: 87

Original Poster
Rep: Reputation: 15
Quote:
Originally Posted by makuyl View Post
Some paysites actually prefer to make a profit. Having people do mass downloads of whole sites consumes a lot of bandwidth, and that costs money. Guess why it isn't recommended...
As for you freely distributing the pictures,... hmm. Copyright and distribution rights come to mind.
You do realize that these images are not copyrightable, right? They're from the 1800s. Please read up on the concept of "public domain". The pay-site is simply forcing you to pay for initial access to the content. Once you have access to the images, you can do whatever you want with them.

Last edited by brian0918; 09-09-2007 at 12:16 PM.
 
Old 09-09-2007, 11:51 AM   #5
brian0918
Member
 
Registered: Apr 2003
Posts: 87

Original Poster
Rep: Reputation: 15
Hmm. I think I figured it out.

Last edited by brian0918; 09-09-2007 at 12:15 PM.
 
Old 09-10-2007, 08:39 PM   #6
Jorophose
Member
 
Registered: Oct 2006
Location: Ontario, Canada
Distribution: Xubuntu 6.06!! =D
Posts: 137

Rep: Reputation: 15
Even though I kinda condone (Or whatever you use to politely say "Don't do that, bra") that, I sometimes find myself in the same position.

Have you tried wget?
 
Old 09-10-2007, 09:33 PM   #7
SlowCoder
Member
 
Registered: Oct 2004
Location: Southeast, U.S.A.
Distribution: Fedora (Desktop), CentOS (Server), Knoppix (Diags)
Posts: 934

Rep: Reputation: 38
Quote:
Originally Posted by Jorophose View Post
Even though I kinda condone (Or whatever you use to politely say "Don't do that, bra") that, I sometimes find myself in the same position.

Have you tried wget?
Hrmmm ... to condone is to agree with. To condemn is to say "naughty naughty". Besides, what are you doing scolding women's undergarments? What did they ever do to you? Wierd ...
 
Old 09-10-2007, 11:52 PM   #8
alred
Member
 
Registered: Mar 2005
Location: singapore
Distribution: puppy and Ubuntu and ... erh ... redhat(sort of) :( ... + the venerable bsd and solaris ^_^
Posts: 658
Blog Entries: 8

Rep: Reputation: 31
i would like to add dillo to wget ...


.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
site download with wget ygloo Linux - General 7 09-27-2006 06:49 AM
Fast Download Site for Linux, enables >4GB Download of single file TigerLinux Linux - Distributions 9 10-29-2005 01:45 PM
How to download a whole site? Hosiah General 7 06-15-2005 03:03 AM
wikipedia site download. arunshivanandan General 2 02-07-2004 01:18 AM
Slackers - Download site! DavidPhillips Linux - General 1 12-19-2001 09:31 PM


All times are GMT -5. The time now is 10:56 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration