LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   General (https://www.linuxquestions.org/questions/general-10/)
-   -   How to mass-download from a site? (https://www.linuxquestions.org/questions/general-10/how-to-mass-download-from-a-site-583303/)

brian0918 09-09-2007 12:27 AM

How to mass-download from a site?
 
I have access to a pay-site hosting thousands of public domain images. Since the pay is only for the access (after all, the images are PD), I should be able to download and freely distribute them to all.

Now it's just a simple matter of putting that into practice. To download an image normally, it opens a page up, which fetches the image through javascript and crafty document.write disguises. In order to save the actual file, you have to right-click the image and go to Save As (in Windows, of course).

So, as a first attempt, I determined how the page URLs were generated (all end with a number that increases by 1 for each consecutive page) - the image urls each have unique hashes, so I couldn't touch them. Then I used a program to generate URLs for me (called urlgen). Then I used a regex editor to put html tags around the url's. Then I tried using Firefox's DownThemAll extension to download each page, hoping it would also download the page's content. It didn't. It only downloaded the html of the page.

I know this would be a helluva lot easier to do in Linux, but in Windows, are there any suggestions for accomplishing this? I'm thinking next I'll try one of those keyboard/mouse button combination recorders, but was hoping someone had a simpler (Windows-based) solution.

Thanks!

makuyl 09-09-2007 03:45 AM

Some paysites actually prefer to make a profit. Having people do mass downloads of whole sites consumes a lot of bandwidth, and that costs money. Guess why it isn't recommended...
As for you freely distributing the pictures,... hmm. Copyright and distribution rights come to mind.

me-macias 09-09-2007 06:29 AM

Quote:

Originally Posted by brian0918 (Post 2886188)
So, as a first attempt, I determined how the page URLs were generated (all end with a number that increases by 1 for each consecutive page) - the image urls each have unique hashes, so I couldn't touch them. Then I used a program to generate URLs for me (called urlgen). Then I used a regex editor to put html tags around the url's.

Did I understand you correctly you havel all URL's you needed? Then make the list and feed wget with it.

have a nice day, bye

brian0918 09-09-2007 10:49 AM

Quote:

Originally Posted by makuyl (Post 2886312)
Some paysites actually prefer to make a profit. Having people do mass downloads of whole sites consumes a lot of bandwidth, and that costs money. Guess why it isn't recommended...
As for you freely distributing the pictures,... hmm. Copyright and distribution rights come to mind.

You do realize that these images are not copyrightable, right? They're from the 1800s. Please read up on the concept of "public domain". The pay-site is simply forcing you to pay for initial access to the content. Once you have access to the images, you can do whatever you want with them.

brian0918 09-09-2007 10:51 AM

Hmm. I think I figured it out.

Jorophose 09-10-2007 07:39 PM

Even though I kinda condone (Or whatever you use to politely say "Don't do that, bra") that, I sometimes find myself in the same position.

Have you tried wget?

SlowCoder 09-10-2007 08:33 PM

Quote:

Originally Posted by Jorophose (Post 2888084)
Even though I kinda condone (Or whatever you use to politely say "Don't do that, bra") that, I sometimes find myself in the same position.

Have you tried wget?

Hrmmm ... to condone is to agree with. To condemn is to say "naughty naughty". Besides, what are you doing scolding women's undergarments? What did they ever do to you? Wierd ... :)

alred 09-10-2007 10:52 PM

i would like to add dillo to wget ...


.


All times are GMT -5. The time now is 02:25 AM.