LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 10-27-2015, 02:10 PM   #1
AdultFoundry
Member
 
Registered: Jun 2015
Posts: 282

Rep: Reputation: Disabled
Downloading and compressing images from a list of urls


I am somehow familiar with Linux, and I was thinking that I could set up a low cost VPS account at ovh.com, and use it for what is needed here.

I will have a list of urls, lets say like 100 or more, and each of these urls with have 15 thumbnails linked to regular size image files. I would like to enter this list of urls to something like a text file, and then download all large images to my vps account (one image set per folder). Folders could be named like 001, 002, 003, this could be set to be done automatically I guss. After that I would need to compress the images too. I would need to look at one of them, set the correct values and then apply to all the images in all of the folders that I would have. So if there would be 100 urls in the text file, I would have 100 folders, and 1500 images overall. I would set the compression rate based on one of the images, and then apply it to the rest. After that I would need to download all these images to Windows through ftp, or if I could transfer it to another server (shared hosting), this would be even better for this (I have ssh access, it is limited I think, can be something like "chroot jail").

Like I said, I am fairly new to Linux, but I should be able to figure it out.

Is something like this possible? How would it work (the more details the better, at this point)?

Thanks.
 
Old 10-27-2015, 02:16 PM   #2
suicidaleggroll
LQ Guru
 
Registered: Nov 2010
Location: Colorado
Distribution: OpenSUSE, CentOS
Posts: 5,573

Rep: Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142
Sure it's possible, the only hiccup is going to be determining the URLs of the large images from the webpage of thumbnails. If you have lynx installed you should be able to come up with a grep/awk pattern to pull out the necessary urls from the output of "lynx --dump". Once you can get the URLs of the individual images, you can loop through them with wget to download. ImageMagick has a tool called "convert" which can be used to manipulate images (cropping, resizing, compressing, etc.) from the command line and is very easy to script.

An example page would be helpful. If you can't share it, maybe just the output of "lynx --dump" for one of them with the necessary domains/names changed.

Last edited by suicidaleggroll; 10-27-2015 at 02:18 PM.
 
Old 10-27-2015, 02:19 PM   #3
schneidz
LQ Guru
 
Registered: May 2005
Location: boston, usa
Distribution: fedora-35
Posts: 5,313

Rep: Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918Reputation: 918
if you have ssh access to the servers that would be the best protocol to use.

short of that haxing something with wget or curl would probably work.
 
Old 10-27-2015, 02:27 PM   #4
AdultFoundry
Member
 
Registered: Jun 2015
Posts: 282

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by suicidaleggroll View Post
An example page would be helpful. If you can't share it, maybe just the output of "lynx --dump" for one of them with the necessary domains/names changed.
!!!NSFW!!! http://html.blazingmovies.com/11/64/..._c1848_01.html !!!NSFW!!!

This is an example, the url structure can be different, depending on the site. But general concept, just like this.

Edit: This one actually goes to a different subdomain, and the url to the right changes too. Normally it should not be as complicated as this, but as this shows, it could be, and it wont be the same on all the sponsor sites. Most of the time, it should be

domain.com/00001/index.html

and then

domain.com/00001/001.jpg
domain.com/00002/002.jpg

... and so on, as an example.

Last edited by AdultFoundry; 10-27-2015 at 03:02 PM.
 
Old 10-27-2015, 02:31 PM   #5
AdultFoundry
Member
 
Registered: Jun 2015
Posts: 282

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by schneidz View Post
if you have ssh access to the servers that would be the best protocol to use.

short of that haxing something with wget or curl would probably work.
I read somewhere:

wget -i text-file-with-urls.txt

... thats why I am asking here. I am still looking for a Windows solution - there seems to be some as people say, but nothing works 100 percent (I would do the download, and compression part separately). The best thing that I found is UrlOpener.com and DownThemAll for Firefox. Download one page by one manually, and then compress. This takes too much time. I want to work on a website, and it would be good if this would go fairly fast. Thats why I was thinking get a good and cheap vps at ovh.com and set up everything there, like cron possibly, things like this. I've also seen some Perl programs for this, but this would be a little to much, I think. Like I said, I am a beginner here.
 
Old 10-27-2015, 02:56 PM   #6
suicidaleggroll
LQ Guru
 
Registered: Nov 2010
Location: Colorado
Distribution: OpenSUSE, CentOS
Posts: 5,573

Rep: Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142Reputation: 2142
You should really label that link NSFW...

With the one you provided, the lynx output is very straight forward. Something like the following would grab all of the urls out of it:
Code:
urls=$(lynx --dump http://html.blazingmovies.com/11/64/pics/55711/nude/372_c1848_01.html | awk '/Visible/,/Hidden/' | head -n -2 | tail -n +2 | awk '{print $2}')
(There's probably a better awk-only way to do that, but I'm not an awk expert)

Then you can grab them with wget:
Code:
wget -np -nd $urls
That will download them to the pwd, if you want them to go into a subdirectory you could make one and cd there before running the wget.

You'd have to experiement with a few different pages to see if they all behave similarly.

Once you have them in a directory, you can loop over them and run convert, eg:
Code:
for i in *.jpg; do convert -quality 70 $i $i; done
 
1 members found this post helpful.
Old 10-28-2015, 01:44 PM   #7
John VV
LQ Muse
 
Registered: Aug 2005
Location: A2 area Mi.
Posts: 17,627

Rep: Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651Reputation: 2651
but BE WARNED!!!!!!

jpg is called a "lousy " format BECAUSE it tosses out image data into the TRASH CAN and in doing so ADDS compression artifacts to the image

this loss of data is UNRECOVERABLE!!!!

and if you recompress a "lousy" image with a second "lousy" compression

you will NOT LIKE THE RESULTS!!!!!

if you take a jpg compressed image and recompress it as a jpg
you removed a lot of the original information and replaced it with NOISE artifacts
a pic is worth ..........

------
http://imgbox.com/94fCbN9t

Last edited by John VV; 10-28-2015 at 01:47 PM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Compressing a two columned list krmarshall87 Linux - Newbie 2 07-23-2013 12:06 PM
Downloading a list of files from a remote server using a list ralcocer Programming 2 02-11-2009 01:02 PM
downloading DICOM images boakye Linux - Networking 0 02-02-2005 07:30 PM
lynx : read a list of URLs from file ? fnd Linux - Software 0 06-22-2004 04:42 PM
Mozilla and downloading images SirLancelotX Linux - General 4 07-22-2003 08:08 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 11:05 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration