Downloading and compressing images from a list of urls
Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Downloading and compressing images from a list of urls
I am somehow familiar with Linux, and I was thinking that I could set up a low cost VPS account at ovh.com, and use it for what is needed here.
I will have a list of urls, lets say like 100 or more, and each of these urls with have 15 thumbnails linked to regular size image files. I would like to enter this list of urls to something like a text file, and then download all large images to my vps account (one image set per folder). Folders could be named like 001, 002, 003, this could be set to be done automatically I guss. After that I would need to compress the images too. I would need to look at one of them, set the correct values and then apply to all the images in all of the folders that I would have. So if there would be 100 urls in the text file, I would have 100 folders, and 1500 images overall. I would set the compression rate based on one of the images, and then apply it to the rest. After that I would need to download all these images to Windows through ftp, or if I could transfer it to another server (shared hosting), this would be even better for this (I have ssh access, it is limited I think, can be something like "chroot jail").
Like I said, I am fairly new to Linux, but I should be able to figure it out.
Is something like this possible? How would it work (the more details the better, at this point)?
Sure it's possible, the only hiccup is going to be determining the URLs of the large images from the webpage of thumbnails. If you have lynx installed you should be able to come up with a grep/awk pattern to pull out the necessary urls from the output of "lynx --dump". Once you can get the URLs of the individual images, you can loop through them with wget to download. ImageMagick has a tool called "convert" which can be used to manipulate images (cropping, resizing, compressing, etc.) from the command line and is very easy to script.
An example page would be helpful. If you can't share it, maybe just the output of "lynx --dump" for one of them with the necessary domains/names changed.
Last edited by suicidaleggroll; 10-27-2015 at 02:18 PM.
An example page would be helpful. If you can't share it, maybe just the output of "lynx --dump" for one of them with the necessary domains/names changed.
This is an example, the url structure can be different, depending on the site. But general concept, just like this.
Edit: This one actually goes to a different subdomain, and the url to the right changes too. Normally it should not be as complicated as this, but as this shows, it could be, and it wont be the same on all the sponsor sites. Most of the time, it should be
domain.com/00001/index.html
and then
domain.com/00001/001.jpg
domain.com/00002/002.jpg
... and so on, as an example.
Last edited by AdultFoundry; 10-27-2015 at 03:02 PM.
if you have ssh access to the servers that would be the best protocol to use.
short of that haxing something with wget or curl would probably work.
I read somewhere:
wget -i text-file-with-urls.txt
... thats why I am asking here. I am still looking for a Windows solution - there seems to be some as people say, but nothing works 100 percent (I would do the download, and compression part separately). The best thing that I found is UrlOpener.com and DownThemAll for Firefox. Download one page by one manually, and then compress. This takes too much time. I want to work on a website, and it would be good if this would go fairly fast. Thats why I was thinking get a good and cheap vps at ovh.com and set up everything there, like cron possibly, things like this. I've also seen some Perl programs for this, but this would be a little to much, I think. Like I said, I am a beginner here.
jpg is called a "lousy " format BECAUSE it tosses out image data into the TRASH CAN and in doing so ADDS compression artifacts to the image
this loss of data is UNRECOVERABLE!!!!
and if you recompress a "lousy" image with a second "lousy" compression
you will NOT LIKE THE RESULTS!!!!!
if you take a jpg compressed image and recompress it as a jpg
you removed a lot of the original information and replaced it with NOISE artifacts
a pic is worth ..........
------ http://imgbox.com/94fCbN9t
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.