LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 06-12-2019, 04:38 AM   #1
guscoan
LQ Newbie
 
Registered: Jun 2019
Posts: 2

Rep: Reputation: Disabled
WGET to download images


I have two CSV files. One contains a url to an image and the other the name i want the image downloaded and saved as. They are two files as it is 3.6 million rows long and I cant combine without blowing up my computer.

Is there a way to use a wget command to download link and rename the file?


Both files can be viewed from the below links
https://www.dropbox.com/s/fis8srwdtm87y7i/url.csv?dl=0
https://www.dropbox.com/s/9k2f7zf11q...Names.csv?dl=0

any help appreciated.

Jodie
 
Old 06-12-2019, 05:53 AM   #2
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 11,662
Blog Entries: 9

Rep: Reputation: 3100Reputation: 3100Reputation: 3100Reputation: 3100Reputation: 3100Reputation: 3100Reputation: 3100Reputation: 3100Reputation: 3100Reputation: 3100Reputation: 3100
have you tried changing 'dl=0' to 'dl=1'?
 
1 members found this post helpful.
Old 06-12-2019, 07:21 AM   #3
teckk
Senior Member
 
Registered: Oct 2004
Distribution: FreeBSD Arch
Posts: 2,134

Rep: Reputation: 409Reputation: 409Reputation: 409Reputation: 409Reputation: 409
Don't know if these are time sensitive or not.

Code:
wget --spider "https://uc3af9f647b4bcb93f2c716f7d0f.dl.dropboxusercontent.com/cd/0/get/Aiqw78JPywxbsMVjYPY8kiJm-HTa_p5XddLQ6eZi4VhF2XoNQQqNzyvoj3KIQUyQ6Fe8ThO5Q-1H35arCeXumhsXtZB4FW5aW1PeMq638xExdA/file?_download_id=1407438504576448139996869513670874382657307170413355605886783133243&_notify_domain=www.dropbox.com&dl=1"
Spider mode enabled. Check if remote file exists.
--2019-06-12 07:16:56--  https://uc3af9f647b4bcb93f2c716f7d0f.dl.dropboxusercontent.com/cd/0/get/Aiqw78JPywxbsMVjYPY8kiJm-HTa_p5XddLQ6eZi4VhF2XoNQQqNzyvoj3KIQUyQ6Fe8ThO5Q-1H35arCeXumhsXtZB4FW5aW1PeMq638xExdA/file?_download_id=1407438504576448139996869513670874382657307170413355605886783133243&_notify_domain=www.dropbox.com&dl=1
Loaded CA certificate '/etc/ssl/certs/ca-certificates.crt'
Resolving uc3af9f647b4bcb93f2c716f7d0f.dl.dropboxusercontent.com (uc3af9f647b4bcb93f2c716f7d0f.dl.dropboxusercontent.com)... 162.125.3.6, 2620:100:6018:6::a27d:306
Connecting to uc3af9f647b4bcb93f2c716f7d0f.dl.dropboxusercontent.com (uc3af9f647b4bcb93f2c716f7d0f.dl.dropboxusercontent.com)|162.125.3.6|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 413788670 (395M) [application/binary]

Code:
wget --spider "https://uccaa835e0962e78259d66cfbd81.dl.dropboxusercontent.com/cd/0/get/AipgxLQRcOaW9UGzsgtyyS20eSKfa36iegeVLzCJ-_8mAAp-1lB3WLh-UI8CQ8RvTDMQEJdh7NxJFqwFA2tI1lDk3-C30fnsXBfXl7r4FzhwFw/file?_download_id=63981500829626411254479516742057535279546930342219265882653689927&_notify_domain=www.dropbox.com&dl=1"
Spider mode enabled. Check if remote file exists.
--2019-06-12 07:19:05--  https://uccaa835e0962e78259d66cfbd81.dl.dropboxusercontent.com/cd/0/get/AipgxLQRcOaW9UGzsgtyyS20eSKfa36iegeVLzCJ-_8mAAp-1lB3WLh-UI8CQ8RvTDMQEJdh7NxJFqwFA2tI1lDk3-C30fnsXBfXl7r4FzhwFw/file?_download_id=63981500829626411254479516742057535279546930342219265882653689927&_notify_domain=www.dropbox.com&dl=1
Loaded CA certificate '/etc/ssl/certs/ca-certificates.crt'
Resolving uccaa835e0962e78259d66cfbd81.dl.dropboxusercontent.com (uccaa835e0962e78259d66cfbd81.dl.dropboxusercontent.com)... 162.125.3.6, 2620:100:6018:6::a27d:306
Connecting to uccaa835e0962e78259d66cfbd81.dl.dropboxusercontent.com (uccaa835e0962e78259d66cfbd81.dl.dropboxusercontent.com)|162.125.3.6|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 146592156 (140M) [application/binary]
 
Old 06-12-2019, 08:50 PM   #4
guscoan
LQ Newbie
 
Registered: Jun 2019
Posts: 2

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by teckk View Post
Don't know if these are time sensitive or not.

Code:
wget --spider "https://uc3af9f647b4bcb93f2c716f7d0f.dl.dropboxusercontent.com/cd/0/get/Aiqw78JPywxbsMVjYPY8kiJm-HTa_p5XddLQ6eZi4VhF2XoNQQqNzyvoj3KIQUyQ6Fe8ThO5Q-1H35arCeXumhsXtZB4FW5aW1PeMq638xExdA/file?_download_id=1407438504576448139996869513670874382657307170413355605886783133243&_notify_domain=www.dropbox.com&dl=1"
Spider mode enabled. Check if remote file exists.
--2019-06-12 07:16:56--  https://uc3af9f647b4bcb93f2c716f7d0f.dl.dropboxusercontent.com/cd/0/get/Aiqw78JPywxbsMVjYPY8kiJm-HTa_p5XddLQ6eZi4VhF2XoNQQqNzyvoj3KIQUyQ6Fe8ThO5Q-1H35arCeXumhsXtZB4FW5aW1PeMq638xExdA/file?_download_id=1407438504576448139996869513670874382657307170413355605886783133243&_notify_domain=www.dropbox.com&dl=1
Loaded CA certificate '/etc/ssl/certs/ca-certificates.crt'
Resolving uc3af9f647b4bcb93f2c716f7d0f.dl.dropboxusercontent.com (uc3af9f647b4bcb93f2c716f7d0f.dl.dropboxusercontent.com)... 162.125.3.6, 2620:100:6018:6::a27d:306
Connecting to uc3af9f647b4bcb93f2c716f7d0f.dl.dropboxusercontent.com (uc3af9f647b4bcb93f2c716f7d0f.dl.dropboxusercontent.com)|162.125.3.6|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 413788670 (395M) [application/binary]

Code:
wget --spider "https://uccaa835e0962e78259d66cfbd81.dl.dropboxusercontent.com/cd/0/get/AipgxLQRcOaW9UGzsgtyyS20eSKfa36iegeVLzCJ-_8mAAp-1lB3WLh-UI8CQ8RvTDMQEJdh7NxJFqwFA2tI1lDk3-C30fnsXBfXl7r4FzhwFw/file?_download_id=63981500829626411254479516742057535279546930342219265882653689927&_notify_domain=www.dropbox.com&dl=1"
Spider mode enabled. Check if remote file exists.
--2019-06-12 07:19:05--  https://uccaa835e0962e78259d66cfbd81.dl.dropboxusercontent.com/cd/0/get/AipgxLQRcOaW9UGzsgtyyS20eSKfa36iegeVLzCJ-_8mAAp-1lB3WLh-UI8CQ8RvTDMQEJdh7NxJFqwFA2tI1lDk3-C30fnsXBfXl7r4FzhwFw/file?_download_id=63981500829626411254479516742057535279546930342219265882653689927&_notify_domain=www.dropbox.com&dl=1
Loaded CA certificate '/etc/ssl/certs/ca-certificates.crt'
Resolving uccaa835e0962e78259d66cfbd81.dl.dropboxusercontent.com (uccaa835e0962e78259d66cfbd81.dl.dropboxusercontent.com)... 162.125.3.6, 2620:100:6018:6::a27d:306
Connecting to uccaa835e0962e78259d66cfbd81.dl.dropboxusercontent.com (uccaa835e0962e78259d66cfbd81.dl.dropboxusercontent.com)|162.125.3.6|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 146592156 (140M) [application/binary]
Thanks for this, I can put the files in the root folder so it doesnt have to check the link (note not sensitive data. I cant see how this code renames the file as the corresponding row in the names CSV? Perhaps i should combine the CSVs first in linux....
 
Old 06-13-2019, 01:07 AM   #5
Ktasy
LQ Newbie
 
Registered: Jun 2019
Posts: 10

Rep: Reputation: 0
Quote:
Originally Posted by guscoan View Post
Perhaps i should combine the CSVs first in linux....
You can do that,maybe something different will happen.
 
Old 06-13-2019, 05:40 AM   #6
tshikose
Member
 
Registered: Apr 2010
Location: Kinshasa, Democratic Republic of Congo
Distribution: RHEL, Fedora, CentOS
Posts: 462

Rep: Reputation: 86
Hi,

For combining the two files line by line you can use paste.
My guess is that it won't blew up your computer, as paste should work with chunk of the two inputs written as chunk of output, without filling up the whole memory.
 
Old 06-14-2019, 07:04 AM   #7
Shadow_7
Senior Member
 
Registered: Feb 2003
Distribution: debian
Posts: 3,858
Blog Entries: 1

Rep: Reputation: 819Reputation: 819Reputation: 819Reputation: 819Reputation: 819Reputation: 819Reputation: 819
I used to use wget to download a list of URLs in a file.

$ wget -c -i file_of_urls.txt

Back when I was on dialup and the local rest areas had free wifi with much better connection speeds. Or the local library parking lot. Although back then the laptops battery would only last an hour, so I had to get as much as I could as fast as I could before returning to slumming it status (dialup at home).

You could script it to mv/rename the file after download, but you'd probably need 3 parameters per line in the file. And a variant with only the 1 parameter (URL) for wget. The 2nd and 3rd being what actually downloaded (without URL and $PATH) and it's new name.

$ cat FILE.txt | while read LINE; do OLDNAME=$(echo $LINE | awk '{ print $2; }'); NEWNAME=$(echo $LINE | awk '{ print $3; }'; echo $OLDNAME" --- "$NEWNAME; mv $OLDNAME $NEWNAME; done

or something like that. Adding an awk for $1 for the URL if you wanted to run wget once per line to avoid a file that wget couldn't use directly.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Wget command download FTP files but some images are 0 kb runeveryday Linux - Newbie 0 03-04-2014 04:44 PM
[SOLVED] How do I use wget to download only images from a single web page? errigour Linux - Newbie 1 11-29-2011 06:57 PM
[SOLVED] Using wget to download images and updating to capture new ones from link Using Debian Linux - Newbie 2 02-19-2011 08:58 PM
Wget - download images from any location andreic Linux - Software 3 04-18-2006 04:22 AM
wget will not download full webpage with images hedpe Linux - Software 2 02-15-2006 11:46 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 05:30 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration