LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 04-06-2013, 03:10 AM   #1
Si14
LQ Newbie
 
Registered: Mar 2013
Posts: 14

Rep: Reputation: Disabled
Question wget; download certain link


I want to download a pdf file. The page URL is:
Code:
www.amazon.com/product1/pdf
In the source file of the above page, there are several links (1 PDF+JPG files+js files...).

1- I want to download only the pdf file. The pdf file's link is as below (extracted from the above HTML source code):

Code:
http://book.amazon.com/still/10.1202/shelf.201205216/seed/1811_ftp.pdf?v=1&t=hf6hlzrm&fb62f105
It seems that "ftp.pdf" can be used to filter the above PDF link for wget.

2- I want to save the output file to 1811_ftp.pdf
That means whatever is after "seed/" and before "?v=" strings.

Thank you for your help.
 
Old 04-06-2013, 03:43 AM   #2
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
In theory something along the lines of a
Code:
curl -s "http://book.amazon.com/still/10.1202/shelf.201205216/seed/1811_ftp.pdf?v=1&t=hf6hlzrm&fb62f105" > "~/1811_ftp.pdf"
should work except that 0) the host name AFAIK is "books" and not "book" and 1) unless you somehow make the D/L command part of an existing session or supply the right credentials to log in first (if applicable) it may redirect to another page denying you the D/L.
 
Old 04-07-2013, 09:15 PM   #3
Si14
LQ Newbie
 
Registered: Mar 2013
Posts: 14

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by unSpawn View Post
In theory something along the lines of a
Code:
curl -s "http://book.amazon.com/still/10.1202/shelf.201205216/seed/1811_ftp.pdf?v=1&t=hf6hlzrm&fb62f105" > "~/1811_ftp.pdf"
should work except that 0) the host name AFAIK is "books" and not "book" and 1) unless you somehow make the D/L command part of an existing session or supply the right credentials to log in first (if applicable) it may redirect to another page denying you the D/L.
I can not use the code you mentioned.
1- I need to use the following link in the curl command:
Code:
www.amazon.com/product1/pdf
and
2- I need to tell curl to only download the file which has "ftp.pdf" string.
and
3- then save the output file into whatever is after "seed/" and before "?v=" strings.

I have the following links saved in list.txt, which need to go through the above processes. I need to tell curl to perform the above actions for each of them.

Code:
list.txt:
www.amazon.com/product1/pdf
www.amazon.com/product2/pdf
www.amazon.com/product3/pdf
www.amazon.com/product4/pdf
...
Thank you for your help.

Last edited by Si14; 04-07-2013 at 09:17 PM.
 
Old 04-08-2013, 04:48 PM   #4
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
*shrug* When I try your URI I get We're sorry. The Web address you entered is not a functioning page on our site...
 
Old 04-09-2013, 04:22 PM   #5
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Assuming the url is stored in a variable, all it takes is a simple parameter substitution or similar string manipulation.

Code:
url='http://book.amazon.com/still/10.1202/shelf.201205216/seed/1811_ftp.pdf?v=1&t=hf6hlzrm&fb62f105'

fname=${url##*/}
fname=${fname%%[?]*}

wget -O "$fname" "$url"
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
BLFS: wget --spider reports broken link even though the link is working? McZ Linux From Scratch 2 03-07-2013 12:40 AM
wget indirect link downloading lamachine Linux - Server 2 04-02-2012 08:39 PM
Submitting a link with Wget and Proxy? acctman Linux - General 1 08-10-2011 01:03 PM
[SOLVED] Using wget to download images and updating to capture new ones from link Using Debian Linux - Newbie 2 02-19-2011 08:58 PM
Download with wget.....or!!!!! mpregos Slackware 7 06-23-2009 02:51 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 11:36 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration