LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 06-10-2011, 03:39 AM   #1
dazdaz
Member
 
Registered: Aug 2003
Location: Europe
Distribution: RHEL, CentOS, Ubuntu
Posts: 333

Rep: Reputation: 17
script to download a site from Google cache


I am looking for either a script or application which allows me to download all files from Google Cache, as i'd like to archive a site which is currently offline.

Downloading manually by clicking on the Cached webpage is very tedious when doing it manually.

The script must take into account, when you have search hits over multiple pages, so you'd manually click onto next.

thanks

Last edited by dazdaz; 06-10-2011 at 03:47 AM.
 
Old 06-10-2011, 04:05 AM   #2
Snark1994
Senior Member
 
Registered: Sep 2010
Distribution: Debian
Posts: 1,632
Blog Entries: 3

Rep: Reputation: 346Reputation: 346Reputation: 346Reputation: 346
Have you tried something like:

Code:
wget -m -k -w 20 http://www.foobar.com/
 
1 members found this post helpful.
Old 06-10-2011, 04:22 AM   #3
dazdaz
Member
 
Registered: Aug 2003
Location: Europe
Distribution: RHEL, CentOS, Ubuntu
Posts: 333

Original Poster
Rep: Reputation: 17
I only need to download the "Cached" content returned from Google cache and nothing else.

It should create local directories when necessary and store content there. I mention this because I found a Firefox plugin which claimed to do be able to scoop up content from Google cache, but did'nt create the local directory's nor did it give the correct local filename's.

wget won't handle the multiple page search results. I don't think there is an option with Google to return 1 very loooong page instead of multiple search results else that would be very handy for scripting :-)
100 results per page is the maximum setting.

I don't think that a wget 1-liner will do what I want (pls correct me if you know of a trick)

Last edited by dazdaz; 06-10-2011 at 07:05 AM.
 
Old 06-10-2011, 03:34 PM   #4
Snark1994
Senior Member
 
Registered: Sep 2010
Distribution: Debian
Posts: 1,632
Blog Entries: 3

Rep: Reputation: 346Reputation: 346Reputation: 346Reputation: 346
Sorry, I'm not quite sure I'm understanding your issue. Are you saying you want to, eg. go to http://www.google.co.uk/search?q=linux+questions (search for "linux questions"), and download the website you get when you click on the "Cached" link by the search result? (the link I got was http://webcache.googleusercontent.co...w.google.co.uk)

EDIT: Wait, I found this article. Are you talking about trying to do this in a script, without the plugins?
 
Old 05-22-2019, 03:41 AM   #5
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,126

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
Reported as spam
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
script for make schedule for automatically download file from ftp site chenboly Linux - General 6 10-05-2015 04:41 AM
Squid: one site too slow, cache problem also cooljai Linux - Server 4 08-25-2009 04:04 AM
FOSS script for site like Google Docs eje211 Linux - Server 2 01-18-2009 02:28 PM
Fast Download Site for Linux, enables >4GB Download of single file TigerLinux Linux - Distributions 9 10-29-2005 12:45 PM
Squid Cache Site Exceptions?? win32sux Linux - Software 5 12-13-2004 08:28 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 04:07 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration