Help answer threads with 0 replies.
Go Back > Forums > Linux Forums > Linux - Security
User Name
Linux - Security This forum is for all security related questions.
Questions, tips, system compromises, firewalls, etc. are all included here.


  Search this Thread
Old 03-18-2013, 11:57 AM   #1
Registered: Jan 2006
Location: USA
Posts: 540

Rep: Reputation: 52
Searching Cache

so, this isnt exactly a Linux specific Q, but i am looking for some info.

anyone know if its possible to search the cache of the bigger engines like Gool, Bingg, Yahooo. i can see how the access to the cache can be sold as a service to say the Feds, but can the public get access?

i am working an issue where some cached pages might have data which would be a security issue for my customer.
Old 03-19-2013, 08:15 AM   #2
Senior Member
Registered: Jul 2007
Location: Directly above centre of the earth, UK
Distribution: SuSE, plus some hopping
Posts: 3,985

Rep: Reputation: 819Reputation: 819Reputation: 819Reputation: 819Reputation: 819Reputation: 819Reputation: 819
I'm not sure what you think is in, eg, Google's cache, but it may not work in the way that you think that it does.

In any case, to the extent that Google caches things that you are interested in, Google has access to that information. Now if the question could be 'Can some outsider break in to Google and get access to stuff that Google didn't intend them to?' then you'd have to say that while Google would tell you about all of their measures to make this impossible, if you found this a very serious outcome, you'd have to say that there can be no guarantee that it can never happen.

For most people there are bigger risks than this, but, if you were very sensitive about this particular issue, then you have a problem.

The one case that I can think of off hand where this kind of thing happened, it wasn't a search engine.
Old 03-19-2013, 11:38 AM   #3
Registered: Jan 2006
Location: USA
Posts: 540

Original Poster
Rep: Reputation: 52
ok, i know how gool cache works. i can query the cache for a specific page to see what that cached paged looks like, and this is open to the public. i want to search the public cache (query it), etc. its easier to query then it is for me to build a list of URL's and then pull thise in via php and then serach using regex, etc.

customer may have leaked some data, of which they changed their html, but engine cache's may still have a copy of pages that contain this data, etc.

does this give clarity?
Old 03-19-2013, 12:06 PM   #4
Senior Member
Registered: Nov 2005
Distribution: Debian
Posts: 2,793

Rep: Reputation: 990Reputation: 990Reputation: 990Reputation: 990Reputation: 990Reputation: 990Reputation: 990Reputation: 990
Doesn't a normal search query the cache? I mean the whole point of caching is to make searches go faster, right?
Old 03-19-2013, 06:07 PM   #5
Registered: May 2001
Posts: 28,826
Blog Entries: 55

Rep: Reputation: 3341Reputation: 3341Reputation: 3341Reputation: 3341Reputation: 3341Reputation: 3341Reputation: 3341Reputation: 3341Reputation: 3341Reputation: 3341Reputation: 3341
As others said a regular search does "query the cache" and AFAIK Google doesn't provide an API for bulk cache querying.
Old 03-19-2013, 10:52 PM   #6
Registered: Jan 2006
Location: USA
Posts: 540

Original Poster
Rep: Reputation: 52
nope. i didnt think i would have to explain this. google cache is a 2nd older copy of web pages, etc.

as far as i can tell, google cache is not something you can query using google search operators, hence my original question.

let me explain how cache works.
  1. goto google in FF
  2. in the search field type and click Search
  3. put mouse over the result (not web tools)
  4. now you see a double arrow icon to the right, put you mouse over that
  5. now look to the right, you see a "cached" link, click that
  6. or you can simply use in direct google search

so now you know what the cache is. i am looking for a way to use the engine operators to find specific data that is in cached pages. we the public have access to that latest cached page, can you imagine how many copies gool has, do you see how this may be useful to say the feds or local law enforcement, you change your public Facebook stuff thinking its gone, yet gool has every change you made, etc etc. i just need to query for specific data pattern in the cache that is available to the public, etc. i am thinking i need to build a uri list, use PHP to pull those from gool cache, and then grep the page content for my pattern, etc.

Last edited by Linux_Kidd; 03-22-2013 at 03:42 PM.
Old 03-19-2013, 11:36 PM   #7
LQ Guru
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.6, Centos 5.10
Posts: 16,626

Rep: Reputation: 2149Reputation: 2149Reputation: 2149Reputation: 2149Reputation: 2149Reputation: 2149Reputation: 2149Reputation: 2149Reputation: 2149Reputation: 2149Reputation: 2149
If I was going to try that, I'd use Perl with WWW::Mechanize and friends to do it.
Basically you'd be looking for the code that activates that '>>' button.


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
can't find firefox-cache afther try to put cache-dir into /dev/shm wubai Linux - Software 7 03-02-2013 03:52 AM
searching backports with apt-cache Guest1234 Debian 3 10-25-2008 07:31 AM
openSUSE 10: Samba failed to create the cache directory in /var/cache mianmajidali Linux - Server 0 05-09-2008 02:00 AM
Searching for a free web cache server distro, or make one with squid. kamaradski Linux - Distributions 2 12-07-2006 01:04 PM

All times are GMT -5. The time now is 01:48 AM.

Main Menu
Write for LQ is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration