LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Security
User Name
Password
Linux - Security This forum is for all security related questions.
Questions, tips, system compromises, firewalls, etc. are all included here.

Notices

Reply
 
Search this Thread
Old 03-18-2013, 12:57 PM   #1
Linux_Kidd
Member
 
Registered: Jan 2006
Location: USA
Posts: 539

Rep: Reputation: 51
Searching Cache


so, this isnt exactly a Linux specific Q, but i am looking for some info.

anyone know if its possible to search the cache of the bigger engines like Gool, Bingg, Yahooo. i can see how the access to the cache can be sold as a service to say the Feds, but can the public get access?

i am working an issue where some cached pages might have data which would be a security issue for my customer.
 
Old 03-19-2013, 09:15 AM   #2
salasi
Senior Member
 
Registered: Jul 2007
Location: Directly above centre of the earth, UK
Distribution: SuSE, plus some hopping
Posts: 3,916

Rep: Reputation: 777Reputation: 777Reputation: 777Reputation: 777Reputation: 777Reputation: 777Reputation: 777
I'm not sure what you think is in, eg, Google's cache, but it may not work in the way that you think that it does.

In any case, to the extent that Google caches things that you are interested in, Google has access to that information. Now if the question could be 'Can some outsider break in to Google and get access to stuff that Google didn't intend them to?' then you'd have to say that while Google would tell you about all of their measures to make this impossible, if you found this a very serious outcome, you'd have to say that there can be no guarantee that it can never happen.

For most people there are bigger risks than this, but, if you were very sensitive about this particular issue, then you have a problem.

The one case that I can think of off hand where this kind of thing happened, it wasn't a search engine.
 
Old 03-19-2013, 12:38 PM   #3
Linux_Kidd
Member
 
Registered: Jan 2006
Location: USA
Posts: 539

Original Poster
Rep: Reputation: 51
ok, i know how gool cache works. i can query the cache for a specific page to see what that cached paged looks like, and this is open to the public. i want to search the public cache (query it), etc. its easier to query then it is for me to build a list of URL's and then pull thise in via php and then serach using regex, etc.

customer may have leaked some data, of which they changed their html, but engine cache's may still have a copy of pages that contain this data, etc.

does this give clarity?
 
Old 03-19-2013, 01:06 PM   #4
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian
Posts: 2,527

Rep: Reputation: 872Reputation: 872Reputation: 872Reputation: 872Reputation: 872Reputation: 872Reputation: 872
Doesn't a normal search query the cache? I mean the whole point of caching is to make searches go faster, right?
 
Old 03-19-2013, 07:07 PM   #5
unSpawn
Moderator
 
Registered: May 2001
Posts: 27,561
Blog Entries: 54

Rep: Reputation: 2927Reputation: 2927Reputation: 2927Reputation: 2927Reputation: 2927Reputation: 2927Reputation: 2927Reputation: 2927Reputation: 2927Reputation: 2927Reputation: 2927
As others said a regular search does "query the cache" and AFAIK Google doesn't provide an API for bulk cache querying.
 
Old 03-19-2013, 11:52 PM   #6
Linux_Kidd
Member
 
Registered: Jan 2006
Location: USA
Posts: 539

Original Poster
Rep: Reputation: 51
nope. i didnt think i would have to explain this. google cache is a 2nd older copy of web pages, etc.

as far as i can tell, google cache is not something you can query using google search operators, hence my original question.

let me explain how cache works.
  1. goto google in FF
  2. in the search field type site:www.cccg.net/missions.html and click Search
  3. put mouse over the result (not web tools)
  4. now you see a double arrow icon to the right, put you mouse over that
  5. now look to the right, you see a "cached" link, click that
  6. or you can simply use cache:www.cccg.net/missions.html in direct google search

so now you know what the cache is. i am looking for a way to use the engine operators to find specific data that is in cached pages. we the public have access to that latest cached page, can you imagine how many copies gool has, do you see how this may be useful to say the feds or local law enforcement, you change your public Facebook stuff thinking its gone, yet gool has every change you made, etc etc. i just need to query for specific data pattern in the cache that is available to the public, etc. i am thinking i need to build a uri list, use PHP to pull those from gool cache, and then grep the page content for my pattern, etc.

Last edited by Linux_Kidd; 03-22-2013 at 04:42 PM.
 
Old 03-20-2013, 12:36 AM   #7
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,311

Rep: Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040
If I was going to try that, I'd use Perl with WWW::Mechanize and friends http://search.cpan.org/search?query=...anize&mode=all to do it.
Basically you'd be looking for the code that activates that '>>' button.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
can't find firefox-cache afther try to put cache-dir into /dev/shm wubai Linux - Software 7 03-02-2013 04:52 AM
searching backports with apt-cache Guest1234 Debian 3 10-25-2008 08:31 AM
openSUSE 10: Samba failed to create the cache directory in /var/cache mianmajidali Linux - Server 0 05-09-2008 03:00 AM
Searching for a free web cache server distro, or make one with squid. kamaradski Linux - Distributions 2 12-07-2006 02:04 PM


All times are GMT -5. The time now is 01:28 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration