LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > General
User Name
Password
General This forum is for non-technical general discussion which can include both Linux and non-Linux topics. Have fun!

Notices

Reply
 
Search this Thread
Old 12-15-2004, 11:47 AM   #1
dlublink
Member
 
Registered: Oct 2004
Location: Canada
Distribution: Ubuntu
Posts: 329

Rep: Reputation: 30
How does google work?


I noticed that Google.com claims to search about 8 billion pages.

How is it that when I click Search it takes less than a second to search all 8 billion pages?

Even if an entire page could be searched in one clock cycle(which it can't be), the fastest servers I have seen are 3 gigagherts. At one page per cycle that google.com would need 8 gigahertz.

Obviously it takes a lot more than 1 cycle to search a page, a lot more than 1. So how does google.com (or any other search engine) come up with such incredible power?

Does it have a crazylinked database where everyword in the database has like a million links?

Curiosity has gotten the better of me.

Thanks,

David
 
Old 12-15-2004, 12:29 PM   #2
mermxx
Member
 
Registered: Apr 2004
Location: Wales
Distribution: rh9, winxp
Posts: 411

Rep: Reputation: 30
PIGEONS!!!! check out this link!!!!
http://www.google.com/technology/pigeonrank.html
 
Old 12-15-2004, 01:07 PM   #3
david_ross
Moderator
 
Registered: Mar 2003
Location: Scotland
Distribution: Slackware, RedHat, Debian
Posts: 12,047

Rep: Reputation: 64
Moved: This thread is more suitable in General and has been moved accordingly.
 
Old 12-15-2004, 02:40 PM   #4
ddu_
Member
 
Registered: Dec 2004
Distribution: Slackware
Posts: 114

Rep: Reputation: 16
Not sure if it's what you want, but I have a .pdf of the Google Cluster Architecture.

http://www.digitalcarnage.net/files/...chitecture.pdf
 
Old 12-16-2004, 12:29 AM   #5
dawizman
Member
 
Registered: Feb 2004
Distribution: Gentoo
Posts: 119

Rep: Reputation: 15
Google Uses the power of clusters. Basically, they have hundreds, possibly thouthands of top-end servers with multiple processors linked together, essentially combining their processors into one. So, for an example, if they were to only have say 200 computers clustered together, each with 2 2Ghz processors that would be 400 2ghz processors, or 800Ghz of processing power. In reality, google likely has far more servers with more, faster processors per server. Anyway, that is just a basic explanation of how google searches so fast. Hopefully another memmber will have another link to a better explanation.
 
Old 12-16-2004, 02:32 PM   #6
nex6
Member
 
Registered: Apr 2004
Distribution: Ubuntu;Debain;Redhat
Posts: 46

Rep: Reputation: 16
Google has one of the largest Linux farms anywhere. I think they have a custimized version of redhat . with that much processing power
they can do pretty much anything they want.


http://www.internetweek.com/lead/lead060100.htm


-Nex6

Last edited by nex6; 12-16-2004 at 02:34 PM.
 
Old 12-16-2004, 02:53 PM   #7
Joey.Dale
Member
 
Registered: Jun 2003
Location: Tampa, Fl
Distribution: Gentoo, Slackware
Posts: 828

Rep: Reputation: 30
Google's cluster

* 719 racks
* 63,272 machines
* 126,544 CPUs
* 253,088 GHz of processing power
* 126,544 GB of RAM
* 5,062 TB of hard drive space

November 2004: 8,058,044,651 web pages, 880,000,000 images, 845,000,000 messages, 4,500 news sources
 
Old 12-20-2004, 09:13 PM   #8
dlublink
Member
 
Registered: Oct 2004
Location: Canada
Distribution: Ubuntu
Posts: 329

Original Poster
Rep: Reputation: 30
Holy Crap

I am glad I am not paying that electricity bill....


Thanks guys! I knew that Google had some crazy clusters. But WOW! that is a lot of machines.

David
 
Old 12-21-2004, 12:13 PM   #9
slackist
Member
 
Registered: Feb 2004
Location: Phuket, Thailand
Distribution: Slackware 13, OS X
Posts: 434

Rep: Reputation: Disabled
Quote:
Originally posted by Joey.Dale
Google's cluster

* 719 racks
* 63,272 machines
* 126,544 CPUs
* 253,088 GHz of processing power
* 126,544 GB of RAM
* 5,062 TB of hard drive space

November 2004: 8,058,044,651 web pages, 880,000,000 images, 845,000,000 messages, 4,500 news sources
Wow,*very cool info*, where did you find it?

mark
 
Old 12-21-2004, 02:40 PM   #10
dawizman
Member
 
Registered: Feb 2004
Distribution: Gentoo
Posts: 119

Rep: Reputation: 15
Quote:
Originally posted by chefmark
Wow,*very cool info*, where did you find it?

mark
I've seen those figures around the net and AFAIK they are just estimates.
 
Old 12-21-2004, 02:47 PM   #11
leonscape
Senior Member
 
Registered: Aug 2003
Location: UK
Distribution: Debian SID / KDE 3.5
Posts: 2,313

Rep: Reputation: 47
I also presume that what those clusters are running is an searchable index of keywords in a generated database. Not actually searching each and every page.
 
Old 12-22-2004, 05:51 PM   #12
Joey.Dale
Member
 
Registered: Jun 2003
Location: Tampa, Fl
Distribution: Gentoo, Slackware
Posts: 828

Rep: Reputation: 30
http://en.wikipedia.org/wiki/Google

-Joey
 
Old 12-22-2004, 06:21 PM   #13
winword10
LQ Newbie
 
Registered: Sep 2003
Location: New Jersey, USA
Distribution: Gentoo
Posts: 7

Rep: Reputation: 0
I like tacos.
 
Old 12-23-2004, 01:45 AM   #14
dawizman
Member
 
Registered: Feb 2004
Distribution: Gentoo
Posts: 119

Rep: Reputation: 15
Quote:
Originally posted by Joey.Dale
http://en.wikipedia.org/wiki/Google

-Joey
Again, to clarify, if you read the info in that link, it says that those statistics are only an estimate or googles cluster size. Th only people who would know 100% would be Google employees.
 
Old 12-23-2004, 02:29 AM   #15
ror
Member
 
Registered: May 2004
Distribution: Ubuntu
Posts: 583

Rep: Reputation: 33
Re: How does google work?

Quote:
Originally posted by dlublink

Even if an entire page could be searched in one clock cycle(which it can't be), the fastest servers I have seen are 3 gigagherts. At one page per cycle that google.com would need 8 gigahertz.
afaik (and this is both rough and from memory)
Thankfully, that's not how it works. Pages don't get searched when you search, instead when pages are added, hashes of strings on the page are stored, and that's what is searched. Searching actual pages would be far too slow (impossibly slow), although it would mean wildcard searches would possible, something google (and other search engines) can't do.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Google Desktop search-Will slocate work? greauxe Linux - Software 9 11-03-2005 09:50 PM
i can not get ip of google.com. bruse Linux - Newbie 8 07-08-2005 09:12 AM
difference between www.google.com/linux and www.google.com dr_zayus69 General 4 01-12-2005 02:45 PM
What is the OS used by Google.com ? _UnPrEdictAbLe_ General 15 08-21-2004 03:19 AM
Firefox google entry doesn't work Cipher3D Linux - Software 1 07-21-2004 12:58 AM


All times are GMT -5. The time now is 04:01 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration