My Name is Scott and I am starting this thread so that P.L.U.G. will be represented here at LQ. The 'official' website is here, http://plug.phoenix.az.us/index.php
It is a great group of people who are all friendly and open to new people. Recently two gentlemen from Google gave a presentation at our 'east side' meeting and I wrote an article for LXer.com about it.
Here is a copy of that article for you to read.
I am a member of the Phoenix Linux Users Group or PLUG, and at our last meeting Google gave a presentation on how Linux is used at Google. Vince and Pat explained what Linux is used for and many of the challenges they have faced in pushing the Linux envelope.
While Google has many engineers using Linux, only a few have much experience with Linux. When asked though, they said that there were "not many" Windows machines being used by the engineers.
Managing the Network
The set-up of their internal network is similar in concept to a college computer lab. while most college computer labs may have just one - or at most several - locations that are close to each other or on the same campus, Google has employees all over the world and they are all on the same network. Engineers that live on different continents work together on projects using the Google Enterprise Network.
All of the computers on the network receive an automatic installation and all of the resources are remotely mounted. Directories such as AFS, CiFS and NFS are managed by administrators. They get a 'custom' debian install with Red Hat Kickstart. Updates are automatic and every computer must 'call home' daily with a status report that goes into a central repository that tracks 'out-of-date' machines. This allows the admin's to quickly look up what machines need updating with exact information on what they need.
In order to manage the network update process, they have a 'Test' network that most of the Google employees do not have access too. Vince explained that with all of the ultra-modern equipment they use they make sure to always use the latest version of the Kernel. Even though bugs and such arise in the network, it creates much less work going forward by having tested updates before implementing them on the network.
It seems that NFSv3 is not very secure because it uses the sun rpc layer using auth_sys. The client uses lists of groups to aid in its query. With NFSv4, they were able to add security by creating a kernel 'oops' when the client's Kerberos tickets expire. They use the rpc.gssd deamon to close and re-open all of the kernel pipes, thus triggering the 'oops' from the kernel.
Refreshing Kerberos tickets took a little time to figure out. By modifiying the pam_krb5 to automatically refresh allowed the applications to know that the home directory could go away, reminding them to back-up. Even with that done though, Kerberos tickets are hard coded to the /tmp directory and are vulnerable to physical attack.
Users logging on and newly created users have to wait longer because the program has to read the entire database. Package installation is slow but most of their software is installed remotely. When Vince brought that one up we all looked at him and said “join the club” and laughed.
It seems that they came up with what they described as an 'ugly hack' for POSIX. It makes glibc use the local cache (ncsd) but it is buggy and cannot help with the initial hit. They have two options when giving access to the local devices to the user, 1. Red Hat pan_console which gives the user access but does not support more than one user on a machine. 2.Debian groups, which adds the users to many groups and gets them to the NFS quickly. Neither of them is as secure as Google would like.
As you can see because of the unique shape and purpose of Google's network they have pushed and pulled on parts of Linux in ways that were never dreamed of. Hopefully they will figure it out, and tell us how they did it along the way.
Vince was kind enough to inform me of some technical errors of mine.
Here are his corrections:
"Our debian installs use the debian preseed, RH installs use kickstart, the kernel "oops" is a actually bug that we spotted and reported to the Linux NFS maintainer, who was able to fix it with our help. rpc.gssd was patched to _not_close/re-open all the pipes, which would trigger the oops. pam_krb5 was modified to refresh tickets on screensaver unlock, not to notify when the tickets would expire. The problem of reading the entire passwd database affects new user creating via "adduser", and bash ~home dir expansion. The nscd "hack" for glibc pre-dates Google, and I believe it's modeled after other unixes. The debian group access model pollutes the NFSv3 16 group limit."
Thanks Vince :-)
If anyone from the P.L.U.G. finds this thread, Please participate! This site has given me more information than any other single source. I stuck with Linux because of this site and this thread is my way of trying to 'give back'. If this thread helps just one person who had yet to find someone else to talk to about Linux, then I am a happy man.