nfs-krb mounted home folder - Client hangs at login
I'm sysadmin at a school which is running Ubuntu 10.04 on all of our clients and our server. We're using Kerberos for authentication and NFS to mount our users home directories.
My problem is that sometimes the login just stops, showing only the default wallpaper. Checking with top I found that rpc.gssd is constantly using around 20% of the CPU and gdm-session-worker about 15% when this error occurs. If I restart the computer everything works fine for a while and then the problem comes back again.
While troubleshooting the issue I have gathered the following information:
Logging in from a virtual console also hangs. After a while the message "gdm-session-worker blocked for more than 120 seconds" appears.
Logging in as root (which works) and then typing login username hangs. su username also hangs.
su username -s /bin/sh on the other hand does not hang. It gives me a prompt. I can ls the files in my home directory but the login hangs if i try to cat a file och touch a file. Or if i cat something random that doesn't exist, then it also hangs.
The problem is hard to troubleshoot since I haven't found any way to reproduce it. I just keep getting reports that people can't login. I've just been "lucky" a few times to stumble upon the issue myself.
At first I thought the problem was Kerberos related but Kerberos seems to be working fine. The clients get their tickets and they do renew as they're supposed to.
So I'm just throwing this out here hoping someone has an idea of where to go next. :)
Two suggestions that I can think of are:
1. You can use autofs with existing environment so that connection resets if client system's are not using the share (home directory) for a particular period of time.
2. Share home directories Samba+LDAP.
I'm trying the autofs approach now. So far it's working. I'll have too wait a bit to be sure though.
Let us know the results for sure :-)
Okay guys, sorry for the delay.
These are my findings so far:
Using autofs reduced the frequency of lockups, however the problem still occured. I have since found a little dirty hack workaround that seems to work.
What I've done is add these lines to the file /etc/gdm/PostLogin/Default
|All times are GMT -5. The time now is 10:04 PM.|