SuSE 9.3 logins and console crash but pings and Oracle database access ok
We have a login problem with our SuSE 9.3 linux server, kernel = 2.6.5-7.287.3-bigsmp
All logins hang after an arbitrary period of working. We can't even login to the console when this happens. To regain control, we can only hard boot the server. The logins will work for any time between a couple of days to a week or two, but most typically a little over one week.
This problem only started after upgrading Oracle from 9.2.0.5 to 10.2.0.3. We have two instances running, with SGA = 336 MB each. The server RAM is 4 GB. A "show sga" for either instance displays:
Total System Global Area 352321536 bytes
Fixed Size 1261780 bytes
Variable Size 109055788 bytes
Database Buffers 234881024 bytes
Redo Buffers 7122944 bytes
The sshd version:
OpenSSH_4.1p1, OpenSSL 0.9.7d (dated 17 March 2004)
We turned off IPv6 and left IPv4 running because at least one Google entry said that fixed their problem. It did not fix ours.
A cron job in root to restart sshd and output a log file wouldn't run, presumably because cron tries to open a tty, which is hung.
We have a root cron job to append "top" session output to a log file every 5 minutes. Just prior to the login hang, the users "novlwww" and "wwwrun" show in the top output. Just after the hang, only their uid numbers show, not their user names. These users control Tomcat and apache2, among other applications. This has to be a clue, but what?
Does anyone have any suggestions what could be wrong?
Last edited by ponsolo; 03-14-2008 at 05:32 PM.
|