What to check after a server crash?
Our unix server running firewall, nat, sendmail, procmail, squid, samba, named, etc. for some 50 clients crashed two times yesterday, after a successful uptime of some 45 days.
The 45 days uptime is sort of OK, however the two crashes on the same day annoy me.
The crashes were total: I could only restart the server by the RESET button (it gave me the good old windows feeling).
Afterwards, I did not find any error messages in /var/log/messages nor in root's mail messages or any other logfiles in /var/log that could light me up what caused the two crashes. Using fsck I also found the file systems to be clean.
When the server was crashed, there were no error messages on the screen, either, the only strange thing was that there were some surplus characters (or, rather a complete sentence, but not an error message of any kind) shown after the 'login: ' prompt on the screen.
Could you please give me some hints, what logs to turn on and what logfiles to check in order to trace down any future crashes, or simply monitor free system resources.
Last edited by J_Szucs; 02-25-2003 at 10:56 PM.
|