LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Fedora (http://www.linuxquestions.org/questions/fedora-35/)
-   -   Troubleshooting (http://www.linuxquestions.org/questions/fedora-35/troubleshooting-745143/)

kir2u 08-04-2009 07:33 PM

Hello,
I've had this fedora box setup for me to use as a mailserver. It's running sendmail and i'm using pop3d in order to allow users to check their email.

The problem is now that for the past 3-4 days the server keeps crashing at around 10am. I wanted to know how i would go about troubleshooting this. Where is the logs i can look out? /var/logs doesn't seem to have the proper logs.

Thank you

just an update.

i checked the "messages" and "secure" files in /var/log and all i see is just someone running a brute force on my SSH port and trying different users and failing.

unSpawn 08-04-2009 08:55 PM

Quote:

Originally Posted by kir2u (Post 3631564)
The problem is now that for the past 3-4 days the server keeps crashing at around 10am. I wanted to know how i would go about troubleshooting this. Where is the logs i can look out? /var/logs doesn't seem to have the proper logs.

Crashing how? Does it reboot spontaneously? Or do you have to reboot it? Does it show errors on the console or when you log in? When a machine reboots unintendedly, reading back /var/log/messages lines from the approximate time of reboot might reveal information about processes that ran or errored out. Also check at which time logrotate kicks in (/etc/crontab) and with what configuration (/etc/logrotate.d/syslog) so you know if you also need to read back archived copies of /var/log/messages. Since 10AM sounds too regular I'd check (copies of) /var/log/cron and root crontab (/var/spool/cron/root) as well. If none of the logs reveal clues at the approximate time of reboot then you might want to start logging more information by tweaking what gets logged in /etc/syslog.conf (e.g.: '*.debug -/var/log/debug'), running SMART checks and collect system statistics with Atop, Dstat or Collectl.

kir2u 08-04-2009 11:11 PM

Quote:

Originally Posted by unSpawn (Post 3631611)
Crashing how? Does it reboot spontaneously? Or do you have to reboot it? Does it show errors on the console or when you log in? When a machine reboots unintendedly, reading back /var/log/messages lines from the approximate time of reboot might reveal information about processes that ran or errored out. Also check at which time logrotate kicks in (/etc/crontab) and with what configuration (/etc/logrotate.d/syslog) so you know if you also need to read back archived copies of /var/log/messages. Since 10AM sounds too regular I'd check (copies of) /var/log/cron and root crontab (/var/spool/cron/root) as well. If none of the logs reveal clues at the approximate time of reboot then you might want to start logging more information by tweaking what gets logged in /etc/syslog.conf (e.g.: '*.debug -/var/log/debug'), running SMART checks and collect system statistics with Atop, Dstat or Collectl.

i dont think it's an actual reboot. It just hangs of some sort because i stop getting my mails and can't SSH to the box so have to manually restart the server to get back into it.

kir2u 08-04-2009 11:18 PM

another thing i see is:

error: stat of /var/log/ppp/connect-errors failed: No such file or directory

when i do : logrotate /etc/logrotate.conf

could this be it? it's in the daily cron tab folder.

chrism01 08-05-2009 01:46 AM

That probably(??) shouldn't cause as much trouble as you're having, but it's definitely worth fixing.
Have a good look through your logfiles for anything at that time of day or just before.

unSpawn 08-05-2009 06:18 AM

Quote:

Originally Posted by kir2u (Post 3631693)
i dont think it's an actual reboot. It just hangs of some sort because i stop getting my mails and can't SSH to the box so have to manually restart the server to get back into it.

Depending on hardware specs and load a machine may appear to hang for some period of time, but since you did not post details to show it actually crashed that's just speculation. If "manually restart the server" means hard resetting the machine then you may expect all sorts of problems. Filesystems are quite robust but they were not intended to suffer continuous and survive deliberate power cuts like that. Like Chrism01 said the missing /var/log/ppp/connect-errors is not going to make the machine hang. I think I gave you enough pointers to get started so do get back to us in more detail about what logs you looked at and what you did find.

markseger 08-06-2009 08:49 AM

One thing I've seen on rare occasions is systems hangs caused by flaky hardware or a high process which takes over the system and both show up as gaps in collectl data. In other words, when collectl is run as a daemon and taking samples every 10 seconds, each sample is exactly 10 seconds apart within a msec of each other with virtually no missed samples. If some piece of hardware misbehaves or some very high priority process such as the 'oom killer' takes over the system, no other process will get any run time until it finishes. This will show up as a few missing collectl samples and sometimes as many as several minutes worth.
-mark

chrism01 08-06-2009 09:13 PM

If its always around 10am I'd start by looking at all the crontabs... and also look through any logfiles at about that time (start from 09:45).

unSpawn 08-06-2009 10:00 PM

I already mentioned all of that in post #2.


All times are GMT -5. The time now is 03:02 AM.