LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   fedora server shuts down for no apparent reason (https://www.linuxquestions.org/questions/linux-general-1/fedora-server-shuts-down-for-no-apparent-reason-360463/)

jordanthompson 09-05-2005 12:39 PM

fedora server shuts down for no apparent reason
 
Hi there,
I am running the most recent (updated by yum) fedora release of redhat. I use this machine as print/file/web/mail/samba server. Every once and a while it just shuts itself off. I'm not sure where to look in the logs to find the answer.

thanks in advance,
Jordan

macemoneta 09-05-2005 12:48 PM

Usually, a poweroff is caused by exceeding the thermal critical limit. There may be nothing in the logs, as Linux didn't make the call - the hardware/BIOS did.

Check to make sure all fans are running (and properly oriented - blowing in the correct direction). Monitor the temperature of the system.

Another possibility is a brief power outage. Make sure that you are on a UPS. Make sure that you are not exceeding the power draw on your power supply. If your server has redundant power supplies, make sure that you are using independant power sources for each.

PTrenholme 09-05-2005 02:12 PM

You might want to look at sensors and set a few alarms for high temp conditions. The youu'd at least know if that was the problem.

Personally, I think power problems are more likely.

You did look at dmesg just to see what it said, didn't you?

I like
Code:

$ dmesg | gawk '/fail/;/error/'
If you wanted to look at all your logs, here's one way:
Code:

# gawk '/fail/{print FILENAME ": " $0};/error/{print FILENAME ": " $0}' `ls -D /var/log/*`
although there are los of monitoring tools available.

Take a look here for some.

jordanthompson 09-05-2005 02:58 PM

Here are the results for
gawk '/fail/{print FILENAME ": " $0};/error/{print FILENAME ": " $0}' `ls -D /var/log/*` | grep "Sep 5"

I'm guessing it died around 2 hours ago (right after the first post in this thread.)


/var/log/boot.log: Sep 5 13:28:09 dot mdmpd: mdmpd failed
/var/log/maillog: Sep 5 00:49:15 dot imap[31858]: SQUAT failed to open index file
/var/log/maillog: Sep 5 00:49:15 dot imap[31858]: SQUAT failed
/var/log/maillog: Sep 5 00:49:45 dot imap[31928]: SQUAT failed to open index file
/var/log/maillog: Sep 5 00:49:45 dot imap[31928]: SQUAT failed
/var/log/maillog: Sep 5 00:50:08 dot imap[31858]: SQUAT failed to open index file
/var/log/maillog: Sep 5 00:50:08 dot imap[31858]: SQUAT failed
/var/log/maillog: Sep 5 09:10:24 dot imap[32247]: SQUAT failed to open index file
/var/log/maillog: Sep 5 09:10:24 dot imap[32247]: SQUAT failed
/var/log/maillog: Sep 5 09:10:54 dot imap[32249]: SQUAT failed to open index file
/var/log/maillog: Sep 5 09:10:54 dot imap[32249]: SQUAT failed
/var/log/maillog: Sep 5 09:11:34 dot imap[32248]: SQUAT failed to open index file
/var/log/maillog: Sep 5 09:11:34 dot imap[32248]: SQUAT failed
/var/log/maillog: Sep 5 09:15:26 dot imap[1677]: SQUAT failed to open index file
/var/log/maillog: Sep 5 09:15:26 dot imap[1677]: SQUAT failed
/var/log/maillog: Sep 5 09:36:06 dot imap[1784]: SQUAT failed to open index file
/var/log/maillog: Sep 5 09:36:06 dot imap[1784]: SQUAT failed
/var/log/maillog: Sep 5 09:37:44 dot imap[1790]: SQUAT failed to open index file
/var/log/maillog: Sep 5 09:37:44 dot imap[1790]: SQUAT failed
/var/log/maillog: Sep 5 09:38:02 dot imap[1786]: SQUAT failed to open index file
/var/log/maillog: Sep 5 09:38:02 dot imap[1786]: SQUAT failed
/var/log/maillog: Sep 5 09:38:10 dot imap[1787]: SQUAT failed to open index file
/var/log/maillog: Sep 5 09:38:10 dot imap[1787]: SQUAT failed
/var/log/maillog: Sep 5 09:38:14 dot imap[1790]: SQUAT failed to open index file
/var/log/maillog: Sep 5 09:38:14 dot imap[1790]: SQUAT failed
/var/log/maillog: Sep 5 09:38:49 dot imap[1786]: SQUAT failed to open index file
/var/log/maillog: Sep 5 09:38:49 dot imap[1786]: SQUAT failed
/var/log/maillog: Sep 5 09:38:55 dot imap[1788]: SQUAT failed to open index file
/var/log/maillog: Sep 5 09:38:55 dot imap[1788]: SQUAT failed
/var/log/maillog: Sep 5 09:38:59 dot imap[1791]: SQUAT failed to open index file
/var/log/maillog: Sep 5 09:38:59 dot imap[1791]: SQUAT failed
/var/log/maillog: Sep 5 09:39:34 dot imap[1789]: SQUAT failed to open index file
/var/log/maillog: Sep 5 09:39:34 dot imap[1789]: SQUAT failed
/var/log/maillog: Sep 5 09:50:34 dot imap[1794]: SQUAT failed to open index file
/var/log/maillog: Sep 5 09:50:34 dot imap[1794]: SQUAT failed
/var/log/maillog: Sep 5 09:50:39 dot imap[1795]: SQUAT failed to open index file
/var/log/maillog: Sep 5 09:50:39 dot imap[1795]: SQUAT failed
/var/log/maillog: Sep 5 09:53:49 dot imap[1828]: SQUAT failed to open index file
/var/log/maillog: Sep 5 09:53:49 dot imap[1828]: SQUAT failed
/var/log/maillog: Sep 5 09:54:09 dot imap[1833]: SQUAT failed to open index file
/var/log/maillog: Sep 5 09:54:09 dot imap[1833]: SQUAT failed
/var/log/maillog: Sep 5 09:55:35 dot imap[1838]: SQUAT failed to open index file
/var/log/maillog: Sep 5 09:55:35 dot imap[1838]: SQUAT failed
/var/log/maillog: Sep 5 10:03:29 dot imap[1841]: SQUAT failed to open index file
/var/log/maillog: Sep 5 10:03:29 dot imap[1841]: SQUAT failed
/var/log/maillog: Sep 5 10:03:34 dot imap[1843]: SQUAT failed to open index file
/var/log/maillog: Sep 5 10:03:34 dot imap[1843]: SQUAT failed
/var/log/maillog: Sep 5 10:04:03 dot imap[1881]: SQUAT failed to open index file
/var/log/maillog: Sep 5 10:04:03 dot imap[1881]: SQUAT failed
/var/log/maillog: Sep 5 10:04:18 dot imap[1841]: SQUAT failed to open index file
/var/log/maillog: Sep 5 10:04:18 dot imap[1841]: SQUAT failed
/var/log/maillog: Sep 5 10:04:45 dot imap[1842]: SQUAT failed to open index file
/var/log/maillog: Sep 5 10:04:45 dot imap[1842]: SQUAT failed
/var/log/maillog: Sep 5 10:16:30 dot imap[1883]: SQUAT failed to open index file
/var/log/maillog: Sep 5 10:16:30 dot imap[1883]: SQUAT failed
/var/log/maillog: Sep 5 10:19:48 dot imap[1885]: SQUAT failed to open index file
/var/log/maillog: Sep 5 10:19:48 dot imap[1885]: SQUAT failed
/var/log/maillog: Sep 5 10:20:22 dot imap[1886]: SQUAT failed to open index file
/var/log/maillog: Sep 5 10:20:22 dot imap[1886]: SQUAT failed
/var/log/maillog: Sep 5 10:23:44 dot imap[1888]: SQUAT failed to open index file
/var/log/maillog: Sep 5 10:23:44 dot imap[1888]: SQUAT failed
/var/log/maillog: Sep 5 15:44:17 dot imap[4189]: SQUAT failed to open index file
/var/log/maillog: Sep 5 15:44:17 dot imap[4189]: SQUAT failed
/var/log/maillog: Sep 5 15:44:28 dot imap[4256]: SQUAT failed to open index file
/var/log/maillog: Sep 5 15:44:28 dot imap[4256]: SQUAT failed
/var/log/maillog: Sep 5 15:45:36 dot imap[5434]: SQUAT failed to open index file
/var/log/maillog: Sep 5 15:45:36 dot imap[5434]: SQUAT failed
/var/log/messages: Sep 5 09:28:30 dot smbd[1779]: getpeername failed. Error was Transport endpoint is not connected
/var/log/messages: Sep 5 09:28:30 dot smbd[1779]: write_socket_data: write failure. Error = Connection reset by peer
/var/log/messages: Sep 5 13:27:38 dot kernel: ** driver failed to call pci_enable_device(). As a temporary
/var/log/messages: Sep 5 13:28:09 dot mdmpd: mdmpd failed
/var/log/messages: Sep 5 13:33:03 dot smbd[5234]: getpeername failed. Error was Transport endpoint is not connected
/var/log/messages: Sep 5 13:33:03 dot smbd[5234]: getpeername failed. Error was Transport endpoint is not connected
/var/log/messages: Sep 5 13:33:03 dot smbd[5234]: write_socket_data: write failure. Error = Connection reset by peer
/var/log/secure: Sep 5 13:27:48 dot sshd[4023]: error: Bind to port 22 on 0.0.0.0 failed: Address already in use.
/var/log/secure: Sep 5 13:27:48 dot sshd[4023]: error: Bind to port 22 on 0.0.0.0 failed: Address already in use.
[root@dot jordan]#

jordanthompson 09-05-2005 03:19 PM

By the way, I don't think its a heat issue - I had just checked the operation of all of the fans (I do that periodically) and I have an extra one on the chassis itself to boot.

jordanthompson 09-08-2005 08:20 PM

Any suggestions where to look?

macemoneta 09-08-2005 11:06 PM

Well, have you set up the lmsensors as PTrenholme suggested, to check the temperature readings before a shutdown? Have you verified that you are not overdrawing your power supply? Do you have a UPS? Also, are you overclocking? Are you using an aftermarket heatsink on the CPU?

Try this; open a command window and enter:

while true; do true; done

That's an infinite loop (you can interrupt it with Ctrl-c). It will cause your CPU temperature to go up quickly. If your system shuts down within about 5 minutes of starting that, it's a heat issue. If not, it's likely a power issue.

jordanthompson 09-09-2005 10:07 PM

Well, have you set up the lmsensors as PTrenholme suggested, to check the temperature readings before a shutdown?
I could not get this to work (compile, install, etc.) I was able to install the rpm, but I can't find where it put the binaries.

Have you verified that you are not overdrawing your power supply?
I am definetly not overdrawing the ps. I have one card - everything else is onboard the motherboard.

Do you have a UPS?
Actually, I have two - I live in Florida :-)

Also, are you overclocking?
No

Are you using an aftermarket heatsink on the CPU?
No - it is an Intel - at least it came with the CPU.

Try this; open a command window and enter:
while true; do true; done
Did this - it has been running now for over an hour - the computer is still up.

Any other suggestions? Where in the logs could I find a clue - if the OS is shutting it down for some reason?
Thanks very much for your help,
Jordan

macemoneta 09-10-2005 03:22 AM

If the OS is shutting it down (instead of a spontaneous power off), edit /var/log/messages after booting back up. Go to the bottom of the file, then search backwards for "restart" - the syslogd restart message. The messages before this are the last messages recorded by the system before the poweroff. If this was a normal system initiated operation, you will see a series of service termination messages. For example:

Code:

Aug  8 01:22:41 mmouse shutdown: shutting down for system reboot
Aug  8 01:22:43 mmouse init: Switching to runlevel: 6
Aug  8 01:22:44 mmouse cups-config-daemon: cups-config-daemon -TERM succeeded
Aug  8 01:22:45 mmouse dbus: avc:  1 AV entries and 1/512 buckets used, longest chain length 1
Aug  8 01:22:45 mmouse messagebus: messagebus -TERM succeeded
Aug  8 01:22:45 mmouse cups: cupsd shutdown succeeded
Aug  8 01:22:50 mmouse httpd: httpd shutdown succeeded
Aug  8 01:22:50 mmouse sshd: sshd -TERM succeeded
Aug  8 01:22:51 mmouse sendmail: sendmail shutdown succeeded
Aug  8 01:22:51 mmouse sendmail: sm-client shutdown succeeded
Aug  8 01:22:51 mmouse spamassassin: spamd shutdown succeeded
Aug  8 01:22:52 mmouse dhcpd: dhcpd shutdown succeeded
Aug  8 01:22:52 mmouse dhcpd: dhcpd shutdown succeeded
Aug  8 01:22:52 mmouse smartd[3966]: smartd received signal 15: Terminated
Aug  8 01:22:52 mmouse smartd[3966]: smartd is exiting (exit status 0)
Aug  8 01:22:53 mmouse smartd: smartd shutdown succeeded
Aug  8 01:22:53 mmouse xinetd[4084]: Exiting...
Aug  8 01:22:53 mmouse xinetd: xinetd shutdown succeeded
Aug  8 01:22:54 mmouse acpid: acpid shutdown succeeded
Aug  8 01:22:55 mmouse crond: crond shutdown succeeded
Aug  8 01:22:55 mmouse ntpd[4104]: ntpd exiting on signal 15
Aug  8 01:22:55 mmouse mdmonitor: mdadm shutdown succeeded
Aug  8 01:22:55 mmouse kernel: Kernel logging (proc) stopped.
Aug  8 01:22:55 mmouse kernel: Kernel log daemon terminating.
Aug  8 01:22:56 mmouse syslog: klogd shutdown succeeded
Aug  8 01:22:56 mmouse exiting on signal 15


PTrenholme 09-10-2005 01:54 PM

I really don't think you need it, since over-temp problems are not too likely, and your BIOS should be monitoring the temp.s for you, but, if you'd like to see the temperatures, the lm_sensors package is in, I believe, the FC4 "base" repository, so just do a
Code:

# yum install lm_sensors
(The version on ATrpms is more current, if you care.)

If you use KDE, try a
Code:

# yum install kdeutils
instad (which should install lm_sensors as a dependency), and the use the "KSensors" applet to display the temperatures on your panel, and set alarms. (If you don't see any hardware temperature sensors, look at info sensors and info sensors.conf.)

By the way, my system uses a 3GHz Intel 745 H/T, which was running hot until I replaced the stock cooler with an aftermarket one. Now I'm seldom above 110 F.

Caution: Replacing the heat-sink is not easy for the inexperienced, and experience can be expensive to acquire. It cost me $200 for a new CPU when I bent the pins removing the old CPU/heat-sink. (I thought I could remove the heat-sink leaving the CPU in the M/B, but the thermal grease had a different idea, and prevailed.)


All times are GMT -5. The time now is 02:22 PM.