LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   Server restarts on his own (https://www.linuxquestions.org/questions/linux-server-73/server-restarts-on-his-own-836739/)

ticoloco 10-07-2010 07:50 AM

Server restarts on his own
 
Hy all,
I have a problem with one a server installed with Debian 3.1, that is restarting almost every day at the same hour, 6,25 AM. Here are the messages obtained with
#cerberus:/etc# grep -C 5 restart /var/log/messages

.....missing displays....
Oct 6 06:25:27 cerberus kernel: device eth2 left promiscuous mode
Oct 6 06:25:27 cerberus kernel: device eth2 entered promiscuous mode
Oct 6 06:25:29 cerberus syslogd 1.4.1#17: restart.

.....missing displays....
Oct 7 06:25:47 cerberus kernel: device eth2 left promiscuous mode
Oct 7 06:25:47 cerberus kernel: device eth2 entered promiscuous mode
Oct 7 06:25:49 cerberus syslogd 1.4.1#17: restart.
Oct 7 06:37:03 cerberus -- MARK --
Oct 7 06:57:03 cerberus -- MARK --
Oct 7 07:17:02 cerberus -- MARK --
Oct 7 07:37:02 cerberus -- MARK --
Oct 7 07:57:02 cerberus -- MARK --

As I said, there are days when the system is not restarting, but more often it does. For example, on 6th October it didn't restart, but on 7th it did, and the messages are the same.
Any ideas?

Sayan Acharjee 10-07-2010 07:58 AM

Quote:

Originally Posted by ticoloco (Post 4120449)
Hy all,
I have a problem with one a server installed with Debian 3.1, that is restarting almost every day at the same hour, 6,25 AM. Here are the messages obtained with
#cerberus:/etc# grep -C 5 restart /var/log/messages

.....missing displays....
Oct 6 06:25:27 cerberus kernel: device eth2 left promiscuous mode
Oct 6 06:25:27 cerberus kernel: device eth2 entered promiscuous mode
Oct 6 06:25:29 cerberus syslogd 1.4.1#17: restart.

.....missing displays....
Oct 7 06:25:47 cerberus kernel: device eth2 left promiscuous mode
Oct 7 06:25:47 cerberus kernel: device eth2 entered promiscuous mode
Oct 7 06:25:49 cerberus syslogd 1.4.1#17: restart.
Oct 7 06:37:03 cerberus -- MARK --
Oct 7 06:57:03 cerberus -- MARK --
Oct 7 07:17:02 cerberus -- MARK --
Oct 7 07:37:02 cerberus -- MARK --
Oct 7 07:57:02 cerberus -- MARK --

As I said, there are days when the system is not restarting, but more often it does. For example, on 6th October it didn't restart, but on 7th it did, and the messages are the same.
Any ideas?

Stop the syslogd service, see if it works.
Again check the /etc/sysconfig/syslog and /etc/syslog.conf file for any clue, post them here if you can.

TB0ne 10-07-2010 08:55 AM

Quote:

Originally Posted by sayan_acharjee (Post 4120453)
Stop the syslogd service, see if it works.
Again check the /etc/sysconfig/syslog and /etc/syslog.conf file for any clue, post them here if you can.

Why would stopping syslog (the one service that can give you clues to what's happening), be a good thing??

Since you're getting messages regarding your network interfaces, and then the system is restarting, it would hint at a network-related issue. It mentions promiscious mode...are you running a sniffer/analyzer on that box? If so, it could be core-dumping/getting overloaded, and shutting your box down.

ticoloco 10-08-2010 12:14 AM

Quote:

Originally Posted by TB0ne (Post 4120501)
Since you're getting messages regarding your network interfaces, and then the system is restarting, it would hint at a network-related issue. It mentions promiscious mode...are you running a sniffer/analyzer on that box? If so, it could be core-dumping/getting overloaded, and shutting your box down.

The box had a analyzer
#crontab -e
.........
#Run pingmon - for locations monitoring
#* * * * * /usr/local/bin/pingmon
..........

but I decommented it few weeks ago, as you can see from this output...
I don't have any other scheduled event in crontab at this specific time ...6,25 AM. And, to be honest, I don't believe that there's overload at that hour in the morning, because we are starting our workday at 8.00 AM, so, with one hour and so after.

leejohnli 10-08-2010 01:24 AM

leaving your network interface in promiscous mode is a bad thing.

ifconfig -v | grep -i promisc -see if still open

to turn off
ifconfig eth2 -promisc

TB0ne 10-08-2010 07:45 AM

Quote:

Originally Posted by ticoloco (Post 4121154)
The box had a analyzer
#crontab -e
.........
#Run pingmon - for locations monitoring
#* * * * * /usr/local/bin/pingmon
..........

but I decommented it few weeks ago, as you can see from this output...
I don't have any other scheduled event in crontab at this specific time ...6,25 AM. And, to be honest, I don't believe that there's overload at that hour in the morning, because we are starting our workday at 8.00 AM, so, with one hour and so after.

Agree with leejohnli totally. And it's not the job that's concerning, it's the interface in that mode that can cause problems. If you're not NEEDING promiscious mode, then you definitely should turn it off. From what you've posted, every time the box goes down, you're seeing references to it, and a network interface....

ticoloco 10-08-2010 09:28 AM

Quote:

Originally Posted by leejohnli (Post 4121194)
leaving your network interface in promiscous mode is a bad thing.

ifconfig -v | grep -i promisc -see if still open

to turn off
ifconfig eth2 -promisc

I've checked, and eth2 it's not in promiscuous mode:
#ifconfig
..............
eth2 Link encap:Ethernet HWaddr xxxxxxxxxxxxxxxx
inet addr:xxxxxxxxxxxx Bcast:xxxxxxxxxxxx Mask:xxxxxxxxxxxxx
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1407753 errors:0 dropped:0 overruns:0 frame:0
TX packets:1156515 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:239756633 (228.6 MiB) TX bytes:1164451100 (1.0 GiB)
Interrupt:20 Base address:0xdc00 Memory:ff8fe000-ff8fe038
..............
if it were it must have the line:
................
UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
..............

I believe that it's something else that in forcing eth2 to enter in promiscuous mode and then to give those messages at 6.25 AM.

Zippy1970 10-08-2010 07:15 PM

My old webserver (running Woody) did the exact same thing with the same messages in the syslog. Turned out it was failing hardware (although I can't remember exactly what it was). The reason it restarted itself at the exact same time almost daily, was because it was running some heavy indexing program daily at the same time which caused the hardware to fail.

So what I'm trying to say is, first make sure you don't have failing hardware. Those usually don't generate nice error messages in your log file but simply do stuff like crash or reboot your machine.

Zippy1970 10-08-2010 07:18 PM

PS: As for the NIC entering and leaving promiscuous mode, I had those lines in my syslog as well and tracked them down to a network traffic analyzer I had running (MRTG). So it was actually "normal behavior".

ticoloco 10-09-2010 02:33 AM

Quote:

Originally Posted by Zippy1970 (Post 4121959)
My old webserver (running Woody) did the exact same thing with the same messages in the syslog. Turned out it was failing hardware (although I can't remember exactly what it was). The reason it restarted itself at the exact same time almost daily, was because it was running some heavy indexing program daily at the same time which caused the hardware to fail.

So what I'm trying to say is, first make sure you don't have failing hardware. Those usually don't generate nice error messages in your log file but simply do stuff like crash or reboot your machine.

What hardware should I have to check ? And how?
Thanks!

Zippy1970 10-09-2010 02:22 PM

Things to check:
  • CPU: Make sure your CPU isn't overheating due to a clogged up cooler or worn CPU fan. Make sure the cooler is free from dust and the fan turns freely.
  • Memory: If you have two (or more) sticks, remove one stick at the time to see if that gets rid of the reboots. If it does, the removed stick is faulty.
  • Hard Drive: If you don't find any obvious error messages in your log files pointing at a problem with your Hard Drive, it could still be the cause for your reboots. Install smartctl and hddtemp. Smartctl displays the SMART information of the drive while hddtemp displays the temperature of a hard drive (you can omit hddtemp if you like since SMART data also contains the drive's temperature).

ticoloco 10-11-2010 09:48 AM

Quote:

Originally Posted by Zippy1970 (Post 4122357)
Things to check:
  • CPU: Make sure your CPU isn't overheating due to a clogged up cooler or worn CPU fan. Make sure the cooler is free from dust and the fan turns freely.
  • Memory: If you have two (or more) sticks, remove one stick at the time to see if that gets rid of the reboots. If it does, the removed stick is faulty.
  • Hard Drive: If you don't find any obvious error messages in your log files pointing at a problem with your Hard Drive, it could still be the cause for your reboots. Install smartctl and hddtemp. Smartctl displays the SMART information of the drive while hddtemp displays the temperature of a hard drive (you can omit hddtemp if you like since SMART data also contains the drive's temperature).

I'll try all this steps. I'll get back with feedback.
Thanks again


All times are GMT -5. The time now is 02:08 AM.