Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have a pair of Proliant 5500s running Cent OS 5.2 that both randomly reboot. APIC as well as ASR are off. I'm thinking it has to be a software problem as opposed to a hardware problem since both are experiencing the same problem. Also, one is a mail/web server running Apache and Zimbra and the other is a Samba server.
Dec 21 06:06:48 mail -- MARK --
Dec 21 06:26:48 mail -- MARK --
Dec 21 06:46:48 mail -- MARK --
Dec 21 07:06:48 mail -- MARK --
Dec 21 07:26:48 mail -- MARK --
Dec 21 07:46:48 mail -- MARK --
Dec4.1: restart.
Dec 21 15:05:56 mail audispd: af_unix plugin initialized
Dec 21 15:05:56 mail audispd: audispd initialized with q_depth=64 and 1 active plugins
Dec 21 15:05:56 mail kernel: klogd 1.4.1, log source = /proc/kmsg started.
Dec 21 15:05:56 mail kernel: Linux version 2.6.18-92.1.18.el5 (mockbuild@builder16.centos.org) (gcc version 4.1.2 20071124 (Red Hat 4.1.2-42)) #1 SMP Wed Nov 12 09:30:27 EST 2008
Dec 21 15:05:56 mail kernel: BIOS-provided physical RAM map:
Dec 21 15:05:56 mail kernel: BIOS-e820: 0000000000000000 - 000000000009f800 (usable)
Dec 21 15:05:56 mail kernel: BIOS-e820: 000000000009f800 - 00000000000a0000 (reserved)
Dec 21 15:05:56 mail kernel: BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
Dec 21 15:05:56 mail kernel: BIOS-e820: 0000000000100000 - 000000009fffc000 (usable)
Dec 21 15:05:56 mail kernel: BIOS-e820: 000000009fffc000 - 00000000a0000000 (ACPI data)
Dec 21 15:05:56 mail kernel: BIOS-e820: 00000000fec00000 - 00000000fec10000 (reserved)
Dec 21 15:05:56 mail kernel: BIOS-e820: 00000000fee00000 - 00000000fee10000 (reserved)
Dec 21 15:05:56 mail kernel: BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
Dec 21 15:05:56 mail kernel: 1663MB HIGHMEM available.
Dec 21 15:05:56 mail kernel: 896MB LOWMEM available.
Dec 21 15:05:56 mail kernel: found SMP MP-table at 000f4fd0
Weird. You have 'mark' timestamps showing up regularly, then they stop, then hours later the server restarts. I wonder if something is killing syslogd??
Given that your logs aren't telling you much, do you have some spare RAM that you could trade out with on one of the two servers? (I think this is worth exploring on the chance that your RAM + motherboard combination are just not playing nice.)
I have a pair of Proliant 5500s running Cent OS 5.2 that both randomly reboot. APIC as well as ASR are off. I'm thinking it has to be a software problem as opposed to a hardware problem since both are experiencing the same problem. Also, one is a mail/web server running Apache and Zimbra and the other ...<SNIP>... 1663MB HIGHMEM available.
Dec 21 15:05:56 mail kernel: 896MB LOWMEM available.
Dec 21 15:05:56 mail kernel: found SMP MP-table at 000f4fd0
Run rootkit hunter and check rootkit on the system. That looks strange and having two servers doing it may be a coincidence, but its highly doubtful.
What type of cron jobs do you have running? Do you have any backups running? Anything in particular running you know of between the time stamps of MARK above and the reboot time stamps? I would imagine some process is killing these machines, preventing the logging or the like.
What time are those zimbra crons running? Anything close to the time the machine is getting rebooted? One of those could be the culprit.
None of them are close to the reboot times and the reboot is completely random. Load also seems to be no factor as it is just as likely to reboot running off hours as business hours.
The servers both rebooted at the same time, I think you can check the power.
I'd have to agree here as well. If it's not a cron, it's off business hours and both servers rebooted almost at the exact time, I'd check power as well.
I checked for rootkits last night and both servers were clean. The servers don't reboot at the same time but there could still be an issue with the UPS.
I checked for rootkits last night and both servers were clean. The servers don't reboot at the same time but there could still be an issue with the UPS.
Never rule it out of the equation.
Do these servers have dual power supplies? If so, do you have more than one UPS? If there's more than one, you should split up the power and or try bypassing the UPS to see if the problem reoccurs. Also, most UPS's have a management port, you could probably setup to monitor these to see if it's the actual culprit.
They do have dual power supplies. I have another UPS at home that I'll bring and try next week. The UPS does have a management port so I'll set it up and see what it shows.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.