Problem: Brand new CentOS 4.4 dual intel cpu server has serious time issues. We loaded the OS and soon after started seeing lots of disturbing things... Corrupted log files, raid card rebooting, load fluctuating wildly, and notices like these in /var/log/messages:
Quote:
Apr 26 13:54:33 spam2 syslogd 1.4.1: restart.
Apr 26 14:12:00 spam2 kernel: warning: many lost ticks.
Apr 26 14:12:00 spam2 kernel: Your time source seems to be instable or some driver is hogging interupts
|
Getting a clue I first started looking into our time server syncs via NTPD. NTPD, though it tried very hard, couldn't keep the time synced and I finally realized (duh) that our clock was all over the place. I ran some time vs hardware clock comparisons (see below). Note that the hardware clock is correct as I ran "date && hwclock" three times over the course 17 seconds...
Quote:
[root@host ~]# date && hwclock
Thu Apr 26 13:28:27 MDT 2007
Thu 26 Apr 2007 01:02:32 PM MDT -0.251558 seconds
[root@host ~]# date && hwclock
Thu Apr 26 13:29:48 MDT 2007
Thu 26 Apr 2007 01:02:37 PM MDT -0.286713 seconds
[root@host ~]# date && hwclock
Thu Apr 26 13:31:17 MDT 2007
Thu 26 Apr 2007 01:02:49 PM MDT -0.403288 seconds
|
The "date" command during this time moves forward wildly. i.e. The hardware clock correctly advanced 17 seconds while the OS thinks the time moved forward almost 3 minutes during the same time period.
My question is: Where does the problem lie? With one, or both of the CPU's? With the mother board? Something else? Since this is a custom machine I'm trying to figure out what to RMA.
Thanks in advance,
~ Oban