I have a really odd problem. My clock on a box named bach loops over and over, advancing only 1-2 seconds for every 10-20. My prompt, which displays the time, shows the issue nicely:
08:02:18 jptxs@bach:~$
08:02:18 jptxs@bach:~$
08:02:19 jptxs@bach:~$
08:02:20 jptxs@bach:~$
08:02:20 jptxs@bach:~$
08:02:21 jptxs@bach:~$
08:02:21 jptxs@bach:~$
08:02:21 jptxs@bach:~$
08:02:22 jptxs@bach:~$
08:02:18 jptxs@bach:~$
08:02:18 jptxs@bach:~$
08:02:18 jptxs@bach:~$
08:02:19 jptxs@bach:~$
08:02:19 jptxs@bach:~$
08:02:20 jptxs@bach:~$
08:02:20 jptxs@bach:~$
08:02:20 jptxs@bach:~$
08:02:21 jptxs@bach:~$
08:02:21 jptxs@bach:~$
08:02:22 jptxs@bach:~$
08:02:22 jptxs@bach:~$
08:02:18 jptxs@bach:~$
Now, eventually, it will get to a point where it will move on a second. It would advance to 08:02:23 or maybe 24. But then it will skip back to 19 or 20 and start to loop for a bit again. I wrote a script to track it:
#!/bin/bash
while true
do
printf "harware clock says :: " >> clockChecks.log
hwclock >> clockChecks.log
echo >> clockChecks.log
printf "uptime is :: " >> clockChecks.log
uptime >> clockChecks.log
echo >> clockChecks.log
echo "bach clock:" >> clockChecks.log
date >> clockChecks.log
echo >> clockChecks.log
echo "wagner clock:" >> clockChecks.log
ssh wagner date >> clockChecks.log
echo >> clockChecks.log
echo ++++++---------------------++++++++ >> clockChecks.log
echo >> clockChecks.log
sleep 10
done
The output compares it to a working box named wagner as well as tracking uptime and hwclock output. I have posted a file [clockChecks.log] that shows the output at pastebin here:
http://pastebin.com/819421, and latest now here:
http://pastebin.com/820372. Now, what you can see is that they are in sync for quite a while. Here is some output from the most recent manifestation of the issue (updated to latest):
harware clock says :: Wed 08 Nov 2006 02:14:38 PM EST -0.563844 seconds
uptime is :: 14:14:39 up 1 day, 3:42, 2 users, load average: 0.00, 0.00, 0.00
bach clock:
Wed Nov 8 14:14:39 EST 2006
wagner clock:
Wed Nov 8 14:14:39 EST 2006
++++++---------------------++++++++
harware clock says :: Wed 08 Nov 2006 02:16:54 PM EST -0.967773 seconds
uptime is :: 14:14:53 up 1 day, 3:43, 2 users, load average: 0.00, 0.00, 0.00
bach clock:
Wed Nov 8 14:14:53 EST 2006
wagner clock:
Wed Nov 8 14:16:54 EST 2006
++++++---------------------++++++++
harware clock says :: Wed 08 Nov 2006 02:20:07 PM EST -0.079559 seconds
uptime is :: 14:15:03 up 1 day, 3:43, 2 users, load average: 0.00, 0.00, 0.00
bach clock:
Wed Nov 8 14:15:03 EST 2006
wagner clock:
Wed Nov 8 14:20:07 EST 2006
I ran ntpdate and the clock was reset, but immediately slowed down and looped again.
This gets stranger. This all started when I upgraded from a an old Debian install running kernel 2.20 with custom config to Ubuntu 6.06. For various reasons I needed a 2.6 kernel on the box. The box crawled. At that point, I didn't notice the clock issues. I installed over and over. I tried to go back to Debian Sarge. I tried a RedHat AS 3 set of disks I had and that wouldn't boot correctly (seemed like the init scripts were timing out). I finally settled on CentOS 4.4 (Server Disk for the install). That has gotten me to the point where I am now. It seemed to be running fine for 7-8 hours. I reinstalled the services I had had on the box (pdnsd:
http://www.phys.uu.nl/~rombouts/pdnsd.html, ejabberd:
http://ejabberd.jabber.ru/ and VMWare Server:
http://www.vmware.com/products/server/ running an OpenBSD guest OS). Everything seemed fine, but then the clock started going funny again after a few hours more operations. So I searched and searched. I found a ton of stuff about XBox clock skips and loops, but nothing I could find seemed relevant to me. I upgraded my BIOS (
http://www-307.ibm.com/pc/support/si...id=MIGR-42952). After that reboot things seemed to be fine again for a bit. That's when I wrote the script to track it. I thought I'd licked it. Bu then the above unfolded and now I'm here begging for help.
Things I have tried:
- upgraded BIOS
- replaced battery on MB
- reset the RTC clock
- ran without VMWare up (to eliminate vmmon, which was suspect to me at one point)
- ran in single user mode
Other info:
- ps -ef ::
http://pastebin.com/819423
- cat /proc/interrupts|grep 8\: \ 8: 2207 IO-APIC-edge rtc
- dmidecode output ::
http://pastebin.com/819410
- uname -a \ Linux bach 2.6.9-42.EL #1 Sat Aug 12 09:17:58 CDT 2006 i686 i686 i386 GNU/Linux
The problem eventually manifested in every case I tried. I have also found there seems to be no real tie in to time of day (though it has always happened after 9pm regardless of uptime) and that a reboot always seems to fix it for a while, though the amount of uptime that passes has varied from 4-7 hours.
Could this be hardware? Seems too inconsistent for that, in my mind.
How, if at all, could this be somehow be a relationship to the wagner box? It seemed to go out of whack when that went offline. But it was a 100% fresh install in a few cases and still had this issue.
I didn't see any errors in /var/log/messages or other logs at that time, either.
Could this be some odd thing I'm just not seeing?
Any suggestions for search terms? I've tried "clock skip" "clock loop" and all sorts of system indicators (kernel: Linux bach 2.6.9-42.EL #1 Sat Aug 12 09:17:58 CDT 2006 i686 i686 i386 GNU/Linux, model: info in dmidecode output above).
Anything I'm just plain missing?
Any help appreciated...
jptxs