System halt after 25 days
Hi
I'm facing with a strange system failure. After 25 days since boot (more precisely 24 days, 20 hours and circa 30min) the system halts. The wall-time is unrelated, only the boot-time seems valuable. The nearest value to this time is (2^31-1)millisec: but the system doesn't halt exactly when CLOCK_MONOTONIC reach 2147483seconds, it runs for a handful of minutes (~10), then it stops. Until this, the system runs smoothly. It seems that some kernel activity, scheduled for later processing, doesn't handle properly the wrap of this counter and it crashes. I suspect something related to disk-cache-flush I looked into the kernel tree for anything related to this issue, but nothing. All the time related functions use struct timespec/timeval or int64, and no millisecond reference. Have someone some suggestion ? Thanks in advance ---- Linux kernel 2.6.26.8-3 CPU MIPS 4KSd V2.4 System busybox + libuClibc-0.9.30.so Storage jffs2 / mtd |
What's on the console when it halts? Is there a stack trace? What does "last" say was the reason? Do you have a hardware watchdog timer enabled in the BIOS?
|
It's a long shot, but is there anything in the logs?
|
Logs
Hi,
unfortunately the console is not usable, because the machine is located remotely, the only access is via ssh. After reboot, the previous logs are lost, because they are in tmpfs. During the tests in laboratory, with console access, we never faced this issue I tried unsuccessfully to reproduce the phenomenon -"accelerate" the time jiffies += SOME_LARGE_VALUE in do_timer(), but it doesn't work: Linux doesn't run at all (there is a document by Kobayashi/Toshiba about, I discovered *after*) - "start" the timer near to the 25days expiration date u64 jiffies_64 ... = INITIAL_JIFFIES + 2000000L; but the system run flawless beyond the critical point |
Can you check for events at the time of the last crash:
Code:
ipmitool sel list |
UPS ?.
|
Quote:
write logs to different location, NOT tmpfs? |
I don't think anyone can solve it without additional information. So as in post #7 save the logs (and come back after 25 days).
It can be even a simple disk full on your tmpfs, but we can only guess... |
Well
clearly this is not a "known" issue. We are verifying the feasibility to connect a remote machine to the console, and hopefully ... But 25days is a long time :( Thanks |
Quote:
And, as mentioned by others, anything in hardware logs: Code:
ipmitool sel elist - setup a syslog server and send syslogs to it. - in a loop write out dmesg output to file; same with other info like vmstat, iostat etc Also, which OS? I have seen in some OS: boot.olog and boot.log - not sure if you have looked at that? |
All times are GMT -5. The time now is 08:05 AM. |