LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel
User Name
Password
Linux - Kernel This forum is for all discussion relating to the Linux kernel.

Notices


Reply
  Search this Thread
Old 04-10-2019, 11:25 AM   #1
icav
LQ Newbie
 
Registered: Apr 2019
Posts: 3

Rep: Reputation: Disabled
System halt after 25 days


Hi
I'm facing with a strange system failure.
After 25 days since boot (more precisely 24 days, 20 hours and circa 30min) the system halts.
The wall-time is unrelated, only the boot-time seems valuable.

The nearest value to this time is (2^31-1)millisec: but the system doesn't halt exactly when CLOCK_MONOTONIC reach 2147483seconds, it runs for a handful of minutes (~10), then it stops. Until this, the system runs smoothly.
It seems that some kernel activity, scheduled for later processing, doesn't handle properly the wrap of this counter and it crashes.
I suspect something related to disk-cache-flush

I looked into the kernel tree for anything related to this issue, but nothing. All the time related functions use struct timespec/timeval or
int64, and no millisecond reference.

Have someone some suggestion ?
Thanks in advance

----
Linux kernel 2.6.26.8-3
CPU MIPS 4KSd V2.4
System busybox + libuClibc-0.9.30.so
Storage jffs2 / mtd
 
Old 04-10-2019, 04:27 PM   #2
smallpond
Senior Member
 
Registered: Feb 2011
Location: Massachusetts, USA
Distribution: Fedora
Posts: 4,138

Rep: Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263
What's on the console when it halts? Is there a stack trace? What does "last" say was the reason? Do you have a hardware watchdog timer enabled in the BIOS?
 
Old 04-10-2019, 05:14 PM   #3
frankbell
LQ Guru
 
Registered: Jan 2006
Location: Virginia, USA
Distribution: Slackware, Ubuntu MATE, Mageia, and whatever VMs I happen to be playing with
Posts: 19,311
Blog Entries: 28

Rep: Reputation: 6137Reputation: 6137Reputation: 6137Reputation: 6137Reputation: 6137Reputation: 6137Reputation: 6137Reputation: 6137Reputation: 6137Reputation: 6137Reputation: 6137
It's a long shot, but is there anything in the logs?
 
Old 04-11-2019, 01:48 AM   #4
icav
LQ Newbie
 
Registered: Apr 2019
Posts: 3

Original Poster
Rep: Reputation: Disabled
Logs

Hi,
unfortunately the console is not usable, because the machine is located remotely, the only access is via ssh.
After reboot, the previous logs are lost, because they are in tmpfs.

During the tests in laboratory, with console access, we never faced this issue

I tried unsuccessfully to reproduce the phenomenon

-"accelerate" the time
jiffies += SOME_LARGE_VALUE in do_timer(),
but it doesn't work: Linux doesn't run at all (there is a document
by Kobayashi/Toshiba about, I discovered *after*)

- "start" the timer near to the 25days expiration date
u64 jiffies_64 ... = INITIAL_JIFFIES + 2000000L;
but the system run flawless beyond the critical point
 
Old 04-11-2019, 12:29 PM   #5
smallpond
Senior Member
 
Registered: Feb 2011
Location: Massachusetts, USA
Distribution: Fedora
Posts: 4,138

Rep: Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263Reputation: 1263
Can you check for events at the time of the last crash:

Code:
ipmitool sel list
 
Old 04-11-2019, 06:05 PM   #6
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,120

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
UPS ?.
 
Old 04-12-2019, 01:36 AM   #7
ondoho
LQ Addict
 
Registered: Dec 2013
Posts: 19,872
Blog Entries: 12

Rep: Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053Reputation: 6053
Quote:
Originally Posted by icav View Post
After reboot, the previous logs are lost, because they are in tmpfs.
is that configurable?
write logs to different location, NOT tmpfs?
 
Old 04-12-2019, 02:00 AM   #8
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,804

Rep: Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306Reputation: 7306
I don't think anyone can solve it without additional information. So as in post #7 save the logs (and come back after 25 days).
It can be even a simple disk full on your tmpfs, but we can only guess...
 
Old 04-15-2019, 04:26 AM   #9
icav
LQ Newbie
 
Registered: Apr 2019
Posts: 3

Original Poster
Rep: Reputation: Disabled
Well
clearly this is not a "known" issue.
We are verifying the feasibility to connect a remote machine to the console, and hopefully ... But 25days is a long time
Thanks
 
Old 04-15-2019, 05:42 AM   #10
dc.901
Senior Member
 
Registered: Aug 2018
Location: Atlanta, GA - USA
Distribution: CentOS/RHEL, openSuSE/SLES, Ubuntu
Posts: 1,005

Rep: Reputation: 370Reputation: 370Reputation: 370Reputation: 370
Quote:
Originally Posted by icav View Post
Well
clearly this is not a "known" issue.
We are verifying the feasibility to connect a remote machine to the console, and hopefully ... But 25days is a long time
Thanks
Assuming you already checked cron?
And, as mentioned by others, anything in hardware logs:
Code:
ipmitool sel elist
ipmitool sensor
Since, you know this happens in 25 days, perhaps, you should set a script to capture some of the information from system:
- setup a syslog server and send syslogs to it.
- in a loop write out dmesg output to file; same with other info like vmstat, iostat etc

Also, which OS? I have seen in some OS: boot.olog and boot.log - not sure if you have looked at that?

Last edited by dc.901; 04-15-2019 at 05:45 AM. Reason: adding info about boot.log
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to find a file that's modified more than 2 days ago but less than 5 days BudiKusasi Linux - Newbie 1 02-09-2018 07:25 PM
LXer: The first rule of zero-days is no one talks about zero-days (so we'll explain) LXer Syndicated Linux News 0 10-20-2015 10:06 PM
[SOLVED] How to find a file which is older than 5 days but youger than 6 days? thomas2004ch Linux - Software 1 10-29-2013 03:30 AM
How do I grep my /var/log/secure file for the past 7 days or so many days? johnmccarthy Linux - Newbie 5 01-04-2013 09:43 PM
System going down for system halt NOW! (hangs) jdh77 Yoper 14 12-07-2004 03:28 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software > Linux - Kernel

All times are GMT -5. The time now is 09:08 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration