LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 03-14-2016, 03:14 AM   #1
ErnestoC
LQ Newbie
 
Registered: Mar 2016
Posts: 9

Rep: Reputation: Disabled
S5520UR Reboots unexpectedly - Nothing on the logs


Hello,

I have a S5520UR server running with Ubuntu 14.04.4 LTS (GNU/Linux 3.16.0-62-generic x86_64).

For (apparently) no reason the server reboots randomly and I can't find nothing on the logs prior to the reboots.

Does anyone know where could I search for the causes ?

Thanks !

Ernesto.
 
Old 03-14-2016, 05:28 AM   #2
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 16,289

Rep: Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322
If the logs show nothing, increase your logging level.

Have you a watchdog enabled?
Is ACPI or anything configured to reboot on any condition? You can eliminate ACPI by running it with '-l' which throws events into syslog.
 
Old 03-14-2016, 09:02 AM   #3
ErnestoC
LQ Newbie
 
Registered: Mar 2016
Posts: 9

Original Poster
Rep: Reputation: Disabled
error

Last edited by ErnestoC; 03-14-2016 at 09:58 AM.
 
Old 03-14-2016, 09:08 AM   #4
ErnestoC
LQ Newbie
 
Registered: Mar 2016
Posts: 9

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by business_kid View Post
If the logs show nothing, increase your logging level.

Have you a watchdog enabled?
Is ACPI or anything configured to reboot on any condition? You can eliminate ACPI by running it with '-l' which throws events into syslog.
Hello business_kid and thanks for your quick answer !
There isn't a watchdog installed.

I'll see if I can increase the level of logging ...

Thanks !
 
Old 03-14-2016, 10:57 AM   #5
ErnestoC
LQ Newbie
 
Registered: Mar 2016
Posts: 9

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by ErnestoC View Post
Hello business_kid and thanks for your quick answer !
There isn't a watchdog installed.

I'll see if I can increase the level of logging ...

Thanks !
The only errors I've found are these in syslog:

kernel: [601468.885157] Uhhuh. NMI received for unknown reason 31 on CPU 1.
kernel: [601468.885648] Do you have a strange power saving mode enabled?
kernel: [601468.886129] Dazed and confused, but trying to continue

But the dont happend just before the reboot.

There's still nothing at the moment of the reboot...

Do you think that these error messages could be related to the unexpected reboots ?
 
Old 03-15-2016, 02:41 AM   #6
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 16,289

Rep: Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322
Code:
kernel: [601468.886129] Dazed and confused, but trying to continue
That's a kernel error Google that.

NMI. not being received is serious anda possible indicator of trouble (software OR hardware). The 'NM' in that stands for "Non Maskable." If that's not getting through, it sounds like one CPU goes AWOL.
 
Old 03-21-2016, 05:06 AM   #7
ErnestoC
LQ Newbie
 
Registered: Mar 2016
Posts: 9

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by business_kid View Post
If the logs show nothing, increase your logging level.

Have you a watchdog enabled?
Is ACPI or anything configured to reboot on any condition? You can eliminate ACPI by running it with '-l' which throws events into syslog.
Hi, I've just found out that there is a watcdog enabled, found these on the kernel log :

kern.log.1:Mar 14 17:27:25 SUPERPUMA kernel: [ 0.323307] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
kern.log.1:Mar 18 20:30:12 SUPERPUMA kernel: [ 0.323312] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.

Is the watchdog causing the reboots ?

Do I need to disable it ?

Thanks !
 
Old 03-21-2016, 06:34 AM   #8
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 16,289

Rep: Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322
A watchdog watches for processes going out to lunch, and resets the box. This is usually done with a timer counting down. Every time you get back to the basic routine, it loads up the timer. If one of your processes goes out to lunch, your core goes with it, so the watchdog runs out and you reset.
One thing you could do is disable it. After bootup, this might do it.
Code:
echo 0 > /proc/sys/kernel/nmi_watchdog
 
Old 03-21-2016, 09:22 AM   #9
ErnestoC
LQ Newbie
 
Registered: Mar 2016
Posts: 9

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by business_kid View Post
A watchdog watches for processes going out to lunch, and resets the box. This is usually done with a timer counting down. Every time you get back to the basic routine, it loads up the timer. If one of your processes goes out to lunch, your core goes with it, so the watchdog runs out and you reset.
One thing you could do is disable it. After bootup, this might do it.
Code:
echo 0 > /proc/sys/kernel/nmi_watchdog
Thanks for the answer !

I'm currently testing the system with the ACPI off, if I get a reboot I'll disable de watchdog also...

Do you think these log messages are connected with the reboots ?

/var/log/kern.log.1:Mar 18 20:30:12 SUPERPUMA kernel: [ 27.867103] ACPI Warning: SystemIO range 0x0000000000000428-0x000000000000042F conflicts with OpRegion 0x0000000000000428-0x000000000000042F (\GPE0) (20140424/utaddress-254)
/var/log/kern.log.1:Mar 18 20:30:12 SUPERPUMA kernel: [ 27.867109] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
/var/log/kern.log.1:Mar 18 20:30:12 SUPERPUMA kernel: [ 27.867112] ACPI Warning: SystemIO range 0x0000000000000500-0x000000000000052F conflicts with OpRegion 0x0000000000000500-0x000000000000052F (\_SI_.SIOR) (20140424/utaddress-254)
/var/log/kern.log.1:Mar 18 20:30:12 SUPERPUMA kernel: [ 27.867114] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver


/var/log/kern.log.1:Mar 18 20:30:12 SUPERPUMA kernel: [ 0.648897] ACPI FADT declares the system doesn't support PCIe ASPM, so disable it

[ 0.680073] pci 0000:00:1f.0: can't claim BAR 13 [io 0x0400-0x047f]: address
 
Old 03-21-2016, 12:09 PM   #10
ErnestoC
LQ Newbie
 
Registered: Mar 2016
Posts: 9

Original Poster
Rep: Reputation: Disabled
A little update... The system reboots no longer randomly but every monday between 5 pm and 6 pm...
It has been the same thing for the last weeks...

Does this tell us something that could lead me to resolution ?
 
Old 03-22-2016, 03:43 AM   #11
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 16,289

Rep: Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322
There's usually a rush of email around that time, isn't there? I have no idea what that means in your context. What does 5 - 6 p.m. mean to you?
 
Old 03-23-2016, 11:04 AM   #12
ErnestoC
LQ Newbie
 
Registered: Mar 2016
Posts: 9

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by business_kid View Post
There's usually a rush of email around that time, isn't there? I have no idea what that means in your context. What does 5 - 6 p.m. mean to you?
Hi !

It is not un email server, we use it only as a virtual machine server.
We have Oracle VirtualBox on it.

Nothing happens on mondays 5 to 6pm.

VM's backups are launched at midnight...

Sometimes it reboots without any load, sometimes it does when we are working on the VM's...

ACPI=off didn't work, I'm disabling tonight the NMI_watchdog to see what happens...

Thanks for your answers ! It really helps me because I don't know where else to look...
 
Old 03-24-2016, 05:28 AM   #13
business_kid
LQ Guru
 
Registered: Jan 2006
Location: Ireland
Distribution: Slackware, Slarm64 & Android
Posts: 16,289

Rep: Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322Reputation: 2322
I woul;d turn up logging and scour the log from 5 - 6 to see if anyone attacks, mail or ftp gets busy, whatever. It could be idle, start on disk maintenance, and puke on that. Find out.
 
Old 03-24-2016, 06:16 AM   #14
ErnestoC
LQ Newbie
 
Registered: Mar 2016
Posts: 9

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by business_kid View Post
I woul;d turn up logging and scour the log from 5 - 6 to see if anyone attacks, mail or ftp gets busy, whatever. It could be idle, start on disk maintenance, and puke on that. Find out.
Thanks !

I'll try to compare the monday syslog with the tuesday one to see if I can find something odd...
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
KDE logs me out unexpectedly tux_addict Debian 4 12-23-2017 10:41 PM
capturing logs of random reboots on Debian Squeeze + Xen bweaver Linux - Virtualization and Cloud 2 08-16-2011 12:30 PM
Sysstat logs corrupted when server reboots after hang... ddenton Linux - General 0 01-07-2011 09:03 PM
Machine locks up unexpectedly, questions about cryptic kernel logs tisource Linux - Hardware 0 05-13-2006 12:03 PM
Computer Reboots - What Logs Should I Check? MoghNX01 Linux - Newbie 6 12-15-2005 11:09 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 09:01 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration