LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware
User Name
Password
Slackware This Forum is for the discussion of Slackware Linux.

Notices


Reply
  Search this Thread
Old 11-30-2016, 08:06 PM   #1
mfoley
Senior Member
 
Registered: Oct 2008
Location: Columbus, Ohio USA
Distribution: Slackware
Posts: 2,555

Rep: Reputation: 177Reputation: 177
Why did my server enter an endless trace loop?


Sometime on Wednesday afternoon, Nov. 23rd, my Linux Server ceased function properly, but did not shutdown or reboot. I was no longer able to ssh into it, nor was it resolving DNS (it is the LAN DNS server). The mail port (25) was still open and it was receiving email, but not delivering any. Port 80 was still open. When I finally got to it physically on the following Saturday the console was continuously looping with the 1st line showing "BUG: unable to handle kernel paging request at ffff88030e28fe40". I made a screen-shot of the console - attached.

Some further lines down the output there are the lines:
Code:
CPU: 5 PID: 20523 Comm: sendmail Tainted: G     B    3.10.17 #3
Hardware name: Gigabyte Technology Co., Ltd. Z97X-UD5H/Z97X-UD5H, BIOS F8 06/17/2014
Then a bunch of apparently register dumps, etc.

The "3.10.17" number is the Linux kernel version. "Gigabyte Technology" is the motherboard manufacturer and the "297X-UD5H" is the motherboard product name. "06/17/2014" is the AMI BIOS date, "F8" is the BIOS version number. Not sure any of that is helpful. What might "sendmail Tainted" mean?

Simply rebooting fixed the problem -- apparently.

Does anyone have any idea what happened?

Slackware64 14.1
Attached Thumbnails
Click image for larger version

Name:	HungLoop.jpg
Views:	70
Size:	203.4 KB
ID:	23652  
 
Old 11-30-2016, 10:38 PM   #2
Darth Vader
Senior Member
 
Registered: May 2008
Location: Romania
Distribution: DARKSTAR Linux 2008.1
Posts: 2,727

Rep: Reputation: 1247Reputation: 1247Reputation: 1247Reputation: 1247Reputation: 1247Reputation: 1247Reputation: 1247Reputation: 1247Reputation: 1247
Quote:
Originally Posted by mfoley View Post
Does anyone have any idea what happened?
Some smartass though he can make a 24/7 server using a high-end gaming motherboard, CPU and a bunch of non-ECC memory.

That's the main and fundamental error. This is NOT a Server, but a Gaming Computer forced to do a 24/7 job...

You know, the smart guys invented the Server Grade Motherboards and the ECC Memories, with a reason...

Last edited by Darth Vader; 12-01-2016 at 05:19 AM.
 
Old 12-01-2016, 09:45 AM   #3
mfoley
Senior Member
 
Registered: Oct 2008
Location: Columbus, Ohio USA
Distribution: Slackware
Posts: 2,555

Original Poster
Rep: Reputation: 177Reputation: 177
Well, this was a custom built machine with components, including mother board, recommended. There was no indication on the MB box that it was specifically for gaming.

What, for example, would you suggest as a Server Grade Motherboard?

You may be right on the non-ECC memory, but how do you know that? Because of the motherboard model?
 
Old 12-01-2016, 10:10 AM   #4
bassmadrigal
LQ Guru
 
Registered: Nov 2003
Location: West Jordan, UT, USA
Distribution: Slackware
Posts: 8,792

Rep: Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656Reputation: 6656
The issue that Darth Vader is (rudely) implying is that server hardware tends to be more robust, and that usually includes using ECC (error-correcting code) memory. Long story short, it is possible for a neutron from a cosmic ray to flip a memory bit from 0 to 1 or 1 to 0. This is not common, because neutrinos rarely interact with particles on Earth. There are about a trillion neutrinos passing through your hand every second, but collisions only occur once every few years (see this fun xkcd what-if for more detailed information on neutrinos). When one of these neutrinos "collide" with a bit in your RAM, it can have no RAMifications (heh) on the computer or it can bring it to its knees. It all depends on what was contained in that bit of memory. ECC memory is designed to notice this and correct it.

Unfortunately, as far as I know, there's no way to know whether this issue was caused by some random neutrino or some other hardware/software glitch. Many times, when this occurs due to a neutrino, a quick reboot and you're back to normal and won't experience it again. There's really not much you can do to protect yourself from those types of situations without getting server-grade hardware that uses ECC memory. If this issue starts happening frequently, it could be a sign your hardware is having issues and would be unrelated to neutrinos.

Personally, as long as a machine isn't considered critical, you probably don't need to spend the extra money for server-grade components. I have been happily using my normal desktop rig as a 24/7 server for many years (both within my LAN and out to the internet), but if my machine goes down, only me, my wife, and a few friends would be affected. I also have an htpc running kodi that runs 24/7 that can act as a media server to my mobile phone. I haven't had issues with each, and, to me, neither computer is worth the cost of server-grade hardware based on what they provide. If you're running a major business, you could lose a lot of money in a quick period of time if your desktop turned server goes out, so it's typically beneficial to pay the extra money for server-grade hardware when that computer will be providing a critical function (which would be up to you and your company to decide if it is worth it).

TL;DNR: Don't worry about it unless it happens frequently. Then start testing your hardware because there might be a bad component.
 
4 members found this post helpful.
Old 12-01-2016, 01:38 PM   #5
Skaendo
Senior Member
 
Registered: Dec 2014
Location: West Texas, USA
Distribution: Slackware64-14.2
Posts: 1,445

Rep: Reputation: Disabled
My best guess would be that some piece of your hardware had a hiccup. Like bassmadrigal said you probably don't need to worry about it unless it starts happening more frequently. You could run diagnostic tests, like a memtest, stressing the CPU, checking the SMART status of your HDD, test your power supply, etc. But you will need some downtime to do all that.

Personally, I think that it is the MoBo. Gigabyte IMO is junk. I have a friend that cannot give his Gigabyte laptop away. I have had nothing but issues with their MoBo's. But that is purely a personal opinion and speculation.

My favorite right now is a old ASUS M2NPV-VM MoBo that I have running my personal web facing server. This thing is a workhorse. No ECC-RAM or anthing "server-grade" in it. And it's pretty minimal, MoBo, HDD, PSU, DVD, NIC.
 
  


Reply

Tags
hang, loop, smartass, trace



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
In a for loop that operates on multiple servers, need both the options to enter the password or to skip to next server in loop rajkamalhm Linux - Newbie 7 06-08-2016 09:28 PM
x keeps restarting in endless loop eggbert74 Linux - Newbie 17 08-29-2008 01:38 AM
Hotplug endless loop AzCoder Ubuntu 4 04-30-2005 01:56 PM
Mandrake 10.1 Install endless loop SteveI Mandriva 2 12-30-2004 09:56 AM
HELP- endless loop contrasutra Linux - Software 2 04-26-2003 11:25 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware

All times are GMT -5. The time now is 04:34 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration