LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware
User Name
Password
Linux - Hardware This forum is for Hardware issues.
Having trouble installing a piece of hardware? Want to know if that peripheral is compatible with Linux?

Notices


Reply
  Search this Thread
Old 03-16-2017, 07:04 PM   #1
etpoole60
Member
 
Registered: Jan 2008
Posts: 111

Rep: Reputation: 0
Hardware Error


While running I'm getting the following message found in the messages file after a reboot:

Mar 15 21:57:09 MySystem kernel: [Hardware Error]: Machine check events logged

Where can I find the information on whatever error this concerns? I am running the following hardware:
ASRock X99 Extreme4
Intel Core i7-5820K (liquid cooled)
64 GB DDR4 RAM (ran memtest looking for error and found none)

Running CentOS 6.8 x86_64 completely updated

This custom system is less than 6 months old and I want to know some details so I will know what is being told to me when I take it back.

TIA
Gene
 
Old 03-16-2017, 08:12 PM   #2
frankbell
LQ Guru
 
Registered: Jan 2006
Location: Virginia, USA
Distribution: Slackware, Ubuntu MATE, Mageia, and whatever VMs I happen to be playing with
Posts: 19,323
Blog Entries: 28

Rep: Reputation: 6141Reputation: 6141Reputation: 6141Reputation: 6141Reputation: 6141Reputation: 6141Reputation: 6141Reputation: 6141Reputation: 6141Reputation: 6141Reputation: 6141
Have you checked the logs in /var/log (I just checked; /var/log exists on CentOS. I'd look first at /var/log/messages.

You can also use SystemD tools to view the logs, but I don't have much experience with them: https://www.digitalocean.com/communi...e-systemd-logs
 
Old 03-17-2017, 02:35 PM   #3
etpoole60
Member
 
Registered: Jan 2008
Posts: 111

Original Poster
Rep: Reputation: 0
I saw the message mentioned above in /var/log/messages. I also saw the same message in dmesg. But there was no explanation as to what or where (I assume the when was the time stamp on the message)

Now, let me throw another wrinkle into this...
The message appeared again Mar 17 13:41:00 as the last line in messages. The prior lines in the messages has to do with smartd; ntpd; with the last time stamp of 13:37:24.
But the system did not crash or lock up, it is currently running command line.

There are other dates and times (starting Mar 13) where the message appeared and I would venture a guess that the system locks up 75% of the time when the message appears.

TIA
Gene
 
Old 03-19-2017, 12:20 PM   #4
Soadyheid
Senior Member
 
Registered: Aug 2010
Location: Near Edinburgh, Scotland
Distribution: Cinnamon Mint 20.1 (Laptop) and 20.2 (Desktop)
Posts: 1,672

Rep: Reputation: 486Reputation: 486Reputation: 486Reputation: 486Reputation: 486
I reckon to give anybody a snowball's chance of helping you you're going to have to post the error message and include, say, at least twenty messages prior to it as well. Any clues are likely to before the actual machine check.

Play Bonny!

 
Old 03-19-2017, 12:44 PM   #5
michaelk
Moderator
 
Registered: Aug 2002
Posts: 25,700

Rep: Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895
CentOS 6 is not systemd.

You can post the contents of the /var/log/mcelog file or the output of the command
mcelog
 
Old 03-19-2017, 04:02 PM   #6
etpoole60
Member
 
Registered: Jan 2008
Posts: 111

Original Poster
Rep: Reputation: 0
The mcelog is showing a 0 (zero) length and is dated Mar 13 23:39. I have no idea why it is not working.

Here are some of the messages just prior to the error message at Mar 18 17:58:30:

Mar 18 17:54:45 jpdsys1 ntpd[5121]: 0.0.0.0 c016 06 restart
Mar 18 17:54:45 jpdsys1 ntpd[5121]: 0.0.0.0 c012 02 freq_set kernel 29.022 PPM
Mar 18 17:54:45 jpdsys1 abrtd: Init complete, entering main loop
Mar 18 17:54:46 jpdsys1 /usr/sbin/gpm[5247]: *** info [daemon/startup.c(136)]:
Mar 18 17:54:46 jpdsys1 /usr/sbin/gpm[5247]: Started gpm successfully. Entered daemon mode.
Mar 18 17:54:46 jpdsys1 kernel: device virbr0-nic entered promiscuous mode
Mar 18 17:54:46 jpdsys1 kernel: virbr0: starting userspace STP failed, starting kernel STP
Mar 18 17:54:46 jpdsys1 kernel: ip_tables: (C) 2000-2006 Netfilter Core Team
Mar 18 17:54:46 jpdsys1 kernel: nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
Mar 18 17:54:46 jpdsys1 dnsmasq[5436]: started, version 2.48 cachesize 150
Mar 18 17:54:46 jpdsys1 dnsmasq[5436]: compile time options: IPv6 GNU-getopt DBus no-I18N DHCP TFTP "--bind-interfaces with SO_BINDTODEVICE"
Mar 18 17:54:46 jpdsys1 dnsmasq-dhcp[5436]: DHCP, IP range 192.168.122.2 -- 192.168.122.254, lease time 1h
Mar 18 17:54:46 jpdsys1 dnsmasq[5436]: reading /etc/resolv.conf
Mar 18 17:54:46 jpdsys1 dnsmasq[5436]: using nameserver 68.87.68.166#53
Mar 18 17:54:46 jpdsys1 dnsmasq[5436]: using nameserver 75.75.75.75#53
Mar 18 17:54:46 jpdsys1 dnsmasq[5436]: using nameserver 8.8.8.8#53
Mar 18 17:54:46 jpdsys1 dnsmasq[5436]: read /etc/hosts - 75 addresses
Mar 18 17:54:46 jpdsys1 dnsmasq[5436]: read /var/lib/libvirt/dnsmasq/default.addnhosts - 0 addresses
Mar 18 17:54:46 jpdsys1 dnsmasq[5436]: read /var/lib/libvirt/dnsmasq/default.hostsfile
Mar 18 17:54:46 jpdsys1 kernel: Ebtables v2.0 registered
Mar 18 17:54:46 jpdsys1 kernel: ip6_tables: (C) 2000-2006 Netfilter Core Team
Mar 18 17:54:46 jpdsys1 smartd[5505]: smartd 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-642.15.1.el6.x86_64] (local build)
Mar 18 17:54:46 jpdsys1 smartd[5505]: Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
Mar 18 17:54:46 jpdsys1 smartd[5505]: Opened configuration file /etc/smartd.conf
Mar 18 17:54:46 jpdsys1 smartd[5505]: Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scanning devices
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sda, type changed from 'scsi' to 'sat'
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sda [SAT], opened
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sda [SAT], PNY CS1311 120GB SSD, S/N:PNY051621667702008B2, WWN:5-f8db4c-0516008b2, FW:CS131122, 120 GB
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sda [SAT], not found in smartd database.
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sda [SAT], can't monitor Current_Pending_Sector count - no Attribute 197
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sda [SAT], can't monitor Offline_Uncorrectable count - no Attribute 198
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sda [SAT], is SMART capable. Adding to "monitor" list.
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sdb, type changed from 'scsi' to 'sat'
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sdb [SAT], opened
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sdb [SAT], PNY CS1311 120GB SSD, S/N:PNY0516216677020158E, WWN:5-f8db4c-05160158e, FW:CS131122, 120 GB
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sdb [SAT], not found in smartd database.
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sdb [SAT], can't monitor Current_Pending_Sector count - no Attribute 197
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sdb [SAT], can't monitor Offline_Uncorrectable count - no Attribute 198
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sdb [SAT], is SMART capable. Adding to "monitor" list.
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sdc, type changed from 'scsi' to 'sat'
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sdc [SAT], opened
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sdc [SAT], WDC WD4003FZEX-00Z4SA0, S/N:WD-WCC5D4XJJYD8, WWN:5-0014ee-2b77825cb, FW:01.01A01, 4.00 TB
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sdc [SAT], not found in smartd database.
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sdc [SAT], is SMART capable. Adding to "monitor" list.
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sdd, type changed from 'scsi' to 'sat'
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sdd [SAT], opened
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sdd [SAT], WDC WD4003FZEX-00Z4SA0, S/N:WD-WCC5D4XJJNZZ, WWN:5-0014ee-262229444, FW:01.01A01, 4.00 TB
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sdd [SAT], not found in smartd database.
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sdd [SAT], is SMART capable. Adding to "monitor" list.
Mar 18 17:54:46 jpdsys1 smartd[5505]: Monitoring 4 ATA and 0 SCSI devices
Mar 18 17:54:46 jpdsys1 kernel: lo: Disabled Privacy Extensions
Mar 18 17:54:46 jpdsys1 smartd[5518]: smartd has fork()ed into background mode. New PID=5518.
Mar 18 17:54:49 jpdsys1 ntpd[5121]: Listen normally on 25 virbr0 192.168.122.1 UDP 123
Mar 18 17:54:52 jpdsys1 ntpd[5121]: 0.0.0.0 c615 05 clock_sync
Mar 18 17:58:30 jpdsys1 kernel: [Hardware Error]: Machine check events logged
 
Old 03-19-2017, 06:41 PM   #7
Soadyheid
Senior Member
 
Registered: Aug 2010
Location: Near Edinburgh, Scotland
Distribution: Cinnamon Mint 20.1 (Laptop) and 20.2 (Desktop)
Posts: 1,672

Rep: Reputation: 486Reputation: 486Reputation: 486Reputation: 486Reputation: 486
OK, this is what I see.....
Quote:
Mar 18 17:54:46 jpdsys1 smartd[5505]: smartd 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-642.15.1.el6.x86_64] (local build)
Mar 18 17:54:46 jpdsys1 smartd[5505]: Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net
Mar 18 17:54:46 jpdsys1 smartd[5505]: Opened configuration file /etc/smartd.conf
Mar 18 17:54:46 jpdsys1 smartd[5505]: Configuration file /etc/smartd.conf was parsed, found DEVICESCAN, scanning devices
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sda, type changed from 'scsi' to 'sat'
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sda [SAT], opened
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sda [SAT], PNY CS1311 120GB SSD, S/N:PNY051621667702008B2, WWN:5-f8db4c-0516008b2, FW:CS131122, 120 GB
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sda [SAT], not found in smartd database.
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sda [SAT], can't monitor Current_Pending_Sector count - no Attribute 197
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sda [SAT], can't monitor Offline_Uncorrectable count - no Attribute 198
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sda [SAT], is SMART capable. Adding to "monitor" list.
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sdb, type changed from 'scsi' to 'sat'
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sdb [SAT], opened
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sdb [SAT], PNY CS1311 120GB SSD, S/N:PNY0516216677020158E, WWN:5-f8db4c-05160158e, FW:CS131122, 120 GB
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sdb [SAT], not found in smartd database.
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sdb [SAT], can't monitor Current_Pending_Sector count - no Attribute 197
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sdb [SAT], can't monitor Offline_Uncorrectable count - no Attribute 198
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sdb [SAT], is SMART capable. Adding to "monitor" list.
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sdc, type changed from 'scsi' to 'sat'
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sdc [SAT], opened
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sdc [SAT], WDC WD4003FZEX-00Z4SA0, S/N:WD-WCC5D4XJJYD8, WWN:5-0014ee-2b77825cb, FW:01.01A01, 4.00 TB
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sdc [SAT], not found in smartd database.
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sdc [SAT], is SMART capable. Adding to "monitor" list.
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sdd, type changed from 'scsi' to 'sat'
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sdd [SAT], opened
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sdd [SAT], WDC WD4003FZEX-00Z4SA0, S/N:WD-WCC5D4XJJNZZ, WWN:5-0014ee-262229444, FW:01.01A01, 4.00 TB
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sdd [SAT], not found in smartd database.
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sdd [SAT], is SMART capable. Adding to "monitor" list.
Mar 18 17:54:46 jpdsys1 smartd[5505]: Monitoring 4 ATA and 0 SCSI devices
Mar 18 17:54:46 jpdsys1 kernel: lo: Disabled Privacy Extensions
Mar 18 17:54:46 jpdsys1 smartd[5518]: smartd has fork()ed into background mode. New PID=5518.
Mar 18 17:54:49 jpdsys1 ntpd[5121]: Listen normally on 25 virbr0 192.168.122.1 UDP 123
Mar 18 17:54:52 jpdsys1 ntpd[5121]: 0.0.0.0 c615 05 clock_sync
Mar 18 17:58:30 jpdsys1 kernel: [Hardware Error]: Machine check events logged
You're running smartd 5.43 2012-06-30 r3573 [x86_64-linux-2.6.32-642.15.1.el6.x86_64] (local build)

sda and sdb are both 120Gb SSD drives.

Both SSDs give messages:
Quote:
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sda [SAT], can't monitor Current_Pending_Sector count - no Attribute 197
Mar 18 17:54:46 jpdsys1 smartd[5505]: Device: /dev/sda [SAT], can't monitor Offline_Uncorrectable count - no Attribute 198
When I Google this I get Smartmontools ticket #248 which was logged four years ago.

What I gather from this, SMART attributes are not standardised across both SSDs and HDDs. As you are running a now rather old version of smartd (5.43 as mentioned in the fault ticket) maybe you should be running something more current, looks like 6.5-3? I'm not sure. Perhaps a sysadmin type could either confirm or correct my findings, not me Chiefy, I'm engines. (hardware)

Play Bonny!

 
Old 03-20-2017, 08:50 PM   #8
etpoole60
Member
 
Registered: Jan 2008
Posts: 111

Original Poster
Rep: Reputation: 0
I've removed smartmontools (the package for smartd) while I look for a replacement - seems all of the repositories only have smartd 5.43.

While we are on this portion of the problem, I must say that fdisk invoked during installation when I created all of the raid1 file systems does not like 4 TB hard drives. The install allowed me to make 5 partitions defined as LVM physical volumes and raid1 them. I am still at a loss as to how it did it. Here is what I'm talking about:

When I enter fdisk -l /dev/sdc I get the following to sysout:

WARNING: GPT (GUID Partition Table) detected on '/dev/sdc'! The util fdisk doesn't support GPT. Use GNU Parted.

Disk /dev/sdc: 4000.8 GB, 4000787030016 bytes
255 heads, 63 sectors/track, 486401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00000000

Device Boot Start End Blocks Id System
/dev/sdc1 1 267350 2147483647+ ee GPT
Partition 1 does not start on physical sector boundary.


But when I issue a pvs command I see all 5 partitions (md4 - md8):

PV VG Fmt Attr PSize PFree
/dev/md1 vg_jpdopsys lvm2 a--u 63.93g 29.93g
/dev/md2 vg_jpdswap lvm2 a--u 23.98g 7.98g
/dev/md3 vg_jpdspec lvm2 a--u 22.77g 20.77g
/dev/md4 vg_jpduser lvm2 a--u 359.87g 205.87g
/dev/md5 vg_jpddbdata lvm2 a--u 841.38g 609.38g
/dev/md6 vg_jpdsysdata lvm2 a--u 841.38g 761.38g
/dev/md7 lvm2 ---- 841.38g 841.38g
/dev/md8 lvm2 ---- 841.38g 841.38g


Could all of th9s be a part of the issue?

TIA
Gene
 
Old 03-20-2017, 09:32 PM   #9
michaelk
Moderator
 
Registered: Aug 2002
Posts: 25,700

Rep: Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895
No, because if you read the warning message fdisk does not support GPT disks. Look at the output of the command

parted -l

No idea why a MCE is not be logged to the mcelog file. At the moment I suspect it is a smartd version problem. Might try a recent live version if possible like mint to see if you have the same problem.
 
1 members found this post helpful.
Old 03-21-2017, 01:04 PM   #10
etpoole60
Member
 
Registered: Jan 2008
Posts: 111

Original Poster
Rep: Reputation: 0
I'm going to work with a friend of mine who is way ahead of me on hardware to see if we can isolate this issue.

I spoke to another friend and she suggested that I try CentOS 7 because it has a better error reporting proceses. But she suggested a complete reinstall.

I'm also going to look for some documentation for parted for fdisk users.

TIA
Gene
 
Old 03-21-2017, 01:19 PM   #11
michaelk
Moderator
 
Registered: Aug 2002
Posts: 25,700

Rep: Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895Reputation: 5895
gdisk is similar to fdisk but is just for GPT partitions.
 
Old 03-24-2017, 02:45 PM   #12
etpoole60
Member
 
Registered: Jan 2008
Posts: 111

Original Poster
Rep: Reputation: 0
OK, Several stress test programs were used (tested RAM; video card; CPU loads; disk drive issues) and none failed - but let it be said that all of the tests were run using Windows. So I'm back to thinking that Windows is a lot more tolerant of errors than Linux is.

I'm going to try gdisk and I'll get back to you.

In the meantime I've removed smartmon and removed all partitions from both 4TB disk drives.

TIA
Gene
 
Old 03-24-2017, 07:50 PM   #13
etpoole60
Member
 
Registered: Jan 2008
Posts: 111

Original Poster
Rep: Reputation: 0
Not sure what to say other than I performed all of the disk configuration using gdisk and it was actually more simple than using fdisk. I noticed that there is no 'extended' partition just 5 sequentially numbered partitions. The md devices are still doing their resync, but the machine has been up all day so far.

Why didn't the CentOS installation program 'see' the 4 TB hard drive and invoke the gdisk program?

Looking at /var/log/messages I still see the message (3 times):
kernel: {hardware Error]: Machine check events logged

But no machine crash - yet.

TIA
Gene
 
Old 03-25-2017, 01:15 PM   #14
etpoole60
Member
 
Registered: Jan 2008
Posts: 111

Original Poster
Rep: Reputation: 0
OK, Let me say that what appeared to be a hardware error probably isn't. Using the gdisk program solved all of my partitioning problems on those 4 TB drives.

After that was complete and all of the resync processes were done, this machine received a abort message after I restarted this machine:

Source Problem Last Occurance
gamin Process /usr/libexec/gam_server was killed by signal 11 (SIGSEGV) 2017-03-25 00:50

Should I stay here? Move over to some software forum? Go to a CentOS site?

TIA
Gene
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Hardware error? coralfang Slackware 5 08-01-2017 04:35 PM
[Hardware Error]: System Fatal error epg Slackware 17 02-09-2017 11:34 AM
Help! Hardware error or something... See my picture for the error. barnac1e Mandriva 2 01-06-2012 09:34 AM
82801db No Error But No Sound In 10.2, successfuly detect hardware, no error, just no iromrs Linux - Hardware 2 09-08-2007 07:46 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Hardware

All times are GMT -5. The time now is 09:42 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration