Computer Specs:
Quote:
BIOS Information
Vendor: American Megatrends Inc.
Version: 1303
Release Date: 05/31/2010
BIOS Revision: 8.15
Motherboard: ASUS P7H55-M PRO
Power Supply: Seasonic 55-520GB (520 W)
Processor: Intel(R) Core(TM) i5 CPU 650 @ 3.20GHz
Communication controller: Intel Corporation 5 Series/3400 Series Chipset HECI Controller (rev 06)
PCI bridge: Intel Corporation Core Processor PCI Express x16 Root Port (rev 12)
Host bridge: Intel Corporation Core Processor DRAM Controller (rev 12)
GPU: NVIDIA GeForce GTX 460
CD/DVD/BD: iHES108 2 (Power has been disconnected to simplify troubleshooting)
HDD: Samsung 204UI
RAM: CORSAIR XMS 3 DDR3 1333Hz - 2 double bank dimms at 2048MB each
Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168 PCI Express Gigabit Ethernet controller (rev 03)
Peripherals attached: USB Keyboard and USB mouse. (I had a bluetooth dongle, keyboard and mouse attached but I have temporarily disconnected to simplify things)
|
BACKGROUND
So I have a Slackware 14.1 media center and server at my in-laws (killing their internet and not mine). Two days before the Thanksgiving holiday the computer went offline during a power outage. I will admit that for a few months prior to this I would note that the second hard drive (Western Digital) was acting up and would intermittently freeze the computer when data was being copied.
I arrived at my in-laws for the holidays and discovered upon booting that the Hard drive smart status was bad on the Western Digital hard drive. I removed the western digital drive, booted off the Samsung and found that I was getting a "MCE hardware error" and a report of a "CMCI storm detected"
My /var/log/messages has an endless loop of CMCI Storm errors
Quote:
Dec 6 13:34:38 livingroomtv kernel: [ 5475.329597] CMCI storm detected: switching to poll mode
Dec 6 13:35:08 livingroomtv kernel: [ 5504.887601] CMCI storm subsided: switching to interrupt mode
|
I will output /var/log/messages to the following pastebin:
http://pastebin.com/SxRX0jg7
You will notice towards the end of the file that this CMCI storm is being reported several times a second.
My syslog seems to be reporting an equal number of errors of the following:
Quote:
hid-generic 0003:0A5C:4502.0005: can't reset device, 0000:00:1d.0-1.5.1/input0, status -32
|
Here is my /var/log/syslog pastebin
http://pastebin.com/8nGExtEr
/var/log/dmesg has an MCE hardware error, CMCI Storm and ACPI warning
Quote:
[ 3.809619] mce: [Hardware Error]: Machine check events logged
[ 3.809843] mce: [Hardware Error]: Machine check events logged
[ 3.838073] devtmpfs: mounted
[ 3.839240] Freeing unused kernel memory: 1272k freed
[ 3.839605] Write protecting the kernel read-only data: 16384k
[ 3.840932] Freeing unused kernel memory: 552k freed
[ 3.842410] Freeing unused kernel memory: 792k freed
[ 3.881006] CMCI storm detected: switching to poll mode
[ 4.593463] loop: module loaded
[ 4.736720] udevd[194]: starting version 182
[ 5.441489] microcode: CPU0 sig=0x20652, pf=0x2, revision=0x9
[ 5.455270] microcode: CPU1 sig=0x20652, pf=0x2, revision=0x9
[ 5.455485] microcode: CPU2 sig=0x20652, pf=0x2, revision=0x9
[ 5.455749] microcode: CPU3 sig=0x20652, pf=0x2, revision=0x9
[ 5.456025] microcode: Microcode Update Driver: v2.00 <tigran@aivazian.fsnet.co.uk>, Peter Oruba
[ 5.699829] ACPI Warning: 0x0000000000000828-0x000000000000082f SystemIO conflicts with Region \PMRG 1 (20130328/utaddress-251)
[ 5.700289] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
[ 5.700649] ACPI Warning: 0x0000000000000540-0x000000000000054f SystemIO conflicts with Region \GPS1 1 (20130328/utaddress-251)
[ 5.701107] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
[ 5.701466] ACPI Warning: 0x0000000000000530-0x000000000000053f SystemIO conflicts with Region \GPS1 1 (20130328/utaddress-251)
[ 5.701921] ACPI Warning: 0x0000000000000530-0x000000000000053f SystemIO conflicts with Region \GPS0 2 (20130328/utaddress-251)
[ 5.702374] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
[ 5.702734] ACPI Warning: 0x0000000000000500-0x000000000000052f SystemIO conflicts with Region \GPS1 1 (20130328/utaddress-251)
[ 5.703188] ACPI Warning: 0x0000000000000500-0x000000000000052f SystemIO conflicts with Region \GPS0 2 (20130328/utaddress-251)
[ 5.703641] ACPI: If an ACPI driver is available for this device, you should use it instead of the native driver
|
Here is the entire /var/log/dmesg pastebin
http://pastebin.com/iGNvRfEX
What I've Tried So Far:
I'm not all that familiar with hardware problems but here is what I have done:
-Physically inspected the motherboard - no swollen capacitors or any obvious problems there.
-I have disconnected the bluetooth dongle and CD/DVD/BD drive to simplify trouble shooting a bit.
-I have tried swapping out the RAM with some spare Crucial DDR3 240 pin 1333Hz RAM - This RAM is known to be functional and comes from a recent RAM upgrade of another computer.
-I ran check disk from gparted on the current Samsung HDD without problems.
-I have changed the SATA HDD cable
-I noted that the CPU was 150-160 F on all 4 cores after a fresh boot so I replaced the arctic silver and it boots up at under 100 degrees F on all 4 cores.