LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Distributions > Slackware
User Name
Password
Slackware This Forum is for the discussion of Slackware Linux.

Notices

Reply
 
Search this Thread
Old 01-28-2012, 02:46 PM   #1
granth
Member
 
Registered: Jul 2004
Location: USA
Distribution: Slackware64
Posts: 210

Rep: Reputation: 39
Lightbulb Howto enable EDAC reporting in Slackware generic/huge kernel.


This is useful on systems with ECC RAM.

You'll want to start by rebuilding the kernel.

Code:
root@machine:/# cd /usr/src/linux
root@machine:/usr/src/linux# make mrproper
root@machine:/usr/src/linux# cp /boot/config-generic-2.6.37.6 .config
Make the following changes to the kernel configuration:

Code:
root@machine:/usr/src/linux# make menuconfig

	Device Drivers
		|_---> EDAC reporting 
			|_---> (M) AMD64
				|_---> (*) Sysfs HW Err Injection Facility
			|_---> (M) INTEL *
			# Select all modules for generic build
The resulting .config should be similar to this:

Code:
root@machine:/usr/src/linux# diff /boot/config-generic-2.6.37.6 .config
4c4
< # Sat Apr  9 12:54:40 2011
---
> # Fri Jan 27 22:00:03 2012
4183c4183,4196
< # CONFIG_EDAC_MM_EDAC is not set
---
> CONFIG_EDAC_MM_EDAC=m
> CONFIG_EDAC_MCE=y
> CONFIG_EDAC_AMD64=m
> CONFIG_EDAC_AMD64_ERROR_INJECTION=y
> CONFIG_EDAC_E752X=m
> CONFIG_EDAC_I82975X=m
> CONFIG_EDAC_I3000=m
> CONFIG_EDAC_I3200=m
> CONFIG_EDAC_X38=m
> CONFIG_EDAC_I5400=m
> CONFIG_EDAC_I7CORE=m
> CONFIG_EDAC_I5000=m
> CONFIG_EDAC_I5100=m
> CONFIG_EDAC_I7300=m
Continue to install the kernel as normal. If you need help, follow the instructions here: http://blog.tpa.me.uk/slackware-kernel-compile-guide/

For modular kernels, add the appropriate modprobe commands into /etc/rc.d/rc.modules.

When you boot up, you will see a message like this:

Code:
[    5.965514] EDAC MC: Ver: 2.1.0 Jan 22 2012
[    5.965721] EDAC amd64_edac:  Ver: 3.3.0 Jan 22 2012
[    5.965817] EDAC amd64: ECC is enabled by BIOS.
[    5.965991] EDAC MC: F10h CPU detected
[    5.965998] EDAC amd64: using x4 syndromes.
[    5.966084] EDAC MC: DCT0 chip selects:
[    5.966085] EDAC MC:  0:  2048MB 1:  2048MB
[    5.966086] EDAC MC:  2:  2048MB 3:  2048MB
[    5.966088] EDAC MC:  4:     0MB 5:     0MB
[    5.966089] EDAC MC:  6:     0MB 7:     0MB
[    5.966218] EDAC MC0: Giving out device to 'amd64_edac' 'Family 10h': DEV 0000:00:18.2
[    5.966395] EDAC PCI0: Giving out device to module 'amd64_edac' controller 'EDAC PCI controller': DEV '0000:00:18.2' (POLLED)
You can also peek into /sys. For example:

Code:
root@machine:~# cat /sys/devices/system/edac/mc/mc0/mc_name          
Family 10h
root@machine:~# cat /sys/devices/system/edac/mc/mc0/sdram_scrub_rate 
390720
root@machine:~# cat /sys/devices/system/edac/mc/mc0/size_mb
16384
root@machine:~# cat /sys/devices/system/edac/mc/mc0/csrow0/mem_type 
Unbuffered-DDR3
root@machine:~# cat /sys/devices/system/edac/mc/mc0/csrow0/edac_mode 
S4ECD4ED
root@machine:~# cat /sys/devices/system/edac/mc/mc0/csrow0/size_mb  
4096
EDAC also provides pci bus error reporting:

Code:
echo 1 > /sys/devices/system/edac/pci/check_pci_errors
Code:
root@machine:~# ls -al /sys/devices/system/edac/pci                 
total 0
drwxr-xr-x 3 root root    0 Jan 27 22:22 ./
drwxr-xr-x 4 root root    0 Jan 27 22:22 ../
-rw-r--r-- 1 root root 4096 Jan 27 22:33 check_pci_errors
-rw-r--r-- 1 root root 4096 Jan 27 22:33 edac_pci_log_npe
-rw-r--r-- 1 root root 4096 Jan 27 22:33 edac_pci_log_pe
-rw-r--r-- 1 root root 4096 Jan 27 22:33 edac_pci_panic_on_pe
drwxr-xr-x 2 root root    0 Jan 27 22:33 pci0/
-r--r--r-- 1 root root 4096 Jan 27 22:33 pci_nonparity_count
-r--r--r-- 1 root root 4096 Jan 27 22:33 pci_parity_count
User-space tools are also available (you must compile and install these separately):

Code:
user@machine:~$ edac-util -v -s
edac-util: EDAC drivers are loaded. 1 MC detected:
  mc0:Family 10h

user@machine:~$ edac-util -v -r
mc0: 0 Uncorrected Errors with no DIMM info
mc0: 0 Corrected Errors with no DIMM info
mc0: csrow0: 0 Uncorrected Errors
mc0: csrow0: ch0: 0 Corrected Errors
mc0: csrow0: ch1: 0 Corrected Errors
mc0: csrow1: 0 Uncorrected Errors
mc0: csrow1: ch0: 0 Corrected Errors
mc0: csrow1: ch1: 0 Corrected Errors
mc0: csrow2: 0 Uncorrected Errors
mc0: csrow2: ch0: 0 Corrected Errors
mc0: csrow2: ch1: 0 Corrected Errors
mc0: csrow3: 0 Uncorrected Errors
mc0: csrow3: ch0: 0 Corrected Errors
mc0: csrow3: ch1: 0 Corrected Errors
Most users will want the system halt on uncorrectable errors. Since the generic Slackware kernel has MCE configured, this should already happen. The following command is only useful on systems without MCE.

Code:
echo "1" > /sys/module/edac_core/parameters/edac_mc_panic_on_ue
For more information on EDAC and ECC, refer to the following websites:

http://git.kernel.org/?p=linux/kerne...ation/edac.txt
http://bluesmoke.sourceforge.net
http://buttersideup.com/edacwiki/
http://cr.yp.to/hardware/ecc.html
http://www.cs.nmsu.edu/~pfeiffer/cla...notes/ecc.html
 
Old 01-30-2012, 08:15 PM   #2
Old_Fogie
Senior Member
 
Registered: Mar 2006
Distribution: SLACKWARE 4TW! =D
Posts: 1,515

Rep: Reputation: 62
I remember building this in a while ago, and seeing the errors made me paranoid. So I turned it off Kind of like when your brakes need fixing you turn the radio up. That said the computer is still running with the same ram and haven't had issues yet. I'm not discounting your hard efforts here by any stretch of the means, just thought I'd post my encounters with it.
 
Old 02-01-2012, 12:35 PM   #3
granth
Member
 
Registered: Jul 2004
Location: USA
Distribution: Slackware64
Posts: 210

Original Poster
Rep: Reputation: 39
Do you remember if they were memory errors or pci bus errors?


Quote:
The presence of PCI Parity errors must be examined with a grain of salt.
There are several add-in adapters that do NOT follow the PCI specification
with regards to Parity generation and reporting. The specification says
the vendor should tie the parity status bits to 0 if they do not intend
to generate parity. Some vendors do not do this, and thus the parity bit
can "float" giving false positives.

If they were memory errors, I suggest you run memtest with ECC disabled.
 
Old 02-06-2012, 02:20 AM   #4
Old_Fogie
Senior Member
 
Registered: Mar 2006
Distribution: SLACKWARE 4TW! =D
Posts: 1,515

Rep: Reputation: 62
Oh I can't remember. But this thread has reminded me that it's time to run memtest on my machines, it's been a few months.
 
  


Reply

Tags
ecc, error, kernel, ram, slackware


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
generic or huge kernels? metrofox Slackware 25 04-15-2009 04:04 AM
powertop reporting a huge number of acpi interrupts gawhelan Linux - Laptop and Netbook 5 01-15-2009 02:12 AM
Generic vs Huge kernel in Slack 12 techyranger Slackware 4 01-23-2008 01:30 PM
slack 12, switch to generic kernel from huge kernel, using grub? jaguarrh Slackware 8 09-19-2007 06:29 AM
LXer: Howto upgrade kernel(2.6.22-9-generic) in Feisty Fawn LXer Syndicated Linux News 0 08-06-2007 05:17 AM


All times are GMT -5. The time now is 03:06 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration