LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 03-16-2017, 01:02 PM   #1
road hazard
Member
 
Registered: Nov 2015
Posts: 241

Rep: Reputation: Disabled
MDADM and bit rot


Mods, being a n00b, if this fits better in that section, feel free to move it.

I've seen more than one discussion pertaining to this topic:

http://unix.stackexchange.com/questi...ion-with-mdadm

Something about how MDADM's scrubs really don't fix errors and stuff about MDADM not verifying parity on reads (only writes). Any truth to all this? Maybe it's a bug that's been addressed since I can't find any recent discussions, only stuff from 2008 to about 2013.
 
Old 03-16-2017, 04:23 PM   #2
thordn
LQ Newbie
 
Registered: Mar 2017
Location: Tyresö, Sweden
Distribution: Slackware
Posts: 12

Rep: Reputation: Disabled
Quote:
Originally Posted by road hazard View Post
Mods, being a n00b, if this fits better in that section, feel free to move it.

I've seen more than one discussion pertaining to this topic:

http://unix.stackexchange.com/questi...ion-with-mdadm

Something about how MDADM's scrubs really don't fix errors and stuff about MDADM not verifying parity on reads (only writes). Any truth to all this? Maybe it's a bug that's been addressed since I can't find any recent discussions, only stuff from 2008 to about 2013.
For raid5 there is no way to know which block is bad if you do not get an error reported from the disk, for raid6 there is a possibility to recover, but I cannot say if the current MDADM uses it or not. I normally do a md5 or sha checksum of all files on an array so I later can see if there have been any corruption (and on what file).

When I started using RAID5 you could quite often get silent corruption of the data due to bandwidth problems on the motherboard or because the system is gradually becoming unreliable etc. So to have an external checksum is recommended so you can at least know that the system is good condition.
 
Old 03-16-2017, 05:59 PM   #3
road hazard
Member
 
Registered: Nov 2015
Posts: 241

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by thordn View Post
For raid5 there is no way to know which block is bad if you do not get an error reported from the disk, for raid6 there is a possibility to recover, but I cannot say if the current MDADM uses it or not. I normally do a md5 or sha checksum of all files on an array so I later can see if there have been any corruption (and on what file).

When I started using RAID5 you could quite often get silent corruption of the data due to bandwidth problems on the motherboard or because the system is gradually becoming unreliable etc. So to have an external checksum is recommended so you can at least know that the system is good condition.
I currently use RAID 6 with my MDADM setup. What is this checksum voodoo you speak of?
 
Old 03-16-2017, 07:02 PM   #4
thordn
LQ Newbie
 
Registered: Mar 2017
Location: Tyresö, Sweden
Distribution: Slackware
Posts: 12

Rep: Reputation: Disabled
Quote:
Originally Posted by road hazard View Post
I currently use RAID 6 with my MDADM setup. What is this checksum voodoo you speak of?
Typically I do something like:

cd <root of structure i want to check>

find . -type f -exec md5sum {} \; >md5sum.sum

Which may take many hours depending on size to check, as for me the md5sum.sum file can be some 500 MB

Then to check you do:

md5sum -c md5sum.sum >md5check.txt

grep FAIL md5check.txt | more


If you then got a fail on a file you know is not modfied or you get different fails a second run you know your setup has problems.
 
Old 03-16-2017, 08:10 PM   #5
road hazard
Member
 
Registered: Nov 2015
Posts: 241

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by thordn View Post
Typically I do something like:

cd <root of structure i want to check>

find . -type f -exec md5sum {} \; >md5sum.sum

Which may take many hours depending on size to check, as for me the md5sum.sum file can be some 500 MB

Then to check you do:

md5sum -c md5sum.sum >md5check.txt

grep FAIL md5check.txt | more


If you then got a fail on a file you know is not modfied or you get different fails a second run you know your setup has problems.
Thanks for the info! Doesn't seem too complicated but if that link I originally posted is true, I sure do wish mdadm could be updated to do some repairing during a scrub.

Unless you know how, I think I'll look into automating that and only sending me an email if there are any failures.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Code rot and OpenBSD JWJones *BSD 12 01-22-2015 09:11 AM
LXer: Internet censorship: Let it rot in walled gardens LXer Syndicated Linux News 0 10-12-2012 07:42 PM
LXer: Checksumming Files to Find Bit-Rot LXer Syndicated Linux News 0 06-29-2011 10:50 AM
wts knoppics 3.1 default rot pasword farhan Linux - Distributions 1 01-17-2004 01:41 AM
what is a 'rot'? bobthebat Linux - General 2 09-16-2001 02:03 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 11:23 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration