LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 06-12-2018, 06:33 AM   #1
HalB
LQ Newbie
 
Registered: Jun 2018
Posts: 1

Rep: Reputation: Disabled
Raid6 recovery using madam.


When it comes to recovering from a raid crash using madam, I am a newbie and would like to leverage input from others that have more experience with raid recovery.

Four drives dropped off line with the same count. The two drives with higher counts are not enough to rebuild. No clear evidence of a hardware failure at this time, but data is provided for recheck.

Most of the drive is backed up, however, some data is at risk on the raid6. The goal is to focus on getting the data that is not backed up as a priority.

The raid is built using 3 different vendors (See below). The rationale was to spread the risk in case there was a vendor specific problem. It is unclear if this was a good idea or not. Of the 4 drivers that dropped offline, all four vendors were within the group that dropped.

Desktop drives are in use, which is now understood to be a very bad idea. This may be the root of the problem, but needs to be verified if possible.

The raid is 3 1/2 years old and has been trouble free.

The problem started during the monthly scan which was on June 3, 2018. However, it was a few days later before the problem was noticed, around Thu Jun 7 08:26:49 2018. The problem was noticed when an ssh mount was accessed from another system. The mount had dropped, which was when the failure was first noticed. See the event counts and times below.

# mdadm --version
mdadm - v3.3.2 - 21st August 2014

# cat /etc/debian_version
8.10

The raid was built using Debian 8.x around Sat Feb 7 18:00:18 2015 and updated over time keeping the same major version.

After reading most of online helps I did the following:
# mdadm --stop /dev/md0 <--- was not running
# mdadm --assemble /dev/md0 /dev/sd[c-h]1
mdadm: /dev/md0 assembled from 2 drives - not enough to start the array.

Below is additional information, please advise.

Thank you,
Hal

Information Follows:
=====================================================
mdadm --examine /dev/sd[c-h]1 | egrep 'Event|/dev/sd'
=====================================================
/dev/sdc1:
Events : 10595 --- Update Time : Thu Jun 7 08:26:49 2018
/dev/sdd1:
Events : 10595 --- Update Time : Thu Jun 7 08:26:49 2018
/dev/sde1:
Events : 10558 --- Update Time : Sun Jun 3 06:54:45 2018
/dev/sdf1:
Events : 10558 --- Update Time : Sun Jun 3 06:54:45 2018
/dev/sdg1:
Events : 10558 --- Update Time : Sun Jun 3 06:54:45 2018
/dev/sdh1:
Events : 10558 --- Update Time : Sun Jun 3 06:54:45 2018

The first two have common counts, and the last 4 also have common counts. Note, from the best I can tell, sd[e-h] all dropped off line at the same time, but please advise. When I try to do an assemble it seems that only sd[a-b] are used. All are within arround 50 count of each.

=============================
mdadm --examine /dev/sd[c-h]1
=============================
/dev/sdc1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : f1249a52:20c2d066:3f961210:8a906f88
Name : cuda:0 (local to host cuda)
Creation Time : Sat Feb 7 18:00:18 2015
Raid Level : raid6
Raid Devices : 6

Avail Dev Size : 3750483968 (1788.37 GiB 1920.25 GB)
Array Size : 7500967936 (7153.48 GiB 7680.99 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=0 sectors
State : clean
Device UUID : 8556c42d:72e48fde:5f5066c4:df4006ad

Internal Bitmap : 8 sectors from superblock
Update Time : Thu Jun 7 08:26:49 2018
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 49136a46 - correct
Events : 10595

Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 0
Array State : AA.... ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdd1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : f1249a52:20c2d066:3f961210:8a906f88
Name : cuda:0 (local to host cuda)
Creation Time : Sat Feb 7 18:00:18 2015
Raid Level : raid6
Raid Devices : 6

Avail Dev Size : 3750483968 (1788.37 GiB 1920.25 GB)
Array Size : 7500967936 (7153.48 GiB 7680.99 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=0 sectors
State : clean
Device UUID : 8ee9e94a:4a89dfc6:e5f0878d:9a09947c

Internal Bitmap : 8 sectors from superblock
Update Time : Thu Jun 7 08:26:49 2018
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : e7380b68 - correct
Events : 10595

Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 1
Array State : AA.... ('A' == active, '.' == missing, 'R' == replacing)
/dev/sde1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : f1249a52:20c2d066:3f961210:8a906f88
Name : cuda:0 (local to host cuda)
Creation Time : Sat Feb 7 18:00:18 2015
Raid Level : raid6
Raid Devices : 6

Avail Dev Size : 3750483968 (1788.37 GiB 1920.25 GB)
Array Size : 7500967936 (7153.48 GiB 7680.99 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=0 sectors
State : clean
Device UUID : cd3fa9c6:77043fa8:5c497de9:765d673f

Internal Bitmap : 8 sectors from superblock
Update Time : Sun Jun 3 06:54:45 2018
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 63242d78 - correct
Events : 10558

Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 2
Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdf1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : f1249a52:20c2d066:3f961210:8a906f88
Name : cuda:0 (local to host cuda)
Creation Time : Sat Feb 7 18:00:18 2015
Raid Level : raid6
Raid Devices : 6

Avail Dev Size : 3750483968 (1788.37 GiB 1920.25 GB)
Array Size : 7500967936 (7153.48 GiB 7680.99 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=0 sectors
State : clean
Device UUID : 5565a7cb:0e339395:330a6d42:ab140713

Internal Bitmap : 8 sectors from superblock
Update Time : Sun Jun 3 06:54:45 2018
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 8205f9a3 - correct
Events : 10558

Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 3
Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdg1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : f1249a52:20c2d066:3f961210:8a906f88
Name : cuda:0 (local to host cuda)
Creation Time : Sat Feb 7 18:00:18 2015
Raid Level : raid6
Raid Devices : 6

Avail Dev Size : 3750483968 (1788.37 GiB 1920.25 GB)
Array Size : 7500967936 (7153.48 GiB 7680.99 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=0 sectors
State : clean
Device UUID : bfd3b973:fa4511e1:016ea6b9:e3d016c4

Internal Bitmap : 8 sectors from superblock
Update Time : Sun Jun 3 06:54:45 2018
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 9ddf9b01 - correct
Events : 10558

Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 4
Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdh1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : f1249a52:20c2d066:3f961210:8a906f88
Name : cuda:0 (local to host cuda)
Creation Time : Sat Feb 7 18:00:18 2015
Raid Level : raid6
Raid Devices : 6

Avail Dev Size : 3750483968 (1788.37 GiB 1920.25 GB)
Array Size : 7500967936 (7153.48 GiB 7680.99 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=0 sectors
State : clean
Device UUID : ef1943bb:879a6ba9:fb129749:b3db730d

Internal Bitmap : 8 sectors from superblock
Update Time : Sun Jun 3 06:54:45 2018
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 8710e588 - correct
Events : 10558

Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 5
Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)

============================
The following data is also compiled, but it is longer than the allowed character count for this post. I can post if needed.

Additional data:
Section 1 - List of drive manufactures
Section 2 - fdisk -l on the relavant drives
Section 3 - smartctl --xall /dev/sd[c-h] on the raid drives
Section 4 - smartctl -l scterc /dev/sd[c-h] <<--- Not raid drives... May be cause of problem!!! Not sure.
 
Old 06-13-2018, 08:54 AM   #2
AwesomeMachine
LQ Guru
 
Registered: Jan 2005
Location: USA and Italy
Distribution: Debian testing/sid; OpenSuSE; Fedora; Mint
Posts: 5,524

Rep: Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015Reputation: 1015
Just upload a file of all the additional info to a file sharing site and put the link in your OP.

It is suspect that the array dropped 4 drives so suddenly. It's worth investigating other components within the system, especially the PSU. If you know how to read smart values, are any of them of concern? Which file system are you using?
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Installing onto and existing madam RAID5 array KC9GRD Linux - Newbie 0 02-01-2014 02:48 AM
Raid6 problem saran_redhat Linux - Newbie 6 03-27-2012 10:55 AM
Software mdadm RAID6 recovery - how to reassemble badly broken array eduardr Linux - Server 0 09-20-2011 11:00 AM
RAID6 I/O and Alignment aviso Linux - Server 0 08-16-2009 12:29 PM
set up raid 1 with madam eyt Linux - Software 1 09-01-2005 07:05 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 12:01 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration