LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Raid6 recovery using madam. (https://www.linuxquestions.org/questions/linux-newbie-8/raid6-recovery-using-madam-4175631768/)

HalB 06-12-2018 06:33 AM

Raid6 recovery using madam.
 
When it comes to recovering from a raid crash using madam, I am a newbie and would like to leverage input from others that have more experience with raid recovery.

Four drives dropped off line with the same count. The two drives with higher counts are not enough to rebuild. No clear evidence of a hardware failure at this time, but data is provided for recheck.

Most of the drive is backed up, however, some data is at risk on the raid6. The goal is to focus on getting the data that is not backed up as a priority.

The raid is built using 3 different vendors (See below). The rationale was to spread the risk in case there was a vendor specific problem. It is unclear if this was a good idea or not. Of the 4 drivers that dropped offline, all four vendors were within the group that dropped.

Desktop drives are in use, which is now understood to be a very bad idea. This may be the root of the problem, but needs to be verified if possible.

The raid is 3 1/2 years old and has been trouble free.

The problem started during the monthly scan which was on June 3, 2018. However, it was a few days later before the problem was noticed, around Thu Jun 7 08:26:49 2018. The problem was noticed when an ssh mount was accessed from another system. The mount had dropped, which was when the failure was first noticed. See the event counts and times below.

# mdadm --version
mdadm - v3.3.2 - 21st August 2014

# cat /etc/debian_version
8.10

The raid was built using Debian 8.x around Sat Feb 7 18:00:18 2015 and updated over time keeping the same major version.

After reading most of online helps I did the following:
# mdadm --stop /dev/md0 <--- was not running
# mdadm --assemble /dev/md0 /dev/sd[c-h]1
mdadm: /dev/md0 assembled from 2 drives - not enough to start the array.

Below is additional information, please advise.

Thank you,
Hal

Information Follows:
=====================================================
mdadm --examine /dev/sd[c-h]1 | egrep 'Event|/dev/sd'
=====================================================
/dev/sdc1:
Events : 10595 --- Update Time : Thu Jun 7 08:26:49 2018
/dev/sdd1:
Events : 10595 --- Update Time : Thu Jun 7 08:26:49 2018
/dev/sde1:
Events : 10558 --- Update Time : Sun Jun 3 06:54:45 2018
/dev/sdf1:
Events : 10558 --- Update Time : Sun Jun 3 06:54:45 2018
/dev/sdg1:
Events : 10558 --- Update Time : Sun Jun 3 06:54:45 2018
/dev/sdh1:
Events : 10558 --- Update Time : Sun Jun 3 06:54:45 2018

The first two have common counts, and the last 4 also have common counts. Note, from the best I can tell, sd[e-h] all dropped off line at the same time, but please advise. When I try to do an assemble it seems that only sd[a-b] are used. All are within arround 50 count of each.

=============================
mdadm --examine /dev/sd[c-h]1
=============================
/dev/sdc1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : f1249a52:20c2d066:3f961210:8a906f88
Name : cuda:0 (local to host cuda)
Creation Time : Sat Feb 7 18:00:18 2015
Raid Level : raid6
Raid Devices : 6

Avail Dev Size : 3750483968 (1788.37 GiB 1920.25 GB)
Array Size : 7500967936 (7153.48 GiB 7680.99 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=0 sectors
State : clean
Device UUID : 8556c42d:72e48fde:5f5066c4:df4006ad

Internal Bitmap : 8 sectors from superblock
Update Time : Thu Jun 7 08:26:49 2018
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 49136a46 - correct
Events : 10595

Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 0
Array State : AA.... ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdd1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : f1249a52:20c2d066:3f961210:8a906f88
Name : cuda:0 (local to host cuda)
Creation Time : Sat Feb 7 18:00:18 2015
Raid Level : raid6
Raid Devices : 6

Avail Dev Size : 3750483968 (1788.37 GiB 1920.25 GB)
Array Size : 7500967936 (7153.48 GiB 7680.99 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=0 sectors
State : clean
Device UUID : 8ee9e94a:4a89dfc6:e5f0878d:9a09947c

Internal Bitmap : 8 sectors from superblock
Update Time : Thu Jun 7 08:26:49 2018
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : e7380b68 - correct
Events : 10595

Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 1
Array State : AA.... ('A' == active, '.' == missing, 'R' == replacing)
/dev/sde1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : f1249a52:20c2d066:3f961210:8a906f88
Name : cuda:0 (local to host cuda)
Creation Time : Sat Feb 7 18:00:18 2015
Raid Level : raid6
Raid Devices : 6

Avail Dev Size : 3750483968 (1788.37 GiB 1920.25 GB)
Array Size : 7500967936 (7153.48 GiB 7680.99 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=0 sectors
State : clean
Device UUID : cd3fa9c6:77043fa8:5c497de9:765d673f

Internal Bitmap : 8 sectors from superblock
Update Time : Sun Jun 3 06:54:45 2018
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 63242d78 - correct
Events : 10558

Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 2
Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdf1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : f1249a52:20c2d066:3f961210:8a906f88
Name : cuda:0 (local to host cuda)
Creation Time : Sat Feb 7 18:00:18 2015
Raid Level : raid6
Raid Devices : 6

Avail Dev Size : 3750483968 (1788.37 GiB 1920.25 GB)
Array Size : 7500967936 (7153.48 GiB 7680.99 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=0 sectors
State : clean
Device UUID : 5565a7cb:0e339395:330a6d42:ab140713

Internal Bitmap : 8 sectors from superblock
Update Time : Sun Jun 3 06:54:45 2018
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 8205f9a3 - correct
Events : 10558

Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 3
Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdg1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : f1249a52:20c2d066:3f961210:8a906f88
Name : cuda:0 (local to host cuda)
Creation Time : Sat Feb 7 18:00:18 2015
Raid Level : raid6
Raid Devices : 6

Avail Dev Size : 3750483968 (1788.37 GiB 1920.25 GB)
Array Size : 7500967936 (7153.48 GiB 7680.99 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=0 sectors
State : clean
Device UUID : bfd3b973:fa4511e1:016ea6b9:e3d016c4

Internal Bitmap : 8 sectors from superblock
Update Time : Sun Jun 3 06:54:45 2018
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 9ddf9b01 - correct
Events : 10558

Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 4
Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)
/dev/sdh1:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x1
Array UUID : f1249a52:20c2d066:3f961210:8a906f88
Name : cuda:0 (local to host cuda)
Creation Time : Sat Feb 7 18:00:18 2015
Raid Level : raid6
Raid Devices : 6

Avail Dev Size : 3750483968 (1788.37 GiB 1920.25 GB)
Array Size : 7500967936 (7153.48 GiB 7680.99 GB)
Data Offset : 262144 sectors
Super Offset : 8 sectors
Unused Space : before=262056 sectors, after=0 sectors
State : clean
Device UUID : ef1943bb:879a6ba9:fb129749:b3db730d

Internal Bitmap : 8 sectors from superblock
Update Time : Sun Jun 3 06:54:45 2018
Bad Block Log : 512 entries available at offset 72 sectors
Checksum : 8710e588 - correct
Events : 10558

Layout : left-symmetric
Chunk Size : 512K

Device Role : Active device 5
Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing)

============================
The following data is also compiled, but it is longer than the allowed character count for this post. I can post if needed.

Additional data:
Section 1 - List of drive manufactures
Section 2 - fdisk -l on the relavant drives
Section 3 - smartctl --xall /dev/sd[c-h] on the raid drives
Section 4 - smartctl -l scterc /dev/sd[c-h] <<--- Not raid drives... May be cause of problem!!! Not sure.

AwesomeMachine 06-13-2018 08:54 AM

Just upload a file of all the additional info to a file sharing site and put the link in your OP.

It is suspect that the array dropped 4 drives so suddenly. It's worth investigating other components within the system, especially the PSU. If you know how to read smart values, are any of them of concern? Which file system are you using?


All times are GMT -5. The time now is 11:21 AM.