Raid6 recovery using madam.
When it comes to recovering from a raid crash using madam, I am a newbie and would like to leverage input from others that have more experience with raid recovery.
Four drives dropped off line with the same count. The two drives with higher counts are not enough to rebuild. No clear evidence of a hardware failure at this time, but data is provided for recheck. Most of the drive is backed up, however, some data is at risk on the raid6. The goal is to focus on getting the data that is not backed up as a priority. The raid is built using 3 different vendors (See below). The rationale was to spread the risk in case there was a vendor specific problem. It is unclear if this was a good idea or not. Of the 4 drivers that dropped offline, all four vendors were within the group that dropped. Desktop drives are in use, which is now understood to be a very bad idea. This may be the root of the problem, but needs to be verified if possible. The raid is 3 1/2 years old and has been trouble free. The problem started during the monthly scan which was on June 3, 2018. However, it was a few days later before the problem was noticed, around Thu Jun 7 08:26:49 2018. The problem was noticed when an ssh mount was accessed from another system. The mount had dropped, which was when the failure was first noticed. See the event counts and times below. # mdadm --version mdadm - v3.3.2 - 21st August 2014 # cat /etc/debian_version 8.10 The raid was built using Debian 8.x around Sat Feb 7 18:00:18 2015 and updated over time keeping the same major version. After reading most of online helps I did the following: # mdadm --stop /dev/md0 <--- was not running # mdadm --assemble /dev/md0 /dev/sd[c-h]1 mdadm: /dev/md0 assembled from 2 drives - not enough to start the array. Below is additional information, please advise. Thank you, Hal Information Follows: ===================================================== mdadm --examine /dev/sd[c-h]1 | egrep 'Event|/dev/sd' ===================================================== /dev/sdc1: Events : 10595 --- Update Time : Thu Jun 7 08:26:49 2018 /dev/sdd1: Events : 10595 --- Update Time : Thu Jun 7 08:26:49 2018 /dev/sde1: Events : 10558 --- Update Time : Sun Jun 3 06:54:45 2018 /dev/sdf1: Events : 10558 --- Update Time : Sun Jun 3 06:54:45 2018 /dev/sdg1: Events : 10558 --- Update Time : Sun Jun 3 06:54:45 2018 /dev/sdh1: Events : 10558 --- Update Time : Sun Jun 3 06:54:45 2018 The first two have common counts, and the last 4 also have common counts. Note, from the best I can tell, sd[e-h] all dropped off line at the same time, but please advise. When I try to do an assemble it seems that only sd[a-b] are used. All are within arround 50 count of each. ============================= mdadm --examine /dev/sd[c-h]1 ============================= /dev/sdc1: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : f1249a52:20c2d066:3f961210:8a906f88 Name : cuda:0 (local to host cuda) Creation Time : Sat Feb 7 18:00:18 2015 Raid Level : raid6 Raid Devices : 6 Avail Dev Size : 3750483968 (1788.37 GiB 1920.25 GB) Array Size : 7500967936 (7153.48 GiB 7680.99 GB) Data Offset : 262144 sectors Super Offset : 8 sectors Unused Space : before=262056 sectors, after=0 sectors State : clean Device UUID : 8556c42d:72e48fde:5f5066c4:df4006ad Internal Bitmap : 8 sectors from superblock Update Time : Thu Jun 7 08:26:49 2018 Bad Block Log : 512 entries available at offset 72 sectors Checksum : 49136a46 - correct Events : 10595 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 0 Array State : AA.... ('A' == active, '.' == missing, 'R' == replacing) /dev/sdd1: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : f1249a52:20c2d066:3f961210:8a906f88 Name : cuda:0 (local to host cuda) Creation Time : Sat Feb 7 18:00:18 2015 Raid Level : raid6 Raid Devices : 6 Avail Dev Size : 3750483968 (1788.37 GiB 1920.25 GB) Array Size : 7500967936 (7153.48 GiB 7680.99 GB) Data Offset : 262144 sectors Super Offset : 8 sectors Unused Space : before=262056 sectors, after=0 sectors State : clean Device UUID : 8ee9e94a:4a89dfc6:e5f0878d:9a09947c Internal Bitmap : 8 sectors from superblock Update Time : Thu Jun 7 08:26:49 2018 Bad Block Log : 512 entries available at offset 72 sectors Checksum : e7380b68 - correct Events : 10595 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 1 Array State : AA.... ('A' == active, '.' == missing, 'R' == replacing) /dev/sde1: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : f1249a52:20c2d066:3f961210:8a906f88 Name : cuda:0 (local to host cuda) Creation Time : Sat Feb 7 18:00:18 2015 Raid Level : raid6 Raid Devices : 6 Avail Dev Size : 3750483968 (1788.37 GiB 1920.25 GB) Array Size : 7500967936 (7153.48 GiB 7680.99 GB) Data Offset : 262144 sectors Super Offset : 8 sectors Unused Space : before=262056 sectors, after=0 sectors State : clean Device UUID : cd3fa9c6:77043fa8:5c497de9:765d673f Internal Bitmap : 8 sectors from superblock Update Time : Sun Jun 3 06:54:45 2018 Bad Block Log : 512 entries available at offset 72 sectors Checksum : 63242d78 - correct Events : 10558 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 2 Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing) /dev/sdf1: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : f1249a52:20c2d066:3f961210:8a906f88 Name : cuda:0 (local to host cuda) Creation Time : Sat Feb 7 18:00:18 2015 Raid Level : raid6 Raid Devices : 6 Avail Dev Size : 3750483968 (1788.37 GiB 1920.25 GB) Array Size : 7500967936 (7153.48 GiB 7680.99 GB) Data Offset : 262144 sectors Super Offset : 8 sectors Unused Space : before=262056 sectors, after=0 sectors State : clean Device UUID : 5565a7cb:0e339395:330a6d42:ab140713 Internal Bitmap : 8 sectors from superblock Update Time : Sun Jun 3 06:54:45 2018 Bad Block Log : 512 entries available at offset 72 sectors Checksum : 8205f9a3 - correct Events : 10558 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 3 Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing) /dev/sdg1: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : f1249a52:20c2d066:3f961210:8a906f88 Name : cuda:0 (local to host cuda) Creation Time : Sat Feb 7 18:00:18 2015 Raid Level : raid6 Raid Devices : 6 Avail Dev Size : 3750483968 (1788.37 GiB 1920.25 GB) Array Size : 7500967936 (7153.48 GiB 7680.99 GB) Data Offset : 262144 sectors Super Offset : 8 sectors Unused Space : before=262056 sectors, after=0 sectors State : clean Device UUID : bfd3b973:fa4511e1:016ea6b9:e3d016c4 Internal Bitmap : 8 sectors from superblock Update Time : Sun Jun 3 06:54:45 2018 Bad Block Log : 512 entries available at offset 72 sectors Checksum : 9ddf9b01 - correct Events : 10558 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 4 Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing) /dev/sdh1: Magic : a92b4efc Version : 1.2 Feature Map : 0x1 Array UUID : f1249a52:20c2d066:3f961210:8a906f88 Name : cuda:0 (local to host cuda) Creation Time : Sat Feb 7 18:00:18 2015 Raid Level : raid6 Raid Devices : 6 Avail Dev Size : 3750483968 (1788.37 GiB 1920.25 GB) Array Size : 7500967936 (7153.48 GiB 7680.99 GB) Data Offset : 262144 sectors Super Offset : 8 sectors Unused Space : before=262056 sectors, after=0 sectors State : clean Device UUID : ef1943bb:879a6ba9:fb129749:b3db730d Internal Bitmap : 8 sectors from superblock Update Time : Sun Jun 3 06:54:45 2018 Bad Block Log : 512 entries available at offset 72 sectors Checksum : 8710e588 - correct Events : 10558 Layout : left-symmetric Chunk Size : 512K Device Role : Active device 5 Array State : AAAAAA ('A' == active, '.' == missing, 'R' == replacing) ============================ The following data is also compiled, but it is longer than the allowed character count for this post. I can post if needed. Additional data: Section 1 - List of drive manufactures Section 2 - fdisk -l on the relavant drives Section 3 - smartctl --xall /dev/sd[c-h] on the raid drives Section 4 - smartctl -l scterc /dev/sd[c-h] <<--- Not raid drives... May be cause of problem!!! Not sure. |
Just upload a file of all the additional info to a file sharing site and put the link in your OP.
It is suspect that the array dropped 4 drives so suddenly. It's worth investigating other components within the system, especially the PSU. If you know how to read smart values, are any of them of concern? Which file system are you using? |
All times are GMT -5. The time now is 11:21 AM. |