LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)
-   -   Raid 5 issues (https://www.linuxquestions.org/questions/linux-server-73/raid-5-issues-4175540302/)

countiec 04-21-2015 03:31 AM

Raid 5 issues
 
Hi all

I have been running a raid 5 server for nearly 1.5 years now and all seemed fine up to now.

The RAID 5 was build with 4 drives of 1TB, leaving me 3 TB of disk. On top I run LVM. Now, after rebooting the drives are not showing up properly. I need to for it to assemble and then it start to rebuild

root@sacorria:~$ mdadm -D /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Wed Feb 6 13:42:47 2013
Raid Level : raid5
Array Size : 2929890816 (2794.16 GiB 3000.21 GB)
Used Dev Size : 976630272 (931.39 GiB 1000.07 GB)
Raid Devices : 4
Total Devices : 4
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Tue Apr 21 07:16:34 2015
State : active, degraded, recovering
Active Devices : 3
Working Devices : 4
Failed Devices : 0
Spare Devices : 1

Layout : left-symmetric
Chunk Size : 512K

Rebuild Status : 1% complete

Name : sacorria:0 (local to host sacorria)
UUID : 3a451151:b4dd6c80:7a7c8668:16b13fa8
Events : 490718

Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
5 8 33 1 spare rebuilding /dev/sdc1
3 8 49 2 active sync /dev/sdd1
4 8 1 3 active sync /dev/sda1



Next, I see it move to setting the sda1 as faulty
root@sacorria:~$ mdadm -D /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Wed Feb 6 13:42:47 2013
Raid Level : raid5
Array Size : 2929890816 (2794.16 GiB 3000.21 GB)
Used Dev Size : 976630272 (931.39 GiB 1000.07 GB)
Raid Devices : 4
Total Devices : 4
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Tue Apr 21 07:19:08 2015
State : active, FAILED, recovering
Active Devices : 2
Working Devices : 3
Failed Devices : 1
Spare Devices : 1

Layout : left-symmetric
Chunk Size : 512K

Rebuild Status : 2% complete

Name : sacorria:0 (local to host sacorria)
UUID : 3a451151:b4dd6c80:7a7c8668:16b13fa8
Events : 490748

Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
5 8 33 1 spare rebuilding /dev/sdc1
3 8 49 2 active sync /dev/sdd1
4 8 1 3 faulty /dev/sda1


and next it becomes even better, I get

root@sacorria:~$ mdadm -D /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Wed Feb 6 13:42:47 2013
Raid Level : raid5
Array Size : 2929890816 (2794.16 GiB 3000.21 GB)
Used Dev Size : 976630272 (931.39 GiB 1000.07 GB)
Raid Devices : 4
Total Devices : 4
Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Tue Apr 21 07:19:34 2015
State : active, FAILED
Active Devices : 2
Working Devices : 3
Failed Devices : 1
Spare Devices : 1

Layout : left-symmetric
Chunk Size : 512K

Name : sacorria:0 (local to host sacorria)
UUID : 3a451151:b4dd6c80:7a7c8668:16b13fa8
Events : 490752

Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
1 0 0 1 removed
3 8 49 2 active sync /dev/sdd1
3 0 0 3 removed

4 8 1 - faulty /dev/sda1
5 8 33 - spare /dev/sdc1


/dev/sda is actually giving errors when I do a smartctl test, but sdc does not. How do I make the /dev/sdc active instead of spare? I can mount it, but I am worried that the array will not last a long time.
- I have seem the recreate with assume-clean etc online, but am worried to perform this action. Have some of you done this succesfully? I would hate to lose 3TB of data.

Kind regards
Steve

frostschutz 04-22-2015 07:30 AM

Can you post mdadm --examine and smartctl -a for all drives?

The big question is, when/why was sdc1 kicked from your array, and how outdated/useful is the data on it?

With sdc1 out of date, your best option would be to ddrescue sda to a new disk and hope that there are not too many errors.

With RAID you must test your disks regularly for damages (smartmontools, long selftests, etc.). If you notice errors during the rebuild, (because the rebuild is the only test you ever ran in ages), it's too late. Without tests, disk errors can go unnoticed for a very long time...

countiec 04-30-2015 04:29 AM

Hi Frost

I gave up on the raid and am not on a ZFS based Raid 10 solution.

Kind regards


All times are GMT -5. The time now is 09:28 PM.