LinuxQuestions.org - Drive intermittently dropping from RAID5 array

- Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)

- - Drive intermittently dropping from RAID5 array (https://www.linuxquestions.org/questions/linux-server-73/drive-intermittently-dropping-from-raid5-array-739719/)

Drive intermittently dropping from RAID5 array

I have a 9x320G RAID5 array that I am migrating over to a 3x1.5T RAID5 array. Intermittently, a drive would drop out of the older array and it would automatically start rebuilding. I thought it was a bad cable or controller somewhere, so when I bought the three new drives, I bought a new controller for them all, too.

I'm running both arrays side by side until I'm happy the new hardware is stable (one drive was DOA). Then I noticed one morning that both arrays were rebuilding themselves. This was in /var/log/messages:

Code:

Jul  5 00:30:19 mnemosyne -- MARK --

Jul  5 00:50:19 mnemosyne -- MARK --

Jul  5 01:06:02 mnemosyne kernel: md: syncing RAID array md0

Jul  5 01:06:02 mnemosyne kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.

Jul  5 01:06:02 mnemosyne kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.

Jul  5 01:06:02 mnemosyne kernel: md: using 128k window, over a total of 312568576 blocks.

Jul  5 01:06:02 mnemosyne kernel: md: syncing RAID array md1

Jul  5 01:06:02 mnemosyne kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.

Jul  5 01:06:02 mnemosyne kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.

Jul  5 01:06:02 mnemosyne kernel: md: using 128k window, over a total of 1465135936 blocks.

Jul  5 01:30:20 mnemosyne -- MARK --

Jul  5 01:50:20 mnemosyne -- MARK --

Each array is on separate controllers, and the three new drives are actually on a separate PSU, too, not using any of the nice drive cages I have for the older ones. Any idea what cause both arrays to rebuild at the same time? There was nothing in the logs prior to the above.

Any ideas? Any other useful information I can provide?

Got emails that the array was rebuilding this morning, and found this in syslog:

Code:

Sep  6 01:06:01 mnemosyne /USR/SBIN/CRON[4548]: (root) CMD ([ -x /usr/share/mdadm/checkarray ] && [ $(date +%d) -le 7 ] && /usr/share/mdadm/checkarray --cron --all --quiet)

Sep  6 01:06:02 mnemosyne kernel: md: syncing RAID array md0

Sep  6 01:06:02 mnemosyne kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.

Sep  6 01:06:02 mnemosyne kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.

Sep  6 01:06:02 mnemosyne kernel: md: using 128k window, over a total of 312568576 blocks.

Sep  6 01:06:02 mnemosyne kernel: md: syncing RAID array md1

Sep  6 01:06:02 mnemosyne kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.

Sep  6 01:06:02 mnemosyne kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.

So it looks like this is debian-specific, as checkarray is something written just for debian. It runs the first Sunday of every month at 1:06 AM, which matches up with the last time I saw this, noted above, on July 5.

So, my best guess is there's something wrong with at least one drive on each array that's causing the check to fail, hence the rebuilding. I know there's something up with one of the three drives on md1, and md0 has nine drives, all of which are much older and used, so it wouldn't surprise me if there was an issue there, too.

I guess I was expecting it to say "problem with device X, rebuilding" but it doesn't really know what device has the problem, all it knows is the array itself is out of sync.