|
RAID5 data rebuilding by mdadm results in corrupted data!
one of the devices in my software RAID5 array was removed due to unclean rebooting. /var/log/messages showed:
md: bind<sda3>
md: bind<sdb3>
md: kicking non-fresh sda3 from array!
md: unbind<sda3>
This was also true for another device that was part of a RAID1 array (/dev/sda1)
the status of these devices, when checked using `mdadm -D /dev/md*` was "removed".
also `mdadm -E /dev/sd[a-d]3|grep Event` showed that sda3 had a different Events number (0.4647) than sdc3 and sdd3 (0.8721011). (Note: sdb3 also had a slightly different Events number, 0.8721010 but did not give any error message in /var/log/messages or any bad status when mdadm -E /dev/sdb3 was done)
I tried to rebuild my data by simply removing and readding the faulty devices, by
mdadm /dev/md0 --fail /dev/sda1 --remove /dev/sda1
mdadm /dev/md0 --add /dev/sda1
This seemed to work fine and the device synced in a couple of minutes. So, I did the same with /dev/sda3
mdadm /dev/md2 --fail /dev/sda3 --remove /dev/sda3
mdadm /dev/md2 --add /dev/sda3.
However, this resulted in an error and sda3 could not be added. The status, which was "Not active, degraded" before doing this, now changed to "Active, not started".
So, to start the device I ran the command
mdadm --run /dev/md2
which started the rebuilding process of /dev/sda3
At this point I thought that I was going to save my data, but as it turned out the data that was rebuilt is highly corrupt. A lot of files have garbage and a lot of files/directories don't have any timestamps etc. Some of the original filenames are listed when I do an ls, but I can't open them. there are entries like
?--------- ? ? ? ? ? filename
when I do ls -l
Does this mean that I have lost my data entirely now? Or is there a way to let mdadm know how exactly it should go about regenerating this device so that it gets the data correctly. as of now, mdadm thinks that all the RAID devices are clean.
Thanks in advance to anyone who can point me to any solution to this
|