Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
First off, what commands are you using to get each of those outputs?
I cannot even fathom what may give those divergent results, so please update the post with the command used for each.
Without the commands we cannot even hope to know the answer.
It is possible that you have had one drive in a failed state for some time and a second failure took the array offline (and makes it unrecoverable). We need more info to know.
Do you have a hardware RAID or a software RAID? If you have a hardware RAID just replace the defective drives and the RAID will rebuild it self. If you have a software RAID it's trashed.
md stops writing to the device when it fails, so sdd failed first, all devices were good up to that point. It was device 0.
sde failed next. device 0 was failed since Nov 25. sde was device 2.
sda and sdb both noted the missing drives (0 and 2) on Dec 5, which is when sde and the RAID failed. They are devices 3 and 1.
Ideally, you should set up monitoring with mdadm in monitor mode and have it email or something when a drive dies.
No idea why it now thinks the array is RAID 0.
Okay, that makes sense to me. Unfortunately I was out of the country when all this happened, so even if I had been notified, I was in no position to do anything about it.
Right now I'm using 'dd' to pull data from the 4 drives in hopes I can rebuild the information at least long enough to recover some of the data. What are the odds of that actually working?
First off, what commands are you using to get each of those outputs?
I cannot even fathom what may give those divergent results, so please update the post with the command used for each.
Without the commands we cannot even hope to know the answer.
It is possible that you have had one drive in a failed state for some time and a second failure took the array offline (and makes it unrecoverable). We need more info to know.
Also please post the output of
Code:
cat /proc/mdstat
rnturn & michaelk are correct in thier deductions of the commands I used. /proc/mdstat states:
That output from /proc/mdstat is not surprising since the raid array is failed and not active.
I appreciate the confirmation on the commands.
I do not envy you the recovery process as it will certainly be tedious at best.
If attempting to use dd to recover the data, I would not suggest using anything other than /dev/sde (the last to fail) for recovery since that lasted a lot longer than the other and the first one to fail will have data that is way out of date.
If you can recover a good image of that one and write the data to a new drive and thus get the array back online in a still degraded state then you can add in a drive to replace the first one that failed and once it has then rebuilt the data you may have a fully functioning raid array
You may find this an interesting read - especially the bit about using overlay files to save stressing dodgy drives. Also note the preference for ddrescue rather than dd where an image is actually required.
mdadm --examine reports data stored on individual devices. Once device fell out of array md naturally stops writing to it so different data on different devices is perfectly ok and lets you track the order in which array collapsed: device with AAAA was the first to go followed by .AAA, and after that array stopped. Since event counts on all devices are pretty close, array should be [almost] ok after you force-assemble it. Run fsck and checkarray after that.
mdadm --examine reports data stored on individual devices. Once device fell out of array md naturally stops writing to it so different data on different devices is perfectly ok and lets you track the order in which array collapsed: device with AAAA was the first to go followed by .AAA, and after that array stopped. Since event counts on all devices are pretty close, array should be [almost] ok after you force-assemble it. Run fsck and checkarray after that.
The event count on /dev/sdd is tiny compared to the other three. /dev/sde is only 6 events less than /dev/sda and /dev/sdb so he may be able to force assemble those 3 into a degraded state.
I would suggest as you did, that he do an fsck and checkarray, but that then he immediately add a 4th disk replacing /dev/sdd and allow the array to fully rebuild before doing anything else, not even mounting it. Alternatively he could do a backup of the data on that array which would involve read only while still in the degraded state.
The event count on /dev/sdd is tiny compared to the other three.
Oh yes, missed that in cursory reading - they are all 5-somethings :) So actually the first drive dropped out of the array ages ago, but OP was not paying attention...
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.