LinuxQuestions.org - RAID degraded, partition missing from md0

Page 2 of 2

Show 50 post(s) from this thread on one page

- Linux - Hardware (https://www.linuxquestions.org/questions/linux-hardware-18/)

- - RAID degraded, partition missing from md0 (https://www.linuxquestions.org/questions/linux-hardware-18/raid-degraded-partition-missing-from-md0-4175483697/)

Ser Olmy

11-15-2013 02:55 PM

It looks like your current /dev/sdc may have issues. You should resync md3 immediately.

I've actually never seen an md device become read-only before. I found a forum post describing what seems to be a similar issue. Are you by any chance accessing Intel software RAID sets with mdadm?

As for the device names, well, welcome to the SCSI system, where device names are assigned by the kernel on a "first come-first serve" basis.

When you removed sdb, that name became vacant. Normally that would mean that every device gets to move one step up the ladder (sdc becomes sdb, sdd becomes sdc and so on), but on some (if not most) distributions, daemons like udev may interfere and try to preserve device-to-node mappings.

Thankfully, it doesn't really matter to the md driver what name is assigned to devices and partitions, as every component is labeled with a UUID. It does, however make it difficult to determine exactly which device has any given device name at any given time. If you start off with six drives:

Code:

Normal setup:



  1    2    3    4    5    6    

[sda] [sdb] [sdc] [sdd] [sde] [sdf]

...and one is hot-removed, the device disappears:

Code:

After hot-removing sdb:



  1    2    3    4    5    6    

[sda] ----- [sdc] [sdd] [sde] [sdf]

But after a reboot, there's always a risk that device names may have been reassigned:

Code:

After a reboot, and after being subjected

to typically inconsistent udev behaviour:



  1    2    3    4    5    6    

[sda] ----- [sdc] [sdd] [sde] [sdb]

It's mostly just a nuisance, unless you're using device names rather than labels or UUIDs in /etc/fstab.

reano

11-15-2013 03:11 PM

Quote:

Originally Posted by Ser Olmy (Post 5065220)

Code:

Normal setup:



  1    2    3    4    5    6    

[sda] [sdb] [sdc] [sdd] [sde] [sdf]

...and one is hot-removed, the device disappears:

Code:

After hot-removing sdb:



  1    2    3    4    5    6    

[sda] ----- [sdc] [sdd] [sde] [sdf]

But after a reboot, there's always a risk that device names may have been reassigned:

Code:

After a reboot, and after being subjected

to typically inconsistent udev behaviour:



  1    2    3    4    5    6    

[sda] ----- [sdc] [sdd] [sde] [sdb]

It's mostly just a nuisance, unless you're using device names rather than labels or UUIDs in /etc/fstab.

Thanks for the explanation - I suspected that it's simply a renaming to an empty slot issue, but at this stage I'm so paranoid that I'm pessimistic about anything strange :P Luckily we do use UUID's in the fstab, yes, so it should be all good.

I've read the post you've linked, but am still unsure what resolution to follow regarding the read-only swap md device. Not sure what you mean by Intel software raid - we didn't set up the raid devices using the onboard raid utility, we set them up during the original Linux installation using Ubuntu's software raid. So I guess the answer is no?

Could resyncing md3 result in the same catastrophic crash that we experienced when resyncing md0 earlier today? (Also, how exactly do I resync the "right" way?)

PS: I really owe you for sticking with me through this. Much appreciated!

Ser Olmy

11-15-2013 04:34 PM

Quote:

Originally Posted by reano (Post 5065229)

I've read the post you've linked, but am still unsure what resolution to follow regarding the read-only swap md device. Not sure what you mean by Intel software raid - we didn't set up the raid devices using the onboard raid utility, we set them up during the original Linux installation using Ubuntu's software raid. So I guess the answer is no?

I guess so. The person in the thread ended up destroying and recreating the RAID device/array, and I guess you could do the same, if the device in question is only used for swap (which obviously isn't working now, with the device being read-only).

Quote:

Originally Posted by reano (Post 5065229)

Could resyncing md3 result in the same catastrophic crash that we experienced when resyncing md0 earlier today? (Also, how exactly do I resync the "right" way?)

A resync is highly unlikely to cause any problems, quite the opposite. The md driver is remarkably tolerant of errors, and will try to rewrite a bad sector several times using data from another device in the array before failing a RAID member.

Your experience with the drive that used to be sdb is very much atypical, but problems can occur if a device is allowed to "bit rot" for an extended period of time. Arrays need to be verified/"scrubbed" regularly, and the S.M.A.R.T. status of all drives should be continuously monitored.

You can resync an md device by writing "check" to /sys/devices/virtual/block/<device>/md/sync_action. In this case, this command should initiate a verify/resync:

Code:

echo check > /sys/devices/virtual/block/md3/md/sync_action

Quote:

Originally Posted by reano (Post 5065229)

PS: I really owe you for sticking with me through this. Much appreciated!

You're welcome.

reano

11-15-2013 05:50 PM

Quote:

Originally Posted by Ser Olmy (Post 5065261)

Strange, the read-only flagged disappeared suddenly. I'll see what it does after the next reboot (which will probably only be after md3's resync, and preferably on Monday when I'm onsite again to monitor the boot process.

Quote:

Originally Posted by Ser Olmy (Post 5065261)

Code:

echo check > /sys/devices/virtual/block/md3/md/sync_action

Thanks, I'll do that. How do I check the progress of the resync? Also via /proc/mdstat?

By the way, I've noticed something else. Every night at 30mins past midnight, the server backs up the contents of the /home directory to a NAS drive. This process usually takes about 20 minutes, but now it lasted over 90mins. /home resides on md3 - why would it take so long this time? I haven't started the resync on md3 yet, so it can't be that?

Ser Olmy

11-15-2013 05:59 PM

Quote:

Originally Posted by reano (Post 5065297)

Thanks, I'll do that. How do I check the progress of the resync? Also via /proc/mdstat?

That, or run mdadm --detail /dev/md3

Quote:

Originally Posted by reano (Post 5065297)

By the way, I've noticed something else. Every night at 30mins past midnight, the server backs up the contents of the /home directory to a NAS drive. This process usually takes about 20 minutes, but now it lasted over 90mins. /home resides on md3 - why would it take so long this time? I haven't started the resync on md3 yet, so it can't be that?

The md driver implements "read balancing" for RAID 1 sets, so I'd expect read performance to suffer with one device missing.

reano

11-15-2013 06:01 PM

Quote:

Originally Posted by Ser Olmy (Post 5065301)

That, or run mdadm --detail /dev/md3

The md driver implements "read balancing" for RAID 1 sets, so I'd expect read performance to suffer with one device missing.

The device isn't missing though. md3 is running on both devices (sdb1, sdc1). The one device (sdc) has some pending sectors, could it be that?

Ser Olmy

11-15-2013 06:04 PM

Quote:

Originally Posted by reano (Post 5065302)

The device isn't missing though. md3 is running on both devices (sdb1, sdc1). The one device (sdc) has some pending sectors, could it be that?

That could certainly be the reason, in which case you should see read errors in the logs.

reano

11-15-2013 06:06 PM

Quote:

Originally Posted by Ser Olmy (Post 5065303)

That could certainly be the reason, in which case you should see read errors in the logs.

Doing the resync now on md3 - this is going to take a few hours. Perfect excuse to get some sleep, it's about 2:30AM here now and it's (literally and figuratively) been a stormy night. Non-stop lightning since early evening. Seemed extremely appropriate to the situation, too - irony is a right bastard sometimes, hehe.

Okay, so seems both the (old) sdb and the current sdc are faulty. So you'd recommend I replace both those drives, right?

Btw, what do I check for specifically in the SMART status to determine if a drive is going AWOL on me? Only Pending sectors and Reallocated sectors, or is there another red flag to watch out for?

Ser Olmy

11-15-2013 06:52 PM

Yes, I would recommend replacing both drives.

A growing number of defects is the first sign of a drive (slowly) going bad. The sectors first show up in Current_Pending_Sectors as the drive lists them for reallocation, and once reallocated they become part of the Reallocated_Sectors statistics.

The problem with S.M.A.R.T. is that the drive has to detect the errors for them to show up among the attributes. A bad sector will go undetected until you attempt to read it. Even regular backups might not cause such a sector to be read, as incremental or delta backups and de-duplication has become common features. That's why regular verify/scrubbing of a RAID array is of the utmost importance.

As for other S.M.A.R.T. attributes, they can usually be ignored unless the drive status changes to "failing". smartd can be configured to send an e-mail whenever an attribute changes, something I would strongly recommend. Combined with mdadm in "--monitor" mode, you'll be informed if there's trouble brewing.

reano

11-15-2013 06:53 PM

I'm planning to do a weekly scrubbing/resync of the arrays. My plan is to do it via a cron job (echo check > /sys/devices/virtual/block/<md_device>/md/sync_action) as follows:

- md1 (swap, only a few GB) on Wednesday mornings at 3AM, should finish before 4AM.
- md0 (root filesystem, 1.5TB) on Thursday mornings at 3AM, should finish by about 7AM.
- md2 (shared user data, 1.5TB) on Friday mornings at 3AM, should finish by about 7AM.
- md3 (home directories, 3.0TB) on Saturday mornings at 3AM, should finish by about 11AM.
- md4 (user IMAP mails, 3.0TB) on Saturday afternoons at 12PM, should finish by about 8PM.

Can users use the system, shared resources, mails, homedirs, etc while the resync is taking place? Just incase there are some early-birds starting work before 7AM, or on Saturdays? Or will they experience some serious slowdowns?

Then, further to that, I want to do another cronjob to mail me the smartctl output of all drives, daily, every morning at 8AM.

Does this make sense (and was I more or less correct with my time/duration estimates), or would you recommend any changes to the plan above?

Ser Olmy

11-15-2013 06:58 PM

A resync/check could have a slight effect on performance, but nothing anyone will notice unless the system is under significant load.

I probably wouldn't bother with the smartctl reports, as smartd monitors the exact same parameters. Just make sure to run regular tests, as smartd sends notifications only when a parameter actually changes.

reano

11-15-2013 07:04 PM

Quote:

Originally Posted by Ser Olmy (Post 5065333)

Ah okay - was wondering how a resync would react to files changing on the drive as it's trying to sync them (like a user saving a couple new documents on his homedir while a resync is active on that drive).
Thanks for the advice re smartd - I'll study it a bit and set it up.
Will let you know how the resync went and I'll probably shout for some more advice when the time comes to replace the drives, if you don't mind :)

reano

11-15-2013 07:48 PM

Quote:

Originally Posted by Ser Olmy (Post 5065333)

Wait, sorry - do you mean I have to run this regularly:

Code:

smartctl -t long /dev/sda

In addition to actually checking what smartctl (or smartd) says? How "outdated" would the smartctl -a output be if I don't run a longtest? Ooorr, does smartctl -a always show the latest information, whereas smartd only shows updated info after a test has been run?

How regularly would you recommend? Not sure how long it would take on a 3TB drive, want to see if I can work it into the nightly schedule.

Ser Olmy

11-15-2013 08:12 PM

Quote:

Originally Posted by reano (Post 5065368)

Wait, sorry - do you mean I have to run this regularly:

Code:

smartctl -t long /dev/sda

No, the "test" I was referring to, is a test of smartd's capability to send mail. Since it only sends e-mails whenever there's actually something to report, a "silent failure" may go undetected.

I have a separate smartd configuration file with the "-M test" parameter (/etc/smartd-test.conf), and a cron job that runs smartd -q onecheck -c /etc/smartd-test.conf >/dev/null once a month.

reano

11-18-2013 05:56 PM

Hi again,

Ok, the replacement drives have arrived from our suppliers. I've read up on http://www.howtoforge.com/replacing_..._a_raid1_array

The way I *think* that I have to proceed now is:

1. Shut down the system
2. Insert the new hard drive (it will probably be sdf)
3. Copy the partition tables from sda to sdf, with:

Code:

sfdisk -d /dev/sda | sfdisk /dev/sdf

4. Add the sdf partitions to the md0, md1 and md2 arrays:

Code:

mdadm --manage /dev/md0 --add /dev/sdf1

mdadm --manage /dev/md1 --add /dev/sdf2

mdadm --manage /dev/md2 --add /dev/sdf3

Now a few questions:

a) How do I know the partition table copy will copy the structure in the right order? What I mean is, will the size of sda1 be equal to the size of sdf1, or might it mix it up and match sdf2 instead, and sda2 matches sdf1, etc? If you know what I mean?

b) sfdisk won't work, as these are GPT partition tables. What do I use in its stead?

c) Can I add all 3 partitions to the 3 arrays (as demonstrated in point 4 above) at the same time, or do I have to add them one by one and let each one sync first?

d) Anything I'm missing? Am I missing any steps in my list above? Am I correct in my assumptions in points 1 - 4?

e) Do I not have to format the sdf partitions after copying the partition structure from sda to sdf?

PS: I'm only replacing the faulty sdb (which will now probably be sdf) tonight. If it goes well I'll replace sdc later in the week.

Ser Olmy

11-18-2013 08:45 PM

If you plug the new drives into the same SATA ports as the old ones, they will probably be enumerated in the same order as the old disks. And then there's a chance udev will mess it all up and rename them to /dev/sdg or somesuch, but you'll see soon enough.

Don't copy the partition table from another disk! GPT partition are called "GUID Partition Tables" for a reason; there's a GUID in there, and under no circumstances do you want disks with duplicate GUIDs on your system.

parted does GUID tables just fine. Look at the partition table of the mirror disk, and just create partition of the same size.

reano

11-19-2013 12:28 AM

Quote:

Originally Posted by Ser Olmy (Post 5066967)

What about:

Code:

sgdisk -R=/dev/sdb /dev/sda

sgdisk -G /dev/sdb

That supposedly copies the GPT partition table from sda to sdb, and the second line randomizes the GUID's on sdb.

Any feedback on the other questions I got? :)

Ser Olmy

11-21-2013 06:11 PM

Sorry about the late reply. I suppose you've replaced the drives by now.

If you followed the procedure you outlined above, you should be up and running with fully functional RAID arrays.

reano

12-10-2013 04:35 PM

Just an update on the situation..
The first situation has been resolved, drives have been replaced, etc.
However, last night, the md3 array failed. The problem is that md3 contains the /home partition (and nothing else). Luckily we do have a full daily homedirectory backup on a NAS drive, however, the challenge is to get the array back up.

md3 consists of sdc1 and sdd1 (only one partition per drive, so the entire sdc and sdd drives were involved). sdc went down completely, with sdd still up but severely damaged. So my first step was to replace sdc with a new drive and attempt a resync/recovery by adding the new drive into md3. However, the resync kept on failing because of the read errors on sdd. Hours later, I was running out of ideas, and decided to get rid of sdd as well and start the md3 array fresh with no data on it, and then copy the info from the NAS drive.

The only way I managed to do this was to comment out the /home mount on md3 in fstab and reboot into recovery mode. This then enabled me to --stop the md3 array. (At this point, both the sdd and sdc drives were physically disconnected, and the new drive was in the machine. Ubuntu saw this new drive as sdc. So I then recreated md3 with raid type 1, but raid devices also as 1, and specified the device as sdc1 (I copied the partition structure from sdd before I removed it).

This works well - md3 was up with only sdc (the new drive). I checked blkid on md3, and the UUID matches the old md3 UUID. Good so far. I then edited fstab to again mount /home on that UUID and rebooted into recovery mode again. Ubuntu detected tons of filesystem errors (not sure why?) and asked if it should repair. I said yes. After that, it continued the startup process and was up and running, with /home mounted. Except md3 now changed into md127. Apart from that, everything seems fine.

The weird thing is that, whenever I plug sdd (the one faulty drive) back in (SATA port #4), it brings it up on a reboot as md3 with /home mounted on that, and brings up the NEW sdc drive as md127 but with nothing mounted on it (I don't think so anyway). Why on earth would this happen?

So my questions:

1) I probably did not follow the 100% correct procedure (but I didn't know what else to do, it was more an act of desperation) to get the new drive online with /home mounted on it as a new array (md127). But will it work? And I assume I can then "grow" that array once a second hard drive arrives from the suppliers to get 2 devices on that array again?

2) Why does it become md127 instead of md3 after a reboot?

3) Why does md3 come back with the old drive as a device on it (and /home mounted on THAT) whenever I plug the old faulty drive in again and reboot? Is it because of the SATA port, or because of the drive UUID? I'm now too "scared" to plug the second new drive (once it arrives from the suppliers) into that SATA port, for fear that it would try bring that up as md3 and bring down my home directories (that are currently on md127) again. It's almost as if md3 is still in a configuration saved somewhere with the old drives in it or something. I don't know...

All times are GMT -5. The time now is 12:54 PM.

Page 2 of 2

Show 50 post(s) from this thread on one page