LinuxQuestions.org
Review your favorite Linux distribution.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 02-22-2016, 04:06 PM   #1
gephenzie
LQ Newbie
 
Registered: Feb 2016
Location: Warren, MI USA
Distribution: CentOS 7, Arch Linux
Posts: 8

Rep: Reputation: Disabled
reactivating raid after drive disconnect - all drives now listed as spares


CentOS7 - I have a 32 drive array that I'm working with to learn and once solid, use for storage. Before I do that, I wanted to ensure I knew how to recover from disaster. I've already removed / replaced a disk, and now, after a catastrophic test (unplugged 16 drives in the middle of use, rebooted) I'm unable to get it back online.

Drives in the raid are /dev/sdb1 - /dev/sdab1, fs is ext4. Raid 10. When mdadm assembled, it was a hodgepodge of what drive from where went with what, so when I unplugged the 16 drives at the same time, it was definitely not just mirrors of the other 16 - I expected it to fail.

mdadm.conf is:
Code:
ARRAY /dev/md0 metadata=1.2 name=hz16:0 UUID=0c8a51b5:c79e4eae:a2a30468:40a1e2d4
(I've tried this booting with and w/o the mdadm.conf and get the same result)
Reboot, and /dev/mdstat says:
Code:
[root@hz16 ~]# cat /proc/mdstat
Personalities :
md0 : inactive sdq1[15](S) sdf1[4](S) sdh1[6](S) sdd1[2](S) sdj1[8](S) sde1[3](S) sdp1[14](S) sdn1[12](S) sds1[17](S) sdo1[13](S) sdm1[11](S) sdg1[5](S) sdr1[16](S) sdl1[10](S) sdk1[9](S) sdi1[7](S) sdae1[29](S) sdb1[0](S) sdu1[19](S) sdy1[23](S) sdt1[18](S) sdag1[31](S) sdad1[28](S) sdab1[26](S) sdx1[22](S) sdaa1[25](S) sdaf1[30](S) sdw1[21](S) sdc1[1](S) sdz1[24](S) sdac1[27](S) sdv1[20](S)
      15497953280 blocks super 1.2

unused devices: <none>
Now all my drives are (S)spare drives for some reason.
mdadm --detail /dev/md0 shows:
Code:
[root@hz16 ~]# mdadm --detail /dev/md0
/dev/md0:
        Version : 1.2
     Raid Level : raid0
  Total Devices : 32
    Persistence : Superblock is persistent

          State : inactive

           Name : hz16:0  (local to host hz16)
           UUID : 0c8a51b5:c79e4eae:a2a30468:40a1e2d4
         Events : 8811

    Number   Major   Minor   RaidDevice

       -      65      161        -        /dev/sdaa1
       -      65      177        -        /dev/sdab1
       -      65      193        -        /dev/sdac1
       -      65      209        -        /dev/sdad1
       -      65      225        -        /dev/sdae1
       -      65      241        -        /dev/sdaf1
       -      66        1        -        /dev/sdag1
       -       8       17        -        /dev/sdb1
       -       8       33        -        /dev/sdc1
       -       8       49        -        /dev/sdd1
       -       8       65        -        /dev/sde1
       -       8       81        -        /dev/sdf1
       -       8       97        -        /dev/sdg1
       -       8      113        -        /dev/sdh1
       -       8      129        -        /dev/sdi1
       -       8      145        -        /dev/sdj1
       -       8      161        -        /dev/sdk1
       -       8      177        -        /dev/sdl1
       -       8      193        -        /dev/sdm1
       -       8      209        -        /dev/sdn1
       -       8      225        -        /dev/sdo1
       -       8      241        -        /dev/sdp1
       -      65        1        -        /dev/sdq1
       -      65       17        -        /dev/sdr1
       -      65       33        -        /dev/sds1
       -      65       49        -        /dev/sdt1
       -      65       65        -        /dev/sdu1
       -      65       81        -        /dev/sdv1
       -      65       97        -        /dev/sdw1
       -      65      113        -        /dev/sdx1
       -      65      129        -        /dev/sdy1
       -      65      145        -        /dev/sdz1
I don't know if it's relavent, but blockid looks like this:
Code:
[root@hz16 ~]# blkid
/dev/sda1: UUID="dbdeac26-ee9b-438c-a476-3818399a0853" TYPE="xfs"
/dev/sda2: UUID="aNhB8I-38A5-Jbr2-u5hx-zHVb-VXu3-c7c0nC" TYPE="LVM2_member"
/dev/sdb1: UUID="0c8a51b5-c79e-4eae-a2a3-046840a1e2d4" UUID_SUB="a63a50af-e3ea-4e75-3964-1dd125d55df9" LABEL="hz16:0" TYPE="linux_raid_member"
/dev/sdd1: UUID="0c8a51b5-c79e-4eae-a2a3-046840a1e2d4" UUID_SUB="7d82e0da-679d-3630-9739-8a76dda9c389" LABEL="hz16:0" TYPE="linux_raid_member"
/dev/sde1: UUID="0c8a51b5-c79e-4eae-a2a3-046840a1e2d4" UUID_SUB="b13b82c1-d796-32a7-f927-7f05e83fe79a" LABEL="hz16:0" TYPE="linux_raid_member"
/dev/sdf1: UUID="0c8a51b5-c79e-4eae-a2a3-046840a1e2d4" UUID_SUB="7c35bcc0-2dac-841e-9a2f-41cd59e8e2f9" LABEL="hz16:0" TYPE="linux_raid_member"
/dev/sdg1: UUID="0c8a51b5-c79e-4eae-a2a3-046840a1e2d4" UUID_SUB="44029566-bf9a-70bd-3c5f-1ff9baca4d12" LABEL="hz16:0" TYPE="linux_raid_member"
/dev/sdh1: UUID="0c8a51b5-c79e-4eae-a2a3-046840a1e2d4" UUID_SUB="5c0ce453-de30-aa0e-2544-2f76269eb19b" LABEL="hz16:0" TYPE="linux_raid_member"
/dev/sdi1: UUID="0c8a51b5-c79e-4eae-a2a3-046840a1e2d4" UUID_SUB="f3f93985-c01b-c981-face-99a4bf5c19a6" LABEL="hz16:0" TYPE="linux_raid_member"
/dev/sdc1: UUID="0c8a51b5-c79e-4eae-a2a3-046840a1e2d4" UUID_SUB="ddb36ec7-6ede-0601-afdc-3d9e079c545b" LABEL="hz16:0" TYPE="linux_raid_member"
/dev/sdj1: UUID="0c8a51b5-c79e-4eae-a2a3-046840a1e2d4" UUID_SUB="b2e08a3c-6c81-b25b-2a0e-94d1de178d68" LABEL="hz16:0" TYPE="linux_raid_member"
/dev/sdk1: UUID="0c8a51b5-c79e-4eae-a2a3-046840a1e2d4" UUID_SUB="5731ff29-f928-8748-70f6-956b22f09d14" LABEL="hz16:0" TYPE="linux_raid_member"
/dev/sdm1: UUID="0c8a51b5-c79e-4eae-a2a3-046840a1e2d4" UUID_SUB="fff41ba1-f3af-71cd-d1ba-5ab45897b5a9" LABEL="hz16:0" TYPE="linux_raid_member"
/dev/sdl1: UUID="0c8a51b5-c79e-4eae-a2a3-046840a1e2d4" UUID_SUB="5d5fb37d-fc55-e3f4-e95e-da61bcc7b923" LABEL="hz16:0" TYPE="linux_raid_member"
/dev/sdn1: UUID="0c8a51b5-c79e-4eae-a2a3-046840a1e2d4" UUID_SUB="4487b1a3-059f-187a-5096-9fcaa18a3f50" LABEL="hz16:0" TYPE="linux_raid_member"
/dev/sdo1: UUID="0c8a51b5-c79e-4eae-a2a3-046840a1e2d4" UUID_SUB="304480b3-4f7f-d026-24d3-7c2a0dbe032c" LABEL="hz16:0" TYPE="linux_raid_member"
/dev/sdp1: UUID="0c8a51b5-c79e-4eae-a2a3-046840a1e2d4" UUID_SUB="bd52d56d-2d51-77a2-cba5-2874a7e8ff48" LABEL="hz16:0" TYPE="linux_raid_member"
/dev/sdq1: UUID="0c8a51b5-c79e-4eae-a2a3-046840a1e2d4" UUID_SUB="d29d5e15-cf7a-12e1-9bb3-bd3a33750b1a" LABEL="hz16:0" TYPE="linux_raid_member"
/dev/sdr1: UUID="0c8a51b5-c79e-4eae-a2a3-046840a1e2d4" UUID_SUB="663e48d7-1121-aa4e-d4d5-b33925abbb3b" LABEL="hz16:0" TYPE="linux_raid_member"
/dev/sds1: UUID="0c8a51b5-c79e-4eae-a2a3-046840a1e2d4" UUID_SUB="e5f8140f-20d5-9f60-c944-d2a197365e85" LABEL="hz16:0" TYPE="linux_raid_member"
/dev/sdt1: UUID="0c8a51b5-c79e-4eae-a2a3-046840a1e2d4" UUID_SUB="5d49e23f-40b9-57f1-defc-a226b4d95532" LABEL="hz16:0" TYPE="linux_raid_member"
/dev/sdu1: UUID="0c8a51b5-c79e-4eae-a2a3-046840a1e2d4" UUID_SUB="4b1839e0-2f9b-16e1-c686-9af33bd8e537" LABEL="hz16:0" TYPE="linux_raid_member"
/dev/sdv1: UUID="0c8a51b5-c79e-4eae-a2a3-046840a1e2d4" UUID_SUB="688c659e-1a9f-ef00-c352-b8cd00fbb854" LABEL="hz16:0" TYPE="linux_raid_member"
/dev/sdw1: UUID="0c8a51b5-c79e-4eae-a2a3-046840a1e2d4" UUID_SUB="03074160-826f-1f57-c0f0-cb38191d69e6" LABEL="hz16:0" TYPE="linux_raid_member"
/dev/sdx1: UUID="0c8a51b5-c79e-4eae-a2a3-046840a1e2d4" UUID_SUB="9620abd1-1250-24cc-120d-5959ba12e17b" LABEL="hz16:0" TYPE="linux_raid_member"
/dev/sdy1: UUID="0c8a51b5-c79e-4eae-a2a3-046840a1e2d4" UUID_SUB="5f7d09a7-03ac-74e0-3e94-b062b1d60c34" LABEL="hz16:0" TYPE="linux_raid_member"
/dev/sdz1: UUID="0c8a51b5-c79e-4eae-a2a3-046840a1e2d4" UUID_SUB="a788fdbb-b88e-0144-b7df-caa161a23350" LABEL="hz16:0" TYPE="linux_raid_member"
/dev/sdaa1: UUID="0c8a51b5-c79e-4eae-a2a3-046840a1e2d4" UUID_SUB="8963d482-2770-0b67-fd0c-a7e75fa5c476" LABEL="hz16:0" TYPE="linux_raid_member"
/dev/sdab1: UUID="0c8a51b5-c79e-4eae-a2a3-046840a1e2d4" UUID_SUB="ad77ac2d-eff5-dec7-3970-4e7abe7eaed8" LABEL="hz16:0" TYPE="linux_raid_member"
/dev/sdac1: UUID="0c8a51b5-c79e-4eae-a2a3-046840a1e2d4" UUID_SUB="d7194587-d504-37ab-eecf-863cfd67725d" LABEL="hz16:0" TYPE="linux_raid_member"
/dev/sdad1: UUID="0c8a51b5-c79e-4eae-a2a3-046840a1e2d4" UUID_SUB="e5c1d0fd-bc80-b801-1922-9bb4c1cb650a" LABEL="hz16:0" TYPE="linux_raid_member"
/dev/sdae1: UUID="0c8a51b5-c79e-4eae-a2a3-046840a1e2d4" UUID_SUB="586fb498-83cc-03cf-0b60-5bef78d0e3d7" LABEL="hz16:0" TYPE="linux_raid_member"
/dev/sdaf1: UUID="0c8a51b5-c79e-4eae-a2a3-046840a1e2d4" UUID_SUB="df4dcd45-fd9b-4a7e-713a-8dd78779717b" LABEL="hz16:0" TYPE="linux_raid_member"
/dev/sdag1: UUID="0c8a51b5-c79e-4eae-a2a3-046840a1e2d4" UUID_SUB="0d63a689-dc69-402c-222a-45a14aaca376" LABEL="hz16:0" TYPE="linux_raid_member"
/dev/mapper/centos-root: UUID="940f0995-cc6e-4e90-a79e-47aa7848255c" TYPE="xfs"
/dev/mapper/centos-swap: UUID="f1d3c5d3-7dee-44e0-beb5-65efb63779f2" TYPE="swap"
/dev/mapper/centos-home: UUID="f9eb5725-8690-429f-a596-f937964086f5" TYPE="xfs"
The output of
mdadm --examine /dev/sd{b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,aa,ab,ac,ad,ae,af,ag}1
is very long, so I won't include that here. However, a single drive is:
Code:
/dev/sdag1:
          Magic : a92b4efc
        Version : 1.2
    Feature Map : 0x1
     Array UUID : 0c8a51b5:c79e4eae:a2a30468:40a1e2d4
           Name : hz16:0  (local to host hz16)
  Creation Time : Sun Feb 21 21:58:16 2016
     Raid Level : raid10
   Raid Devices : 32

 Avail Dev Size : 968622080 (461.88 GiB 495.93 GB)
     Array Size : 7748976640 (7390.00 GiB 7934.95 GB)
    Data Offset : 262144 sectors
   Super Offset : 8 sectors
   Unused Space : before=262056 sectors, after=0 sectors
          State : clean
    Device UUID : 0d63a689:dc69402c:222a45a1:4aaca376

Internal Bitmap : 8 sectors from superblock
    Update Time : Mon Feb 22 10:51:42 2016
  Bad Block Log : 512 entries available at offset 72 sectors
       Checksum : 6f9ddbd4 - correct
         Events : 8811

         Layout : near=2
     Chunk Size : 512K

   Device Role : Active device 31
   Array State : AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ('A' == active, '.' == missing, 'R' == replacing)
Ok, so I go to assemble this array,
Code:
mdadm --assemble --force /dev/md0 /dev/sd{b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,aa,ab,ac,ad,ae,af,ag}1
and get:
Code:
mdadm: /dev/sdh1 is busy - skipping
for all 32 drives.

All drives are online and ready, but can't get it to assemble, and all drives are tagged as spares. What do I have to do to get this to assemble once more?

Thanks,
-Jeff
 
Old 02-24-2016, 01:53 AM   #2
sag47
Senior Member
 
Registered: Sep 2009
Location: Orange County, CA
Distribution: Kubuntu x64, Raspbian, CentOS
Posts: 1,845
Blog Entries: 36

Rep: Reputation: 453Reputation: 453Reputation: 453Reputation: 453Reputation: 453
I'll have to look through my notes and get back to you. FWIW the #raid channel on irc.freenode.net has pretty knowledgeable users which can help. They've helped me. Just don't forget about IRC etiquette (simply ask rather than ask to ask, people in different time zones may take 24 hrs to respond, be respectful when asking for help, etc).
 
Old 02-24-2016, 06:21 AM   #3
Soadyheid
Senior Member
 
Registered: Aug 2010
Location: Near Edinburgh, Scotland
Distribution: Cinnamon Mint 17.3 and 18 at present.
Posts: 1,200

Rep: Reputation: 207Reputation: 207Reputation: 207
Quote:
after a catastrophic test (unplugged 16 drives in the middle of use, rebooted) I'm unable to get it back online
I can't guess what sort of failure scenario you were trying to simulate.
I'd say it's toast. RAID 10 can only recover if a failed/pulled disk has its mirror intact.

Quote:
so when I unplugged the 16 drives at the same time, it was definitely not just mirrors of the other 16 - I expected it to fail.
Yup! Toast.

I've only been involved with replacing failed disks in arrays, HP, IBM, Sun, usually at three in the morning! (Why's that? ) So I'd say that your 16 out of 32 disks in the same RAID 10 "failure" is highly improbable. If you split them across two 16 disk JBoDS I'd expect one JBod to mirror the other so a PSU failure on one JBod wouldn't kill everything. (Most arrays have redundant PSUs to mitigate against this as well.)

In your scenario I'd say your quickest option to get back in business would be to re-initialise the RAID and do a restore.

My

Play Bonny!

 
Old 02-24-2016, 02:23 PM   #4
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,661

Rep: Reputation: 1256Reputation: 1256Reputation: 1256Reputation: 1256Reputation: 1256Reputation: 1256Reputation: 1256Reputation: 1256Reputation: 1256
Most raid structures allow for single disk failures to be recovered. raid6 allows for two disk to fail.

Unfortunately, if more than that fail you are toast.

This is why most raid architectures have done it in groups of 5 disks/volume (raid5), then combine multiple
raid 5 volume into mirror groups... Thus requiring four disks to fail (two in each raid5 in a mirror group).

Last edited by jpollard; 02-24-2016 at 02:38 PM.
 
Old 02-24-2016, 02:50 PM   #5
gephenzie
LQ Newbie
 
Registered: Feb 2016
Location: Warren, MI USA
Distribution: CentOS 7, Arch Linux
Posts: 8

Original Poster
Rep: Reputation: Disabled
I figured that since all the drives are still as they were at the time of the fail (data / partitions all intact, superblocks unchanged) then the array could be assembled once again - I'd even expect it to assemble without intervention. I thought that at worst I'd just have to do a file system repair after reassembly to correct any minor file system errors on the last file written. It seems like a major weakness that mdadm can't handle a temporary loss of the drives (power failure, cable disconnect). While true that I'd expect a total failure if 1/2 of a mirror was *permanently* lost; but in this case where it just disappeared for a short time (and thus the disk stopped being utilized) it would seem to be highly recoverable. Think about a scenario where all the power to all equipment goes out - different power supplies die after different time periods - maybe 0.5 seconds apart. Considering that, and the assumption that my current situation is unrecoverable, then *any* power failure would result in a total rebuild of the array and restore of data from backup.

In my current setup I have 2 boxes of 16 drives. The failure was essentially a power failure on one of those boxes (or a loose data cable - my test was actually unplugging the cable for a while). I just thought mdadm would be more tolerant of a temporary disconnect than that.

As I was laying things out for this, one of my questions early on was how to direct mdadm to use what drive in what part of the raid. Obviously as raid-10 with 32 drives in 2 boxes, I'd want one half of each 16 mirrors to be on box1, and the other half of those mirrors to be on box2; then stripe the mirrors. But mdadm during the assemble puts the pairs all over the place. How do you tell it what to stick where? Does it require setting up 16 raid1's first then setup the raid0 (instead of just specifying to assemble a raid10)?

-Jeff
 
Old 02-24-2016, 03:09 PM   #6
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,661

Rep: Reputation: 1256Reputation: 1256Reputation: 1256Reputation: 1256Reputation: 1256Reputation: 1256Reputation: 1256Reputation: 1256Reputation: 1256
Ah... no.

No matter how fast you unplug - some of the disks will be disabled... and the remaining disks informed of that failure.

Even a hard power off won't do that as the disks are designed to maintain operation for a second (or so) of operations to save the current DMA, and any buffers to disk.

The kernel raid software is designed to protect... not prevent.

It likely would have worked if the system were powered down instead...
 
Old 02-24-2016, 04:31 PM   #7
suicidaleggroll
LQ Guru
 
Registered: Nov 2010
Location: Colorado
Distribution: OpenSUSE, CentOS
Posts: 5,396

Rep: Reputation: 2017Reputation: 2017Reputation: 2017Reputation: 2017Reputation: 2017Reputation: 2017Reputation: 2017Reputation: 2017Reputation: 2017Reputation: 2017Reputation: 2017
I'm not sure what everyone is freaking out about. What the OP did is a realistic scenario, think power interruption on the backplane, failed power splitter feeding half the array, etc.

Of course the array will go down, nothing but RAID 1 can protect against that, the point is he replaced the drives and the array is not rebuilding/verifying. The drives, when added back in, were detected as spares instead of the missing parts of the failed array.

I have had exactly this scenario happen on a 24 drive 80 TB RAID 60 system of mine. A power cable went bad and power to 8 drives was cut during operation. It was a hardware RAID, not software, and recovery simply consisted of deleting the array and re-creating it without initialization, followed by an fsck. No data loss, and only minor down time.

Unfortunately I do not know the proper steps to recover the array with mdadm. Frankly I've never heard of somebody using a 32 drive software array, it sounds dangerous to me given my limited experience and numerous hiccups with software raid.
 
Old 02-24-2016, 05:50 PM   #8
gephenzie
LQ Newbie
 
Registered: Feb 2016
Location: Warren, MI USA
Distribution: CentOS 7, Arch Linux
Posts: 8

Original Poster
Rep: Reputation: Disabled
@suicidaleggroll thanks - that was my point; that it seems there *should* be an easy solution for bringing it back online after such an event. I'm not convinced that there isn't a way, but I haven't figured it out yet obviously. You should be able to flick the "spare" bit to "up" then reassemble.

The excessive 32 drive software array is just because I happened into free hardware - one mans garbage... They are only 500GB drives, but it'd be a shame to not play with them and get more familiar with software raids. I've scripted the setup so I can re-do it quickly when necessary. I also have a hardware array on the machine as well but I have not yet dug into playing with it much yet.

Data was never really lost - all just test data to play with the raid.

But if there's no way to recover, does anyone know about the other question? - how to dictate which drive goes where in the raid? Is the "solution" I mentioned (establish 16 mirrors then stripe it) the best way to achieve that? I was concerned that I might loose something to overhead by creating a raid of raids manually like that, but perhaps mdadm is smart enough to optimize it properly. Can mdmadm handle swapping a bad drive within a mirror within a stripe like that? To do it, it would have to have translated the raid1's in the raid0 to be a raid10 on it's own.

I'm looking forward to pulling the plug on 16 drives and having it still run

-Jeff
 
Old 02-24-2016, 05:59 PM   #9
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,661

Rep: Reputation: 1256Reputation: 1256Reputation: 1256Reputation: 1256Reputation: 1256Reputation: 1256Reputation: 1256Reputation: 1256Reputation: 1256
Well, I haven't tried this (not enough disks actually), but mdadm does have a "--re-add" option under the manage command. According to the manpage:

Quote:
If the device name given is faulty then mdadm will find all
devices in the array that are marked faulty, remove them and
attempt to immediately re-add them. This can be useful if you
are certain that the reason for failure has been resolved.
For your case, this might work.

Last edited by jpollard; 02-24-2016 at 06:00 PM.
 
Old 02-25-2016, 06:57 AM   #10
gephenzie
LQ Newbie
 
Registered: Feb 2016
Location: Warren, MI USA
Distribution: CentOS 7, Arch Linux
Posts: 8

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by jpollard View Post
Well, I haven't tried this (not enough disks actually), but mdadm does have a "--re-add" option under the manage command.
I was hopeful there for a moment, but it did not work for me. In fact, it does nothing
Code:
[root@hz16 ~]# mdadm --re-add --verbose /dev/md0
[root@hz16 ~]#
No changes in /proc/mdstat either
Also tried
Code:
[root@hz16 ~]#  mdadm --create --assume-clean /dev/md0 --level=10 --raid-devices=32 /dev/sd{b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z,aa,ab,ac,ad,ae,af,ag}1
mdadm: cannot open /dev/sdb1: Device or resource busy
But nope, it's looking grim.
 
Old 02-25-2016, 01:33 PM   #11
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,661

Rep: Reputation: 1256Reputation: 1256Reputation: 1256Reputation: 1256Reputation: 1256Reputation: 1256Reputation: 1256Reputation: 1256Reputation: 1256
I'm wondering what/why /dev/sdb1 is busy. Something must be using it for it to be busy, and shouldn't be.
 
Old 02-26-2016, 11:20 AM   #12
Soadyheid
Senior Member
 
Registered: Aug 2010
Location: Near Edinburgh, Scotland
Distribution: Cinnamon Mint 17.3 and 18 at present.
Posts: 1,200

Rep: Reputation: 207Reputation: 207Reputation: 207
Quote:
I'm wondering what/why /dev/sdb1 is busy. Something must be using it for it to be busy, and shouldn't be.
OK, here's a guess...

All the drives in a RAID have an extra small partition on them containing data which keeps track of the RAID; disk details including serial No., position of the disk within the array, disk status (ready, failed, recovery/rebuild, etc,).

It needs to read this information which is used to recover from disk failures. 32 disks of a 64 disk array suddenly disappeared so the data on these 32 partitions no longer matches the the remaining 32 which should have been updated to reflect the now missing disks. (This may or may not have happened as the RAID was effectively shot in the head! ) I'd imagine the Op pulled them one at a time so at least some of the remaining disks have had this data updated. Now nothing matches.... AAarrgghh!

You'll notice on the Ops attempt to recover the RAID by re-adding the disks, the first disk it tries to access to read this config data is /dev/sdb1 which I reckon is this RAID system config partition on that disk.
Which disks were pulled? Was this one of them? maybe the data is now corrupt?

Anyway, that's my If I'm wrong in my conceptual description, I think I should at least get a gold star for the attempt!

Play Bonny!

 
Old 02-26-2016, 12:13 PM   #13
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,661

Rep: Reputation: 1256Reputation: 1256Reputation: 1256Reputation: 1256Reputation: 1256Reputation: 1256Reputation: 1256Reputation: 1256Reputation: 1256
Yes, but it isn't "an extra small partition", that is the partition header.

And if the raid doesn't get activated (none of the disks are), then it shouldn't be busy.
 
Old 02-26-2016, 09:31 PM   #14
gephenzie
LQ Newbie
 
Registered: Feb 2016
Location: Warren, MI USA
Distribution: CentOS 7, Arch Linux
Posts: 8

Original Poster
Rep: Reputation: Disabled
"lsof | grep sdb" reports nothing. I am not sure why mdadm reports it as busy (the array is not started, drive not mounted) but it's consistent across reboots.
 
Old 02-26-2016, 09:45 PM   #15
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 15,169

Rep: Reputation: 1943Reputation: 1943Reputation: 1943Reputation: 1943Reputation: 1943Reputation: 1943Reputation: 1943Reputation: 1943Reputation: 1943Reputation: 1943Reputation: 1943
Maybe a race - I used this blog a while back.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Multiple spares with raid jpc82 Linux - General 2 10-05-2009 06:55 AM
Software RAID 6 Drives Marked As Spares And The md Drive is inactive jc_cpu Linux - Server 11 06-04-2009 11:06 AM
Raid Missing Spares shiftytitan Linux - Server 1 06-22-2007 06:48 PM
SCSI drives not listed in /dev/ jimbodude21 Linux - Hardware 3 09-19-2006 01:21 PM
drives listed twice under gnome salparadise Linux - Software 3 09-20-2003 01:05 PM


All times are GMT -5. The time now is 09:28 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration