LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 10-17-2007, 10:06 AM   #1
Captain Mullet
LQ Newbie
 
Registered: Jan 2006
Posts: 3

Rep: Reputation: 0
Interesting Problem With Software Raid 5


Greetings,

The short version is that I'm wondering if there is a file somewhere I can edit that would allow me to manually specify what hard drives I want to be used in an array created with mdadm.

The long version is this:
Recently I had one of four disks fail in my raid 5 array. Finding the disk was relatively easy, as it was clunking ever so gracefully. I bought a new disk, and slipped it in. When I restarted my computer, I ran a
#mdadm /dev/md0 -a /dev/sdd1
And nothing happened. I tried a few more commands, looked around on the internet, and then rebooted again. My desktop was giving me a fair amount of disk errors on boot, so I took out every disk in my server except the boot drive and the 4 array drives. I edited mdstat.conf to remove the other raid I had created (raid1), and removed all references to anything inside /etc/fstab. Now, when I boot, this is the dmesg excerpt.

Code:
scsi3 : sata_promise
  Vendor: ATA       Model: ST3500630AS       Rev: 3.AA
  Type:   Direct-Access                      ANSI SCSI revision: 05
  Vendor: ATA       Model: ST3500630AS       Rev: 3.AA
  Type:   Direct-Access                      ANSI SCSI revision: 05
  Vendor: ATA       Model: ST3500630AS       Rev: 3.AA
  Type:   Direct-Access                      ANSI SCSI revision: 05
  Vendor: ATA       Model: ST3500630AS       Rev: 3.AA
  Type:   Direct-Access                      ANSI SCSI revision: 05
SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB)
 sda: sda1
sd 0:0:0:0: Attached scsi disk sda
SCSI device sdb: 976773168 512-byte hdwr sectors (500108 MB)
 sdb: sdb1
sd 1:0:0:0: Attached scsi disk sdb
SCSI device sdc: 976773168 512-byte hdwr sectors (500108 MB)
 sdc: sdc1
sd 2:0:0:0: Attached scsi disk sdc
SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB)
 sdd: sdd1
sd 3:0:0:0: Attached scsi disk sdd
md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: bitmap version 4.39
md: raid1 personality registered for level 1
raid5: automatically using best checksumming function: pIII_sse
   pIII_sse  :  1978.000 MB/sec
raid5: using function: pIII_sse (1978.000 MB/sec)
md: raid5 personality registered for level 5
md: raid4 personality registered for level 4
md: md1 stopped.
md: md0 stopped.
md: bind<sdb1>
md: bind<sdc1>
md: bind<sda1>
md: bind<sdd1>
md: kicking non-fresh sdc1 from array!
md: unbind<sdc1>
md: export_rdev(sdc1)
md: kicking non-fresh sdb1 from array!
md: unbind<sdb1>
md: export_rdev(sdb1)
raid5: device sdd1 operational as raid disk 3
raid5: not enough operational devices for md0 (3/4 failed)
RAID5 conf printout:
 --- rd:4 wd:1 fd:3
 disk 3, o:1, dev:sdd1
raid5: failed to run raid set md0
md: pers->run() failed ...
I figured it would be a simple matter of failing the disks, removing them, and re-adding them. But when I attempt to fail or remove the two disks, I am given the message that they do not exist. When I attempt to add them, I am given the message that they are busy or in use. They are not mounted, and since the filesystems cannot be accessed, I do not know what could possibly prevent me from adding them. I also notice that md1 is mentioned in the dmesg, is deleting the reference from mdadm.conf not enough? Also, I notice this oddity.

Code:
/dev/sdc1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : bb177475:83977a04:b367dffe:1ee00c72
  Creation Time : Thu Aug  2 20:58:10 2007
     Raid Level : raid5
    Device Size : 488383936 (465.76 GiB 500.11 GB)
     Array Size : 1465151808 (1397.28 GiB 1500.32 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Wed Oct 10 22:01:44 2007
          State : active
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0
       Checksum : 65060a36 - correct
         Events : 0.440046

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     2       8       65        2      active sync   /dev/.static/dev/sde1

   0     0       0        0        0      removed
   1     1       8       49        1      active sync   /dev/sdd1
   2     2       8       65        2      active sync   /dev/.static/dev/sde1
   3     3       8       81        3      active sync   /dev/.static/dev/sdf1
Compare that with this,

Code:
/dev/sda1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : bb177475:83977a04:b367dffe:1ee00c72
  Creation Time : Thu Aug  2 20:58:10 2007
     Raid Level : raid5
    Device Size : 488383936 (465.76 GiB 500.11 GB)
     Array Size : 1465151808 (1397.28 GiB 1500.32 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 0

    Update Time : Thu Oct 11 02:09:25 2007
          State : clean
 Active Devices : 1
Working Devices : 1
 Failed Devices : 5
  Spare Devices : 0
       Checksum : 650cfb57 - correct
         Events : 0.440055

         Layout : left-symmetric
     Chunk Size : 64K

      Number   Major   Minor   RaidDevice State
this     4       8       49       -1      spare   /dev/sdd1

   0     0       0        0        0      removed
   1     1       0        0        1      faulty removed
   2     2       0        0        2      faulty removed
   3     3       8       81        3      active sync   /dev/.static/dev/sdf1
And the cat /proc/mdstat is
Code:
Personalities : [raid1] [raid5] [raid4] 
md0 : inactive sdd1[3] sda1[4](S)
      976767872 blocks
       
unused devices: <none>
Each disk seems to have a different idea of what is going on. I am at a complete loss as to what course I should take at this point. I'm fairly certain that three disks out of the four are completely functional, would it be possible (I've been planning to do this eventually, but was hoping I would get to do a backup first) to replace my operating system drive (which itself is starting to get uppity with me), start a fresh operating system install, and build a completely new array using the information already on the disks? Or start an array with the 3 disks, and then add the fourth and grow it?

If there's any other information that may be helpful, I can provide it. The fdisk I ran on all the disks had valid partitions on all of them.

It is a short book, but I appreciate you making it all the way through. Any and all suggestions will be taken with extreme gratitude.
 
Old 10-19-2007, 06:04 AM   #2
InDubio
LQ Newbie
 
Registered: Feb 2007
Posts: 10

Rep: Reputation: 0
Lightbulb

Well, to be honest, I'm still a little puzzled here about whats going on with your drives, and how things should be so I will state what I have made out so far (correct me if I'm wrong):

You have a Raid 5 array consisting of 4 drives namely sda1 sdb1 sdc1 and sdd1?

The drives still physically working are sda sdc and sdd?

did you try assemble the array by hand?
Code:
mdadm --assemble --verbose --run /dev/md0 /dev/sda /dev/sdc /dev/sdd
maybe you will find some answers here:
Linux Raid Howto especially chapter 8
 
Old 10-19-2007, 08:55 AM   #3
Captain Mullet
LQ Newbie
 
Registered: Jan 2006
Posts: 3

Original Poster
Rep: Reputation: 0
Yep, tried that.
The issue was that during boot or initialization, the system was attempting to build the array. It failed, and somehow the drives were never really released, in that after boot they were perpetually "busy". I could never run that assemble command, because I was told the drives were in use or busy. Somehow when I put in the new drive, the order in which the bios recognized the drives changed, and my raid 1 and raid 5 got all intermixed. It is an old motherboard, and I had three drive controller cards plugged in, so I think the poor thing just got confused. I unplugged all the drives except the ones for the raid 5, and still no luck.

The strange thing was that running a --examine on the disks had two of them (the ones that kept saying they were busy) had them as active in an undetected array. Well, I couldn't --stop that array to release the disks because it didn't exist. I also tried to find where I could tell debian not to build the array on boot, but nothing I tried worked.

Anyway, like I said, I have been wanting to rebuild my OS for a while, so that's what I did. Clean install of debian, apt-get install mdadm, and running --assemble --force /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 worked like a charm. I'm happily watching my movies again.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Potential problem with software RAID? grpprod Linux - General 2 09-20-2006 03:52 PM
Software RAID-1 install problem elmu Linux - General 1 11-09-2005 07:07 PM
Software RAID System problem Sushy Slackware 2 09-22-2005 07:11 PM
Problem with Suse 9.1 and software raid oddthat Linux - Software 0 11-11-2004 01:00 PM
Interesting Software? Jason97Cobra Linux - General 4 05-10-2001 11:27 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 02:30 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration