Greetings,
The short version is that I'm wondering if there is a file somewhere I can edit that would allow me to manually specify what hard drives I want to be used in an array created with mdadm.
The long version is this:
Recently I had one of four disks fail in my raid 5 array. Finding the disk was relatively easy, as it was clunking ever so gracefully. I bought a new disk, and slipped it in. When I restarted my computer, I ran a
#mdadm /dev/md0 -a /dev/sdd1
And nothing happened. I tried a few more commands, looked around on the internet, and then rebooted again. My desktop was giving me a fair amount of disk errors on boot, so I took out every disk in my server except the boot drive and the 4 array drives. I edited mdstat.conf to remove the other raid I had created (raid1), and removed all references to anything inside /etc/fstab. Now, when I boot, this is the dmesg excerpt.
Code:
scsi3 : sata_promise
Vendor: ATA Model: ST3500630AS Rev: 3.AA
Type: Direct-Access ANSI SCSI revision: 05
Vendor: ATA Model: ST3500630AS Rev: 3.AA
Type: Direct-Access ANSI SCSI revision: 05
Vendor: ATA Model: ST3500630AS Rev: 3.AA
Type: Direct-Access ANSI SCSI revision: 05
Vendor: ATA Model: ST3500630AS Rev: 3.AA
Type: Direct-Access ANSI SCSI revision: 05
SCSI device sda: 976773168 512-byte hdwr sectors (500108 MB)
sda: sda1
sd 0:0:0:0: Attached scsi disk sda
SCSI device sdb: 976773168 512-byte hdwr sectors (500108 MB)
sdb: sdb1
sd 1:0:0:0: Attached scsi disk sdb
SCSI device sdc: 976773168 512-byte hdwr sectors (500108 MB)
sdc: sdc1
sd 2:0:0:0: Attached scsi disk sdc
SCSI device sdd: 976773168 512-byte hdwr sectors (500108 MB)
sdd: sdd1
sd 3:0:0:0: Attached scsi disk sdd
md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: bitmap version 4.39
md: raid1 personality registered for level 1
raid5: automatically using best checksumming function: pIII_sse
pIII_sse : 1978.000 MB/sec
raid5: using function: pIII_sse (1978.000 MB/sec)
md: raid5 personality registered for level 5
md: raid4 personality registered for level 4
md: md1 stopped.
md: md0 stopped.
md: bind<sdb1>
md: bind<sdc1>
md: bind<sda1>
md: bind<sdd1>
md: kicking non-fresh sdc1 from array!
md: unbind<sdc1>
md: export_rdev(sdc1)
md: kicking non-fresh sdb1 from array!
md: unbind<sdb1>
md: export_rdev(sdb1)
raid5: device sdd1 operational as raid disk 3
raid5: not enough operational devices for md0 (3/4 failed)
RAID5 conf printout:
--- rd:4 wd:1 fd:3
disk 3, o:1, dev:sdd1
raid5: failed to run raid set md0
md: pers->run() failed ...
I figured it would be a simple matter of failing the disks, removing them, and re-adding them. But when I attempt to fail or remove the two disks, I am given the message that they do not exist. When I attempt to add them, I am given the message that they are busy or in use. They are not mounted, and since the filesystems cannot be accessed, I do not know what could possibly prevent me from adding them. I also notice that md1 is mentioned in the dmesg, is deleting the reference from mdadm.conf not enough? Also, I notice this oddity.
Code:
/dev/sdc1:
Magic : a92b4efc
Version : 00.90.00
UUID : bb177475:83977a04:b367dffe:1ee00c72
Creation Time : Thu Aug 2 20:58:10 2007
Raid Level : raid5
Device Size : 488383936 (465.76 GiB 500.11 GB)
Array Size : 1465151808 (1397.28 GiB 1500.32 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Wed Oct 10 22:01:44 2007
State : active
Active Devices : 3
Working Devices : 3
Failed Devices : 1
Spare Devices : 0
Checksum : 65060a36 - correct
Events : 0.440046
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 2 8 65 2 active sync /dev/.static/dev/sde1
0 0 0 0 0 removed
1 1 8 49 1 active sync /dev/sdd1
2 2 8 65 2 active sync /dev/.static/dev/sde1
3 3 8 81 3 active sync /dev/.static/dev/sdf1
Compare that with this,
Code:
/dev/sda1:
Magic : a92b4efc
Version : 00.90.00
UUID : bb177475:83977a04:b367dffe:1ee00c72
Creation Time : Thu Aug 2 20:58:10 2007
Raid Level : raid5
Device Size : 488383936 (465.76 GiB 500.11 GB)
Array Size : 1465151808 (1397.28 GiB 1500.32 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Update Time : Thu Oct 11 02:09:25 2007
State : clean
Active Devices : 1
Working Devices : 1
Failed Devices : 5
Spare Devices : 0
Checksum : 650cfb57 - correct
Events : 0.440055
Layout : left-symmetric
Chunk Size : 64K
Number Major Minor RaidDevice State
this 4 8 49 -1 spare /dev/sdd1
0 0 0 0 0 removed
1 1 0 0 1 faulty removed
2 2 0 0 2 faulty removed
3 3 8 81 3 active sync /dev/.static/dev/sdf1
And the cat /proc/mdstat is
Code:
Personalities : [raid1] [raid5] [raid4]
md0 : inactive sdd1[3] sda1[4](S)
976767872 blocks
unused devices: <none>
Each disk seems to have a different idea of what is going on. I am at a complete loss as to what course I should take at this point. I'm fairly certain that three disks out of the four are completely functional, would it be possible (I've been planning to do this eventually, but was hoping I would get to do a backup first) to replace my operating system drive (which itself is starting to get uppity with me), start a fresh operating system install, and build a completely new array using the information already on the disks? Or start an array with the 3 disks, and then add the fourth and grow it?
If there's any other information that may be helpful, I can provide it. The fdisk I ran on all the disks had valid partitions on all of them.
It is a short book, but I appreciate you making it all the way through. Any and all suggestions will be taken with extreme gratitude.