raid 5 degraded unable to log in

neonorm · 06-08-2007, 06:02 AM

Hi,
the system:
-Promise SATA-300TX4 (4x pci S-ata Controller)
-4x Maxtor DiamondMax 10 300GB SATA-2, 3.5", 7200rpm, 16MB Cache
-Fedora Core 6 desktop system on seperat disk

I set the 4 drives to run as raid 5 device during the fedora setup, without spare disk.
Since I had two problems
1) while booting bios the s-ata controller did not recognized but 3 of4 drives attached, rearanging the cables a few times soved this.
2) got this message from mdadm as mail to root a couple of times:
A DegradedArray event had been detected on md device /dev/md0
...
The /proc/mdstat file currently contains the following:
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdc1[2] sdb1[1] sda1[0]
879148800 blocks level 5, 256k chunk, algorithm 2 [4/3] [UUU_]
unused devices: <none>

Besides it worked for a few weeks till yesterday.While copying a file over network from the fedor machine to my notebook i got a networking error an the linux machin just hang, no reaction to mouse or keyboard input.
so i rebooted to the login screen and tried to copy again, i could log in via samba but after a minute it hang again.
i thought it would be some sort of overheating problem and retried an hour later, with same result.
next time i booted i was dropped to the shell a few seconds after the graphical loading screen appeared :

...
Group00" now active [ok]
Checking filesystems
fsck.ext3: Invalid argument while trying to open /dev/md0 [failed]

*** An error occured during the file system check.
*** Dropping you to a shell; the system will reboot
*** when you leave the shell.
Give root password for maintenance :
...

# mdadm -D /dev/md0
/dev/md0:
Version : 00.90.03
RAID Level : raid5
Device Size : 293049600 (279.47 GiB 300.0 GB)
Raid Devices : 4
Total Devices : 3
Preferred Minor : 0
Persistance : Superblock is persistent
Update Time : Thu Jun 7 20:36:04 2007
State : active, degraded, Not Started
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 256k

UUID : 83f17573:e4194bcf:57741e3e:9ffdca4f
Events : 0.49781
Number Major Minor RaidDevice State
0 8 33 0 active sync /dev/sdc1
1 8 17 1 active sync /dev/sdb1
2 8 1 2 active sync /dev/sda1
3 0 0 3 removed

under /dev/ all entries sdX sdX1 are there for a-d. an the controller does recognize all drives after on booting.
i need your help concerning the following :
1) how do iget the system to run again, without md0 or better by forcing mdadm to run it with the 3 remaining drives?
2) where is the problem, hardware error... ?
3) how kann i rebuild the raid array, after purchaising new drive for if it is a hardware error?

if there is something i should post to just tell me.
sorry for my english and thanks for your help !!

rtspitz · 06-09-2007, 09:45 PM

running with only 3 of 4 disk is NUTS. the more components there are the more likely is a failure of the whole thing, so please do yourself a favour and get it working again with 4 disks for data protection against disk failure.

rebuild is quite easy:

- get a new drive
- create a partition that matches the other ones in size (or bigger)
- add sdd1 to the array which starts the rebuild.

Code:

mdadm --manage --add /dev/md0 /dev/sdd1

you can have a look at the rebuild process with:

Code:

cat /proc/mdstat

Electro · 06-09-2007, 11:07 PM

That is a bad setup. You should match the amount of hard drives with the amount of controllers. Using four controllers and using one hard drive for each controller reduces the chances that is a controller problem. Using one controller and using four hard drives connected to it will create problems. Software RAID level 5 is not smart either. I suggest spend the money on hardware RAID controllers.

Promise controllers are not reliable under Linux.

RAID-5 is not here to prevent data loss. It is here to give you more time to do backups. Back up the data after it reconstructs the data.

neonorm · 06-10-2007, 06:06 AM

hi,
thanks fpr your replies i already managed to reassemble the array, but a least some of the files stored on it are now corrupted. ain't to bad cause i was just testing and had no important stuff on it.
but i'm not quiet satisfied with the raid 5 solution, so i'll try another one.
my problem now is how to get rid of the auto assemble when booting, to get my drives free.
i already ereased the corresponding line in /etc/mdadm config so that it looks like this :

DEVICE partitions
MAILADDR root

and removed md0 from /etc/fstab file, but after reboot i still get
/dev/sda1 is apperently in use by the system..

and

mdadm -D /dev/md0 gives :

/dev/md0:
Version : 00.90.03
RAID Level : raid5
Raid Devices : 4
Total Devices : 3
Preferred Minor : 0
Persistance : Superblock is persistent
Update Time : Sat Jun 9 22:09:03 2007
State : clean, degraded
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 256k

UUID : 83f17573:e4194bcf:57741e3e:9ffdca4f
Events : 0.50134
Number Major Minor RaidDevice State
0 8 33 0 active sync /dev/sdc1
1 8 17 1 active sync /dev/sdb1
2 8 1 2 active sync /dev/sda1
3 0 0 3 removed

so its still started on boot ?
so how do i completly erease md0 ???

rtspitz · 06-10-2007, 09:03 AM

the partition type of sd[abcd]3 is most likely set to "FD" = raid auto detect. that's why it will show up after reboot. so use fdisk to set the partition type to 83 ( = linux). you should also erase the raid-superblocks with:

Code:

mdadm --zero-superblock /dev/sd[abcd]3