Mdadm Raid question
As I had a server crash overnight, I'm still struggling to find the cause as the logs don't tell me anything.
This machine has got software raid setup and I started querying the Raid config. I'm new to mdadm and I didnt' set it up originally but I get the following. Maybe someone with mdadm skills will be able to help out. Standard disk info stuff -------------------------------- # fdisk -l Disk /dev/hdc: 80.0 GB, 80026361856 bytes 16 heads, 63 sectors/track, 155061 cylinders Units = cylinders of 1008 * 512 = 516096 bytes Device Boot Start End Blocks Id System /dev/hdc1 * 1 207 104296+ fd Linux raid autodetect /dev/hdc2 208 2312 1060920 fd Linux raid autodetect /dev/hdc3 2313 155061 76985496 fd Linux raid autodetect Disk /dev/hda: 80.0 GB, 80026361856 bytes 255 heads, 63 sectors/track, 9729 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/hda1 * 1 13 104391 fd Linux raid autodetect /dev/hda2 14 145 1060290 fd Linux raid autodetect /dev/hda3 146 9729 76983480 fd Linux raid autodetect df -h Filesystem Size Used Avail Use% Mounted on /dev/md2 73G 50G 19G 73% / /dev/md0 99M 19M 75M 21% /boot none 503M 0 503M 0% /dev/shm ------------------------------------------------------------------ But when querying the array using # mdadm -E /dev/hdc1 /dev/hdc1: Magic : a92b4efc Version : 00.90.00 UUID : 7f5b6639:a36365fd:def88079:b731abb5 Creation Time : Mon Dec 15 01:27:33 2003 Raid Level : raid1 Device Size : 104192 (101.75 MiB 106.69 MB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 0 Update Time : Tue Aug 9 17:38:38 2005 State : dirty, no-errors Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 Checksum : e4ebb016 - correct Events : 0.57 Number Major Minor RaidDevice State this 1 22 1 1 active sync /dev/hdc1 0 0 0 0 0 faulty removed 1 1 22 1 1 active sync /dev/hdc1 --- I get one of them listing as faulty removed! Only get this when querying hdc, hda1 are all active sync. Do I have a problem? Thx |
To examine an array, you should do:
mdadm --detail /dev/md0 this shows you the full details of what has been removed etc. If hda is damanged, you want to replace it ASAP. 1. go and buy a new hard drive of the same size (it simplifies everything). 2. make the partitions the same size as they used to be on the old hda 3. run this command to add the parition to the array: mdadm /dev/md0 –add /dev/hda1 (change md0 and hda1 as required.) The hard drives will now spent time resyncing. if you do : cat /proc/mdstat it might look like: root@hamishnet:/# cat /proc/mdstat Personalities : [raid1] md1 : active raid1 hde3[0] 20015744 blocks [2/1] [U_] md0 : active raid1 hde1[1] hda5[0] 19542976 blocks [2/2] [UU] in mine, the md0 array is good (indivated by "[UU]"), however my md1 partition has one drive missing. hamish |
Thx,
When I do # cat /proc/mdstat Personalities : [raid1] read_ahead 1024 sectors md2 : active raid1 hdc3[1] 76983360 blocks [2/1] [_U] md1 : active raid1 hdc2[1] 1060224 blocks [2/1] [_U] md0 : active raid1 hdc1[1] 104192 blocks [2/1] [_U] unused devices: <none> So is this OK or not? Whenever I do the following I always see that "faulty removed" and I don't know whether that's normal or not? # mdadm --detail /dev/md2 /dev/md2: Version : 00.90.00 Creation Time : Mon Dec 15 01:26:32 2003 Raid Level : raid1 Array Size : 76983360 (73.42 GiB 78.83 GB) Device Size : 76983360 (73.42 GiB 78.83 GB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 2 Persistence : Superblock is persistent Update Time : Tue Aug 9 17:38:38 2005 State : dirty, no-errors Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 Number Major Minor RaidDevice State 0 0 0 0 faulty removed 1 22 3 1 active sync /dev/hdc3 UUID : 7b337624:a23aad65:c485d413:28f65fbb Events : 0.72 |
Hi,
Newbee here following this with interest (got a raid0 at home working ok, I know bad idea). Anyway: /dev/hdb is missing from your fdisk -l This hard drive might be dead then! Can you please post the mdadm.conf - How do you know the HD controller is not faulty? - Have you got so-called smart HDs? the log might have recorded with smartmontools signs of its failure Hamish: How can stefaandk identify physically which of the two HD has failed (other than by trial and error) A stab in the dark: Did you try to restart the area with mdadm (from a CLI) to see what mdadm says? |
It doesn't seem like the mdadm.conf file is being used
Quote:
I basically inherited this system so I'm trying to make sense of it's raid config. Since I have no prior XP with mdadm I don't want to start putting in commands that could potentially blow up this raid. How would I manually try to start hdb? But if there was an hdb in this config, would this mean that there was a mirror across 3 disks? |
what is the output of
mdadm --detail /dev/md2 Have you got the output of /etc/raidtab btw which distro have you got I am pretty sure /dev/hda and hdc are part of the raid. You can have raid with 3 hd (but I suppose it would say raid 5 then) Forget about me asking about hdb, it just sounded strange, but it is possible to have a raid accross hda and hdc You are not necessarily using mdadm at the minute; there is a series of utilities called mdtools (I think?) You have [_U] One of your hard drive is malfunctioning / dead. But because you have a raid1 system (mirror), the system still works. |
This is on a RedHat 9 box.
So the _U with certainty tells me that one of my drives is dead? Coz with fdisk -l I get Disk /dev/hdc: 80.0 GB, 80026361856 bytes 16 heads, 63 sectors/track, 155061 cylinders Units = cylinders of 1008 * 512 = 516096 bytes Device Boot Start End Blocks Id System /dev/hdc1 * 1 207 104296+ fd Linux raid autodetect /dev/hdc2 208 2312 1060920 fd Linux raid autodetect /dev/hdc3 2313 155061 76985496 fd Linux raid autodetect Disk /dev/hda: 80.0 GB, 80026361856 bytes 255 heads, 63 sectors/track, 9729 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/hda1 * 1 13 104391 fd Linux raid autodetect /dev/hda2 14 145 1060290 fd Linux raid autodetect /dev/hda3 146 9729 76983480 fd Linux raid autodetect Seems that there are 2 disks there, or does this show even if the disk is dead due to raid? Here are the other commands you asked for: # mdadm --detail /dev/md2 /dev/md2: Version : 00.90.00 Creation Time : Mon Dec 15 01:26:32 2003 Raid Level : raid1 Array Size : 76983360 (73.42 GiB 78.83 GB) Device Size : 76983360 (73.42 GiB 78.83 GB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 2 Persistence : Superblock is persistent Update Time : Tue Aug 9 17:38:38 2005 State : dirty, no-errors Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 Number Major Minor RaidDevice State 0 0 0 0 faulty removed 1 22 3 1 active sync /dev/hdc3 UUID : 7b337624:a23aad65:c485d413:28f65fbb Events : 0.72 ---------------- # more /etc/raidtab raiddev /dev/md2 raid-level 1 nr-raid-disks 2 chunk-size 64k persistent-superblock 1 nr-spare-disks 0 device /dev/hda3 raid-disk 0 device /dev/hdc3 raid-disk 1 raiddev /dev/md0 raid-level 1 nr-raid-disks 2 chunk-size 64k persistent-superblock 1 nr-spare-disks 0 device /dev/hda1 raid-disk 0 device /dev/hdc1 raid-disk 1 raiddev /dev/md1 raid-level 1 nr-raid-disks 2 chunk-size 64k persistent-superblock 1 nr-spare-disks 0 device /dev/hda2 raid-disk 0 device /dev/hdc2 raid-disk 1 -------- |
re [_U]
looks like it, yes (but I am like you, is this 100% sure? you might want to do some backups first and then try to restart the raid with some of the mdtools rather than mdadm. I heard mdadm is "better" and I use it but then you will need to edit mdadm.conf I have not enough knowledge to see why fdsik still see both HD. Maybe one of the HD is not that damaged? An example http://aplawrence.com/Linux/rebuildraid.html I have never rebuild an area myself (and cannot bec I have raid 0) A generic piece of info http://gentoo-wiki.com/HOWTO_Gentoo_..._Software_RAID Maybe you could try to plug each HD on its own and reboot (I have no idea of the possible consequences of that) |
[_U] means that one of the disks is broken.
The above means that the first drive in the array is unavailable. [U_] means that the second HDD is unavailable. I believe that trial and error is the only way to find out. you are right in thinking that ding fdisk -l will give you an indicaiton of which one is broken. If you do that and see that hdb is not listed in fdisk -l, then you can open up the PC and see if hdb is in fact a hard drive. raidtab has nothing to do with mdadm. They are two different packages for doing the same thing. Raidtab is older, and mdadm is becoming more popular. you will find that /etc/mdadm.conf is probably unused. I have never used it, in fact I didn't know it existed! best of luck |
I suppose one can do without mdadm.conf while using mdadm with some scripts,
and this will depend on the distro On my distro the raid area is started automatically and I think mdadm takes the info it needs from mdadm.conf (that I configured by hand). My point was that possibly mdtools was used by stefaandk's legacy system rather tham mdadm. Must be said that indeed stefaandk can use either mdtools or mdadm |
Thanks for all the help with this guys, it was indeed a faulty drive and I had it replaced and it's all good now!
|
:)
|
Glad to know you are sorted. Hope you have learned about raid in the process :-)
|
All times are GMT -5. The time now is 12:30 AM. |