Mdadm Raid question

stefaandk · 08-08-2005, 07:56 PM

As I had a server crash overnight, I'm still struggling to find the cause as the logs don't tell me anything.

This machine has got software raid setup and I started querying the Raid config.
I'm new to mdadm and I didnt' set it up originally but I get the following.

Maybe someone with mdadm skills will be able to help out.

Standard disk info stuff
--------------------------------
# fdisk -l

Disk /dev/hdc: 80.0 GB, 80026361856 bytes
16 heads, 63 sectors/track, 155061 cylinders
Units = cylinders of 1008 * 512 = 516096 bytes

Device Boot Start End Blocks Id System
/dev/hdc1 * 1 207 104296+ fd Linux raid autodetect
/dev/hdc2 208 2312 1060920 fd Linux raid autodetect
/dev/hdc3 2313 155061 76985496 fd Linux raid autodetect

Disk /dev/hda: 80.0 GB, 80026361856 bytes
255 heads, 63 sectors/track, 9729 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/hda1 * 1 13 104391 fd Linux raid autodetect
/dev/hda2 14 145 1060290 fd Linux raid autodetect
/dev/hda3 146 9729 76983480 fd Linux raid autodetect

df -h

Filesystem Size Used Avail Use% Mounted on
/dev/md2 73G 50G 19G 73% /
/dev/md0 99M 19M 75M 21% /boot
none 503M 0 503M 0% /dev/shm

------------------------------------------------------------------

But when querying the array using

# mdadm -E /dev/hdc1

/dev/hdc1:
Magic : a92b4efc
Version : 00.90.00
UUID : 7f5b6639:a36365fd:def88079:b731abb5
Creation Time : Mon Dec 15 01:27:33 2003
Raid Level : raid1
Device Size : 104192 (101.75 MiB 106.69 MB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 0

Update Time : Tue Aug 9 17:38:38 2005
State : dirty, no-errors
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
Checksum : e4ebb016 - correct
Events : 0.57

Number Major Minor RaidDevice State
this 1 22 1 1 active sync /dev/hdc1
0 0 0 0 0 faulty removed
1 1 22 1 1 active sync /dev/hdc1

---
I get one of them listing as faulty removed!

Only get this when querying hdc, hda1 are all active sync.

Do I have a problem?

Thx

hamish · 08-09-2005, 03:36 PM

To examine an array, you should do:

mdadm --detail /dev/md0

this shows you the full details of what has been removed etc.

If hda is damanged, you want to replace it ASAP.

1. go and buy a new hard drive of the same size (it simplifies everything).
2. make the partitions the same size as they used to be on the old hda
3. run this command to add the parition to the array:

mdadm /dev/md0 –add /dev/hda1

(change md0 and hda1 as required.)

The hard drives will now spent time resyncing.

if you do :

cat /proc/mdstat

it might look like:

root@hamishnet:/# cat /proc/mdstat
Personalities : [raid1]
md1 : active raid1 hde3[0]
20015744 blocks [2/1] [U_]

md0 : active raid1 hde1[1] hda5[0]
19542976 blocks [2/2] [UU]

in mine, the md0 array is good (indivated by "[UU]"), however my md1 partition has one drive missing.

hamish

stefaandk · 08-09-2005, 09:41 PM

Thx,

When I do

# cat /proc/mdstat

Personalities : [raid1]
read_ahead 1024 sectors
md2 : active raid1 hdc3[1]
76983360 blocks [2/1] [_U]

md1 : active raid1 hdc2[1]
1060224 blocks [2/1] [_U]

md0 : active raid1 hdc1[1]
104192 blocks [2/1] [_U]

unused devices: <none>

So is this OK or not?

Whenever I do the following I always see that "faulty removed" and I don't know whether that's normal or not?

# mdadm --detail /dev/md2
/dev/md2:
Version : 00.90.00
Creation Time : Mon Dec 15 01:26:32 2003
Raid Level : raid1
Array Size : 76983360 (73.42 GiB 78.83 GB)
Device Size : 76983360 (73.42 GiB 78.83 GB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 2
Persistence : Superblock is persistent

Update Time : Tue Aug 9 17:38:38 2005
State : dirty, no-errors
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0

Number Major Minor RaidDevice State
0 0 0 0 faulty removed
1 22 3 1 active sync /dev/hdc3
UUID : 7b337624:a23aad65:c485d413:28f65fbb
Events : 0.72

Emmanuel_uk · 08-10-2005, 01:21 AM

Hi,

Newbee here following this with interest (got a raid0 at home working ok,
I know bad idea). Anyway:

/dev/hdb is missing from your fdisk -l
This hard drive might be dead then!

Can you please post the mdadm.conf

- How do you know the HD controller is not faulty?

- Have you got so-called smart HDs? the log might have recorded with smartmontools
signs of its failure

Hamish: How can stefaandk identify physically which of the two HD has failed
(other than by trial and error)

A stab in the dark: Did you try to restart the area with mdadm (from a CLI)
to see what mdadm says?

stefaandk · 08-10-2005, 09:15 PM

It doesn't seem like the mdadm.conf file is being used

Quote:

# more /etc/mdadm.conf
# mdadm configuration file
#
# mdadm will function properly without the use of a configuration file,
# but this file is useful for keeping track of arrays and member disks.
# In general, a mdadm.conf file is created, and updated, after arrays
# are created. This is the opposite behavior of /etc/raidtab which is
# created prior to array construction.
#
#
# the config file takes two types of lines:
#
# DEVICE lines specify a list of devices of where to look for
# potential member disks
#
# ARRAY lines specify information about how to identify arrays so
# so that they can be activated
#
# You can have more than one device line and use wild cards. The first
# example includes SCSI the first partition of SCSI disks /dev/sdb,
# /dev/sdc, /dev/sdd, /dev/sdj, /dev/sdk, and /dev/sdl. The second
# line looks for array slices on IDE disks.
#
#DEVICE /dev/sd[bcdjkl]1
#DEVICE /dev/hda1 /dev/hdb1
#
# If you mount devfs on /dev, then a suitable way to list all devices is:
#DEVICE /dev/discs/*/*
#
#
#
# ARRAY lines specify an array to assemble and a method of identification.
# Arrays can currently be identified by using a UUID, superblock minor number,
# or a listing of devices.
#
# super-minor is usually the minor number of the metadevice
# UUID is the Universally Unique Identifier for the array
# Each can be obtained using
#
# mdadm -D <md>
#
#ARRAY /dev/md0 UUID=3aaa0122:29827cfa:5331ad66:ca767371
#ARRAY /dev/md1 super-minor=1
#ARRAY /dev/md2 devices=/dev/hda1,/dev/hda2
#
# ARRAY lines can also specify a "spare-group" for each array. mdadm --monitor
# will then move a spare between arrays in a spare-group if one array has a failed
# drive but no spare
#ARRAY /dev/md4 uuid=b23f3c6d:aec43a9f:fd65db85:369432df spare-group=group1
#ARRAY /dev/md5 uuid=19464854:03f71b1b:e0df2edd:246cc977 spare-group=group1
#
# When used in --follow (aka --monitor) mode, mdadm needs a
# mail address and/or a program. This can be given with "mailaddr"
# and "program" lines to that monitoring can be started using
# mdadm --follow --scan & echo $! > /var/run/mdadm
# If the lines are not found, mdadm will exit quietly

#PROGRAM /usr/sbin/handle-mdadm-events

I basically inherited this system so I'm trying to make sense of it's raid config.

Since I have no prior XP with mdadm I don't want to start putting in commands that could potentially blow up this raid.

How would I manually try to start hdb?

But if there was an hdb in this config, would this mean that there was a mirror across 3 disks?

Emmanuel_uk · 08-11-2005, 01:30 AM

what is the output of
mdadm --detail /dev/md2

Have you got the output of
/etc/raidtab

btw which distro have you got

I am pretty sure /dev/hda and hdc are part of the raid.

You can have raid with 3 hd (but I suppose it would say raid 5 then)
Forget about me asking about hdb, it just sounded strange, but
it is possible to have a raid accross hda and hdc

You are not necessarily using mdadm at the minute; there is
a series of utilities called mdtools (I think?)

You have [_U] One of your hard drive is malfunctioning / dead.
But because you have a raid1 system (mirror), the system still works.

stefaandk · 08-11-2005, 01:55 AM

This is on a RedHat 9 box.

So the _U with certainty tells me that one of my drives is dead?

Coz with fdisk -l I get

Disk /dev/hdc: 80.0 GB, 80026361856 bytes
16 heads, 63 sectors/track, 155061 cylinders
Units = cylinders of 1008 * 512 = 516096 bytes

Device Boot Start End Blocks Id System
/dev/hdc1 * 1 207 104296+ fd Linux raid autodetect
/dev/hdc2 208 2312 1060920 fd Linux raid autodetect
/dev/hdc3 2313 155061 76985496 fd Linux raid autodetect

Disk /dev/hda: 80.0 GB, 80026361856 bytes
255 heads, 63 sectors/track, 9729 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/hda1 * 1 13 104391 fd Linux raid autodetect
/dev/hda2 14 145 1060290 fd Linux raid autodetect
/dev/hda3 146 9729 76983480 fd Linux raid autodetect

Seems that there are 2 disks there, or does this show even if the disk is dead due to raid?

Here are the other commands you asked for:

# mdadm --detail /dev/md2
/dev/md2:
Version : 00.90.00
Creation Time : Mon Dec 15 01:26:32 2003
Raid Level : raid1
Array Size : 76983360 (73.42 GiB 78.83 GB)
Device Size : 76983360 (73.42 GiB 78.83 GB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 2
Persistence : Superblock is persistent

Update Time : Tue Aug 9 17:38:38 2005
State : dirty, no-errors
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0

Number Major Minor RaidDevice State
0 0 0 0 faulty removed
1 22 3 1 active sync /dev/hdc3
UUID : 7b337624:a23aad65:c485d413:28f65fbb
Events : 0.72

----------------

# more /etc/raidtab
raiddev /dev/md2
raid-level 1
nr-raid-disks 2
chunk-size 64k
persistent-superblock 1
nr-spare-disks 0
device /dev/hda3
raid-disk 0
device /dev/hdc3
raid-disk 1
raiddev /dev/md0
raid-level 1
nr-raid-disks 2
chunk-size 64k
persistent-superblock 1
nr-spare-disks 0
device /dev/hda1
raid-disk 0
device /dev/hdc1
raid-disk 1
raiddev /dev/md1
raid-level 1
nr-raid-disks 2
chunk-size 64k
persistent-superblock 1
nr-spare-disks 0
device /dev/hda2
raid-disk 0
device /dev/hdc2
raid-disk 1
--------

Emmanuel_uk · 08-11-2005, 02:22 AM

re [_U]
looks like it, yes (but I am like you, is this 100% sure?
you might want to do some backups first and then try
to restart the raid with some of the mdtools
rather than mdadm. I heard mdadm is "better" and I use it
but then you will need to edit mdadm.conf

I have not enough knowledge to see why fdsik still see both HD.
Maybe one of the HD is not that damaged?

An example
http://aplawrence.com/Linux/rebuildraid.html

I have never rebuild an area myself (and cannot bec I have raid 0)

A generic piece of info
http://gentoo-wiki.com/HOWTO_Gentoo_..._Software_RAID

Maybe you could try to plug each HD on its own and reboot
(I have no idea of the possible consequences of that)

hamish · 08-11-2005, 10:12 AM

[_U] means that one of the disks is broken.

The above means that the first drive in the array is unavailable. [U_] means that the second HDD is unavailable.

I believe that trial and error is the only way to find out. you are right in thinking that ding fdisk -l will give you an indicaiton of which one is broken. If you do that and see that hdb is not listed in fdisk -l, then you can open up the PC and see if hdb is in fact a hard drive.

raidtab has nothing to do with mdadm. They are two different packages for doing the same thing. Raidtab is older, and mdadm is becoming more popular.

you will find that /etc/mdadm.conf is probably unused. I have never used it, in fact I didn't know it existed!

best of luck

Emmanuel_uk · 08-12-2005, 12:50 AM

I suppose one can do without mdadm.conf while using mdadm with some scripts,
and this will depend on the distro

On my distro the raid area is started automatically and I think mdadm
takes the info it needs from mdadm.conf (that I configured by hand).

My point was that possibly mdtools was used by stefaandk's legacy system
rather tham mdadm.
Must be said that indeed stefaandk can use either mdtools or mdadm

stefaandk · 08-15-2005, 07:47 PM

Thanks for all the help with this guys, it was indeed a faulty drive and I had it replaced and it's all good now!

hamish · 08-16-2005, 02:07 AM

Emmanuel_uk · 08-16-2005, 02:11 AM

Glad to know you are sorted. Hope you have learned about raid in the process :-)