[SOLVED] Software RAID failing?

pi314 · 07-16-2009, 04:43 AM

Hello everyone!

I've read this forums for a long time and I've found them really useful.
Now I've got a problem and I would love you to help me. Thanks in advance!

I got a PC at home working as a server with Ubuntu server 7.10.
It has 5 hard disks:

1 PATA/IDE disk (system + swap)
4 SATA disks (software raid: array of 4 disks RAID 5)

Turning on the server, you can read:

Code:

fsck.ext3: Unable to resolve 'UUID=2ec026e5-1ed9-4007-9d4f-82fbfafb2d9f'
fsck died with exit status 8
* File system check failed
Please repair the file system manually

This is what "fdisk -l" shows:

Code:

Disk /dev/hda: 120.0 GB, 120000000000 bytes
255 heads, 63 sectors/track, 14589 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0xdc08dc08

   Device Boot      Start         End      Blocks   Id  System
/dev/hda1   *           1       13991   112382676   83  Linux
/dev/hda2           13992       14589     4803435    5  Extended
/dev/hda5           13992       14589     4803403+  82  Linux swap / Solaris

Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00003b91

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1       60801   488384001   fd  Linux raid autodetect

Disk /dev/sdb: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x0005978d

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1       60801   488384001   fd  Linux raid autodetect

Disk /dev/sdc: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x000077ee

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1               1       60801   488384001   fd  Linux raid autodetect

Disk /dev/sdd: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00098808

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1       60801   488384001   fd  Linux raid autodetect

md0 is missing!

Fstab:

Code:

# /etc/fstab: static file system information.
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
proc            /proc           proc    defaults        0       0
# /dev/hda1
#UUID=b2e37da0-e59b-41ff-9f14-f09cd75e4cb8 /               ext3    defaults,errors=remount-ro 0       1

# /dev/md0
UUID=2ec026e5-1ed9-4007-9d4f-82fbfafb2d9f /media/raid     ext3    defaults        0       2

# /dev/hda5
UUID=321f3c6b-19e1-4feb-9c27-8046d30188c1 none            swap    sw              0       0
/dev/hdb        /media/cdrom0   udf,iso9660 user,noauto,exec 0       0

I've tried to change this line in fstab:
UUID=2ec026e5-1ed9-4007-9d4f-82fbfafb2d9f /media/raid ext3 defaults 0 2
to:
/dev/md0 /media/raid ext3 defaults 0 2

Then "mount -a" or rebooting:

Code:

mount: special device /dev/disk/by-uuid/2ec026e5-1ed9-4007-9d4f-82fbfafb2d9f does not exist
mount: wrong fs type, bad option, bad superblock on /dev/md0,
       missing codepage or helper program, or other error
       (could this be the IDE device where you in fact use
       ide-scsi so that sr0 or sda or so is needed?)
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

Could it be a Raid fail?
Do I need to create the array again?

Any kind of help will be really appreciated.
Thank you very much.

eco · 07-16-2009, 07:15 AM

Hi,

At first glance it doesn't look like it's a RAID problem. Have you done any system changes, upgrades, ...?

Have a look in: /dev/disk/by-uuid and see if you can find 2ec026e5-1ed9-4007-9d4f-82fbfafb2d9f and look to see if /dev/md0 or /dev/md/0 still exist.

also try the following two command just in case:

Code:

# cat /proc/mdstat
# mdadm --detail /dev/md0

I suspect the system got rid of the UUID or that it changed. Maybe all you need is to add a new UUID to your RAID config file but get more info before doing any changes. You can really make things worst if you are not very careful.

I also noticed your '/' was commented out in your fstab!

Any logs?

pi314 · 07-16-2009, 08:54 AM

Quote:

Originally Posted by eco

Hi,

At first glance it doesn't look like it's a RAID problem. Have you done any system changes, upgrades, ...?

No changes.

Quote:

Originally Posted by eco

Hi,
Have a look in: /dev/disk/by-uuid and see if you can find 2ec026e5-1ed9-4007-9d4f-82fbfafb2d9f and look to see if /dev/md0 or /dev/md/0 still exist.

Can't find 2ec026e5-1ed9-4007-9d4f-82fbfafb2d9f. /dev/md0 does exist.

ls /dev/disk/by-uuid/ -l

Code:

total 0
lrwxrwxrwx 1 root root 10 2009-07-16 11:38 321f3c6b-19e1-4feb-9c27-8046d30188c1 -> ../../hda5
lrwxrwxrwx 1 root root  9 2009-07-16 11:38 ae4a00f0-52b4-4df7-9f2b-71e20fcf25de -> ../../sdc
lrwxrwxrwx 1 root root 10 2009-07-16 11:38 b2e37da0-e59b-41ff-9f14-f09cd75e4cb8 -> ../../hda1

Why only these three? What about sda, sdb, sdd...? Is it normal?

cat /proc/mdstat:

Code:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md0 : inactive sdc1[2](S) sdd1[3](S) sdb1[1](S) sda1[0](S)
      1953535744 blocks

mdadm --detail /dev/md0:

Code:

mdadm: md device /dev/md0 does not appear to be active.

Quote:

Originally Posted by eco

I also noticed your '/' was commented out in your fstab!

It's true. How can it work? It's not commented anymore.

Thank you very much for your help.

eco · 07-16-2009, 09:16 AM

Hi again and sorry for late reply,

Well, you can start by adding a link to md0 and see if that helps (remember to set fstab back to what it was for your RAID)

# cd /dev/disk/by-uuid
# ln -s 2ec026e5-1ed9-4007-9d4f-82fbfafb2d9f /dev/md0

then try a 'mount -a' and see if it helps.

I think your RAID is fine but something happened with the system.

If this doesn't fix it we can dig deeper.

Best of luck.

pi314 · 07-16-2009, 11:39 AM

This gives an error:
ln -s 2ec026e5-1ed9-4007-9d4f-82fbfafb2d9f /dev/md0

It's like this, isn't it?
ln -s /dev/md0 2ec026e5-1ed9-4007-9d4f-82fbfafb2d9f

mount -a:

Code:

mount: wrong fs type, bad option, bad superblock on /dev/md0,
       missing codepage or helper program, or other error
       (could this be the IDE device where you in fact use
       ide-scsi so that sr0 or sda or so is needed?)
       In some cases useful info is found in syslog - try
       dmesg | tail  or so

Thanks a lot.

eco · 07-17-2009, 04:29 AM

sorry, allways get those muddled up.

If you have the space I'd backup all RAID disks using dd and then try and force a rebuild of the raid but seriously, back it up first before you do any changes.

Did you not have the data backup up before the 'failure'?

Got t go now... got a call, sry

pi314 · 07-17-2009, 06:01 AM

I was preparing an incremental remote backup, but it wasn't working yet so I have no backups! Always the same...

Can I make exact 1:1 copies of the disks? With dd?

Thanks for all you efforts.

pi314 · 07-18-2009, 03:12 AM

Auto answering:

http://wiki.linuxquestions.org/wiki/Dd

And if you want to create a compressed image:

Code:

dd if=/dev/hdx | gzip > /path/to/image.gz

pi314 · 07-19-2009, 02:31 PM

It's taking about 8 hours to backup and compress every 500gb hard disk. Once it's completed, I'll try some dirty work...

eco · 07-20-2009, 03:22 AM

Sorry, was away on a short holiday...

dd is slow but it's the best for making sure you have an exact copy of your disks in case of failure. Be sure to know which disk to restore to in the RAID.

I still think the problem is with the system and not the RAID software. Are you sure no changes where made? Was the box ever rebooted since the RAID was built? Does dmesg say anything more?

A long process might be to recreate an exact RAID and then dump the data back onto each disk and hope for the best.

pi314 · 07-20-2009, 04:48 AM

I hope that the 4 disks in the array are ok, and I can restore the info.

Should I make a backup of the system disk too?
If I do, I think it should be unmounted, right?
I could use a liveCD.
By the way I haven't unmounted aything to make backups of the disks of the array, is it ok?

Thank you very much.

eco · 07-21-2009, 02:52 AM

Well, the disks of the array where never started as RAID disks so I can't see this as being a problem. For the system disk you are right, boot of a live CD and make the image. What I tend to do for testing is create a vm in say VirtualBox with a disk of the same size of the OS, dd back to the VM and boot the VM to see if it all works and make the changes there.

Also make a copy of all the info regarding your RAID as you will want it to be identical and make sure you don't write to the RAID disks when you recreate it.

Backups are your friends... :\

pi314 · 07-25-2009, 10:17 AM

Finally, after making backups of every disk, I installed Debian instead of Ubuntu.
During the installation, I built the raid array again because it looked damaged (raid 5 of 2 disks. It should be 4 disks).

After installing debian everything is fine. rsnapshot is already making backups every 4 hours, day, week, etc. to avoid future problems.

Thank you very much for helping me.