LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Server (http://www.linuxquestions.org/questions/linux-server-73/)
-   -   Software RAID failing? (http://www.linuxquestions.org/questions/linux-server-73/software-raid-failing-740481/)

pi314 07-16-2009 04:43 AM

Software RAID failing?
 
Hello everyone!

I've read this forums for a long time and I've found them really useful.
Now I've got a problem and I would love you to help me. Thanks in advance!

I got a PC at home working as a server with Ubuntu server 7.10.
It has 5 hard disks:
  • 1 PATA/IDE disk (system + swap)
  • 4 SATA disks (software raid: array of 4 disks RAID 5)

Turning on the server, you can read:

Code:

fsck.ext3: Unable to resolve 'UUID=2ec026e5-1ed9-4007-9d4f-82fbfafb2d9f'
fsck died with exit status 8
* File system check failed
Please repair the file system manually


This is what "fdisk -l" shows:

Code:

Disk /dev/hda: 120.0 GB, 120000000000 bytes
255 heads, 63 sectors/track, 14589 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0xdc08dc08

  Device Boot      Start        End      Blocks  Id  System
/dev/hda1  *          1      13991  112382676  83  Linux
/dev/hda2          13992      14589    4803435    5  Extended
/dev/hda5          13992      14589    4803403+  82  Linux swap / Solaris

Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00003b91

  Device Boot      Start        End      Blocks  Id  System
/dev/sda1              1      60801  488384001  fd  Linux raid autodetect

Disk /dev/sdb: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x0005978d

  Device Boot      Start        End      Blocks  Id  System
/dev/sdb1              1      60801  488384001  fd  Linux raid autodetect

Disk /dev/sdc: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x000077ee

  Device Boot      Start        End      Blocks  Id  System
/dev/sdc1              1      60801  488384001  fd  Linux raid autodetect

Disk /dev/sdd: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00098808

  Device Boot      Start        End      Blocks  Id  System
/dev/sdd1              1      60801  488384001  fd  Linux raid autodetect

md0 is missing!




Fstab:
Code:

# /etc/fstab: static file system information.
#
# <file system> <mount point>  <type>  <options>      <dump>  <pass>
proc            /proc          proc    defaults        0      0
# /dev/hda1
#UUID=b2e37da0-e59b-41ff-9f14-f09cd75e4cb8 /              ext3    defaults,errors=remount-ro 0      1

# /dev/md0
UUID=2ec026e5-1ed9-4007-9d4f-82fbfafb2d9f /media/raid    ext3    defaults        0      2

# /dev/hda5
UUID=321f3c6b-19e1-4feb-9c27-8046d30188c1 none            swap    sw              0      0
/dev/hdb        /media/cdrom0  udf,iso9660 user,noauto,exec 0      0


I've tried to change this line in fstab:
UUID=2ec026e5-1ed9-4007-9d4f-82fbfafb2d9f /media/raid ext3 defaults 0 2
to:
/dev/md0 /media/raid ext3 defaults 0 2

Then "mount -a" or rebooting:
Code:

mount: special device /dev/disk/by-uuid/2ec026e5-1ed9-4007-9d4f-82fbfafb2d9f does not exist
mount: wrong fs type, bad option, bad superblock on /dev/md0,
      missing codepage or helper program, or other error
      (could this be the IDE device where you in fact use
      ide-scsi so that sr0 or sda or so is needed?)
      In some cases useful info is found in syslog - try
      dmesg | tail  or so

Could it be a Raid fail?
Do I need to create the array again?

Any kind of help will be really appreciated.
Thank you very much.

eco 07-16-2009 07:15 AM

Hi,

At first glance it doesn't look like it's a RAID problem. Have you done any system changes, upgrades, ...?

Have a look in: /dev/disk/by-uuid and see if you can find 2ec026e5-1ed9-4007-9d4f-82fbfafb2d9f and look to see if /dev/md0 or /dev/md/0 still exist.

also try the following two command just in case:
Code:

# cat /proc/mdstat
# mdadm --detail /dev/md0

I suspect the system got rid of the UUID or that it changed. Maybe all you need is to add a new UUID to your RAID config file but get more info before doing any changes. You can really make things worst if you are not very careful.

I also noticed your '/' was commented out in your fstab!

Any logs?

pi314 07-16-2009 08:54 AM

Quote:

Originally Posted by eco (Post 3609453)
Hi,

At first glance it doesn't look like it's a RAID problem. Have you done any system changes, upgrades, ...?

No changes.

Quote:

Originally Posted by eco (Post 3609453)
Hi,
Have a look in: /dev/disk/by-uuid and see if you can find 2ec026e5-1ed9-4007-9d4f-82fbfafb2d9f and look to see if /dev/md0 or /dev/md/0 still exist.

Can't find 2ec026e5-1ed9-4007-9d4f-82fbfafb2d9f. /dev/md0 does exist.

ls /dev/disk/by-uuid/ -l
Code:

total 0
lrwxrwxrwx 1 root root 10 2009-07-16 11:38 321f3c6b-19e1-4feb-9c27-8046d30188c1 -> ../../hda5
lrwxrwxrwx 1 root root  9 2009-07-16 11:38 ae4a00f0-52b4-4df7-9f2b-71e20fcf25de -> ../../sdc
lrwxrwxrwx 1 root root 10 2009-07-16 11:38 b2e37da0-e59b-41ff-9f14-f09cd75e4cb8 -> ../../hda1

Why only these three? What about sda, sdb, sdd...? Is it normal?


cat /proc/mdstat:
Code:

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : inactive sdc1[2](S) sdd1[3](S) sdb1[1](S) sda1[0](S)
      1953535744 blocks

mdadm --detail /dev/md0:
Code:

mdadm: md device /dev/md0 does not appear to be active.
Quote:

Originally Posted by eco (Post 3609453)
I also noticed your '/' was commented out in your fstab!

It's true. How can it work? It's not commented anymore.


Thank you very much for your help.

eco 07-16-2009 09:16 AM

Hi again and sorry for late reply,

Well, you can start by adding a link to md0 and see if that helps (remember to set fstab back to what it was for your RAID)

# cd /dev/disk/by-uuid
# ln -s 2ec026e5-1ed9-4007-9d4f-82fbfafb2d9f /dev/md0

then try a 'mount -a' and see if it helps.

I think your RAID is fine but something happened with the system.

If this doesn't fix it we can dig deeper.

Best of luck.

pi314 07-16-2009 11:39 AM

This gives an error:
ln -s 2ec026e5-1ed9-4007-9d4f-82fbfafb2d9f /dev/md0

It's like this, isn't it?
ln -s /dev/md0 2ec026e5-1ed9-4007-9d4f-82fbfafb2d9f

mount -a:
Code:

mount: wrong fs type, bad option, bad superblock on /dev/md0,
      missing codepage or helper program, or other error
      (could this be the IDE device where you in fact use
      ide-scsi so that sr0 or sda or so is needed?)
      In some cases useful info is found in syslog - try
      dmesg | tail  or so

Thanks a lot.

eco 07-17-2009 04:29 AM

sorry, allways get those muddled up.

If you have the space I'd backup all RAID disks using dd and then try and force a rebuild of the raid but seriously, back it up first before you do any changes.

Did you not have the data backup up before the 'failure'?

Got t go now... got a call, sry

pi314 07-17-2009 06:01 AM

I was preparing an incremental remote backup, but it wasn't working yet so I have no backups! Always the same...

Can I make exact 1:1 copies of the disks? With dd?

Thanks for all you efforts.

pi314 07-18-2009 03:12 AM

Auto answering:

http://wiki.linuxquestions.org/wiki/Dd

And if you want to create a compressed image:

Code:

dd if=/dev/hdx | gzip > /path/to/image.gz

pi314 07-19-2009 02:31 PM

It's taking about 8 hours to backup and compress every 500gb hard disk. Once it's completed, I'll try some dirty work...

eco 07-20-2009 03:22 AM

Sorry, was away on a short holiday...

dd is slow but it's the best for making sure you have an exact copy of your disks in case of failure. Be sure to know which disk to restore to in the RAID.

I still think the problem is with the system and not the RAID software. Are you sure no changes where made? Was the box ever rebooted since the RAID was built? Does dmesg say anything more?

A long process might be to recreate an exact RAID and then dump the data back onto each disk and hope for the best.

pi314 07-20-2009 04:48 AM

I hope that the 4 disks in the array are ok, and I can restore the info.

Should I make a backup of the system disk too?
If I do, I think it should be unmounted, right?
I could use a liveCD.
By the way I haven't unmounted aything to make backups of the disks of the array, is it ok?

Thank you very much.

eco 07-21-2009 02:52 AM

Well, the disks of the array where never started as RAID disks so I can't see this as being a problem. For the system disk you are right, boot of a live CD and make the image. What I tend to do for testing is create a vm in say VirtualBox with a disk of the same size of the OS, dd back to the VM and boot the VM to see if it all works and make the changes there.

Also make a copy of all the info regarding your RAID as you will want it to be identical and make sure you don't write to the RAID disks when you recreate it.

Backups are your friends... :\

pi314 07-25-2009 10:17 AM

Finally, after making backups of every disk, I installed Debian instead of Ubuntu.
During the installation, I built the raid array again because it looked damaged (raid 5 of 2 disks. It should be 4 disks).

After installing debian everything is fine. rsnapshot is already making backups every 4 hours, day, week, etc. to avoid future problems.

Thank you very much for helping me.


All times are GMT -5. The time now is 03:14 AM.