LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices

Reply
 
Search this Thread
Old 07-16-2009, 04:43 AM   #1
pi314
LQ Newbie
 
Registered: Jul 2009
Posts: 13

Rep: Reputation: 2
Software RAID failing?


Hello everyone!

I've read this forums for a long time and I've found them really useful.
Now I've got a problem and I would love you to help me. Thanks in advance!

I got a PC at home working as a server with Ubuntu server 7.10.
It has 5 hard disks:
  • 1 PATA/IDE disk (system + swap)
  • 4 SATA disks (software raid: array of 4 disks RAID 5)

Turning on the server, you can read:

Code:
fsck.ext3: Unable to resolve 'UUID=2ec026e5-1ed9-4007-9d4f-82fbfafb2d9f'
fsck died with exit status 8
* File system check failed
Please repair the file system manually

This is what "fdisk -l" shows:

Code:
Disk /dev/hda: 120.0 GB, 120000000000 bytes
255 heads, 63 sectors/track, 14589 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0xdc08dc08

   Device Boot      Start         End      Blocks   Id  System
/dev/hda1   *           1       13991   112382676   83  Linux
/dev/hda2           13992       14589     4803435    5  Extended
/dev/hda5           13992       14589     4803403+  82  Linux swap / Solaris

Disk /dev/sda: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00003b91

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1       60801   488384001   fd  Linux raid autodetect

Disk /dev/sdb: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x0005978d

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1       60801   488384001   fd  Linux raid autodetect

Disk /dev/sdc: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x000077ee

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1               1       60801   488384001   fd  Linux raid autodetect

Disk /dev/sdd: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00098808

   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1               1       60801   488384001   fd  Linux raid autodetect
md0 is missing!




Fstab:
Code:
# /etc/fstab: static file system information.
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
proc            /proc           proc    defaults        0       0
# /dev/hda1
#UUID=b2e37da0-e59b-41ff-9f14-f09cd75e4cb8 /               ext3    defaults,errors=remount-ro 0       1

# /dev/md0
UUID=2ec026e5-1ed9-4007-9d4f-82fbfafb2d9f /media/raid     ext3    defaults        0       2

# /dev/hda5
UUID=321f3c6b-19e1-4feb-9c27-8046d30188c1 none            swap    sw              0       0
/dev/hdb        /media/cdrom0   udf,iso9660 user,noauto,exec 0       0

I've tried to change this line in fstab:
UUID=2ec026e5-1ed9-4007-9d4f-82fbfafb2d9f /media/raid ext3 defaults 0 2
to:
/dev/md0 /media/raid ext3 defaults 0 2

Then "mount -a" or rebooting:
Code:
mount: special device /dev/disk/by-uuid/2ec026e5-1ed9-4007-9d4f-82fbfafb2d9f does not exist
mount: wrong fs type, bad option, bad superblock on /dev/md0,
       missing codepage or helper program, or other error
       (could this be the IDE device where you in fact use
       ide-scsi so that sr0 or sda or so is needed?)
       In some cases useful info is found in syslog - try
       dmesg | tail  or so
Could it be a Raid fail?
Do I need to create the array again?

Any kind of help will be really appreciated.
Thank you very much.
 
Old 07-16-2009, 07:15 AM   #2
eco
Member
 
Registered: May 2006
Location: BE
Distribution: Debian/Gentoo
Posts: 412

Rep: Reputation: 48
Hi,

At first glance it doesn't look like it's a RAID problem. Have you done any system changes, upgrades, ...?

Have a look in: /dev/disk/by-uuid and see if you can find 2ec026e5-1ed9-4007-9d4f-82fbfafb2d9f and look to see if /dev/md0 or /dev/md/0 still exist.

also try the following two command just in case:
Code:
# cat /proc/mdstat
# mdadm --detail /dev/md0
I suspect the system got rid of the UUID or that it changed. Maybe all you need is to add a new UUID to your RAID config file but get more info before doing any changes. You can really make things worst if you are not very careful.

I also noticed your '/' was commented out in your fstab!

Any logs?
 
Old 07-16-2009, 08:54 AM   #3
pi314
LQ Newbie
 
Registered: Jul 2009
Posts: 13

Original Poster
Rep: Reputation: 2
Quote:
Originally Posted by eco View Post
Hi,

At first glance it doesn't look like it's a RAID problem. Have you done any system changes, upgrades, ...?
No changes.

Quote:
Originally Posted by eco View Post
Hi,
Have a look in: /dev/disk/by-uuid and see if you can find 2ec026e5-1ed9-4007-9d4f-82fbfafb2d9f and look to see if /dev/md0 or /dev/md/0 still exist.
Can't find 2ec026e5-1ed9-4007-9d4f-82fbfafb2d9f. /dev/md0 does exist.

ls /dev/disk/by-uuid/ -l
Code:
total 0
lrwxrwxrwx 1 root root 10 2009-07-16 11:38 321f3c6b-19e1-4feb-9c27-8046d30188c1 -> ../../hda5
lrwxrwxrwx 1 root root  9 2009-07-16 11:38 ae4a00f0-52b4-4df7-9f2b-71e20fcf25de -> ../../sdc
lrwxrwxrwx 1 root root 10 2009-07-16 11:38 b2e37da0-e59b-41ff-9f14-f09cd75e4cb8 -> ../../hda1
Why only these three? What about sda, sdb, sdd...? Is it normal?


cat /proc/mdstat:
Code:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] 
md0 : inactive sdc1[2](S) sdd1[3](S) sdb1[1](S) sda1[0](S)
      1953535744 blocks
mdadm --detail /dev/md0:
Code:
mdadm: md device /dev/md0 does not appear to be active.
Quote:
Originally Posted by eco View Post
I also noticed your '/' was commented out in your fstab!
It's true. How can it work? It's not commented anymore.


Thank you very much for your help.
 
Old 07-16-2009, 09:16 AM   #4
eco
Member
 
Registered: May 2006
Location: BE
Distribution: Debian/Gentoo
Posts: 412

Rep: Reputation: 48
Hi again and sorry for late reply,

Well, you can start by adding a link to md0 and see if that helps (remember to set fstab back to what it was for your RAID)

# cd /dev/disk/by-uuid
# ln -s 2ec026e5-1ed9-4007-9d4f-82fbfafb2d9f /dev/md0

then try a 'mount -a' and see if it helps.

I think your RAID is fine but something happened with the system.

If this doesn't fix it we can dig deeper.

Best of luck.
 
Old 07-16-2009, 11:39 AM   #5
pi314
LQ Newbie
 
Registered: Jul 2009
Posts: 13

Original Poster
Rep: Reputation: 2
This gives an error:
ln -s 2ec026e5-1ed9-4007-9d4f-82fbfafb2d9f /dev/md0

It's like this, isn't it?
ln -s /dev/md0 2ec026e5-1ed9-4007-9d4f-82fbfafb2d9f

mount -a:
Code:
mount: wrong fs type, bad option, bad superblock on /dev/md0,
       missing codepage or helper program, or other error
       (could this be the IDE device where you in fact use
       ide-scsi so that sr0 or sda or so is needed?)
       In some cases useful info is found in syslog - try
       dmesg | tail  or so
Thanks a lot.
 
Old 07-17-2009, 04:29 AM   #6
eco
Member
 
Registered: May 2006
Location: BE
Distribution: Debian/Gentoo
Posts: 412

Rep: Reputation: 48
sorry, allways get those muddled up.

If you have the space I'd backup all RAID disks using dd and then try and force a rebuild of the raid but seriously, back it up first before you do any changes.

Did you not have the data backup up before the 'failure'?

Got t go now... got a call, sry
 
Old 07-17-2009, 06:01 AM   #7
pi314
LQ Newbie
 
Registered: Jul 2009
Posts: 13

Original Poster
Rep: Reputation: 2
I was preparing an incremental remote backup, but it wasn't working yet so I have no backups! Always the same...

Can I make exact 1:1 copies of the disks? With dd?

Thanks for all you efforts.
 
Old 07-18-2009, 03:12 AM   #8
pi314
LQ Newbie
 
Registered: Jul 2009
Posts: 13

Original Poster
Rep: Reputation: 2
Auto answering:

http://wiki.linuxquestions.org/wiki/Dd

And if you want to create a compressed image:

Code:
dd if=/dev/hdx | gzip > /path/to/image.gz
 
Old 07-19-2009, 02:31 PM   #9
pi314
LQ Newbie
 
Registered: Jul 2009
Posts: 13

Original Poster
Rep: Reputation: 2
It's taking about 8 hours to backup and compress every 500gb hard disk. Once it's completed, I'll try some dirty work...
 
Old 07-20-2009, 03:22 AM   #10
eco
Member
 
Registered: May 2006
Location: BE
Distribution: Debian/Gentoo
Posts: 412

Rep: Reputation: 48
Sorry, was away on a short holiday...

dd is slow but it's the best for making sure you have an exact copy of your disks in case of failure. Be sure to know which disk to restore to in the RAID.

I still think the problem is with the system and not the RAID software. Are you sure no changes where made? Was the box ever rebooted since the RAID was built? Does dmesg say anything more?

A long process might be to recreate an exact RAID and then dump the data back onto each disk and hope for the best.
 
Old 07-20-2009, 04:48 AM   #11
pi314
LQ Newbie
 
Registered: Jul 2009
Posts: 13

Original Poster
Rep: Reputation: 2
I hope that the 4 disks in the array are ok, and I can restore the info.

Should I make a backup of the system disk too?
If I do, I think it should be unmounted, right?
I could use a liveCD.
By the way I haven't unmounted aything to make backups of the disks of the array, is it ok?

Thank you very much.
 
Old 07-21-2009, 02:52 AM   #12
eco
Member
 
Registered: May 2006
Location: BE
Distribution: Debian/Gentoo
Posts: 412

Rep: Reputation: 48
Well, the disks of the array where never started as RAID disks so I can't see this as being a problem. For the system disk you are right, boot of a live CD and make the image. What I tend to do for testing is create a vm in say VirtualBox with a disk of the same size of the OS, dd back to the VM and boot the VM to see if it all works and make the changes there.

Also make a copy of all the info regarding your RAID as you will want it to be identical and make sure you don't write to the RAID disks when you recreate it.

Backups are your friends... :\
 
Old 07-25-2009, 10:17 AM   #13
pi314
LQ Newbie
 
Registered: Jul 2009
Posts: 13

Original Poster
Rep: Reputation: 2
Finally, after making backups of every disk, I installed Debian instead of Ubuntu.
During the installation, I built the raid array again because it looked damaged (raid 5 of 2 disks. It should be 4 disks).

After installing debian everything is fine. rsnapshot is already making backups every 4 hours, day, week, etc. to avoid future problems.

Thank you very much for helping me.
 
  


Reply

Tags
fsck, fstab, mdadm, raid, server, ubuntu


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Adding an old software-RAID array to a software-RAID installation.. Boot problems GarethM Linux - Hardware 2 05-05-2008 03:16 PM
RH enterprise server with failing raid gronzo Linux - Hardware 2 08-10-2007 06:15 PM
RAID-1 failing, is my brand new disk BAD?? sauce Linux - Server 1 05-24-2007 01:08 PM
Linear RAID failing to automount at boot metamechanical Linux - General 10 10-29-2006 08:01 PM
Slackware 10.2 HPT372 3 Drive RAID Storage failing ERRDivideByZero Slackware 10 02-09-2006 04:31 PM


All times are GMT -5. The time now is 03:07 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration