LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 03-07-2013, 02:20 AM   #1
ravand
LQ Newbie
 
Registered: Mar 2013
Posts: 11

Rep: Reputation: Disabled
Raid failure - Need help


Yesterday, after a restart of the server, we had to make the horrifying experience that our home folder had been rollbacked for almost half a year!
We immediatly contacted our server Provider (Hetzner) and they told us that the raid md127 didn't start up correctly and that we should reload the raid manually.

I wanted to get help from this forum, since i am new to this whole subject and since there is a high risk in losing all your data if approaching falsely to this issue.

To our problem:

When typing in "cat /proc/mdstat":
http://puu.sh/2dtXE (Screenshot)
We can see, that md127 marks a _U. As far as i understood that means sdba4 can't be loaded but sdb4 is loaded.

When having a closer look into md127 with "mdadm -D /dev/md127":
http://puu.sh/2du19 (Screenshot)
We see, that partition number 0 has been removed and 1 is running

I would have also given you the etc/raidtab but for some reason its missing on our root!


As mentioned above i don't really know how to approach in such a case, do i just reactivate the raid with commands, copy it over to another, or do we even have to get the disk swapped?

I would be very thankful for any kind of help and advice you can give me.
I am kind of scared to lose our important data, thats why i am asking here :S I hope you have comprehension for that

Thanks in advance
ravand

Last edited by ravand; 03-07-2013 at 04:53 AM.
 
Old 03-07-2013, 05:03 AM   #2
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,289

Rep: Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034
1. How about /etc/mdadm.conf ?

2. what distro+version
Code:
uname -a

cat /etc/*release*
3. It looks like a strange setup; you appear to have 4 RAID1 (mirror) sets, but only 2 physical disks: sda, sdb.
This is not a good idea if one disk goes bad, all RAID sets would be affected.

md0 = sda1, sdb1
md1 = sda2, sdb2
md2 = sda3, sdb3

& I suspect md3 should be = sda4, sdb4.

What you appear to have is md3 has split into 2 single disk RAID1 sets; md3 & md127.
Can you check the conf file or somehow other check how the RAID sets were built eg ask your Provider ?
 
1 members found this post helpful.
Old 03-07-2013, 05:26 AM   #3
ravand
LQ Newbie
 
Registered: Mar 2013
Posts: 11

Original Poster
Rep: Reputation: Disabled
1. I haven't found the mdadm.conf in the /etc folder but in the /etc/mdadm/ folder but it doesn't say much :/
Quote:
DEVICES /dev/[hs]d*
MAILADDR xxxxx@xxxxxx
MAILFROM xxxxx@xxxxx
I x'ed the email address for private purposes

2. The kernel is:
Quote:
Linux localhost 2.6.32-5-amd64 #1 SMP Sun May 6 04:00:17 UTC 2012 x86_64 GNU/Linux
The command you provided for distro didn't work neither did "cat /etc/*-release"

But i got it working with "lsb_release -a" :

Quote:
No LSB modules are available.
Distributor ID: Debian
Description: Debian GNU/Linux 6.0.6 (squeeze)
Release: 6.0.6
Codename: squeeze
3. Do u think that would explain the odd number 127?

Last edited by ravand; 03-07-2013 at 05:36 AM.
 
Old 03-07-2013, 05:41 AM   #4
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,289

Rep: Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034
2. actually, there's no '-' in my 'cat' cmd; its deliberate so that it usually works on most distros.

3. It certainly looks like it, given the other RAID arrays and their numbering.
That's why its important to find out if they are 2 halves of the same RAID set, or if you've broken 2 sets. My money is on the former.
Whoever built the sets should know... and you NEED to know before you try fixing anything.
Incidentally, if you can avoid using those 2 and ideally unmount them, that should stop any further drift in content.
 
1 members found this post helpful.
Old 03-07-2013, 05:49 AM   #5
ravand
LQ Newbie
 
Registered: Mar 2013
Posts: 11

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by chrism01 View Post
2. actually, there's no '-' in my 'cat' cmd; its deliberate so that it usually works on most distros.

3. It certainly looks like it, given the other RAID arrays and their numbering.
That's why its important to find out if they are 2 halves of the same RAID set, or if you've broken 2 sets. My money is on the former.
Whoever built the sets should know... and you NEED to know before you try fixing anything.
Incidentally, if you can avoid using those 2 and ideally unmount them, that should stop any further drift in content.
We haven't really touched anything on the raids it must have happened automatically.
Do u have any other ways of finding out how everything looked like before the incident since the conf files dont provide anything for some reason :/
Would the provider know?

EDIT: I might have found something that could support your assumption

When typing in "mdadm --detail --scan >> /etc/mdadm/mdadm.conf" i get the following + 1 error message:

mdadm.conf:
Quote:
ARRAY /dev/md/0 metadata=1.2 name=rescue:0 UUID=457ffb60:47f0ba44:0aa1b92a:647d5935
ARRAY /dev/md/1 metadata=1.2 name=rescue:1 UUID=b36da940:7c5b51e8:78805318:4bf6110a
ARRAY /dev/md/2 metadata=1.2 name=rescue:2 UUID=182a8f0d:8f295d0a:bf4e2ebf:7113e813
ARRAY /dev/md/3 metadata=1.2 name=rescue:3 UUID=45958b4b:1024b8cb:30a98470:705d7110
error:
Quote:
mdadm: cannot open /dev/md/rescue:3: No such file or directory
EDIT2:

Also here a screenshot of md3 details. Both md127 and md3 refer to the name "rescue:3" do u think that is a hint for a split?
http://puu.sh/2dxdd

Last edited by ravand; 03-07-2013 at 06:06 AM.
 
Old 03-07-2013, 11:17 AM   #6
ravand
LQ Newbie
 
Registered: Mar 2013
Posts: 11

Original Poster
Rep: Reputation: Disabled
EDIT: Sry i didn't want to spam that hard it seemed like i was lagging or the webpage so i may have accidently clicked post several times

Last edited by ravand; 03-07-2013 at 12:14 PM.
 
Old 03-07-2013, 11:18 AM   #7
ravand
LQ Newbie
 
Registered: Mar 2013
Posts: 11

Original Poster
Rep: Reputation: Disabled
EDIT: Sry i didn't want to spam that hard it seemed like i was lagging or the webpage so i may have accidently clicked post several times

Last edited by ravand; 03-07-2013 at 12:14 PM.
 
Old 03-07-2013, 11:19 AM   #8
ravand
LQ Newbie
 
Registered: Mar 2013
Posts: 11

Original Poster
Rep: Reputation: Disabled
I unmounted md127 to see what would happen, i restarted the server and the md127 entry was gone, also the md127 file in /dev/. The /home directory is empty now

Is that normal? Or did we screw up here?

also we get this error:
Quote:
mount: wrong fs type, bad option, bad superblock on /dev/md3,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so

EDIT: Sry i didn't want to spam that hard it seemed like i was lagging or the webpage so i may have accidently clicked post several times

Last edited by ravand; 03-07-2013 at 11:27 AM.
 
Old 03-07-2013, 05:08 PM   #9
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,289

Rep: Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034Reputation: 2034
1. check the partitions again
Code:
cat /proc/mdstat

mdadm --detail /dev/md3
mdadm --detail /dev/md127
2. Might be worth listing the disks/partitions as well
Code:
fdisk -l
that's a lowercase L

3. Do ask your provider how they set it up

4. hope you have a backup
 
Old 03-08-2013, 05:10 AM   #10
ravand
LQ Newbie
 
Registered: Mar 2013
Posts: 11

Original Poster
Rep: Reputation: Disabled
It seems like the md127 has reapeared after another restart but the home directory is still empty

1. I noticed that after typing the command md3 and md127 both say "(auto-read-only)" What does that mean?
Quote:
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md127 : active (auto-read-only) raid1 sdb4[1]
1822442815 blocks super 1.2 [2/1] [_U]

md3 : active (auto-read-only) raid1 sda4[0]
1822442815 blocks super 1.2 [2/1] [U_]

md2 : active raid1 sda3[0] sdb3[1]
1073740664 blocks super 1.2 [2/2] [UU]

md1 : active raid1 sda2[0] sdb2[1]
524276 blocks super 1.2 [2/2] [UU]

md0 : active (auto-read-only) raid1 sda1[0] sdb1[1]
33553336 blocks super 1.2 [2/2] [UU]

unused devices: <none>
mdadm --detail /dev/md3:
Quote:
/dev/md3:
Version : 1.2
Creation Time : Sat Jun 23 13:47:29 2012
Raid Level : raid1
Array Size : 1822442815 (1738.02 GiB 1866.18 GB)
Used Dev Size : 1822442815 (1738.02 GiB 1866.18 GB)
Raid Devices : 2
Total Devices : 1
Persistence : Superblock is persistent

Update Time : Thu Mar 7 17:51:13 2013
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0

Name : rescue:3
UUID : 45958b4b:1024b8cb:30a98470:705d7110
Events : 3101836

Number Major Minor RaidDevice State
0 8 4 0 active sync /dev/sda4
1 0 0 1 removed
mdadm --detail /dev/md127:
Quote:
/dev/md127:
Version : 1.2
Creation Time : Sat Jun 23 13:47:29 2012
Raid Level : raid1
Array Size : 1822442815 (1738.02 GiB 1866.18 GB)
Used Dev Size : 1822442815 (1738.02 GiB 1866.18 GB)
Raid Devices : 2
Total Devices : 1
Persistence : Superblock is persistent

Update Time : Thu Mar 7 17:51:13 2013
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0

Name : rescue:3
UUID : 45958b4b:1024b8cb:30a98470:705d7110
Events : 67654

Number Major Minor RaidDevice State
0 0 0 0 removed
1 8 20 1 active sync /dev/sdb4


2. fdisk -l gives the following:
Quote:
WARNING: GPT (GUID Partition Table) detected on '/dev/sdb'! The util fdisk doesn't support GPT. Use GNU Parted.


Disk /dev/sdb: 3000.6 GB, 3000592982016 bytes
255 heads, 63 sectors/track, 364801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00000000

Device Boot Start End Blocks Id System
/dev/sdb1 1 267350 2147483647+ ee GPT
Partition 1 does not start on physical sector boundary.

WARNING: GPT (GUID Partition Table) detected on '/dev/sda'! The util fdisk doesn't support GPT. Use GNU Parted.


Disk /dev/sda: 3000.6 GB, 3000592982016 bytes
256 heads, 63 sectors/track, 363376 cylinders
Units = cylinders of 16128 * 512 = 8257536 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00000000

Device Boot Start End Blocks Id System
/dev/sda1 1 266306 2147483647+ ee GPT
Partition 1 does not start on physical sector boundary.

Disk /dev/md0: 34.4 GB, 34358616064 bytes
2 heads, 4 sectors/track, 8388334 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00000000

Disk /dev/md0 doesn't contain a valid partition table

Disk /dev/md1: 536 MB, 536858624 bytes
2 heads, 4 sectors/track, 131069 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00000000

Disk /dev/md1 doesn't contain a valid partition table

Disk /dev/md2: 1099.5 GB, 1099510439936 bytes
2 heads, 4 sectors/track, 268435166 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00000000

Disk /dev/md2 doesn't contain a valid partition table

Disk /dev/md3: 1866.2 GB, 1866181442560 bytes
2 heads, 4 sectors/track, 455610703 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00000000

Disk /dev/md3 doesn't contain a valid partition table

Disk /dev/md127: 1866.2 GB, 1866181442560 bytes
2 heads, 4 sectors/track, 455610703 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk identifier: 0x00000000

Disk /dev/md127 doesn't contain a valid partition table
Also here the fstab:
Quote:
proc /proc proc defaults 0 0
none /dev/pts devpts gid=5,mode=620 0 0
/dev/md/0 none swap sw 0 0
/dev/md/1 /boot ext3 defaults 0 0
/dev/md/2 / ext4 defaults 0 0
#Old entry
#/dev/md/3 /home ext4 defaults 0 0
/dev/md127 /home ext4 defaults 0 0
/dev/md3 /home ext4 defaults 0 0
tmpfs /ramdisk tmpfs defaults,size=6000M 0 0
3. I have read somewhere that if a raid can't be loaded the system tries to create a new raid set as a mirror of the broken one and all of these automatic created raids start with the number 127-. So in this case i would assume that the setup was md0,md1,md2,md3. Also if you check on the hetzner wiki about repairing a broken raid they are also talking about this setup. However if you think this is not yet enough information we can contact the provider again to be 100% sure.

4. Hmm... More or less. We had most of our backups in the home directy which has been rollbacked for 5 months (i honestly dont understand why 5 months) and we only have backups that are 2-3 months old on external devices. We are kind of in problematic situation.


EDIT: Btw if we can't manage to get the raids mounted again, do you know any way of extracting or bumping the content of a raid1 file to a directory or isn't this possible? We are planning on formatting the whole system IF we can get the files back

Last edited by ravand; 03-08-2013 at 05:29 AM.
 
Old 03-08-2013, 05:53 AM   #11
whizje
Member
 
Registered: Sep 2008
Location: The Netherlands
Distribution: Slackware64 current
Posts: 583

Rep: Reputation: 129Reputation: 129
Code:
mdadm -S /dev/md127
mdadm --add /dev/md3 /dev/sdb4
Stop the md127 array and add the disk back to md3. Remove md127 from fstab and add /dev/md/3 back to fstab.

Last edited by whizje; 03-08-2013 at 06:00 AM.
 
1 members found this post helpful.
Old 03-08-2013, 06:01 AM   #12
ravand
LQ Newbie
 
Registered: Mar 2013
Posts: 11

Original Poster
Rep: Reputation: Disabled
Ok i have done that now i get the following:

Quote:
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md3 : active (auto-read-only) raid1 sda4[0] sdb4[1]
1822442815 blocks super 1.2 [2/1] [U_]

md2 : active raid1 sda3[0] sdb3[1]
1073740664 blocks super 1.2 [2/2] [UU]

md1 : active raid1 sda2[0] sdb2[1]
524276 blocks super 1.2 [2/2] [UU]

md0 : active (auto-read-only) raid1 sda1[0] sdb1[1]
33553336 blocks super 1.2 [2/2] [UU]

unused devices: <none>
The home folder is still empty tho. I still get the "auto-read-only" i didnt have that before. Any explanation?
 
Old 03-08-2013, 06:06 AM   #13
whizje
Member
 
Registered: Sep 2008
Location: The Netherlands
Distribution: Slackware64 current
Posts: 583

Rep: Reputation: 129Reputation: 129
after editing fstab do a
Code:
mount -a
 
1 members found this post helpful.
Old 03-08-2013, 06:08 AM   #14
ravand
LQ Newbie
 
Registered: Mar 2013
Posts: 11

Original Poster
Rep: Reputation: Disabled
THis is what i get for mount -a:
Quote:
mount: none already mounted or /dev/pts busy
mount: according to mtab, devpts is already mounted on /dev/pts
mount: wrong fs type, bad option, bad superblock on /dev/md3,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so

mount: wrong fs type, bad option, bad superblock on /dev/md3,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
EDIT:
Here the dmesg tail:
Quote:
[ 19.551459] INFO-xpp: FEATURE: with sync_tick() from DAHDI
[ 19.643844] INFO-xpp_usb: revision Unknown
[ 19.644024] usbcore: registered new interface driver xpp_usb
[ 20.307779] dahdi: Registered tone zone 0 (United States / North America)
[ 21.759758] eth0: no IPv6 routers present
[ 34.032136] [drm] Initialized drm 1.1.0 20060810
[ 34.635433] lp: driver loaded but no devices found
[ 34.771899] ppdev: user-space parallel port driver
[ 142.734088] EXT4-fs (md3): VFS: Can't find ext4 filesystem
[ 142.742375] EXT4-fs (md3): VFS: Can't find ext4 filesystem
EDIT2:
md3 details give this:
Quote:
/dev/md3:
Version : 1.2
Creation Time : Sat Jun 23 13:47:29 2012
Raid Level : raid1
Array Size : 1822442815 (1738.02 GiB 1866.18 GB)
Used Dev Size : 1822442815 (1738.02 GiB 1866.18 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent

Update Time : Fri Mar 8 12:58:03 2013
State : clean, degraded
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1

Name : rescue:3
UUID : 45958b4b:1024b8cb:30a98470:705d7110
Events : 3101840

Number Major Minor RaidDevice State
0 8 4 0 active sync /dev/sda4
1 8 20 1 spare rebuilding /dev/sdb4
It now says "spare rebuilding"

Last edited by ravand; 03-08-2013 at 06:11 AM.
 
Old 03-08-2013, 06:23 AM   #15
whizje
Member
 
Registered: Sep 2008
Location: The Netherlands
Distribution: Slackware64 current
Posts: 583

Rep: Reputation: 129Reputation: 129
try fsck /dev/md3
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Software RAID (mdadm) - RAID 0 returns incorrect status for disk failure/disk removed Marjonel Montejo Linux - General 4 10-04-2009 06:15 PM
Dual drive failure in RAID 5 (also, RAID 1, and LVM) ABL Linux - Server 6 05-27-2009 08:01 PM
RAID mdadm - Sending E-Mails on RAID Failure? rootking Linux - General 1 12-25-2007 03:59 AM
Raid Failure karan101 Linux - General 1 04-14-2005 08:14 PM
RAID I failure joseph_1970 Linux - Software 7 12-22-2003 06:05 PM


All times are GMT -5. The time now is 09:44 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration