LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 05-30-2011, 02:25 PM   #1
superkev
LQ Newbie
 
Registered: May 2011
Posts: 8

Rep: Reputation: Disabled
Hard drive dropping out of RAID on reboot


Every time I reboot my server, one of my hard drives drops out of the RAID5 array. I'm pretty sure that there's nothing wrong with the drive itself. I bought all three drives at the same time, and they are identical in make/model/capacity. While the server is running, it's smooth sailing. However, whenever I shut down or reboot, I get an email message that the array is degraded. It's always /dev/sda1 that drops out of the array. I can always rebuild the array by adding the partition back in, but it's a bit of a pain. Any suggestions on how to troubleshoot this?

I've got MDADM 3.0.3 on Fedora 12. Linux 2.6.32.26-175.fc12.i686 on i686

Here is my mdadm.conf file:

MAILADDR email@address.com
MAILFROM raid@address.com
DEVICE /dev/sda1 /dev/sdc1 /dev/sdb1
ARRAY /dev/md0 level=raid5 devices=,dev/sdc1,/dev/sdb1,/dev/sda1
 
Old 05-30-2011, 03:21 PM   #2
T3RM1NVT0R
Senior Member
 
Registered: Dec 2010
Location: Internet
Distribution: Linux Mint, SLES, CentOS, Red Hat
Posts: 2,385

Rep: Reputation: 477Reputation: 477Reputation: 477Reputation: 477Reputation: 477
@ Reply

Hi there,

Quote:
MAILADDR email@address.com
MAILFROM raid@address.com
DEVICE /dev/sda1 /dev/sdc1 /dev/sdb1
ARRAY /dev/md0 level=raid5 devices=,dev/sdc1,/dev/sdb1,/dev/sda1
Why there is a comma at the beginning after = in the last line?

Did you perform a hardware diagnostic on all the disks?

As you said it always happens with /dev/sda1, did you perform fsck on that partition?

Please let us know the above information and we can take it from there. Am still confused why there is a comma in the last line. Was that just a typo ;-)
 
Old 05-30-2011, 03:37 PM   #3
superkev
LQ Newbie
 
Registered: May 2011
Posts: 8

Original Poster
Rep: Reputation: Disabled
Thanks for the reply!

Quote:
Originally Posted by T3RM1NVT0R View Post
Why there is a comma at the beginning after = in the last line?
Oops. That must be a typo. I used Webmin to create the array, so maybe it put the extra comma in there. Or, maybe I did that by mistake. I removed the comma. Not sure if that will fix the problem though, since I have to wait for the array to rebuild, and then reboot the server to see if the problem happens again.

Quote:
Did you perform a hardware diagnostic on all the disks?

As you said it always happens with /dev/sda1, did you perform fsck on that partition?
I removed the partition from the array and ran fsck but I get an error.

$ sudo fsck /dev/sda1
fsck from util-linux-ng 2.16.2
fsck: fsck.linux_raid_member: not found
fsck: Error 2 while executing fsck.linux_raid_member for /dev/sda1

I get the same error if the partition is part of the array.

I checked the SMART status of /dev/sda and it looks fine.
 
Old 05-30-2011, 03:57 PM   #4
T3RM1NVT0R
Senior Member
 
Registered: Dec 2010
Location: Internet
Distribution: Linux Mint, SLES, CentOS, Red Hat
Posts: 2,385

Rep: Reputation: 477Reputation: 477Reputation: 477Reputation: 477Reputation: 477
@ Reply

Did you perform this when the system was up, I think yes because you said that you are waiting for reboot. Then it seems to be bit usual as there might be some process locking it. I can see that you are running on Fedora 12. Did you try performing the fsck on /dev/sda1 using live CD?
 
Old 05-30-2011, 11:03 PM   #5
superkev
LQ Newbie
 
Registered: May 2011
Posts: 8

Original Poster
Rep: Reputation: Disabled
I haven't tried using a live CD because this is a very stripped-down headless box. I don't even have an optical drive installed in it right now. So, I've been trying many different ways to get fsck to run on /dev/sda1 but I haven't had any success. I removed the partition from the array, cleared the partition table, then created a new RAID partition using parted. When I try to run fsck on it, I get this:

$ sudo fsck /dev/sda1
fsck from util-linux-ng 2.16.2
e2fsck 1.41.9 (22-Aug-2009)
fsck.ext3: Group descriptors look bad... trying backup blocks...
fsck.ext3: Bad magic number in super-block when using the backup blocks
fsck.ext3: going back to original superblock
fsck.ext3: Device or resource busy while trying to open /dev/sda1
Filesystem mounted or opened exclusively by another program?

Maybe this is a dumb question, but how can it be mounted or opened exclusively when it's an unmounted brand new partition?
 
Old 05-31-2011, 01:57 AM   #6
T3RM1NVT0R
Senior Member
 
Registered: Dec 2010
Location: Internet
Distribution: Linux Mint, SLES, CentOS, Red Hat
Posts: 2,385

Rep: Reputation: 477Reputation: 477Reputation: 477Reputation: 477Reputation: 477
@ Reply

Well it might be possible that either you are having other partitions on the same device /dev/sda or the kernel is not releasing the lock. You can stop mounting of this at the first attempt during the boot and then try to run fsck.

You can check whether the system is able to read the partition table on that device this will also make sure that the device is not in use using the following command: partprobe /dev/sda

Last edited by T3RM1NVT0R; 05-31-2011 at 01:58 AM.
 
Old 05-31-2011, 08:46 AM   #7
superkev
LQ Newbie
 
Registered: May 2011
Posts: 8

Original Poster
Rep: Reputation: Disabled
I've tried preventing the RAID from mounting on reboot as well as preventing the start of services that might be trying to access it (SMB, mdmonitor) but still get the same error. I re-added /dev/sda1 to the array and let it rebuild overnight. This morning I unmounted the array and ran fsck on it. I was able to get a result.

$ sudo fsck -f -v /dev/md0
fsck from util-linux-ng 2.16.2
e2fsck 1.41.9 (22-Aug-2009)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information

189953 inodes used (0.10%)
14676 non-contiguous files (7.7%)
167 non-contiguous directories (0.1%)
# of inodes with ind/dind/tind blocks: 122934/55986/1
217313809 blocks used (29.67%)
0 bad blocks
3 large files

181727 regular files
8217 directories
0 character device files
0 block device files
0 fifos
0 links
0 symbolic links (0 fast symbolic links)
0 sockets
--------
189944 files

Still can't get it to run on /dev/sda1 though. Is it possible that fsck just won't run on individual RAID partitions?

Last edited by superkev; 05-31-2011 at 11:24 AM.
 
Old 05-31-2011, 09:44 AM   #8
superkev
LQ Newbie
 
Registered: May 2011
Posts: 8

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by T3RM1NVT0R View Post
You can check whether the system is able to read the partition table on that device this will also make sure that the device is not in use using the following command: partprobe /dev/sda
I rebooted the server to test again, and as usual, /dev/sda1 dropped out of the array. I ran partprobe and got no output at all. I can see the partition table in parted and in fdisk though.

GNU Parted 1.9.0
Using /dev/sda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print
Model: ATA WDC WD15EARS-00M (scsi)
Disk /dev/sda: 1500GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number Start End Size Type File system Flags
1 405MB 1500GB 1500GB primary raid

The reason the partition starts at 405MB is because I had to follow the directions here to get the three drives to align properly in Linux:

http://community.wdc.com/t5/Desktop/...EARS/td-p/6395

I'd understand if all three drives had the same problem, but it seems like /dev/sda is unique in having trouble on reboot and only on reboot.
 
Old 06-01-2011, 01:06 PM   #9
T3RM1NVT0R
Senior Member
 
Registered: Dec 2010
Location: Internet
Distribution: Linux Mint, SLES, CentOS, Red Hat
Posts: 2,385

Rep: Reputation: 477Reputation: 477Reputation: 477Reputation: 477Reputation: 477
@ Reply

Please paste the output of following commands:

1. cat /proc/mdstat
2. mdadm --detail /dev/md0 or /dev/md1 (whatever the name of the RAID device)
3. cat /etc/fstab
 
Old 06-01-2011, 01:13 PM   #10
superkev
LQ Newbie
 
Registered: May 2011
Posts: 8

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by T3RM1NVT0R View Post
Please paste the output of following commands:

1. cat /proc/mdstat
2. mdadm --detail /dev/md0 or /dev/md1 (whatever the name of the RAID device)
3. cat /etc/fstab
$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdb1[1] sda1[0] sdc1[2]
2929437824 blocks level 5, 4k chunk, algorithm 2 [3/3] [UUU]

unused devices: <none>

$ sudo mdadm --detail /dev/md0
/dev/md0:
Version : 0.90
Creation Time : Mon Dec 27 01:04:42 2010
Raid Level : raid5
Array Size : 2929437824 (2793.73 GiB 2999.74 GB)
Used Dev Size : 1464718912 (1396.86 GiB 1499.87 GB)
Raid Devices : 3
Total Devices : 3
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Wed Jun 1 14:07:56 2011
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 4K

UUID : 1aa42d21:250c2ab3:2a093f01:df0d1412 (local to host hostname)
Events : 0.29620

Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1

$ cat /etc/fstab

/dev/VolGroup00/LogVol01 / ext3 defaults 1 1
/dev/VolGroup00/LogVol02 /var ext3 defaults 1 2
LABEL=/boot /boot ext3 defaults 1 2
tmpfs /dev/shm tmpfs defaults 0 0
devpts /dev/pts devpts gid=5,mode=620 0 0
sysfs /sys sysfs defaults 0 0
proc /proc proc defaults 0 0
/dev/VolGroup00/LogVol00 swap swap defaults 0 0
/dev/md0 /var/shares/raid ext3 defaults 0 0
/dev/sde1 /mnt/usbdrive ext3 defaults 0 0
 
Old 06-01-2011, 02:55 PM   #11
T3RM1NVT0R
Senior Member
 
Registered: Dec 2010
Location: Internet
Distribution: Linux Mint, SLES, CentOS, Red Hat
Posts: 2,385

Rep: Reputation: 477Reputation: 477Reputation: 477Reputation: 477Reputation: 477
@Reply

Can you please paste the output of cat /var/log/dmesg, fogot to mention in previos post.
 
Old 06-01-2011, 03:16 PM   #12
superkev
LQ Newbie
 
Registered: May 2011
Posts: 8

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by T3RM1NVT0R View Post
Can you please paste the output of cat /var/log/dmesg, fogot to mention in previos post.
The output was pretty long, so I've attached it as a txt file instead of pasting it.

dmesg.txt
 
Old 06-02-2011, 12:47 PM   #13
T3RM1NVT0R
Senior Member
 
Registered: Dec 2010
Location: Internet
Distribution: Linux Mint, SLES, CentOS, Red Hat
Posts: 2,385

Rep: Reputation: 477Reputation: 477Reputation: 477Reputation: 477Reputation: 477
@ Reply

I had a look at the dmesg file and from that I can see that your /dev/sda1 is the root drive of RAID from the dmesg file it appears that /dev/sda1 is not getting enough time to join RAID and before that /dev/sdb1 and /dev/sdc1 joins the RAID and then /dev/sda1 left out because RAID do not care to look at the root RAID device again.

What we can try here is at the time of reboot edit the kernel line and at the end after a space put rootdelay=20. This will put a delay of 20 seconds before RAID start putting the devices in /dev/md and SCSI drives will get enough time to get initialized properly and might help us to prevent getting /dev/sda1 off RAID after reboot.

If this result in a success we can put this permanently in /boot/grub/grub.conf
 
Old 06-02-2011, 02:25 PM   #14
superkev
LQ Newbie
 
Registered: May 2011
Posts: 8

Original Poster
Rep: Reputation: Disabled
I tried and unfortunately it did not help. The drive dropped out again on reboot. I'm rebuilding the array again.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
RAID Setup, Hard Drive labels change after reboot! jonez Linux - Hardware 1 05-19-2008 10:10 PM
remount RAID drive after reboot with mdadm ufmale Linux - Software 1 11-15-2007 08:13 PM
Remounting raid (sda1) drive on reboot spammy163 Linux - Newbie 2 04-29-2006 01:58 PM
hard drive powers down on reboot postal26 Linux - Hardware 1 05-31-2004 09:03 AM
hard drive write causes reboot eastsuse Linux - Newbie 2 05-14-2004 10:49 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 01:25 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration