LinuxQuestions.org
Support LQ: Use code LQ3 and save $3 on Domain Registration
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices



Reply
 
Search this Thread
Old 04-13-2010, 08:14 PM   #1
dbrazeau
Member
 
Registered: Aug 2009
Distribution: Fedora, OpenSuse, DENX Embedded Linux
Posts: 162

Rep: Reputation: 28
mdadm RAID5 degraded/rebuild access issues.


I am trying to test my systems RAID5 recovery and I seem to be running into some issues.

So I created a RAID5 array with 3 drives using mdadm. I then create a ext3 file system on the RAID and copy a 3.5GB test file to the new file system on the RAID. I then proceed to fault and remove one of the drives from the RAID using mdadm. Up to this point all seems well. The now degraded RAID is still mounted and can see the test file on it using "ls -l".

From here on is where the trouble starts.

With the RAID still in degraded mode I try to copy(read) the test file and I get a bunch of errors. It doesn't matter if I try create a duplicate copy of it on the RAID or try to copy it to a drive not in the RAID. Adding the "faulted" drive back into the RAID and waiting for it to recover does not fix my issue, I still get the same errors.

Here are the errors I'm seeing:
Code:
# cp testfile.tar testfile_degraded.tar
attempt to access beyond end of device
md0: rw=0, want=15236514744, limit=22490368
__ratelimit: 626 callbacks suppressed
Buffer I/O error on device md0, logical block 1904564342
attempt to access beyond end of device
md0: rw=0, want=33612653048, limit=22490368
Buffer I/O error on device md0, logical block 4201581630
attempt to access beyond end of device
md0: rw=0, want=18764592712, limit=22490368
Buffer I/O error on device md0, logical block 2345574088
attempt to access beyond end of device
md0: rw=0, want=9395562552, limit=22490368
Buffer I/O error on device md0, logical block 1174445318
cp: read error: Input/output error
Does anyone know how to fix these errors?

Please let me know if there is any other information that would be helpful.

Last edited by dbrazeau; 04-13-2010 at 08:17 PM.
 
Old 04-13-2010, 11:41 PM   #2
anuragccsu
LQ Newbie
 
Registered: Jun 2009
Posts: 11

Rep: Reputation: 0
Hi there,

did you try to see the status of the array using -D option of mdadm what does it say, you can get the complete status of the RAID by this command option.
you can check it out and some more things as:
what was the status of the array before you failed it and what is after failure?
was your array full?!
you can execute sync command to write all unwritten buffers to be written to the disk as in your output it says shows some buffer IO error.
To me it seems as your array is full, try to copy a small file, or try creating files using touch command.

Thanks
Anurag
 
Old 04-14-2010, 12:03 PM   #3
dbrazeau
Member
 
Registered: Aug 2009
Distribution: Fedora, OpenSuse, DENX Embedded Linux
Posts: 162

Original Poster
Rep: Reputation: 28
My RAID is not full. I created an RAID5 that's a little bigger than 11GB. As I mention before my test file is about 3.5GB. At the time I fail the drive and put my RAID in degraded mode the 3.5GB test file is the only file on the RAID, so there is still about 7.5GB free.

Also I have no problem creating a file using touch when the RAID is in degraded mode. When I try to create a duplicate copy of the test file on the degraded RAID some of it actually gets copied. The first time I tried about 700MB was copied before the errors started happening, and the second time I tried about 350MB were copied before I got the errors. Either way there should be plenty available space on my RAID to complete my copy operation.

Based on the error returned from cp.
Code:
cp: read error: Input/output error
It seems like it is having trouble reading my test file, not writing more data to the RAID. Also to support this claim, as I mentioned before, I cannot copy the test file to a separate drive that is not part of the array.

So the basic issue is that when trying to read my test file off a degraded RAID5 I get a bunch of "out of bounce" errors.

Does anyone know what metrics are used for this error:
Code:
md0: rw=0, want=9395562552, limit=22490368
Is this 22490368 blocks, cylinders, bytes?
 
Old 04-14-2010, 04:35 PM   #4
dbrazeau
Member
 
Registered: Aug 2009
Distribution: Fedora, OpenSuse, DENX Embedded Linux
Posts: 162

Original Poster
Rep: Reputation: 28
Here is some more details. In this test I copied the 3.5GB test file to the RAID. From here I can still read back the test file fine. Then I fault one my partitions in my RAID. Now when I try to read the file I get the "attempt to access beyond end of device" errors. In this test I was able to copy about 3GB of the 3.5GB file before I started getting errors.

Here is details about my raid before failing one of the drives:
Code:
~ # mdadm -D /dev/md0
/dev/md0:
        Version : 0.90
  Creation Time : Wed Apr 14 12:50:54 2010
     Raid Level : raid5
     Array Size : 11245184 (10.72 GiB 11.52 GB)
  Used Dev Size : 5622592 (5.36 GiB 5.76 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Wed Apr 14 13:04:46 2010
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : ab1369c8:669b3be5:14975abc:932ab79d (local to host Testbox)
         Events : 0.38

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1
       2       8       33        2      active sync   /dev/sdc1
This is the command I use fault one of the drives:
Code:
~ # mdadm /dev/md0 -f /dev/sdb1
raid5: Disk failure on sdb1, disabling device.
raid5: Operation continuing on 2 devices.
mdadm: set /dev/sdb1 faulty in /RAID5 conf printout:
dev/md0
~ #  --- rd:3 wd:2
 disk 0, o:1, dev:sda1
 disk 1, o:0, dev:sdb1
 disk 2, o:1, dev:sdc1
RAID5 conf printout:
 --- rd:3 wd:2
 disk 0, o:1, dev:sda1
 disk 2, o:1, dev:sdc1
Here are the details about my RAID after faulting the drive:
Code:
~ # mdadm -D /dev/md0
/dev/md0:
        Version : 0.90
  Creation Time : Wed Apr 14 12:50:54 2010
     Raid Level : raid5
     Array Size : 11245184 (10.72 GiB 11.52 GB)
  Used Dev Size : 5622592 (5.36 GiB 5.76 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Wed Apr 14 13:18:40 2010
          State : clean, degraded
 Active Devices : 2
Working Devices : 2
 Failed Devices : 1
  Spare Devices : 0

         Layout : left-symmetric
     Chunk Size : 64K

           UUID : ab1369c8:669b3be5:14975abc:932ab79d (local to host Testbox)
         Events : 0.40

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       0        0        1      removed
       2       8       33        2      active sync   /dev/sdc1

       3       8       17        -      faulty spare   /dev/sdb1

~ # cat /proc/mdstat 
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] 
md0 : active raid5 sdb1[3](F) sdc1[2] sda1[0]
      11245184 blocks level 5, 64k chunk, algorithm 2 [3/2] [U_U]
I then try to copy the test file to a drive that is not in the RAID.
Here are the errors:
Code:
attempt to access beyond end of device
md0: rw=0, want=19245217848, limit=22490368
Buffer I/O error on device md0, logical block 2405652230
attempt to access beyond end of device
md0: rw=0, want=6566075848, limit=22490368
Buffer I/O error on device md0, logical block 820759480
attempt to access beyond end of device
md0: rw=0, want=27860599552, limit=22490368
Buffer I/O error on device md0, logical block 3482574943
attempt to access beyond end of device
md0: rw=0, want=16777306888, limit=22490368
Buffer I/O error on device md0, logical block 2097163360
...
I have googled this error, and it looks like others have had similar issues, but I was unable to find any solution to resolve it.
 
Old 04-14-2010, 10:43 PM   #5
anuragccsu
LQ Newbie
 
Registered: Jun 2009
Posts: 11

Rep: Reputation: 0
Hi there,

I assume from the error messages(attempt to access beyond end of device) that your hard disk itself is having some bad sectors because after failure you are able to put some data again and you get yourself into errors when you hit the bad sector on the disk so could you please carry out the same testing with some other disks(perfect)?

Thanks
Anurag
 
Old 04-15-2010, 01:12 PM   #6
dbrazeau
Member
 
Registered: Aug 2009
Distribution: Fedora, OpenSuse, DENX Embedded Linux
Posts: 162

Original Poster
Rep: Reputation: 28
Quote:
Originally Posted by anuragccsu View Post
Hi there,

I assume from the error messages(attempt to access beyond end of device) that your hard disk itself is having some bad sectors because after failure you are able to put some data again and you get yourself into errors when you hit the bad sector on the disk so could you please carry out the same testing with some other disks(perfect)?

Thanks
Anurag
I'll give it a go with some other disks, but I'm not so sure that is the problem since I have not seen any errors while the RAID is running in non-degraded mode. I have also ran some heavy data integrity tests on the non-degraded RAID with no errors. If there was a bad sector I would expect the data integrity test to fail.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
raid5 with mdadm does not ron or rebuild adiehl Linux - General 12 07-26-2010 11:01 AM
mdadm and degraded arrays radnoran Linux - Newbie 1 01-11-2010 02:18 PM
Fixing degraded RAID5 array ivanmacx Linux - Server 4 11-07-2009 04:59 AM
mdadm issues - active, degraded, Not Started lecnt Linux - General 6 02-23-2009 09:38 AM
mdadm: active, degraded ... what do I do? mesosphere Linux - General 7 06-09-2008 04:18 PM


All times are GMT -5. The time now is 05:48 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration