Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Distribution: Fedora, OpenSuse, DENX Embedded Linux
Posts: 184
Rep:
mdadm RAID5 degraded/rebuild access issues.
I am trying to test my systems RAID5 recovery and I seem to be running into some issues.
So I created a RAID5 array with 3 drives using mdadm. I then create a ext3 file system on the RAID and copy a 3.5GB test file to the new file system on the RAID. I then proceed to fault and remove one of the drives from the RAID using mdadm. Up to this point all seems well. The now degraded RAID is still mounted and can see the test file on it using "ls -l".
From here on is where the trouble starts.
With the RAID still in degraded mode I try to copy(read) the test file and I get a bunch of errors. It doesn't matter if I try create a duplicate copy of it on the RAID or try to copy it to a drive not in the RAID. Adding the "faulted" drive back into the RAID and waiting for it to recover does not fix my issue, I still get the same errors.
Here are the errors I'm seeing:
Code:
# cp testfile.tar testfile_degraded.tar
attempt to access beyond end of device
md0: rw=0, want=15236514744, limit=22490368
__ratelimit: 626 callbacks suppressed
Buffer I/O error on device md0, logical block 1904564342
attempt to access beyond end of device
md0: rw=0, want=33612653048, limit=22490368
Buffer I/O error on device md0, logical block 4201581630
attempt to access beyond end of device
md0: rw=0, want=18764592712, limit=22490368
Buffer I/O error on device md0, logical block 2345574088
attempt to access beyond end of device
md0: rw=0, want=9395562552, limit=22490368
Buffer I/O error on device md0, logical block 1174445318
cp: read error: Input/output error
Does anyone know how to fix these errors?
Please let me know if there is any other information that would be helpful.
did you try to see the status of the array using -D option of mdadm what does it say, you can get the complete status of the RAID by this command option.
you can check it out and some more things as:
what was the status of the array before you failed it and what is after failure?
was your array full?!
you can execute sync command to write all unwritten buffers to be written to the disk as in your output it says shows some buffer IO error.
To me it seems as your array is full, try to copy a small file, or try creating files using touch command.
Distribution: Fedora, OpenSuse, DENX Embedded Linux
Posts: 184
Original Poster
Rep:
My RAID is not full. I created an RAID5 that's a little bigger than 11GB. As I mention before my test file is about 3.5GB. At the time I fail the drive and put my RAID in degraded mode the 3.5GB test file is the only file on the RAID, so there is still about 7.5GB free.
Also I have no problem creating a file using touch when the RAID is in degraded mode. When I try to create a duplicate copy of the test file on the degraded RAID some of it actually gets copied. The first time I tried about 700MB was copied before the errors started happening, and the second time I tried about 350MB were copied before I got the errors. Either way there should be plenty available space on my RAID to complete my copy operation.
Based on the error returned from cp.
Code:
cp: read error: Input/output error
It seems like it is having trouble reading my test file, not writing more data to the RAID. Also to support this claim, as I mentioned before, I cannot copy the test file to a separate drive that is not part of the array.
So the basic issue is that when trying to read my test file off a degraded RAID5 I get a bunch of "out of bounce" errors.
Does anyone know what metrics are used for this error:
Distribution: Fedora, OpenSuse, DENX Embedded Linux
Posts: 184
Original Poster
Rep:
Here is some more details. In this test I copied the 3.5GB test file to the RAID. From here I can still read back the test file fine. Then I fault one my partitions in my RAID. Now when I try to read the file I get the "attempt to access beyond end of device" errors. In this test I was able to copy about 3GB of the 3.5GB file before I started getting errors.
Here is details about my raid before failing one of the drives:
Code:
~ # mdadm -D /dev/md0
/dev/md0:
Version : 0.90
Creation Time : Wed Apr 14 12:50:54 2010
Raid Level : raid5
Array Size : 11245184 (10.72 GiB 11.52 GB)
Used Dev Size : 5622592 (5.36 GiB 5.76 GB)
Raid Devices : 3
Total Devices : 3
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Wed Apr 14 13:04:46 2010
State : clean
Active Devices : 3
Working Devices : 3
Failed Devices : 0
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
UUID : ab1369c8:669b3be5:14975abc:932ab79d (local to host Testbox)
Events : 0.38
Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 8 17 1 active sync /dev/sdb1
2 8 33 2 active sync /dev/sdc1
This is the command I use fault one of the drives:
Code:
~ # mdadm /dev/md0 -f /dev/sdb1
raid5: Disk failure on sdb1, disabling device.
raid5: Operation continuing on 2 devices.
mdadm: set /dev/sdb1 faulty in /RAID5 conf printout:
dev/md0
~ # --- rd:3 wd:2
disk 0, o:1, dev:sda1
disk 1, o:0, dev:sdb1
disk 2, o:1, dev:sdc1
RAID5 conf printout:
--- rd:3 wd:2
disk 0, o:1, dev:sda1
disk 2, o:1, dev:sdc1
Here are the details about my RAID after faulting the drive:
Code:
~ # mdadm -D /dev/md0
/dev/md0:
Version : 0.90
Creation Time : Wed Apr 14 12:50:54 2010
Raid Level : raid5
Array Size : 11245184 (10.72 GiB 11.52 GB)
Used Dev Size : 5622592 (5.36 GiB 5.76 GB)
Raid Devices : 3
Total Devices : 3
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Wed Apr 14 13:18:40 2010
State : clean, degraded
Active Devices : 2
Working Devices : 2
Failed Devices : 1
Spare Devices : 0
Layout : left-symmetric
Chunk Size : 64K
UUID : ab1369c8:669b3be5:14975abc:932ab79d (local to host Testbox)
Events : 0.40
Number Major Minor RaidDevice State
0 8 1 0 active sync /dev/sda1
1 0 0 1 removed
2 8 33 2 active sync /dev/sdc1
3 8 17 - faulty spare /dev/sdb1
~ # cat /proc/mdstat
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4]
md0 : active raid5 sdb1[3](F) sdc1[2] sda1[0]
11245184 blocks level 5, 64k chunk, algorithm 2 [3/2] [U_U]
I then try to copy the test file to a drive that is not in the RAID.
Here are the errors:
Code:
attempt to access beyond end of device
md0: rw=0, want=19245217848, limit=22490368
Buffer I/O error on device md0, logical block 2405652230
attempt to access beyond end of device
md0: rw=0, want=6566075848, limit=22490368
Buffer I/O error on device md0, logical block 820759480
attempt to access beyond end of device
md0: rw=0, want=27860599552, limit=22490368
Buffer I/O error on device md0, logical block 3482574943
attempt to access beyond end of device
md0: rw=0, want=16777306888, limit=22490368
Buffer I/O error on device md0, logical block 2097163360
...
I have googled this error, and it looks like others have had similar issues, but I was unable to find any solution to resolve it.
I assume from the error messages(attempt to access beyond end of device) that your hard disk itself is having some bad sectors because after failure you are able to put some data again and you get yourself into errors when you hit the bad sector on the disk so could you please carry out the same testing with some other disks(perfect)?
Distribution: Fedora, OpenSuse, DENX Embedded Linux
Posts: 184
Original Poster
Rep:
Quote:
Originally Posted by anuragccsu
Hi there,
I assume from the error messages(attempt to access beyond end of device) that your hard disk itself is having some bad sectors because after failure you are able to put some data again and you get yourself into errors when you hit the bad sector on the disk so could you please carry out the same testing with some other disks(perfect)?
Thanks
Anurag
I'll give it a go with some other disks, but I'm not so sure that is the problem since I have not seen any errors while the RAID is running in non-degraded mode. I have also ran some heavy data integrity tests on the non-degraded RAID with no errors. If there was a bad sector I would expect the data integrity test to fail.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.