LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   software raid, corruption, not problem w/ media (https://www.linuxquestions.org/questions/linux-software-2/software-raid-corruption-not-problem-w-media-505412/)

exodist 11-27-2006 10:15 PM

software raid, corruption, not problem w/ media
 
first off some specs:

Pentium4 3.0 HT disabled
1gb ram
via onboard ide w/ 320 gb maxtor HD
via onboard sata w/ 1 250gb western digital hd
first silicon image sata controller, 4 western digital 250gb drives
second silicon image sata controller, 2 segate 300gb hd's
Gentoo up to date

What I have tried/problem, I will list what I have done, alternatives I have tried to fix it with, and problem. Extra debug info will follow:
I create a raid arrays, tied levels 1 and 5.Create and make filesystem up to this point works fine (tried both xfs and reiserfs, same prob occurs on both) Have tried several combinations of drives and controllers, all controllers/drives have same problem. I have tried both the 2.6.17 and 2.6.18 version kernels, no love.

Basically I create the raid, wait for it to resync, then I create the filesystem (xfs or reiserfs) then I copy a lot of files to it. I then try to delete some stuff/read some stuff, modify the drive in some way. Stuff fails or even segfaults requiring reboot, dmesg shows that io was trying to access way beyond the edge of the device numerous times saying it is going out to arbitrary block numbers that are often 10+ digits long. I run the filesystems scanning/repairing utility and it finds tons of files with incorrect sizes/lengths. usually this is it, but a few times on reiser it has found corruption requring the tree be rebuilt.

If I rpair the fs then scan it it is clean, I mount it, then unmount it and scan again, still clean, I try to make changes on the drive and once again get errors, scan says repair is needed, same drill as before it is all fixed, then scan says clean.

Once again I have tried different filesystems, different kernels, different raid builders (both mdadm and raidtools)

Filesystems on non raid partitions (including all partitions used in the raids) not corrupted after much use.

Problem as far as I can narrow it down to is in the raid array, not the filesystem or devices.

Forgot to mention, simply copying data from the drive also seems to ?cause? corruption. Like I said I can repair with the fs repair tool, after that scanning will say it is good, then I mount it and unmount it and scan still says good, but then I start copying files from it and after the first few I will start to get read errors and dmesg gets the io access beyond end of device errors. then scan finds the corruption... I have not tried mounting the fs read only, when this rebuild-tree is done I will do that (700gb raid takes a long time to repair)

some extra stuff:

Kernel When erros occurs (this is after repairing, scanning and finding it clean, then setting raid read-only and mounting read-only, then trying to copy data off of it)
Code:

md8: rw=0, want=4261343920, limit=1465175424
attempt to access beyond end of device
md8: rw=0, want=4261343920, limit=1465175424
Buffer I/O error on device md8, logical block 532667989
attempt to access beyond end of device
md8: rw=0, want=4261343920, limit=1465175424
Buffer I/O error on device md8, logical block 532667989
attempt to access beyond end of device
md8: rw=0, want=17166515696, limit=1465175424
attempt to access beyond end of device
md8: rw=0, want=17166515696, limit=1465175424
Buffer I/O error on device md8, logical block 2145814461
attempt to access beyond end of device
md8: rw=0, want=17166515696, limit=1465175424
Buffer I/O error on device md8, logical block 2145814461
attempt to access beyond end of device
md8: rw=0, want=18446744068902090496, limit=1465175424
attempt to access beyond end of device
md8: rw=0, want=18446744068902090496, limit=1465175424
Buffer I/O error on device md8, logical block 18446744073108618975
attempt to access beyond end of device
md8: rw=0, want=18446744068902090496, limit=1465175424
Buffer I/O error on device md8, logical block 18446744073108618975
attempt to access beyond end of device
md8: rw=0, want=4597359648, limit=1465175424
attempt to access beyond end of device
md8: rw=0, want=4597359648, limit=1465175424
Buffer I/O error on device md8, logical block 574669955
attempt to access beyond end of device
md8: rw=0, want=4597359648, limit=1465175424
Buffer I/O error on device md8, logical block 574669955
attempt to access beyond end of device
md8: rw=0, want=18446744070477053544, limit=1465175424
attempt to access beyond end of device
md8: rw=0, want=18446744070477053544, limit=1465175424
Buffer I/O error on device md8, logical block 18446744073305489356
attempt to access beyond end of device
md8: rw=0, want=18446744070477053544, limit=1465175424
Buffer I/O error on device md8, logical block 18446744073305489356

Here is a blotted out output from rsync when the errors occur
Code:

Luxor hd1 # rsync -aqP /Blotted/Out/Path/U* ./
rsync: read errors mapping "/Blotted/Out/file1.xxx": Input/output error (5)
rsync: read errors mapping "/Blotted/Out/file2.xxx": Input/output error (5)
rsync: read errors mapping "/Blotted/Out/file3.xxx": Input/output error (5)
rsync: read errors mapping "/Blotted/Out/file4.xxx": Input/output error (5)


exodist 12-05-2006 12:02 PM

the motherboard was defective apparently, but for some reason there is only corruption with the raid array, something to do w/ the ammount of bandwidth being used.


All times are GMT -5. The time now is 01:31 PM.