Greetings
I have had this same problem:
end_request: I/O Error, dev sda
Buffer I/O error on device dm-2, logical block 326
SCSI error: <1 0 0 0> return code=0x70000
The dmesg shows 7 iterations of the following:
SCSI error : <1 0 0 0> return code = 0x70000
end_request : I/O error, dev sda, sector 3031605
we run ServeRAID on this as well, RHEL4u2 ... this partition (dm-2) is a very large ext3FS that acts as the mount for a postgres database that is very frequently used and taxed...the partition is in LVM2 so I thought maybe a LVM issue but:
The live dm data corresponds to what we see in the LVM2 metadata, so we can probably rule out any problem at the volume manager/device mapper layer:
Code:
vg00-lvol01: 0 2097152 linear 8:2 384
vg00-lvol09: 0 10289152 linear 8:2 643891584
vg00-lvol08: 0 4128768 linear 8:2 639762816
vg00-lvol10: 0 632881152 linear 8:2 2752896
vg00-lvol07: 0 4128768 linear 8:2 635634048
vg00-lvol06: 0 655360 linear 8:2 2097536
vg00-lvol05: 0 4128768 linear 8:2 679870848
vg00-lvol04: 0 6160384 linear 8:2 673710464
vg00-lvol03: 0 18481152 linear 8:2 655229312
vg00-lvol02: 0 1048576 linear 8:2 654180736
These linear mappings correspond to the device regions identified in the etc/lvm/backup/vg00 metadata file, for example:
Code:
lvol10 {
id = "v1j3Ii-GqDO-tVHF-y845-kEQj-vWe5-l4NT7f"
status = ["READ", "WRITE", "VISIBLE"]
segment_count = 1
segment1 {
start_extent = 0
extent_count = 9657 # 301.781 Gigabytes
type = "striped"
stripe_count = 1 # linear
stripes = [
"pv0", 42
]
}
}
pv0 is sda2 (8:2):
pv0 {
id = "0MWHVn-TYKx-0ifq-jCw8-KnrK-LVLD-BH5QGg"
device = "/dev/sda2" # Hint only
status = ["ALLOCATABLE"]
pe_start = 384
pe_count = 10625 # 332.031 Gigabytes
}
Exten 42 above ("pv0", 42) puts us right at the beginning of the region on the disk that is throwing back all the scsi errors...the region is only about 27k in size (that is throwing back the errors) and is high up on the deivce and corrosponds to the journal itself (for the ext3FS).
So I thought there is a problem with how the ips driver or firmware is dealing with the I/Os being sent down from the jbd driver (journaled block device). Immediately remaking the partition as ext2FS resolves the problem ... hmmm ...
So anyway, wanted to try to keep it ext3FS --- so tried:
increase the commit time to 30 seconds by editing /etc/fstab.
Code:
For example:
/dev/vg0/varvol /var ext3 commit=30 1 2
Then remount or reboot for the settings to take a effect on the filesystem.
No go with that so I also tried:
decreasing block flushing frequency via dirty_writeback_centisecs/dirty_expire_centisecs....
Code:
etc/sysctl.conf
vm.dirty_expire_centisecs = 8000
vm.dirty_writeback_centisecs = 2000
-- Again no go ... smartd disabled etc,, all the original items posted in this post ... well we have this problem at multiple locations running the same configuration ... funny thing is now the ext2FS is spitting out the SCSI I/O errors on dm-2 (lvol10) the postgres mount partition ... but only when doing a dd to the partition ... I tried the irqpoll option for booting, again no go -- it aint a hardware issue as 18 different sites are having the same problem ...
Anyone?? ;-)