SuSE: large reiserfs disk partitions hanging
We have recently been experiencing a problem with a Linux server which is used as a NetBackup media server and disk cache.
The server is a Dell R710 with an attached MD3000 disk array running SuSE 10.2 x64. The array is divided into two LUNS, the larger of which is further partitioned into two volumes. Each of the 3 volumes on this disk is formatted with reiserfs.
During the backup run over the last two days the partition on the smaller LUN, and the larger partition on the larger LUN have become unresponsive. No writes are being performed to the filesystems and backup processes attempting to write seem to be hung. When running iostat I can see that there apears to be up to 900 read transactions per second on the disks but no write transactions. However the smaller partition on the largest LUN is unaffected. You also cannot get a folder listing while this is happening. Attempting to kill any backup "bptm" processes using kill -9 also fails if they are attempting to write to these areas.
There are no errors reported in the messages log, nor are there hardware errors reported on the array.
I have tried running fsck on both affected filesystems but this reports back as clean after about 9 hours of processing on each filesystem.
The affected filesystems are approximately 9TB and 7.5TB in size.
Any suggestions on how to resolve this would be greatly appreciated.