ext3_readdir: directory #xxx contains a hole at offset
I am now on the fifth rebuild of my RedHat Linux 8 system and am getting a bit frustrated.
Here's my issue: I've got an ASUS P2B-DS motherboard with a pair of 450MHz P3's with just under 1GB of RAM. I'm using a Maxtor ATA/133 IDE controller with four brand new IDE drives, two identical 200GB Maxtor's and two identical 120GB Seagates. I'm running 2.4.20-24.8smp #1 SMP Mon Dec 1 13:19:19 EST [root@webserver log]# /sbin/hdparm -i /dev/hde /dev/hde: Model=ST3120026A, FwRev=3.06, SerialNo=3JT2ETEF Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% } RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4 BuffType=unknown, BuffSize=8192kB, MaxMultSect=16, MultSect=16 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=234441648 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 UDMA modes: udma0 udma1 udma2 udma3 udma4 *udma5 AdvancedPM=no WriteCache=enabled Drive conforms to: ATA/ATAPI-6 T13 1410D revision 2: 1 2 3 4 5 6 [root@webserver log]# /sbin/hdparm -i /dev/hdg /dev/hdg: Model=Maxtor 6Y200P0, FwRev=YAR41BW0, SerialNo=Y6188P3E Config={ Fixed } RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=57 BuffType=DualPortCache, BuffSize=7936kB, MaxMultSect=16, MultSect=16 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=268435455 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120} PIO modes: pio0 pio1 pio2 pio3 pio4 DMA modes: mdma0 mdma1 mdma2 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6 AdvancedPM=yes: disabled (255) WriteCache=enabled Drive conforms to: (null): 1 2 3 4 5 6 7 I do a full install of RedHat 8 and apply all of the up2date patches. Within about a week, I start getting lots of the following messages: Dec 12 07:04:33 webserver kernel: EXT3-fs error (device ide2(33,3)): ext3_readdir: directory #32898 contains a hole at offset 320040960 Dec 12 07:04:33 webserver kernel: EXT3-fs error (device ide2(33,3)): ext3_readdir: directory #32898 contains a hole at offset 320045056 Dec 12 07:04:33 webserver kernel: EXT3-fs error (device ide2(33,3)): ext3_readdir: directory #32898 contains a hole at offset 320049152 Dec 12 07:04:33 webserver kernel: EXT3-fs error (device ide2(33,3)): ext3_readdir: directory #32898 contains a hole at offset 320053248 Note that the offset's are a bit ridiculous. This directory obviously doesn't extend that far... Here's some info from debugfs on the aforementioned errors: [root@webserver log]# /sbin/debugfs /dev/hde3 debugfs 1.27 (8-Mar-2002) debugfs: stat <32898> Inode: 32898 Type: directory Mode: 0755 Flags: 0x0 Generation: 3797941428 User: 23 Group: 23 Size: 320081920 File ACL: 0 Directory ACL: 0 Links: 2 Blockcount: 8 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x3fd3712f -- Sun Dec 7 12:27:59 2003 atime: 0x3fd9bce0 -- Fri Dec 12 07:04:32 2003 mtime: 0x3fd3712f -- Sun Dec 7 12:27:59 2003 BLOCKS: (0):66185 TOTAL: 1 I originally set the machine up with software RAID (/dev/md), but found a thread that implied that it had problems with SMP. I then converted to just straight EXT3 volumes and am having the same problem. Note that there are NO errors logged in /var/log/messages as far as I/O to the hard disks. It just spontaneously starts spitting out these "hole" messages, typically around 04:00 some time within a week of the initial install. As a side note, I originally partitioned my system volume with just a "/", a "/boot" and a swap partition. In attempting to limit the spread of the corruption, on the second rebuild I created individual filesystems for each of the major directories. Since then, the corruption seems to be directed at "/var" and "/usr". One other point: I've only seen this issue on the Seagate 120GB drives which contain everything except /home and are only about 30% full. The Maxtor 200GB drives which probably contain 10x more files have never had problems. At this point, I am becoming suspicious of the following: - Maxtor IDE controller support - Seagate drives on Maxtor IDE controller - SMP issues - Newer kernels Has anyone else seen this issue? Any suggestions??? |
Filesystem corruption with Maxtor version of Promise ATA controllers and Seagate drvs
Wow! I would have saved myself a lot of pain and agony if I would have just taken notes...:rolleyes:
Okay, here I am, about seven months later and going down the same path (I completely forgot about the previously posted experience). The resolution to the aforementioned issue was to move all four drives onto the internal IDE controllers (PIIX4 if I remember correctly), and put a SCSI CDROM on the internal SCSI bus. Worked flawlessly and I pushed the whole thing out of my mind (obviously). Flash forward seven months--I've moved this system over to an Intel L440GX+ motherboard and upgraded to 750MHz PIII's. I also just finished upgrading my kernel to 2.4.26. Of course, if it ain't broke, break it, so... I've recently started lusting after Fedora Core 2 and 2.6, so I started looking at upgrading. Unfortunately, I ran into a nasty bug that had appeared in kernels shipped with Redhat prior to 8, and has cropped up again in the 2.6 code (see Redhat bugzilla #107880). Basically, when trying to do the Fedora Core 2 install/upgrade, it hangs at the "Loading AIC7XXX driver...". If you really want to know the details, check out the history with this MOBO and Linux... it's interesting--apparently a BIOS issue. I did, however, find that if you do the install with a "linux noprobe", it bypasses the AIC7XXX issue, but leaves me CDROM-less. So, the obvious solution (having forgotten the above posting) was to, you guessed it... install a Maxtor Ultra ATA controller and move the drives over to it, then put the CDROM on the IDE bus. :) To make a very long and painful story (I am recovering from pneumonia, so I've been banging on this full time for about 1 1/2 weeks), I ended up in exactly the same situation that I was in last December. It was only in Googling for this issue that I came across the above posting. Now, here's what's changed since that last posting:
The test is fairly trivial. Basically, all I have to do is mount the Seagate drive mount /dev/hde1 /mnt Then sync up the filesystems using rsync (the ReiserFS experiment was done with tar with the same results). rsync -axq --delete --exclude "/var/spool/squid" / /mnt After some period of time, I'll start seeing nasty I/O errors and such. An fsck of the drive reveals massive corruption. Naturally, doing the same on the Maxtor drive gives no issues whatsoever. Does anyone have any ideas? :Pengy: |
Well, it looks like I have proof of an incompatability with the Maxtor version of the Promise IDE controller and (some) Seagate drives. I have replaced the Maxtor TX2 controller with an SIIG Ultra ATA/133 IDE controller and the system has been rsyncing flawlessly for the last 18 hours. I now need to determine whether this is a Linux issue or a Maxtor issue.
In any case, if you have a Maxtor Ultra100 or Ultra133 controller, stick with Maxtor drives. With other drives, YMMV. Brett |
All times are GMT -5. The time now is 12:01 AM. |