LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   recurring ext3 error (https://www.linuxquestions.org/questions/linux-general-1/recurring-ext3-error-140316/)

jkobrien 01-30-2004 07:16 AM

recurring ext3 error
 
Hi,

I have a dual-boot system, running 2.4.23 (RH9) on the Linux partition. I've had to switch back on forth between the two quite a bit lately and am forced to do an fsck check every third or fourth reboot.

When I come in the morning, having left the computer at the linux login screen (runlevel 3) there's often the following message on the screen...

EXT-fs erro (device ide0(3,1)): ext3_readdir: bad entry in directory #11: rec_len % 4 != 0 - offset=0, inode = 1146636005, rec_len=8247, name_len=32.

My / and /boot partitions are both ext3. Is ext3 a bad idea for /boot? Someone said this to me somewhere once, but I can't remember where or why.

How can I tell which directory is #11? Also, I've tried searching for that inode but turn up nothing.

Any suggestions?

Thanks in advance,

John

moses 01-30-2004 07:30 AM

/boot isn't changing much, so ext3 isn't really necessary. It shouldn't really matter, but if it concerns you, you can remove the journaling from an ext3 filesystem (revert to ext2) without too much of a problem. Read the man page for tune2fs.

Is the error message always the same, or is it similar to the one above?

jkobrien 01-30-2004 07:54 AM

It's always the same message. I couldn't swear to the inode, and some of the other numbers, but other than that it's the same.

Yeah, I thought of switching back to ext2, but wouldn't that be just switching off the error message? Or could the journalling itself be causing the problem? I guess this is the same thing that's causing me to have to do a manual fsck so often.

Thanks for the reply,

John

Shade 01-30-2004 02:18 PM

You're sure it's being halted cleanly between reboots?

--Shade

moses 01-31-2004 12:10 AM

Quote:

Originally posted by jkobrien
It's always the same message. I couldn't swear to the inode, and some of the other numbers, but other than that it's the same.

Yeah, I thought of switching back to ext2, but wouldn't that be just switching off the error message? Or could the journalling itself be causing the problem? I guess this is the same thing that's causing me to have to do a manual fsck so often.

Thanks for the reply,

John

Actually, the numbers are the important part. If they're the same, you may just have a bad cluster on your disk. If it's changing, then something more random is happening. . . IF the journal is the problem, then turning it off will do away with the error (on a level other than just hiding it).
The error IS an ext3 error. I recommend removing the journal (you can always go back to ext3
if it's not the problem) and looking for errors. . .

jkobrien 02-02-2004 11:01 AM

Hi,

Shade, yes, the reboots are always clean - /sbin/reboot or /sbin/init 0.

Moses, a bad cluster could be it alright. Between filesystem checks, the error messages are identical. I've just run fsck and will wait a day or two to see if they're still the same. After that I'll convert /boot to ext2 and let you know what happens.

Thanks again,

John

moses 02-03-2004 12:57 AM

If it's something physically wrong with the drive, changing partitions won't fix it. . . Back up your /boot to somewhere safe (CD-ROM) before you make any changes. . . Back up your other important stuff from this drive too.

jkobrien 02-16-2004 12:35 PM

Hi,

Remember me?

I left my system for a few days booted to linux and didn't see the error recur. Then I rebooted to MS-Win for another few days and when I rebooted back to linux there was the same error message again. The only difference being the "rec_len = 8247" above was now "rec_len = 8259". All other numbers were identical. Could this be a symptom of a bad cluster? When I ran fsck, there was only one error reported this time, as opposed to quite a list the last time.

It seems to me like something that happens while booted to MS-Win is causing the problem. All I can think of is a nightly virus scan that our sysadmin has scheduled. I wouldn't have thought that MS-Win programs would even be aware of the linux partitions though.

I've now converted /boot to ext2 (and backed up! Good advice!) and will wait again for a few days to see what happens.

John

moses 02-19-2004 11:22 PM

I don't know what Windows could do to the partition. My first inclination is to think it's just a coincidence that this happened after booting to windows. However, it'll stay on the back burner. . .
How often were you rebooting to Linux? How long would the system stay in Linux? How long were you using Linux before you'd notice the error messages? Have you looked in the syslog, message, and dmesg for the error? Does it occur during boot, or at some other time?

jkobrien 03-25-2004 11:29 AM

Hi,

I'm a bit anal about loose ends. So this is just to report that since converting the /boot partition back to ext2 that error hasn't re-occured and there's no sign yet of any other repercussions.

I think it was some sort of journalling error. The message came up after every third or fourth reboot or sometimes would appear at the login prompt and then wouldn't reoccur for a few weeks (which why I've waited so long to pronounce it gone).

Anyway, academic now I guess.

John

moses 03-25-2004 08:08 PM

I'm not convinced it's just the filesystem error--usually there's a "good" reason for the error, and with filesystems, it's usually either a bug (those get reported relatively quickly on filesystems since it's very very important to have a reliable filesystem) or a disk problem. I'm guessing that your issue is actually a disk problem that isn't being "activated" as often by ext2 as it was by ext3. I would still be careful about backups. . .

jkobrien 03-26-2004 02:57 AM

Thanks for the tip, moses. I'll do that.

If I could have tracked down the error, I would have stuck with ext3 but as I couldn't trace the directory or file node, I couldn't get anywhere with it.

John

jkobrien 04-15-2004 07:23 AM

Hi,

Back from the "spoke-too-soon" department. That error has recurred. In slightly different format this time - presumably because the filesystem is now ext2 rather than 3.

EXT2-fs error (device ide0(3,1)): ext2_check_page: bad entry in directory #11: unaligned directory entry - offset=1024, inode = 827868901, rec_len = 13622, name_len = 32

find . -inum 827868901 gives me

./proc/1325/fd/4: No such file or directory

I've been mostly using the MS-Win partition lately but had switched back to Linux occasionally with no sign of any problems. I switched to Linux again this morning and left it at the login screen (runlevel 3) over lunch. When I came back just now the above message was on screen.

I've just checked dmesg and the same error is there.

Any ideas?

Thanks in advance,

John

moses 04-26-2004 11:57 AM

I don't know, it really looks like a disk issue to me. . . You might be able to mitigate it with a bad blocks check using the ext2 tools, but I don't know. Physical hardware problems (especially hard disks) are difficult to get around, and once they start going from bad to worse, I've usually just given up and purchased new hardware--it's usually not worth my time to fight with bad hardware. Anyway, my suggestion is that you keep making backups of your important data (don't overwrite your old backups, make new ones) and wait until you either find the cause of the error or decide just to give up and get a new disk. If you're not having real data loss issues, you can probably make it for quite some time before you need to replace the drive.
It's also possible, though improbable, that it's not a drive issue and is, instead, a bus issue. I say improbable because the error looks like the disk is returning a bad result when a read is performed on specific bad blocks. Try the filesystem tools, I know there are ways to check for (and then mark as unusable) bad blocks on the drive. . . man ext2fs tune2fs, fsck, etc. . .


All times are GMT -5. The time now is 10:20 PM.