LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   Thousands of "unattached inode" entries freezing fsck. (https://www.linuxquestions.org/questions/linux-general-1/thousands-of-unattached-inode-entries-freezing-fsck-630593/)

Nothsa 03-25-2008 04:11 PM

Thousands of "unattached inode" entries freezing fsck.
 
We have some CentOS systems with ext3 filesystems that (on occasion) experiences long power failures that are longer than the UPS can handle. We run an fsck on the file systems at every boot, and sometimes they will come back on after a power failure, and when fsck runs there are tens of thousands of "unattached inode" entries:

Code:

Inode ##### ref count is 1, should be 2. Fix? yes
Unattached inode ##### Connect to /lost+found? yes

Where ##### is a different number for each entry. (I've set fsck to answer 'yes' to all questions, hence the "yes"es for the two lines, but I have also tried setting it to answer "no" to all questions). The problem is that it will go through about 7000-8000 of these entries, and then freeze, like it's reached some kind of limit and doesn't want to process any more entries. At this point, someone has to reboot it and it goes to another 7000-8000 before it has to be rebooted again. I'm pretty certain that this is not a hard drive fault because this has happened across 20 different systems and 20 different hard drives.

Does anyone have any ideas:
a) what might be causing the problem, and how to get around it, and
b) what I can do to fix/avoid it without any human intervention? Possible change filesystems, or something else?

I don't want any of these unattached files as I am sure they are just temporary files, so I don't have a problem with just dumping them all. I just can't find a way to do that =/

tronayne 03-26-2008 09:20 AM

Ouch!

Well, a power fail crash leaves lots of things hanging open and that's pretty much that. You might be able to cut that down by periodic file system sync (just run sync from a cron every so often; sync flushes the file system buffers and that'll cut a lot of that down). This may be your best option without having to do a lot more work.

You might want to try one of the journaling file systems (which means you have to unload everything, reinitialize the partition and reload). I've been using Reiser for some years and have had zero problems with it -- on rare occasions I've had similar outages, the systems came back up clean (of course there is an automagic check on reboot, but I do come up clean). There are other journaling file systems; take a look at http://en.wikipedia.org/wiki/Comparison_of_file_systems for a discussion.

One suggestion, though, is get your UPS to shut down the systems when the batteries are about to go? Most of them will do that.

Nothsa 03-26-2008 10:44 AM

Thanks for the tips! I didn't know about the sync command, so that might be all I need, but I'll do some reading up on it.

Also, isn't ext3 already a journaling file system?

tronayne 03-26-2008 12:19 PM

Well, duh, yeah, it is (cripes I hate getting old).

It does have some advantages and disadvantages, though, discussed at http://en.wikipedia.org/wiki/Ext3

Seems like, if you're getting that many inodes all over the place that you might be doing a whole lot of file creation, updates and the like? Might also be worth a look at how applications are doing things; for example, do applications stay open for a long, long time and do lots and lots of reads and writes to files? Be worth a look at flushing after a or a few writes in the application itself (like a call to fflush()) if you can. I'm talking about stuff users start up and leave running for hours (or sometimes days) -- I've seen more than a few instances of files open in editors for three or four days. Even something as simple as a scheduled reboot at, oh, 0330 on Sunday can alleviate a lot of that nonsense. It doesn't hurt, either, to "bounce" a DBMS server in the middle of the night sometimes -- just stop and restart the DBMS server makes it flush and clean up after itself (things like pending updates to tables, logs, locks, all that stuff).

And, syncing every hour or so can't hurt either.

You can see if sync will help by just logging in as root on a given server, enter sync and hit the return -- it is takes a while (more than a second or two), that tells you you've got a lot of stuff hanging out there. I generally run that in threes (sync;sync;sync); first one goes slow, second and third usually go really, really fast.

Anyway, sorry about being old and dumb and hope some of the above helps a little.

archtoad6 05-11-2008 04:56 PM

I like the cron sync idea.

If you want to be a little more about how long sync is taking & how much repeating speeds it up, try:
Code:

time sync; time sync; time sync


All times are GMT -5. The time now is 12:51 AM.