LinuxQuestions.org - fsck on mounted filesystems

- Linux - Server (https://www.linuxquestions.org/questions/linux-server-73/)

- - fsck on mounted filesystems (https://www.linuxquestions.org/questions/linux-server-73/fsck-on-mounted-filesystems-913088/)

fsck on mounted filesystems

kspice allows you to run your server in perpetuation - i.e. without a restart. So do lazy sysadmins that just never do. So how would you know if your file system has errors in it? If you do fsck -n <dev> then you'll get errors that you really need to know your stuff backwards to know if it's an actual problem or not (especially with xen and raw disk image files).

Another scenario: your server goes down (for whatever reason) and it's been up for a year. Your filesystem is set to check itself after a period of time. This means your time to reboot is 10-15 minutes instead of 1. Heart attack time. But it's necessary - you can't just tune2fs the checks out, for example.

I've never really come up with a satisfactory answer for this. The closest I've got is to take a snapshot of your lvm & run fsck over that instead however that's a pita to implement automatically (say as a cron job) and I do have some systems that aren't on lvm.

Is a disk check a little like quantum physics - i.e. observing it causes a state change? What's the deal? Anyone got any interesting fantastic manuals or articles to read? (rtfm/a?) Anyone else got this problem licked? How do you do it?

Cheers

Quote:

Originally Posted by hohum (Post 4521873)

You can't fsck a mounted filesystem; doing that would just corrupt it. You're not expected to know the filesystem details, just let fsck do its job correcting any corruption, and restore any deleted files from your backup. They were corrupted anyway, that's the point - fsck is just the messenger.

Quote:

Another scenario: your server goes down (for whatever reason) and it's been up for a year. Your filesystem is set to check itself after a period of time. This means your time to reboot is 10-15 minutes instead of 1. Heart attack time. But it's necessary - you can't just tune2fs the checks out, for example.

In the days before redundant load balanced servers, downtime was a problem. If you're still configured without redundancy, your uptime is a low priority, not "heart attack time".

Quote:

I've never really come up with a satisfactory answer for this. The closest I've got is to take a snapshot of your lvm & run fsck over that instead however that's a pita to implement automatically (say as a cron job) and I do have some systems that aren't on lvm.

Is a disk check a little like quantum physics - i.e. observing it causes a state change? What's the deal? Anyone got any interesting fantastic manuals or articles to read? (rtfm/a?) Anyone else got this problem licked? How do you do it?

Cheers

You schedule downtime, verify the health of the backup server, and switch over the backup to primary. Then take your server down for maintenance. Take your time, do it right, verify function, then put it back in service.

I concur.

I was reluctant to reply to this as I'm not responsible for (large) production environments.
However, I don't see the ksplice situation any different from the past where admins crowed about uptime measured in years. You want to fix (FSVO "fix") your filesystem, you take your system(s) down to do so. And use a modern f/s - ext4 is much quicker to fsck, as no doubt are others. There will be online fsck in the not too distant future, but may take a while to be accepted (and shipped) by the major distributors.

but that's why I said fsck -n (see man):

Quote:

-n For some filesystem-specific checkers, the -n option will cause the fs-specific fsck to avoid attempting to repair any problems, but simply report such problems to std-
out. This is however not true for all filesystem-specific checkers. In particular, fsck.reiserfs(8) will not report any corruption if given this option. fsck.minix(8)
does not support the -n option at all.

You're absolutely right about redundancy. I have just taken on this role and their are some complexities surrounding that with these particular systems.

Thanks for your input though - I appreciate and value it.

It'd sure be nice to get an answer too, if anyone knows.