Setting up tune2fs on production servers with long uptimes - is there a best practice?
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Like so many things in computing, "best practice" for me may not be the same for you. What's best depends on the needs of the user.
If I were responsible for maintaining production servers, I'd apply the following two rules:
1) partitions which change most frequently over time should be checked most frequently. Use the -i option of tune2fs to set the number of days|weeks|months before fsck is forced, as well as the -c option to set the number of remounts.
2) since production servers and reboot times are the issue, schedule the reboots for the time of day when the servers are least active, such as the wee hours of the night.
I've seen some arguments that you don't want fsck to run every time you reboot, however journaling is built into ext3/ext4, so would it matter?
fsck takes a long time to run. When using ext2 you have to run fsck every time the system crashes and the file system is not umounted during shutdown. Running fsck after the system shutst down normally is a waste of time. So I always used tune2fs to turn off the automatic fsck run every so many boots back when I used ext2.
With a journaling file system the journal performs the recovery function when recovering from a crash and it does it far faster than fsck. So the only time that you need to run fsck on a journaling file system is when the journaling recovery does not work.
In the case where your system crashes, journaling does not recover the file system, and fsck does not recover the file system then you need to reformat the partition and restore from your rsync backup. Trying to recover data using any of the dd copy methods takes an order of magnitude longer and is significantly less likely to work than reformatting and restoring with rsync.
--------------------------------
Steve Stites
Last edited by jailbait; 03-07-2017 at 05:39 PM.
Reason: typos
I agree. In my experience, journaling essentially replaced the need to go sniffing around with fsck. If a hard crash occurs, the journal is nearly always sufficient to recover the drive. (In fact, the only time when I saw it not do so, was when the drive was failing anyway.)
File systems are designed to run without incident for years, and to gracefully recover through journaling when an unexpected shutdown does occur.
You should also be running a monitor for the S.M.A.R.T.™ internal diagnostics that are a standard feature of all modern drives. "If Linux is listening," then the drive – which performs error-detection and track/sector sparing all on its own now – will tell you when it is failing, long before it actually does. (smartctl command and so forth.)
First things first. Why are you not using ext4 everywhere ?. One of it aims was to reduce the fsck time - especially as filesystems get larger.
And don't worry about the "reboot count" - that has been ignored for ages unless you specifically enforce it. e2fsprogs checks on every mount and runs fsck only in need. Regardless of the count.
If you are finding faults on every reboot you have other (probably bigger) issues.
You should also be running a monitor for the S.M.A.R.T.™ internal diagnostics that are a standard feature of all modern drives. "If Linux is listening," then the drive – which performs error-detection and track/sector sparing all on its own now – will tell you when it is failing, long before it actually does. (smartctl command and so forth.)
All of the servers that I manage are all virtual and live on SAN, with the exception of one. With the disks by SAN, would having S.M.A.R.T even be necessary?
The previous admin had S.M.A.R.T running on all of the RHEL5 systems, which were all VMs too. Not sure if it is available for RHEL6, so I would have to read up on it.
Last edited by JockVSJock; 03-08-2017 at 07:53 AM.
All of the servers that I manage are all virtual and live on SAN, with the exception of one. With the disks by SAN, would having S.M.A.R.T even be necessary?
The previous admin had S.M.A.R.T running on all of the RHEL5 systems, which were all VMs too. Not sure if it is available for RHEL6, so I would have to read up on it.
Probably not. Normally, a SAN controller queries the S.M.A.R.T. data on all of the drives that it controls, and has some means to notify you if it detects a problem. It usually doesn't expose the individual devices, as it is the one responsible for controlling them all. However, "SANs do vary." Some of them are very simple while others are very sophisticated.
You should be running ext4(journaling enabled) on all of your file systems.
Probably not. Normally, a SAN controller queries the S.M.A.R.T. data on all of the drives that it controls, and has some means to notify you if it detects a problem. It usually doesn't expose the individual devices, as it is the one responsible for controlling them all. However, "SANs do vary." Some of them are very simple while others are very sophisticated.
Now that I look back, there were a TON of errors with S.M.A.R.T and these RHEL VMs. I turned it off, since the SANs was monitoring those disks.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.