LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 03-07-2017, 11:03 AM   #1
JockVSJock
Senior Member
 
Registered: Jan 2004
Location: DC
Distribution: RHEL/CentOS
Posts: 1,386
Blog Entries: 4

Rep: Reputation: 164Reputation: 164
Setting up tune2fs on production servers with long uptimes - is there a best practice?


I have a number of RHEL production servers that are in the terabytes that also haven't been rebooted in over 250 days.

I want to setup regular reboots on these production servers so that we aren't waiting a long time for fsck to check all of the various data discs.

I've seen some arguments that you don't want fsck to run every time you reboot, however journaling is built into ext3/ext4, so would it matter?

I guess what I'm trying to ask is:

- How often do you reboot Linux production servers (I typically reboot when there is a new kernel, however other then that?)

- Should tune2fs be used to modify partitions so that fsck is forced in so many days in order to keep the reboot time low?


This is a new subject for me, I'm still trying to understand everything here, so I might be rambling...

thanks
 
Old 03-07-2017, 12:55 PM   #2
bigrigdriver
LQ Addict
 
Registered: Jul 2002
Location: East Centra Illinois, USA
Distribution: Debian stable
Posts: 5,908

Rep: Reputation: 355Reputation: 355Reputation: 355Reputation: 355
Like so many things in computing, "best practice" for me may not be the same for you. What's best depends on the needs of the user.
If I were responsible for maintaining production servers, I'd apply the following two rules:
1) partitions which change most frequently over time should be checked most frequently. Use the -i option of tune2fs to set the number of days|weeks|months before fsck is forced, as well as the -c option to set the number of remounts.
2) since production servers and reboot times are the issue, schedule the reboots for the time of day when the servers are least active, such as the wee hours of the night.
 
Old 03-07-2017, 01:42 PM   #3
jailbait
LQ Guru
 
Registered: Feb 2003
Location: Mineral, Virginia
Distribution: Debian 8
Posts: 7,893

Rep: Reputation: 339Reputation: 339Reputation: 339Reputation: 339
Quote:
Originally Posted by JockVSJock View Post
I've seen some arguments that you don't want fsck to run every time you reboot, however journaling is built into ext3/ext4, so would it matter?
fsck takes a long time to run. When using ext2 you have to run fsck every time the system crashes and the file system is not umounted during shutdown. Running fsck after the system shutst down normally is a waste of time. So I always used tune2fs to turn off the automatic fsck run every so many boots back when I used ext2.

With a journaling file system the journal performs the recovery function when recovering from a crash and it does it far faster than fsck. So the only time that you need to run fsck on a journaling file system is when the journaling recovery does not work.

In the case where your system crashes, journaling does not recover the file system, and fsck does not recover the file system then you need to reformat the partition and restore from your rsync backup. Trying to recover data using any of the dd copy methods takes an order of magnitude longer and is significantly less likely to work than reformatting and restoring with rsync.

--------------------------------
Steve Stites

Last edited by jailbait; 03-07-2017 at 05:39 PM. Reason: typos
 
1 members found this post helpful.
Old 03-07-2017, 02:13 PM   #4
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 9,159
Blog Entries: 4

Rep: Reputation: 3233Reputation: 3233Reputation: 3233Reputation: 3233Reputation: 3233Reputation: 3233Reputation: 3233Reputation: 3233Reputation: 3233Reputation: 3233Reputation: 3233
I agree. In my experience, journaling essentially replaced the need to go sniffing around with fsck. If a hard crash occurs, the journal is nearly always sufficient to recover the drive. (In fact, the only time when I saw it not do so, was when the drive was failing anyway.)

File systems are designed to run without incident for years, and to gracefully recover through journaling when an unexpected shutdown does occur.

You should also be running a monitor for the S.M.A.R.T. internal diagnostics that are a standard feature of all modern drives. "If Linux is listening," then the drive which performs error-detection and track/sector sparing all on its own now will tell you when it is failing, long before it actually does. (smartctl command and so forth.)

For Ubuntu, see https://help.ubuntu.com/community/Smartmontools. As noted there, you can arrange for your system to check the drive diagnostics periodically and to notify you.

Nagios can also do it, of course.
 
Old 03-07-2017, 05:56 PM   #5
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 19,722

Rep: Reputation: 3550Reputation: 3550Reputation: 3550Reputation: 3550Reputation: 3550Reputation: 3550Reputation: 3550Reputation: 3550Reputation: 3550Reputation: 3550Reputation: 3550
First things first. Why are you not using ext4 everywhere ?. One of it aims was to reduce the fsck time - especially as filesystems get larger.
And don't worry about the "reboot count" - that has been ignored for ages unless you specifically enforce it. e2fsprogs checks on every mount and runs fsck only in need. Regardless of the count.

If you are finding faults on every reboot you have other (probably bigger) issues.
 
Old 03-08-2017, 07:44 AM   #6
JockVSJock
Senior Member
 
Registered: Jan 2004
Location: DC
Distribution: RHEL/CentOS
Posts: 1,386

Original Poster
Blog Entries: 4

Rep: Reputation: 164Reputation: 164
Quote:
Originally Posted by sundialsvcs View Post

You should also be running a monitor for the S.M.A.R.T.™ internal diagnostics that are a standard feature of all modern drives. "If Linux is listening," then the drive – which performs error-detection and track/sector sparing all on its own now – will tell you when it is failing, long before it actually does. (smartctl command and so forth.)
All of the servers that I manage are all virtual and live on SAN, with the exception of one. With the disks by SAN, would having S.M.A.R.T even be necessary?

The previous admin had S.M.A.R.T running on all of the RHEL5 systems, which were all VMs too. Not sure if it is available for RHEL6, so I would have to read up on it.

Last edited by JockVSJock; 03-08-2017 at 07:53 AM.
 
Old 03-08-2017, 09:33 AM   #7
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 9,159
Blog Entries: 4

Rep: Reputation: 3233Reputation: 3233Reputation: 3233Reputation: 3233Reputation: 3233Reputation: 3233Reputation: 3233Reputation: 3233Reputation: 3233Reputation: 3233Reputation: 3233
Quote:
Originally Posted by JockVSJock View Post
All of the servers that I manage are all virtual and live on SAN, with the exception of one. With the disks by SAN, would having S.M.A.R.T even be necessary?

The previous admin had S.M.A.R.T running on all of the RHEL5 systems, which were all VMs too. Not sure if it is available for RHEL6, so I would have to read up on it.
Probably not. Normally, a SAN controller queries the S.M.A.R.T. data on all of the drives that it controls, and has some means to notify you if it detects a problem. It usually doesn't expose the individual devices, as it is the one responsible for controlling them all. However, "SANs do vary." Some of them are very simple while others are very sophisticated.

You should be running ext4 (journaling enabled) on all of your file systems.
 
Old 03-08-2017, 02:23 PM   #8
JockVSJock
Senior Member
 
Registered: Jan 2004
Location: DC
Distribution: RHEL/CentOS
Posts: 1,386

Original Poster
Blog Entries: 4

Rep: Reputation: 164Reputation: 164
Quote:
Originally Posted by sundialsvcs View Post
Probably not. Normally, a SAN controller queries the S.M.A.R.T. data on all of the drives that it controls, and has some means to notify you if it detects a problem. It usually doesn't expose the individual devices, as it is the one responsible for controlling them all. However, "SANs do vary." Some of them are very simple while others are very sophisticated.
Now that I look back, there were a TON of errors with S.M.A.R.T and these RHEL VMs. I turned it off, since the SANs was monitoring those disks.
 
  


Reply

Tags
fsck, rhel, tune2fs


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Patching Conundrum with Pre-Production and Production Servers sirhamlet Red Hat 1 11-21-2014 05:22 PM
Best Practice for multiple DNS Servers fruitwerks Linux - Server 3 04-22-2013 02:13 AM
LXer: How To Manage Your Servers With Rex - Best Practice LXer Syndicated Linux News 0 05-18-2012 11:50 AM
Production servers HD backup jordib Linux - Server 6 02-24-2010 06:38 PM
Do you use NTOP for troubleshooting on production servers? Ujjain Linux - Software 3 03-29-2009 11:01 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 08:59 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration