Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Hi, we have an unusual problem. We are running a database server on which an incremental file system backup runs every day during the morning hours.
In the sequence, the / fs completes and then the backup agent moves on to /boot. This is where things go wrong. First, there is high i/o wait, spiking cpu utilisation, and then the NMI watchdog setup causes the kernel to restart the system.
This happens every day, unless we exclude the /boot file system from the backup. The backup agent we are running in Idera 5.8. Our kernel version is 2.6.32.431.29.2. It is Red Hat 6.5 that we are running on.
What might be the problem which causes the high i/o wait when the backup starts working on the /boot fs?
That is another question we need confirmed, how useful is the backup of the boot FS everyday, for a bare metal restore?
The daily backup is basically an incremental one? What challenge will it pose to the restore in case we exclude /boot from the daily incremental backup?
/boot should only have things like grub and kernel(s) in it.
It should change only when you modify either of these.
You can confirm this by checking file dates with
Code:
sudo ls -lR --full-time /boot
On my PC, I find no problem with backing up anything (including /boot) to an alternative disk (or other location) with rsync. However, I find it necessary to boot from a "live" DVD or USB to do this.
This guarantees that the main system is not in use at the time of the backup.
Probably something you don't want to mess around doing for a server...
First off, JeremyBoden, thanks for your response. I have checked it and we don't see any changes to it since we made changes to the Kernel.
Padeen, it is a separate partition and the FS appears clean. We have run fsck on it a few times. Apart from that, when we stop the databases running on the server and run the backup with /boot enabled, it completes successfully. And the I/O then, although high, it maxes to about 30%, does not hit the 50%, which is what happens when the backup includes /boot with the databases running as well.
Hope this gives a clear picture. And thanks for responding.
We have decided to run a full backup every time we perform a kernel updates. Doing so shall allow us to keep the Kernel update current, at the very least.
However, the real problem is with the high spike in i/o wait when the backup runs with the boot fs included.
Will be really helpful in case someone with any insights offers any advise.
Hi R
Backups are very intense CPU users I'm assuming your using a DLT drive? It does mention in the man page that a CPU crisis will cause a reboot. Top only reports the average. You can disable/suspend the watchdog and let it finish.
See man page watchdog "The watchdog daemon can be stopped without causing a reboot if the device /dev/watchdog is closed correctly, unless your kernel is compiled with the CONFIG_WATCHDOG_NOWAYOUT option enabled. "
It is unlikely to be a file system corruption.
I would consider that :-
The tapes are damaged.
The tapes are out of space.
The tapes are being used at double density (Double density doesn’t !)
All need CPU time.
In the end watchdog needs CPU and causing a reboot means the backup may not be complete. This would renderer the backup useless.
Incremental Backups.
Fine if you have lots of single files bad for databases. Again what ever databases you are backing up, Incremental will back only the ones that have been changed but not the whole package. First time you implement your DR plan you will find all the flaws leading to some very late nights.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.