Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I'm using a special-purpose process I wrote, running on a Linux server, to support a piece of telecom hardware. Being telecom, high reliability in an unattended environment is critical.
If the process should crash for any reason, I'd like Linux to restart it. If the whole machine has a power failure, when power comes back I want Linux to boot itself and restart my process. How do I make these things happen?
I'm using RH 7.2 in a server configuration (i.e., no GUI or desktop).
You could tie it to init (inittab) or use HW/SW watchdog (kernel support). Boot thing is a BIOS setting AFAIK, and unless there's a e2fsck process in the way everything should be up 'n running again.
Never had a HW watchdog but the purpose is the same as the SW kind, a tool configured to "watch"/repair other processes. You can configure it to reboot under high load, watch the motherboards sensors, watch processes for existence etc, etc. It's launched tru the usual SYSV init process, and tries to write to /dev/watchdog and if it can't (due to overload for instance) it can be set to start/kill off processes.
Selection for SW watchdog is in the kernel config, a link to the source is in the kernel docs.
From the man page:
DESCRIPTION
Watchdog is a daemon that checks if your system is still
working. If programs in user space are not longer executed it will hard reset the system. (This means like when it can't repair the system state)
The kernel provides /dev/watchdog, which when open must be written to within a minute or the machine will reboot.
Each write delays the reboot time another minute. After a minute the watchdog hardware will cause the reset. In the
case of the software watchdog the ability to reboot will
depend on the state of the machines and interrupts.
Watchdog can be stopped without causing a reboot if the
device /dev/watchdog is closed correctly, unless of course
your kernel is compiled with the CONFIG_WATCHDOG_NOWAYOUT option enabled.
TESTS
Watchdog itself does several additional tests to check the
system status:
Check whether the process table is full.
Check whether there is enough free memory available.
Check whether some given files are accessible.
Check whether some given files change in a given interval.
Check whether the average work load exceeds a predefined maximal value.
Check whether the a file table overflow occurred.
Check whether a given process (specified by a pid file) is still running.
(etc etc)
HTH somehow.
*Btw, don't make the mistake I made using a scripted "check/repair" tool. If the system eats resources (testing forkbombs) it won't cope :-]
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.