Hi rel process

civiltongue · 08-27-2002, 03:17 PM

I'm using a special-purpose process I wrote, running on a Linux server, to support a piece of telecom hardware. Being telecom, high reliability in an unattended environment is critical.

If the process should crash for any reason, I'd like Linux to restart it. If the whole machine has a power failure, when power comes back I want Linux to boot itself and restart my process. How do I make these things happen?

I'm using RH 7.2 in a server configuration (i.e., no GUI or desktop).

unSpawn · 08-27-2002, 03:43 PM

You could tie it to init (inittab) or use HW/SW watchdog (kernel support). Boot thing is a BIOS setting AFAIK, and unless there's a e2fsck process in the way everything should be up 'n running again.

civiltongue · 08-27-2002, 04:38 PM

Can you elaborate on HW/SW watchdog?

unSpawn · 08-27-2002, 04:58 PM

Never had a HW watchdog but the purpose is the same as the SW kind, a tool configured to "watch"/repair other processes. You can configure it to reboot under high load, watch the motherboards sensors, watch processes for existence etc, etc. It's launched tru the usual SYSV init process, and tries to write to /dev/watchdog and if it can't (due to overload for instance) it can be set to start/kill off processes.

Selection for SW watchdog is in the kernel config, a link to the source is in the kernel docs.

From the man page:
DESCRIPTION
Watchdog is a daemon that checks if your system is still
working. If programs in user space are not longer executed it will hard reset the system. (This means like when it can't repair the system state)

The kernel provides /dev/watchdog, which when open must be written to within a minute or the machine will reboot.
Each write delays the reboot time another minute. After a minute the watchdog hardware will cause the reset. In the
case of the software watchdog the ability to reboot will
depend on the state of the machines and interrupts.

Watchdog can be stopped without causing a reboot if the
device /dev/watchdog is closed correctly, unless of course
your kernel is compiled with the CONFIG_WATCHDOG_NOWAYOUT option enabled.

TESTS
Watchdog itself does several additional tests to check the
system status:
Check whether the process table is full.
Check whether there is enough free memory available.
Check whether some given files are accessible.
Check whether some given files change in a given interval.
Check whether the average work load exceeds a predefined maximal value.
Check whether the a file table overflow occurred.
Check whether a given process (specified by a pid file) is still running.
(etc etc)

HTH somehow.
*Btw, don't make the mistake I made using a scripted "check/repair" tool. If the system eats resources (testing forkbombs) it won't cope :-]

Sixpax · 08-28-2002, 11:17 AM

Just put it in /etc/inittab as a "respawn":

mypg:3:respawn:/usr/local/bin/myprogram

Make sure the program is designed to run in the background though.