LinuxQuestions.org - Starting daemon via inittab:respawn causes "respawning too fast: disabled for 5 minutes"

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - Starting daemon via inittab:respawn causes "respawning too fast: disabled for 5 minutes" (https://www.linuxquestions.org/questions/linux-newbie-8/starting-daemon-via-inittab-respawn-causes-respawning-too-fast-disabled-for-5-minutes-4175579263/)

Starting daemon via inittab:respawn causes "respawning too fast: disabled for 5 minutes"

I know what the problem here is, but not how to fix it, and am wondering if anybody has a brilliant idea.

I have a temperature/fan control daemon for large rack system that has to run all the time in the background. If it dies or is killed by the user, it needs to restart. The application is critical in that it's the one controlling the fan speeds and shutting down power to the system if it gets too hot.

The application starts up at init and runs fine as it is, everything's working. It's operating as a daemon, doing all it needs to do, communicating to all the parts it needs to. All is well, there are no piles of melted chips and solder on the bottom of the chassis. (I've seen this happen elsewhere, it's pretty, but quite embarrassing, not to mention expensive)

The problem that I'm running into is getting the daemon to restart via inittab:respawn.

This is an embedded system running on busybox on the control processor. This really shouldn't be an issue, as this is really a generic linux/inittab/daemon question.

Here's the problem.

1) All the standard Linux documentation that I have seen says that a daemon app should fork() from the parent, so it can set the ssid of the child properly, and close off the stdout/stderr/stdin ports so that it runs properly in the background. The parent process returns (dies a clean death) and in final effect, the pid moves to a new value (the child process)

2) The very bottom the busybox implementation the RESPAWN option is this - (in init/init.c)

if (a->action_type & (RESPAWN | ASKFIRST)) {
// Only run stuff with pid == 0. If pid != 0, it is already running
if (a->pid == 0)
a->pid = run(a);
}

In run(), it's 1) doing a fork() from init, and then 2) calling exec() to execute my daemon application, and returning its PID to the code above.

There's some other code in init that scans the PID's that it has against the system PID table, and zeros out those values in it's table that no longer have valid entries.

The problem here of course is that the PID that's being returned from my daemon, is that of the parent before the fork, which is going to no longer exist (the pid is going to disappear) shortly after it has forked off the child.

Inittab:respawn gives me the start at boot time that I need, and all the background code that does the restarting of the application if it dies.

The problem is that because the "parent" is dead, it the inittab:respawn entry to bring it back alive again. It's only that my code has a built in "run no more that one instance of the application" check that has preventing it from having 1000's of copies running after a few hours.

I don't have any working keepalive code on the system at this time, and would prefer not to have to implement it just for this one feature.

Obviously this has got to be a common problem, but I'm missing something here in coming up with a common solution.

Any ideas on how to make this scenario (busybox inittab:respawn and child killing parent standard daemon) work?

Thanks

I have no solution for you, but have you considered alternatives? Comparison of different methods.

Thanks for the alternative ideas. Unfortunately I'm on an embedded busybox environment, and pulling in "service" is a bit out of the scope the solution domain. Particularly when the solution revolves around how to get one PID value to the right place.

A simple solution here is that busybox inittab:respawn could wait a few seconds after it started up the app, and then asked the system for the PID for the program name that it just started -- it would get the correct (new child) value. But because it uses the pid value returned from the fork() that that execs the daemon() -- which is the parent's pid that goes away -- it's gets all confused here, and enters a restart storm on my daemon.

It seems to me that busybox inittab:respawn was never intended to run proper self-daemonizing applications.

As anyone seen anything on this subject before?

- - -

I'm a bit confused not about the necessity of "daemonizing" the application (fork, exit parent, on child set ssid, close ports, etc...) but WHO does it.

Busybox supports init.d and start_stop_daemon. In start_stop_daemon, there appears to be code that does just this to the daemon that's being execed. My code is also doing it, so maybe I'm double-daemonizing it (unless I made changes) if I went that way. But when I discovered that it init.d didn't support native respawn features, I backed out to the apperently simpler inittab:respawn.

... (A downside of inttab:respawn is that I haven't figured out if you can start/stop a daemon there, which is a pain while developing)

- - -

Anway, Is there any difference or best practice for writing daemons -- Must be on busybox -- for inittab verses the init.d and start_stop_daemon? -- Particularly with attention paid to the 'demonizing' feature.

Is there additional information that I'm missing here that would helpful here in answering this question, please let me know.

A daemon that runs independently does do the fork/ssid (become a daemon). This allows it to set the process group header (via setsid), and get detached from a possible console terminal (controlling tty).

A daemon that is run using respawn cannot do that because the init process can only monitor its direct descendants (thus when the parent process exits while the new process is the daemon, init receives the "death of a child" signal, and the "respawn" will attempt to restart it). This requires the daemon being started to have two modes - one when it is run independently, and another that that remains a child.

An example of this is sshd. Normally sshd is designed to run independently, but sometimes (such as with some installations of systemd) it uses the -D option to suppress becoming a daemon under the assumption that it should be restarted by the init process.

Hope this helps explain things...