is systemd really all that bad?

TobiSGD · 11-04-2015, 04:49 AM

Quote:

Originally Posted by Drakeo

But this article here is from the people that really deal systems and large server farms.

Yeah, looked at that article and stopped reading at

Quote:

Install a new package that has a .service file and "poof", your entire boot behavior and order can change. Got a cyclic dependency? [BLEEP], you won't even discover this until that unplanned reboot.

Apparently, this person has never heard of testing changes on developer systems before rolling them into production, which lets me question their credibility.

jpollard · 11-04-2015, 06:34 AM

Quote:

Originally Posted by TobiSGD

Yeah, looked at that article and stopped reading at Apparently, this person has never heard of testing changes on developer systems before rolling them into production, which lets me question their credibility.

It doesn't just happen in testing.

Due to the random scheduling, it can happen at any time - and not happen during testing.

ANYTHING can be a change - extra interrupts can slow a process down ... and expose another dependency failure.

Sometimes it works... Sometimes it doesn't.

That is why people keep sticking restarts of some services into rc.local. It tends to get more reliable. If a service did start, attempting to start it again gets canceled. If it didn't start, then it is more likely to get started.

Of course, when that fails too, people start sticking sleeps into rc.local to try and make it work too.

The very BASIC problem is due to the nature of network analysis.

The more complex the dependency network, the more likely adding a single node to it will cause the network to collapse.

This fact was learned back in the mid 1970s and early 1980s with PERT charting for project management - it doesn't scale well:

https://en.wikipedia.org/wiki/Progra...view_technique

The next problem is that the network IS NOT a simple network (just the "before" and "after" would be the base network). There are conditional sub-networks that make it more complex ("wants","requires"...) for yet another network layer... And generates the need for multiple "targets" that do nothing but create sub-nets (it reduces the size of the list of dependencies, but can also makes the network more confusing.

So adding ONE new service could cause a number of previously untested services to also get started...

The next problem is that for "reliability" all services need to be modified to tell systemd when it is "ready". The problem here is that services that are started by other services introduce additional problems: NetworkManager is my favorite bad example. NetworkManager has to tell systemd when the network is ready... but NetworkManager isn't always in control - that is up to DHCP client.... So now the DHCP client has to tell NetworkManager when it is done... so that NetworkManager can tell systemd that the network is ready...

Which works - sort of (it is why there is a "NetworkManager-wait-online" target). Most places have a two or three level network "ready" states. One for administrative access (needed by admins to fix things), another for service networks (must be ready for service access such as remote databases), and a third for public access (needed to be up). The first one MUST be up. The second one CAN be up - but if not, then admins can connect to find/fix things - If both are up, then all services SHOULD be up and admins can do things to verify proper operation... or find out why the public access network isn't working. The third one must be up for general use... Can NetworkManager handle this? nope. All networks are either up or down. The only workaround is to take some of the networks OUT of NetworkManagers control... or dump NetworkManager entirely.

And this doesn't address the problems of a cluster when dependencies are external to the system... (though DHCP is a small example of this, but what about remote database access?).

BTW, There is no race condition for socket connections. If it exists, it is a bug in the service in the first place. Service startup is supposed to:

1) process configuration files, report any errors,
2) initialize the network (up through the listen system call) and report any errors, THEN
3) become a daemon.

At the point the "listen" system call is completed, the service is ready to accept connections. After the fork, the child starts accepting incoming connections (which are in the queue), the parent can then close the socket (the child has it) and exit normally (the event that signals it is ready). Systemd breaks this as there is no point where the service can be inherently identified as "ready".... unless it is modified to TELL systemd it is ready... Thus, the need to tell systemd there are "forking" services... which again defeats the purpose of systemd, as these services can't be monitored by systemd.

In this way, systemd takes over more and more of the formerly independent projects.

As you can see, systemd is not my favorite init.

Drakeo · 11-04-2015, 09:35 AM

Quote:

Originally Posted by TobiSGD

Yeah, looked at that article and stopped reading at Apparently, this person has never heard of testing changes on developer systems before rolling them into production, which lets me question their credibility.

Keep it simple. I wish everyone thought the way you do. But today many hands in one pot. Like I said it is about trust and stability.

Shadow_7 · 11-04-2015, 09:52 AM

Quote:

Originally Posted by TobiSGD

May I ask which distro you are using?

Debian jessie/stable mostly.

Mostly installed like this:
http://www.debian.org/releases/stabl...apds03.html.en

Before systemd I could exit the chroot and umount the mount point and shared points /chroot/proc /chroot/dev. Now I have to stop udev and dbus on the host system to umount /proc and /dev in the /chroot. Although I don't even think I could get that much done now, I just have to reboot to cleanup. I would often do the debootstrap install method to setup a USB boot medium for another machine with a minimal install and never have to reboot the machine I did that linux install in linux on.

It's probably because the package manager installs stuff and tries to restart services (in the chroot). And I guess to some degree succeeds in restarting them in the chroot, forever binding the host system to the chroot (until reboot). Or less graceful endeavors to force things.

(after the debootstrap setup of /mnt/debinst/ in this case.)

Code:

# mount -t proc proc /mnt/debinst/proc
# mount --rbind /dev /mnt/debinst/dev
# export LANG=C; chroot /mnt/debinst /bin/bash
(chroot)# apt-get install linux-image-amd64
(chroot)# passwd
(chroot)# exit
# umount /mnt/debinst/dev
(fail)
# umount /mnt/debinst/proc
(fail)
# umount /mnt/debinst
(fail)