It seems that in future Linux kernel itself will force the use of systemd
From: Lennart Poettering <lennart <at> poettering.net>
Subject: [HEADSUP] cgroup changes
Date: 2013-06-21 17:36:03 GMT (19 weeks, 5 days, 5 hours and 28 minutes ago)
On monday I posted this mail:
Here's an update and a bit on the bigger picture:
Half of what I mentioned there is now in place. There's now a new
"slice" unit type in place in git, and everything is hooked up to
it. logind will now also keep track of running containers/VMs. The
various container/VM managers have to register with logind now. This
serves the purpose of better integration of containers/VMs everywhere
(so that "ps" can show for each process where it belongs to). However,
the main reason for this is that this is eventually going to be the only
way how containers/VMs can get a cgroup of their own.
So, in that context, a bit of the bigger picture:
It took us a while to realize the full extent how awfully unusable
cgroups currently are. The attributes have way more interdependencies
than people might think and it is trivial to create non-sensical
Of course, understanding how awful the status quo is a good first
step. But we really needed to figure out what we can do about this to
clean this up in the long run, and how we can get to something useful
quickly. So, after much discussion between Tejun (the kernel cgroup
maintainer) and various other folks here's the new scheme that we want
to go for:
1) In the long run there's only going to be a single kernel cgroup
hierarchy, the per-controller hierarchies will go away. The single
hierarchy will allow controllers to be individually enabled for each
cgroup. The net effect is that the hierarchies the controllers see are
not orthogonal anymore, they are always subtrees of the full single
2) This hierarchy becomes private property of systemd. systemd will set
it up. Systemd will maintain it. Systemd will rearrange it. Other
software that wants to make use of cgroups can do so only through
systemd's APIs. This single-writer logic is absolutely necessary, since
interdependencies between the various controllers, the various
attributes, the various cgroups are non-obvious and we simply cannot
allow that cgroup users alter the tree independently of each other
forever. Due to all this: The "Pax Cgroup" document is a thing of the
past, it is dead.
3) systemd will hide the fact that cgroups are internally used almost
entirely. In fact, we will take away the unit configuration options
ControlGroup=, ControlGroupModify=, ControlGroupPersistent=,
ControlGroupAttribute= in their entirety. The high-level options
CPUShares=, MemoryLimit=, .. and so on will continue to exist and we'll
add additional ones like them. The system.conf setting
DefaultControllers=cpu will go away too. Basically, you'll get more
high-level settings, but all the low level bits will go away without
replacement. We will take away the ability for the admin to set
arbitrary low-level attributes, to arrange things in completely
arbitrary cgroup trees or to enable arbitrary controllers for a service.
4) systemd git introduced a new unit type called "slice" (see
above). This is for partitioning up resources of the system into
slices. Slices are hierarchial, and other units (such as services, but
also containers/VMs and logged in users) can then be assigned to these
slices. Slices internally map to cgroups, but they are a very high-level
construct. Slices will expose the same CPUShares=, MemoryLimit=
properties as the other units do. This means resource management will
become a first-class, built-in functionality of systemd. You can create
slices for your customers, and in them subslices for their departments,
and then run services, users, vms in them. In the long run these will by
dynamically moveable even (while they are running), but that'll take
more kernel work. By default there will three slices: "system.slice"
(where all system services are located by default), "user.slice" (where
all logged in users are located by default), "machine.slice" (where all
running VMs/containers are located by default). However, the admin will
have full freedom to create arbitary slices and then move the other
units into them.
5) systemd's logind daemon already kept track of logged in
users/sessions. It is now extended to also keep track of virtual
machines/containers. In fact, this is how libvirt/nspawn and friends
will now get their own cgroups. They register as a machine, which means
passing a bit of meta info to systemd, and getting a cgroup assigned in
response. This registration ensures that "ps" and friends can show to
which VM/container a process belongs, but easily allows other tools to
query container/VM info too, so that we'll be able to provide an
integration level of containers/VMs like solaris zones can do it in the
So, this all together sounds like an awful lot of change. #1 and #2 are
long term changes. However #3, #4, #5 are something we can do now and
should do now, as prepartion for the single-writer, unified cgroup
tree. We really, really shouldn't ship the cgroup mess for longer, so
that people make use of the current systemd APIs that expose way too
many internal guts, stuff that we *know* right now is broken and will
cease to exist. We don't want to expose low-level details we already
know *now* we cannot support for long.
Even though #3, #4, #5 sound like major work they are not. In fact #4
and #5 are fully implemented on the systemd side already now upstream. I
am working on #3. I am confident that I'll have this finished in a few
days too, since this is really actually just about deleting code more
than writing code.
With #3, #4, #5 we have something in place that should do the basic
things and first and foremost will hide all the lower-level details of
cgroups. This has the big benefit of allowing us to rearrange these
details later without having to break the user or
programming interfaces, and that's what I really care about here.
Now, what does this mean for other projects using cgroups? So basically,
since we won't implement #1 + #2 immediately the cgroup tree stays
relatively open for other cgroup users. They can continue to fiddle with
it for now, but it must be clear that this is temporary, and that they
don't attempt too fancy things. Direct access to the cgroup tree is on
is way out and that must be clear to everybody.
More specifically: libcgroup is out of the game with
this. libvirt/openshift/lxc/.. can continue to do what they do for now,
however they should be updated sooner rather than later to do things the
systemd way, i.e. rely on systemd VM/container registration and user
And to make one last thing clear: this time, it's not Kay and me who are
taking away the cgroup tree from everybody else, it's actually all
Tejun's fault as the kernel cgroup maintainer... ;-) He wants a unified,
single-writer hierarchy, and it took us a while to agree to that, but
we're now fully on the same page with him.
If you are using non-trivial cgroup setups with systemd right now, then
things will change for you. We will provide you with similar
functionality as before, but things will be different and less
low-level. As long as you only used the high-level options such as
CPUShares, MemoryLimit and so on you should be on the safe side.
I hope this makes some sense,
Lennart Poettering - Red Hat, Inc.
When one looks at all the commotion about systemd on this SLACKWARE forum an observer might conclude that systemd is a core component of Slackware ... but it's not.
In the future, the Linux Kernel will do whatever Linus Torvalds wants it to do.
I hope so too.
You have to consider the source of the message.
Would you care to point out where Pottering says "Linux kernel itself will force the use of systemd"?
Because there is no mention of "Linux" anywhere in his post.
It is interesting how many of the systemd OP's have less than 10 posts...
Lennart can claim all he wants but the Linux foundation and Linus Torvalds gives final authority over what does what.
The Linux kernel is just that... a kernel. Nothing more or less, and it can be used with any software that can be coded for it, which happens to be at the moment, The GNU Operating System, or the Android Operating System.
Wouldn't surprise me, but then again, what really does anymore?
Just because it says systemd does not mean that's the only software that can handle the role. It seems to be assumed that systemd is the only available that can do it, and maybe that is the case for now. But it's a role to manage the cgroups, not a dependency. It just want to have ONE user space writer and some people seem to conclude it will be systemd.
I'm still hoping to rid of udev, soon.
And the kernel would be forked then.
Seeing this astroturfing, I trust the 0pointer code even less. Maybe someone wants to place a backdoor on every Linux box...
please read his mail list before posting.
he only says that the kernel cgroup architecture is changing.
and that systems running with systemd, needs to use the systemd API to change the value's.
otherwise systemd will not allow you to do so.
on non systemd systems, you can still make your own API
|All times are GMT -5. The time now is 12:42 AM.|