Why do pids jump in a container?
1 Attachment(s)
Entering a container (e.g. `docker run` or `docker exec`) makes the PID of next created process jump ahead, why is that?
For example, in the image below the second process in the container (`ps`) is assigned PID 10, not PID 2: Attachment 33283 According to this answer, Linux appears to be allocating PIDs in a sequence, is that not the case? |
Let's get this out of the way first:
You're aware that each Docker container creates its own namespace for pids, right? |
1 Attachment(s)
Quote:
If I just use nsenter to enter the container's PID namespace, this jump doesn't appear to happen: Attachment 33288 I know that in the process of joining a container, runC forks a few times, but I thought that most of that happens in the host PID namespace, and thus these processes shouldn't count in the container pid namespace. Also the jump seems to vary from 5 to 9 pids ahead, and I'm pretty sure runC is consistent with the amount of times it forks to enter a container. Additionally, what's weird is that it's not the entered process PID that jumps ahead, but the PID of the next process in the container. If you have any ideas on why this behaviour occurs, please share. Thanks |
This is interesting, I've never seen it myself before either. But this doesn't seem to be container related, actually. The exact same thing happens also on a host.
Have you tried it yourself? What I've done was run a loop with ls -l /proc/self, to see what it reaches, and after 32767 it starts again with 300 (in my case) Code:
while true; do ls -l /proc/self; done |
Quote:
I believe the kernel also spawns processes in the initial PID namespace from time to time, so your loop will also occasionally see a jump of a few PIDs. Edit: The host also has a lot of services running that might spawn processes as well. I do think what I'm seeing is related to containers, and specifically to how runC enters a container's PID namespace, but I may be wrong. |
I honestly can't see any difference, but if there is, I'd love it if someone explained it to me:
Again, directly on the host: Code:
[root@macroscian ~]# ps aux | grep "ps aux" Ok, so you mean that the difference consists in the fact that it jumps a few pid numbers, not that the number increases like that after running whatever command alternatively. You're right, it's more complicated than it initially seems to be :) |
Talked with once of runc's maintainers, Aleksa Sarai, and he explained why this is happening.
By design, the golang runtime spawns several threads to manage a process. runc is written in golang, and when building/execing into the container, there is a short time where the runc process is running inside the container (before execing the user requested executable, e.g. bash in `docker exec bash`). In Linux, threads and processes are both identified with ids from the same pool, so the go runtime threads are counted in the container pid namespaces, leading to the pid jump I described. |
Simply treat all pids ... in every context ... as being "opaque handles." (A common term is "nonce.") Their values are unpredictable and don't mean anything. Neither do they "point to" anything. Take the value that you are given but don't look closely at it. You have no idea what the next one might be. Use it only for its intended purpose – as a "primary key." The value is entirely arbitrary and contains no embedded information. The entire notion of "n+1" is entirely meaningless.
P.S.: These days, many handle-values are now purposely "unpredictable," specifically so that rogue software has a much more difficult time exploiting them. |
All times are GMT -5. The time now is 11:59 PM. |