Greetings, everyone.
I'm not an kernel programmer (obviously =)), and not even an experienced C-programmer, but from some urgent need, I started to patch LXC userspace tools (
http://lxc.sourceforge.net/) to support capabilities setting for new (i.e. for disallowing CAP_SYS_MODULE, to prevent kernel module loading from containers).
First implementation was rather simple - it's may be seen (as patch) here:
http://git.niifaq.ru/portage-niifaq/...-support.patch
In short, I've done actual capabilities applying via libcap functions, loading initially cap_t from cap_get_proc(), modifying all three 'sets' (INHERITABLE, PERMITTED and EFFECTIVE) at once and then applying final cap_t via cap_set_proc() call;
Ok, that was rather simple, but... It makes nothing. There was another capability modifier - via prctl just few calls after, which is removing CAP_SYS_BOOT - this modification was shown properly on init of new container via 'getpcaps $(pgrep init)', but no sign of other modifications, like that:
Code:
hellgate lxc # getpcaps $(pgrep init)
Capabilities for `1': =ep cap_setpcap-e
Capabilities for `6525': =ep cap_sys_boot-ep
But, if i'll call lxc-start via strace, it will freeze /sbin/init (that's normal) in some kind of socket deadlock, but capabilities would be just as they should be. This and other checks ensures, that capabilities actually was set, but then, somehow, was forgotten. Replacing /sbin/init with shell (in container) script or something else gives nothing.
Ok, i've tried to move capability's operations as near to prctl call as possible (before and after it) - no effect. I've tried to implement all this via low-level setcap calls, linux/capabilitity.h etc - no effect.
Ok, I've tried, is assumption that root-started process has all privileges (not correct, actually, containers, with some capabilities set on lxc execs, could be runned without full root privileges) and rewrite code to drop via prctl any 'bad' capability, leaving other as-is (again, assuming that they are set. Anyway, actually, if they are not set, most possible, that their setting wouldn't be allowed). That works, as was predicted, but that is not very correct way, in my opinion. Anyway, I'm just curious why initial implementation was not working.
So, there is questing itself: what is the difference between
Code:
prctl(PR_CAPBSET_DROP, ..., 0, 0, 0 );
and, for example,
Code:
char *args[] = { "./dummy", (char *)0 };
caps = cap_get_proc();
cap_list[0] = CAP_SYS_MODULE;
cap_list[1] = CAP_SYS_BOOT;
cap_set_flag( caps, CAP_EFFECTIVE, 2, cap_list, CAP_CLEAR );
cap_set_flag( caps, CAP_PERMITTED, 2, cap_list, CAP_CLEAR );
cap_set_flag( caps, CAP_INHERITABLE, 2, cap_list, CAP_CLEAR );
cap_set_proc(caps);
execve( args[0], args );
Just to be clear from lxc-specific background, as far as I can see, lxc-start forks itself in new namespace (via 'clone' call), than initializes different things (network, utsname, etc), chroot'es in container rootfs and then (here I'm setting capabilities) drops via prctl CAP_SYS_BOOT, execvp'ing into /sbin/init afterwards, which should inherit all flags (filesystem capabilities clear), but it does not (if dropping not implemented not via prctl).
So, why prctl actually dropping capabilities with inheritance, while setcap syscall - without inheritance. Or problem hides anywhere else?
Sorry for mistakes, if any will be seen =)