Enabling no_new_privs/NoNewPrivs, disabling setuid on Linux
TL;DR
Setuid binaries are not dangerous in themselves, but a rather worthwhile target that can lead to privilege escalation if they have a vulnerability.
So the goal is to remove this attack surface. The kernel provides a solution with the no_new_privs
flag, which effectively disables setuid/setgid bits and file capabilities. This flag can be set with systemd and the NoNewPrivs option in Pid1.
The most important services wich need to be able to work with this flag set are authentication and account management. For su
and sudo
there is run0
from systemd as replacement, for polkit (polkit-agent-helper-1) exists a patch merged upstream
, but not yet released and some shadow utilities and pam_unix.so from Linux-PAM
can be replaced with account-utils
. Even rootless podman containers continue to work, only a solution for setuid binaries inside containers is missing.
What are the problems?
There are two problems, which on the first glance are not connected with each other, but which are solved by the same solution:
- UID/GID drift by image based Linux distributions
- Security problems with setuid binaries by privilege escalations in case of bugs
The solution for the first problem is easy: all files below /usr needs to be owned by root:root. Currently files owned by a different user are in most cases setuid/setgid binaries, in the remaining cases these are packaging bugs. So to solve this, we need to solve the second problem by finding a different solution than setting the setuid/setgid bit and we get the first problem solved for free.
What is “no_new_privs”?
The execve system call can grant a newly started program privileges that its parent did not have. The most obvious examples are setuid/setgid programs and file capabilities. The no_new_privs flag makes it possible for any process to avoid privilege escalation. After the flag is set, it persists across execve, clone and fork syscalls, and cannot be cleared. execve() will not grant permission for actions that would not have been possible without calling execve. For example, the setuid and setgid bits will no longer change the uid or gid; file capabilities will not add to the permitted set, and LSMs will not relax constraints after execve.
Note that no_new_privs does not prevent privilege changes that do not involve execve(). An appropriately privileged task can still call setuid(2)
or setresuid(2)
.
Design
A well-defined IPC service is generally preferable over any set*id binary, so let’s replace set*id binaries with systemd services or socket activated services. This service will then be started by systemd and run as root. Since the service only needs to implement the functionality which really needs higher privileges, it’s smaller, does not need to process random user input and is much better to audit. Additionally, the service can be sandboxed and secured with the corresponding systemd unit options.
As example, chage -l
needs access to /etc/shadow to read the users own password and account aging information. For this it either has a setuid bit set, or it is owned by group shadow and the segid bit is set. This requires that /etc/shadow is owned by root:shadow and the readable for group bit is set. Which very often is not for “security reasons”, which breaks chage. The solution is a small systemd socket activated service, which provides the user’s shadow entry to this tool. And only the user’s shadow entry, no other one. The utility can then communicate e.g. via varlink
with the service to get the entry and does not need special privileges.
Common tasks
Authentication
Authentication is today done by a PAM
module. PAM expects that the calling application has all privileges to access authentication tokens like the password hash in /etc/shadow. Since most of these applications are running as root, this is in many cases no problem. But only think about su, sudo or screensaver.
Password based authentication boils down to pam_unix.so, which uses /sbin/unix_chkpwd as a setuid helper for checking passwords. pam_unix.so has meanwhile support to get the password hash for the user via pwaccessd
, but it is e.g. not able to change the password. There is a new PAM module pam_unix_ng.so (part of account-utils
), which has full support for pwaccessd including fallback code for containers.
A second service for authentication is polkit , often used by e.g. desktop applications to allow users special tasks like mounting an USB stick or managing wireless. The polkit service calls a setuid helper using PAM for user authentication. For polkit <= 126 there is a patch to exchange the setuid helper with a socket based one and enable polkit-agent-helper service:
systemctl enable --now polkit-agent-helper.socketAccount Management
The users account data is normally managed by setuid binaries from the shadow package. The most common binaries are:
They all got replaced by variants from the account-utils package. They are now only simple frontends doing the communication with the user and communicate via varlink with the systemd socket based services pwaccessd and pwupdd, who do the real work.
The shadow package also provides the commands newuidmap
and newgidmap
. This utilities are used to setup the UID/GID mapping between the OS and the container namespaces. The variants from account-utils work without setuid bit and file capabilities and allow e.g. rootless containers with podman
.
The source code for the account-utils with the utilities like chage, chfn, chsh, passwd, newuidmap and newgidmap, a PAM module pam_unix_ng.so can be found in the account-utils repository on github
.
Run commands as different user
There is a list of binaries, which allow to run commands as different user, all of them except run0 need a setuid bit:
- pkexec
- pleaser
- run0
- su
- sudo
There are two binaries on this list, which are essential for the system and mentioned in millions of scripts and examples: su
and sudo
.
They will stop working. For su it should be simple to create a compatibility wrapper around run0. For sudo this will be more complicated: systemd does not provide the necessary data to polkit to make decisions based on the called program or commandline.
run0 is the replacement by systemd, it starts a transient unit. But to make it really usable as replacement, it needs some enhancements for polkit to provide more data, see the systemd issues run0: improve polkit integration to replace sudo
and run0 create own action.id for polkit
.
Polkit rules for run0
This polkit rule tries to partly simulate sudo behavior for run0:
- Users in the
_run0_nopasswdlist don’t need to enter a password. - Users that are in the admin group always authenticate with their own password.
- The remaining users need to provide the root password.
This matches the sudo-policy-auth-wheel-self configuration of openSUSE and SUSE Linux Enterprise Server 16.0.
To specify a list of users who don’t need a password, create
a file /etc/polkit-1/rules.d/51-run0-nopasswd.rules with the
following example content:
polkit._run0_nopasswd = ["user1","user2",...];
What is completely missing is the possibility to match for called programs and the commandline. This needs enhancements in run0.
/* The _run0_nopasswd variable specifies the users who don't need
* a password to run "run0". To configure the list of users, create
* a file "/etc/polkit-1/rules.d/51-run0-nopasswd.rules" with the
* content:
* polkit._run0_nopasswd = ["user1","user2",...]; */
polkit._run0_nopasswd = [];
polkit.addRule(function(action, subject) {
if (action.id == "org.freedesktop.systemd1.manage-units") {
for (var i in polkit._run0_nopasswd) {
var g = polkit._run0_nopasswd[i];
if (subject.user == g) {
return polkit.Result.YES;
}
}
}
if (action.id == "org.freedesktop.systemd1.manage-units" &&
subject.isInGroup("wheel")) {
return polkit.Result.AUTH_SELF_KEEP;
}
if (action.id == "org.freedesktop.systemd1.manage-units") {
return polkit.Result.AUTH_ADMIN_KEEP;
}
});Open issues
Usage of su and sudo
- It needs a wrapper around run0 for su.
- run0: improve polkit integration to replace sudo
- run0 create own action.id for polkit
Setuid binaries in containers
With NoNewPrivs set by systemd, setuid binaries will also stop working inside containers. Some people had the idea to disable no_new_privs in namespaces via an ioctl or by BPF
. But this would be a task for more skilled kernel developers.
openSUSE MicroOS
There is a working PoC for openSUSE MicroOS as Container Host OS. A polkit package with all necessary backports is available in the standard Repositories. Else new packages from an extra repository are necessary.
The SELinux policy is not yet adjusted for the new polkit and account-utils, for this reason you need to set SELinux to permissive mode: change the SELINUX variable in /etc/selinux/config to permissive or boot with the enforcing=0 kernel commandline option.
After installing openSUSE MicroOS add the repository https://download.opensuse.org/repositories/home:/kukuk:/no_new_privs/openSUSE_Tumbleweed/ and install the package disable-setuid. This package will install the /usr/lib/systemd/system.conf.d/10-disable-setuid.conf config file with NoNewPrivs enabled. Additionally it will install a system-preset file to enable the polkit agent, pwaccessd, pwupdd and newidmapd socket units.
zypper ar -f https://download.opensuse.org/repositories/home:/kukuk:/no_new_privs/openSUSE_Tumbleweed/ no_new_privs
transactional-update run zypper install --allow-vendor-change disable-setuidConclusion
The problem with setuid binaries inside containers and missing compatibility wrappers around run0 for su and sudo are the biggest blocker to widespread acceptance.
Otherwise, the current solution for the use case of a container host already works very well.
Further documentation
List of setuid binaries on openSUSE Tumbleweed
This is the list of all binaries in openSUSE Tumbleweed which have a setuid or setgid bit set. The binaries using capabilities are missing.
Legend of icons:
- ☑️ -> this package is installed by default
- ✅ -> this binary got already adjusted
- ❎ -> binary will work in all relevant use cases without setuid bit
- ❌ -> there are no plans to adjust this binary
- ❓ -> unclear if this binary is really needed
Package list:
- at
- /usr/bin/at (-rwsr-x—, root:trusted)
- authbind
- /usr/libexec/authbind/helper (-rwsr-xr-x, root:root)
- cronie –> use systemd.timer instead
- ❌ /usr/bin/crontab (-rwsr-x—, root:trusted)
- ☑️ dbus-1 <– this looks like packaging bugs
- ❓ /usr/libexec/dbus-1/dbus-daemon-launch-helper (-rwsr-x—, root:messagebus)
- enlightenment
- /usr/lib64/enlightenment/utils/enlightenment_system (-rwsr-xr-x, root:root)
- exim
- /usr/sbin/exim (-rwsr-xr-x, root:root)
- firejail
- /usr/bin/firejail (-rwsr-x—, root:firejail)
- ☑️ fuse
- /usr/bin/fusermount (-rwsr-x—, root:trusted)
- ☑️ fuse3
- /usr/bin/fusermount3 (-rwsr-x—, root:trusted)
- hawk2
- /usr/sbin/hawk_chkpwd (-rwsr-x—, root:haclient)
- libcgroup-tools
- /usr/bin/cgexec (-rwxr-sr-x, root:cgred)
- libgnomesu
- /usr/libexec/libgnomesu/gnomesu-pam-backend (-rwsr-xr-x, root:root)
- libgtop
- /usr/libexec/libgtop_server2 (-rwsr-xr-x, root:root)
- liblxc1
- /usr/libexec/lxc/lxc-user-nic
- libspice-client-glib-helper
- /usr/bin/spice-client-glib-usb-acl-helper (-rwsr-x—, root:kvm)
- ☑️ mariadb
- /usr/lib64/mysql/plugin/auth_pam_tool_dir/auth_pam_tool (-rwsr-xr-x, root:root)
- nvidia-modprobe
- /usr/bin/nvidia-modprobe (-rwsr-xr-x, root:root)
- OpenSMTPD
- /usr/libexec/opensmtpd/lockspool (-rwsr-xr-x, root:root)
- /usr/sbin/smtpctl (-rwxr-sr-x, root:_smtpq)
- ☑️ pam –> replace
pam_unix.sowithpam_unix_ng.so- ❌ /usr/sbin/unix_chkpwd (-rwsr-xr-x, root:shadow)
- pcmciautils
- /usr/sbin/pccardctl (-rwsr-x—, root:trusted)
- physlock
- /usr/bin/physlock (-rwsr-x—, root:trusted)
- pkexec (polkit)
- ❌ /usr/bin/pkexec (-rwsr-xr-x, root:root)
- pleaser
- /usr/bin/please (-rwsr-xr-x, root:root)
- /usr/bin/pleaseedit (-rwsr-xr-x, root:root)
- policycoreutils-newrole
- /usr/bin/newrole (-rwsr-xr-x, root:root)
- polkit
- /usr/libexec/polkit-1/polkit-agent-helper-1 (-rwsr-xr-x, root:root)
- ☑️ postfix
- /usr/sbin/postlog (-rwxr-sr-x, root:maildrop)
- /usr/sbin/postqueue (-rwxr-sr-x, root:maildrop)
- /usr/sbin/postdrop (-rwxr-sr-x, root:maildrop)
- qemu-tools
- /usr/libexec/qemu-bridge-helper (-rwsr-x—, root:kvm)
- sendfax
- /usr/libexec/mgetty+sendfax/faxq-helper(-rwsr-x—, fax:trusted)
- ☑️ shadow
- ✅ /usr/bin/chage (-rwxr-sr-x, root:shadow)
- ✅ /usr/bin/chfn (-rwsr-xr-x, root:shadow)
- ✅ /usr/bin/chsh (-rwsr-xr-x, root:shadow)
- ✅ /usr/bin/expiry (-rwsr-xr-x, root:shadow)
- ❓ /usr/bin/gpasswd (-rwsr-xr-x, root:shadow)
- ✅ /usr/bin/newgidmap (-rwsr-xr-x, root:shadow)
- ❌ /usr/bin/newgrp (-rwsr-xr-x, root:root)
- ✅ /usr/bin/newuidmap (-rwsr-xr-x, root:shadow)
- ✅ /usr/bin/passwd (-rwsr-xr-x, root:shadow)
- smc-tools
- ❌ /usr/lib64/libsmc-preload.so (-rwsr-xr-x, root:root)
- ☑️ sudo
- ❌ /usr/bin/sudo (-rwsr-xr-x, root:root)
- texlive
- /usr/libexec/mktex/public (-rwxr-sr-x, root:mktex)
- thttpd
- /usr/bin/makeweb (-rwxr-s–x, root:www)
- usbauth-notifier
- /usr/libexec/usbauth-npriv (-rwsr-x—, root:usbauth)
- /usr/libexec/usbauth-notifier/usbauth-notifier (-rwxr-sr-x, root:usbauth)
- ☑️ util-linux
- ❎ /usr/bin/umount (-rwsr-xr-x, root:root)
- ❎ /usr/bin/mount (-rwsr-xr-x, root:root)
- ❌ /usr/bin/su (-rwsr-xr-x, root:root)
- v4l-conf
- /usr/bin/v4l-conf (-rwsr-x—, root:video)
- xorg-x11-server-wrapper
- /usr/bin/Xorg.wrap (-rwsr-xr-x, root:root)
