A while back we started authenticating users on our Linux servers against Active Directory. As far as the actual authentication part goes, things are working great.
However, one of the side effects is that Linux thinks (sort of correctly) that it has several thousands (~15-20k) of users. We've seen several issues that seem to be SELinux related (one of which is https://serverfault.com/questions/236419/usr-bin-install-hangs-apparently-due-to-selinux). Some other issues include:
- dmesg repeatedly reports that restorcon gets killed by oom-killer
- booting on some servers take a very long time - this happens after kernel load, apparently during while reading the volume groups, but also while running the restorecon startup script.
- yum updates hang (similar behavior to my SELinux/GNU 'install' question regarding the mmap/munmap)
We see these issues with SELinux in permissive mode. They go away when we disable SELinux completely. Disabling SELinux is an option. I'm also looking at ways to limit the number of users AD presents to Linux using an OU or group. But nerd in me always wants to know more.
So this is a pretty broad question - but anyone have any advice for dealing with SELinux with a large number of users? I'm not particularly familiar with SELinux - but this could be the learning opportunity.
This feels like a oversight regarding libselinux to me.
A 'fix' here would be to rename the old /etc/selinux/targeted/contexts/files/file_contexts.homedirs to something else. Create a new one (typically containing a few generic regular expressions which you can find at the top of the original file) and then setting that file immutable so that the policy rewriter doesn't regenerate the file (this happens when a new selinux-policy-targeted rpm is deployed).
This will prevent the CPU chew you are getting.
Your problem happens because restorecond opens this file as a reference to scan for files in users directories which must always be protected from invalid file label changes. But since your file contains thousands upon thousands of entries the scan uses up large quantities of CPU.
I suspect this was never considered when the library was created and probably needs a rethink from the SELinux end. But for now - that 'fix' should work.
It really depends on what restorecon is actually doing, but normally you don't want to run it at all, since it means that files get labeled with wrong labels and restorecon wants to make it right. The solution would be to have these files already created with the right label.
If it's actually the daemon restorecond that runs in background to do the relabeling then you can tune it so it doesn't do files where it shouldn't. See the man page of restorecond.
Do these servers deal with many files? Or do many files get created? Does it mount NFS shares?
What distro do you use? Redhat and Fedora are very responsive to selinux related problems. If the sheer size of the user database or users in groups is the real problem, they will almost certainly want to know about it. File a bug with bugzilla.