I've set up both an AWS image and a Kubuntu image, both with the CIS benchmarking principles outlined on https://www.cisecurity.org/cis-benchmarks/ having been applied.
Sporadically during this time I've encountered issues with the authentication that can only be described as weird.
The password stops being accepted by the machine - and the password is reported as being incorrect.
I have then tried to resolve the problem by:
- Checking file permissions on the users folder. No luck, the permission are fine and passwordless authentication still functions perfectly.
- Using a root via second user on the machine, checking file permissions on the 8 security files in
/etc/
(passwd
,groups
,shadow
,gshadow
, and their counterparts). No luck, the permissions are fine and all other users work - Using root via a second user, check the integrity of the lines in the files listed above. No luck, the lines tally up similarly to the other lines
- Using root via a second user, reset the password of the troubled user. No luck, user still can't login, even when password has been set to a blank user
- Using root via a second user, create a new user scrapping the old one. Works, until the issue recurs.
The images on Amazon are spun up into new boxes as and when needed, and the Kubuntu machines aren't used often, but this has allowed some interesting points:
- This seems to happen after a number of successful authentications: if spooling up an Amazon image of this is fine for a few authentications then fails, spooling up another Amazon image of this will fail at the same point.
- Flashing the root partition seems to resolve this, ruling out /var and /home from being part of the problem as they are on separate partitions on the Kubuntu boxes
What I'd like, is to know exactly what is causing this and resolve so I don't have to create new users and re-assign ownership every so often.
It turns out PAM was counting successful logins as failed logins, and never resetting the failure count.
PAM seems to be missing a line in its standard setup that should prevent this issue. Specifically the following line should be in
/sbin/pam.d/common_account
as standardOnce I included this, I could see successful logins are resetting the failed logins to 0 correctly.