We have a longstanding production issue where cron on certain servers periodically report the following error-
sudo: uid NNN does not exist in the passwd file!
The user does exist, in fact it is the userid of the cron user. There are 7 jobs in that user's crontab, in the format
* * * * * sudo /run/this/every_minute
0,5,10,15,20,25,30,35,40,45,50,55 * * * * sudo /run/this/every_5mins
10,40 * * * * sudo /run/this/every_30mins
...
0 11 * * 6 sudo /run/this/once_per_week
The every_5mins job sometimes modifies /etc/passwd, but in an atomic fashion. It never touches the userid in question, and when I compare the logs of our /etc/passwd changes and the times which we have received this error, there is no correlation.
I had a look at the sudo code for the version in use (1.6.8p12) and this error comes from the C system call getpwnam returning a null passwd struct pointer. Other than the obvious reason, it can also fail due to EINTR, EIO, EMFILE, ENFILE or ENOMEM. sudo doesn't discriminate between the username not being found and these other error conditions. I think I can assume that cron insulates against EIO and EINTR. Our servers are monitored and seem to have plenty of memory. The max FD limit is very high (> 700k). There aren't any capacity problems that we're aware of, and I think other stuff would be failing if these limits were breached. This leaves EMFILE, and yet I don't see how this could cause such an error, since it is happening at the very beginning before the command is launched. I believe EMFILE can only occur if the max FD (1024) is reached in a particular process.
So I'm stumped - any ideas welcome.
0 Answers