All machines are running CentOS 6.5. We have about 85 client machines doing a Kerberized NFSv4 mount (sec=krb5p) to a server. This weekend, the server was changed (serverA changed to serverB).
Everything appears to work, except, on one a few client servers only, and only for one a few special users, it looks like idmapd stops working after around 30 to 45 minutes.
Simple test: on the client machine in question, I do something like this:
while [ 1 ]; do touch test.`date +%H%M%S`.txt ; sleep 1m ; done
And then watch the files as they are being created. They start out with the proper user and group ID. But after 35 minutes or so, they suddenly switch to being owned by nfsnobody:nfsnobody.
The idmapd process is still running. Other users and other machines are apparently not affected. (Of course we didn't test all users and all machines, but spot testing other users and other machines found no problems.)
Edit: Forgot to post some important details:
- Initially, the new NFS server did not have the correct /etc/idmapd.conf. It had the default. That has since been rectified, and the idmapd service restarted on both server and client.
The /etc/idmapd.conf files are the same on both client and server.- The /etc/passwd and /etc/group files are the same on both client and server.
Edit2: Upon further review, we observed the following:
- The /etc/idmapd.conf on the server is different from the client in that it also has some static mappings. These static mappings are for a few key users who need to run cronjobs from Kerberized NFSv4 shares. This link describes exactly the type of special config on the server's /etc/idmapd.conf file.
- The problem would actually come and go (not simply get bad and stay bad). In my example "touch" test above, the files would be created for a while with the right user and group ownership. Then after ~45 minutes, they'd start getting created with nfsnobody ownership. Then after some time they'd be created with the right ownership. Off and on, with no discernible pattern.
- This happened only for the users with the "special" /etc/idmapd.conf mappings described above, and only on the machines where those users had cron jobs.
At the risk of jinxing myself, it appears a reboot of the NFSv4 server fixed the issue. It's been about three hours since it was rebooted, and so far no issues. Previously, we never went over an hour without at least one account getting into the bad "state" described above.
I can't really explain this except to guess that there was some left over "junk" on the server that did not get flushed when the rpc.idmapd server was restarted with the correct config.