We have Windows domain spread across multiple sites and we are using Ansible for orchestrating Windows rebuild process. During the rebuild, we observe some Kerberos related issues that we suspect may be to the way our workflow works
Rebuild process works as below:
- Since this is a rebuild, the computer object already exists in AD
- Kerberos ticket is created
- Rebuild process is starting, disks are wiped , Windows installed and computer is rejoined to Active Directory
- When computer is up and running, new Kerberos ticket is generated by Ansible to connect to this computer.
In some cases however we can see that Ansible fails to connect to the rebuilt server.
I am trying to understand what happens during this phase that may cause the issue. I see the process as follows:
- We create TGT ticket at the beginning of the Ansible play
- Server is rebuilt and rejoined the domain
- AD replication is in process and newly created computer account is not replicated to all KDC (DC)
- Ansible connects to one of the KDC that has not received update about computer rejoin and uses TGT to receive Service ticket to connect to the new server via WinRM . As a result, it gets WinRM service ticket signed using password for old computer account
- Ansible tries to connect to new server using this ticket and connection fails with an error 'WINRM CONNECTION ERROR: the specified credentials were rejected by the server'
To isolate replication issue, we are configuring Ansible kerberos client to use client's site DC as KDC. This did improve the process but we still see the error occasionally.
Can someone comment on whether our assumptions and fix is correct?
0 Answers