Managing multiple servers, in excess of 90 currently with 3 devops via Ansible. All is working great, however there is a giant security problem right now. Each devop is using their own local ssh key to gain access directly to the servers. Each devop uses a laptop, and each laptop potentially could be be compromised thus opening the entire network of prod servers up to an attack.
I am looking for a solution to centrally manage access, and thus block access for any given key. Not dissimilar to how keys are added to bitbucket or github.
Off the top of my head I would assume the solution would be a tunnel from one machine, the gateway, to the desired prod server... while passing the gateway the request would pick up a new key and use to gain access to the prod server. The result would be we can quickly and efficiently kill access for any devop within seconds by just denying access to the gateway.
Is this good logic? Has anyone seen a solution out there already to thwart this problem?
That's too complicated (checking if a key has access to a specific prod server). Use the gateway server as jump host that accepts every valid key (but can easily remove access for a specific key which removes access to all servers in turn) and then add only the allowed keys to each respective server. After that, make sure you can reach the SSH port of every server only via the jump host.
This is the standard approach.
Engineers should not be running ansible directly from their laptop, unless this is a dev/test environment.
Instead, have a central server that pulls the runbooks from git. This allows for additional controls (four eyes, code review).
Combine this with a bastion or jump-host to restrict access further.
Netflix implemented your setup and released some free software to help that situation.
See this video https://www.oreilly.com/learning/how-netflix-gives-all-its-engineers-ssh-access or this presentation at https://speakerdeck.com/rlewis/how-netflix-gives-all-its-engineers-ssh-access-to-instances-running-in-production with the core point:
Their software is available here: https://github.com/Netflix/bless
Some interesting take aways even if you do not implement their whole solution:
Check out open source CLD software, it solve that problem: https://github.com/classicdevops/cld
Your engineers will able access any server according access matrix, also it provide 2FA by IP address as option.
OneIdentity (ex-Balabit) SPS is the exact thing you need in this scenario. With this appliance you can manage the user identities on basically any machines, track user behavior, monitor and alert, and index whatever the users doing for later reviews.
My suggestion is to disallow SSH access from user machines.
Instead you should
The sample execution model,
OR
If you are limited with server resources, the same Jenkins server can host Git (scm-manager) as well, although there is an additional security risk if one of the developer machine is infected. You may be able to mitigate this by disconnecting the Jenkins server from internet, and resolve Ansible dependencies locally.