I'm trying to ask this question in a way that's answerable, but part of the issue is knowing the implications of my current situation and if there's an issue or technical debt which'll bite me further on.
I've setup a few IPA servers in a master & replicas setup.
server1: dns A record (and fqdn hostname): srv1.mydomain.com
server2: dns A record (and fqdn hostname): srv2.mydomain.com
server3: dns A record (and fqdn hostname): srv3.mydomain.com
the servers have a cname of auth-a, auth-b, auth-c, respectively and use a self signed cert as per a normal IPA install.
This worked fine for months for ssh connections, and sssd and so on. The issue arrived when trying to hook in applications which only allow one ldap server to be specified. There are SRV dns records setup for failover, but in an attempt to get these apps to work i also put in a dns round robin record.
The catch is this round robin only works for normal ldap lookups, not ldap ssl. I can make ssl work however if i disable checking on the ssl cert.
So... the questions !
a) realistically, how bad is it to disable checking of the cert on an internal service ? This ldap server is going to be queried from the LAN, always. I believe i'm opened up to a possible MITM attack, but i'm not certain of how worried i need to be of that. I mean, right now my other option is not using ssl, and that's scary sauce. To perform the MITM attack they'd already need be on my network and have control of the DNS, no ? Any advice which could quantify that concern into real terms would be helpful.
b) as i understand it to actually fix this i'd need to give the RR dns entry as a subject alt name on the self signed cert of the server(s). That means re-keying the server, right ? which in the case of IPA means rejoining every client to IPA for the new cert. That's a non-starter i think.
c) given the current situation and outcome of (a) and (b), what would you recommend as the best course of action to allow apps which only allow one ldap server to be specified (and don't use SRV dns records in any way) to fail-over to the other server should one go down, and still allow ldap over ssl giving my certificates ?
You should issue new certificates with subjectAlternitiveNames and point the dns record for that name at a load balancer.
Round robin dns is not necessarily going to give more availability in the case of server failure unless you're pulling the A/AAAA record from dns as the client will randomly try to connect to one of the servers, including the failed one. If the application doesn't attempt to reconnect, or is unlucky and gets the same record enough times in a row that it fails. Adding a load balancer in front adds extra complexity but does mean this possibility is lessened. If you're happy with the round robin for load sharing then I'd look at whether entering a subjectaltname in the certificate would satisfy the clients to ldaps, or failing that a wildcard might be suitable. Preventing man in the middles could also be achieved by running your own internal PKI and deploying this as a trusted CA in your client machines. This has the added advantage of being a central place you can see expiring or expired certificates rather than having to manage this on each host/service that has its own certificate.
If all you are after is HA, I would do something a bit simplistic, but useful:
Set up an HA cluster for IPA (to avoid trouble - just run it in a VM, where the libvirt service is the protected process) and use that IPA instance for all those limited apps, while the other IPAs tend to user auth. IPA works great on KVM, I've run quite a few instances with zero issues over years