Two weeks ago, two of our four RHEL7 VMs running on same Openstack infrastructure rebooted unprompted and unnoticed. May be externally triggered.
Ever since, all services relying on DNS are unreliable on these two servers, while alright on the unrebooted ones.
After diagnosing, it turns out the affected ones have an average dig time of 5 seconds vs 0, with possible time out. Pinging/ICMP is not available but strangely enough no other IPs seem to have a problem being reached.
dig @1.2.3.4 example.corp +short
All four VMs are running same OS, same /etc/hosts, same resolv.conf, same nsswitch.conf . All four VMs traceroute the DNS server through the same gateway.
What else can I use to debug the issue?
It's not a DNS issue. Turns out it was not a simple reboot but also a live-migrate on Openstack. And something's wrong with the new host. DNS is simply the most affected service.