I've recently been thinking about the TTL of our DNS. We have A records for our servers and then CNAME records for the customer facing names. The www.example.com CNAME points to server-01.example.com for example. In the event of a failure we have the TTL set at 15mins on both the CNAME and A record.
However it dawns on me that this might not be optimal. Surely it should be that A record be 48 hours and the CNAME be 15mins. The CNAME just gets pointed to server-02.example.com in the event of a failure. The A record (in theory should be cached quite happily for a long time, because we use the CNAME as the switcher).
Looking around the Internet I found lots of people having their CNAME long and the A record short: CNAME and A record have different TTLs. Which one will be cached?
This seems contrary to what anybody would want. The question is, does DNS work in the way I hope it works, in that the CNAME request TTL is the important one for if I needed to switch servers in a hurry?
Assuming that the apex A record for
example.com.
was pointing at a broken IP address, most companies I know would change the A record and skip thewww
change entirely:www.example.com
overexample.com
. (hint: most of us don't)Moving on to your linked example, you're comparing apples and oranges. Apex DNS records in web hosting scenarios are a massive pain because of the well-known apex CNAME problem. There are only two correct choices in this circumstance: either the apex A record is changed as necessary to point it at a valid IP, or you forgo having an apex record entirely. Anything between the two is half-baked and inconsistent.
All of this is somewhat beside the point though: if you are relying on manual record changes to handle high availability for your service, you are doing something wrong. The IP address the web browser hits should be a load balancer, an anycast address, a CDN, or a webhosting provider who can provide this high availability if your own server farms cannot. Multiple address records can also work if you're confident that the primary applications consuming them follow follow RFC 6724 guidelines (i.e. most popular web browsers), but many applications are lazy and simply use the first address record returned.
For the sake of the argument, let's examine Google's CNAME chain on its own merits without putting it into the context of your original problem. This will look familiar, as it's the text of my original answer:
Record type is inconsequential here. If the record needs to be changed frequently, it should have a very low TTL. If it doesn't need to be changed frequently, it stands to reason that it doesn't need a low TTL and you can use whatever you're comfortable with.
No one (other than Google) can really comment on why Google wants
ghs.l.google.com IN A
to have a lower TTL than the CNAME records pointing at it. You can't draw any conclusions without understanding their larger design, and the design is what dictates your moving parts.I agree.
As long as the "real servers" have stable IP addresses the A records should have long TTLs. Keep the TTL on the CNAME records low to enable fast switching to another real server in case of failure or whatever.