This may seem quite obscure and perhaps quite niche but bear with me.
The situation is:
- www.domain.com points to infrastructure in EC2, specifically an ELB
- domain.com points to a micro instance in EC2 and is an nginx server thats only purpose is to redirect all requests to www.domain.com
- said micro instance is a single point of failure due to there only being one of them
What we'd like to do is to remove the SPOF so have multiple servers running this redirect. Setting the root domain to a CNAME in order to use an ELB is against the RFC I believe and I don't believe our DNS host will allow us to do this anyway.
What should we do to remove this SPOF given these limitations? Admittedly it's low impact if it does disappear for any reason but the business wants to mitigate this risk.
There is better approach to deal with apex domain rather then hosting 'redirector' instance on EC2.
You could host static website on Amazon S3, that could be configured to redirect your requests to particular domain. If you are using Route53 - there is an 'alias' record type to help you achieve that. Other DNS providers have similar ones.
Follow this blog article to get into details https://aws.amazon.com/blogs/aws/root-domain-website-hosting-for-amazon-s3/
S3 is fault tolerant service, so you would definitely remove your SPOF for fairly low cost.
As has been mentioned, S3 static hosting is a good way to go, but it requires an Alias record hosted in Route 53... and it doesn't support TLS.
TLS (for SNI-capable browsers) for sites hosted on an S3 bucket can be provided by using CloudFront in front of S3, which works perfectly, but also requires an Alias in Route 53.
Note that an Alias is not a type of DNS record. An Alias is an internal configuration directive in the Route 53 DNS servers that says "when we receive a request for this record over here, we will internally (in the Route 53 database) look up and return the same result we would have returned if we had actually received a request for another, different record." In the end, it offers a functionality that seems similar to a CNAME, but instead of telling the resolver to treat one hostname as equivalent to another, and go look up the other record for further information, Route 53 does that lookup step internally, leaving the resolver none-the-wiser as to the actual mechanism used to satisfy the request... and this internal (not external) lookup mechanism is why Alias records work as desired at the zone apex, when CNAMEs don't.
Unless you have a DNS mechanism that can adapt to dynamic IP addresses, as hosting sites on S3 and CloudFront require, there's not a solid way to eliminate the SPOF, though some other DNS providers support a capability similar to Route 53's Alias records. CloudFlare, for example, calls it "CNAME Flattening," where their DNS server, when it receives a query for your A-Record, does a lookup on the back-side for a different A-Record (in a different domain, such as from s3.amazonaws.com or cloudfront.net) and then returns that answer to the requester. That accomplishes the same net result as an Alias. It's not truly "internal" to the 3rd party DNS server, but since the second request is sent out the back, the client resolver doesn't see anything unusual in the behavior.
In January, 2015, AWS announced EC2 Instance Auto-Recovery, which will tear down and rebuild an instance that fails its Cloudwatch availability checks, creating a new instance with the same "everything" -- instance ID, EBS volumes, elastic IP, etc., and this feature works with several instance classes, including the t2 class.
Or, as a last resort, you could partially alleviate the SPOF by provisioning more than one of these redirecting machines and provisining multiple A-records at your zone apex for round-robin "load balancing." This would reduce, but not eliminate, the impact of an outage of one of the machines, though the viability of such a solution is heavily dependent on browser behavior. It wouldn't be considered a "high-availability" solution, but would (maybe) be better than nothing.
If I understand this comment correctly, then, you actually have to do TLS there. If I point my browser to
https://example.com
, that endpoint has to speak TLS and have a valid cert for the hostname, or the link is effectively broken -- a server can't do a redirect if it can't negotiate TLS in a way that keeps the browser happy.