There was a outage in July 2009 of Authorize.Net's websites because of a local fire. If you went to their website during that time there was a notice or redirection to view status updates on their Twitter account. That seemed like a good solution.
That got me thinking. For the websites I manage, in their current setup, if my host lost total internet connection the user would see a 'Server not found' error in their browser. I'd hate to have visitors think the company was no longer in business. I'd favor having the visitor see some kind of 'Unplanned outage' page.
Currently I'd have to:
- Notice the site was down (ip monitoring)
- Update the Nameserver's DNS records to point to another host (hopefully already setup)
- Wait for the new DNS records to propagate (25 mins - 48hrs)
This seems like a horrible solution. I know there has to be a better way of doing this.
Question #1: What is a solution to avoid this?
An idea I had would be to have Nameserver 1 & 2 pointed to nameservers physically located where the website is hosted. And to have Nameserver 3 & 4 pointed to another host where a 'Unplanned outage' page can be viewed.
Question #2: Would this solution work?
Question #3: Can I rely on the nameservers being queried in order (1,2,3,4)?
Question #4: Is this a horrible idea or frowned upon?
Your assumptions under "Currently I'd have to" are sound - note the DNS record propogation time is controlled in the SOA record in your nameservers - you can make it much shorter (look at the records for any prominent site and you'll see that they're generally short TTLs)
However, your solution wouldn't work because DNS servers aren't ordered. There's no 1,2,3,4.
One way I've handled this for a large website in the past was similar to what you described - with a failover component. DNS servers in primary datacenter, DNS servers in secondary hot-spare datacenter, when primary datacenter failed update the DNS to point WWW to secondary datacenter. There were commercial products to handle this automatically (BigIP 3DNS, hah) but it wasn't hard to script.
You could do something very similar on-the-cheap.
Get an inexpensive VPS and configure it as a secondary nameserver for your domain(s), and update your records with your registrar to make sure everybody knows about that nameserver.
Host a site outage page on your new DNS server.
Tweak TTL/Retry/Refresh numbers in your DNS SOA record to correspond to desired failover window.
If your primary site fails, update your DNS manually...(or automatically, if you can detect the failure reliably and script it...)
I'm sure others will have some suggestions on the (many) ways you could handle this.
Take a look at AutoFailover.com
Snip from thier offering:
Autofailover
The mainstay of TZO-HA and the foundation for the high availability option is the unique capability of maintaining extraordinarily low cache times. This allows for near real time traffic redirection.
When TZO-HA detects a failure it automatically updates the DNS record for your domain so that the server requests are sent to the IP address of your alternate server or server cluster.
Unprecedented failover time
The maximum time to re-direct server requests is 2-1/2 minutes including failure detection, DNS record changes, and DNS propagation time through other DNS servers. Typically, this all occurs within 1 minute. Competitive offerings can only deliver time frames of 10 to 30 minutes or more. TZO-HA also Includes Multiple Failover modes.
Doing that via DNS is a horrible idea. Not only will it take forever for your clients to get the hint that your IP has changed, but they'll then cache that you're down, even after you come back up.
What the big guys do is have a second site available (hosting the "we're down" page, or maybe just another copy of the site), and have some routers doing BGP in front of them. If one site goes down, packets magically go to the other site. When it comes back up, it has priority, and there you go.
That's expensive. You probably don't need it. If you do, well... get spending :)
Another option would be to host your main page off of a CDN (that presumably won't go down). If your site is hosed, flip them over to your "hey, things are bad, but they'll get better" page while you make your fixes.