I currently use DNS round robin for load balancing, which works great. The records look like this (I have a TTL of 120 seconds)
;; ANSWER SECTION:
orion.2x.to. 116 IN A 80.237.201.41
orion.2x.to. 116 IN A 87.230.54.12
orion.2x.to. 116 IN A 87.230.100.10
orion.2x.to. 116 IN A 87.230.51.65
I learned that not every ISP / device treats such a response the same way. For example some DNS servers rotate the addresses randomly or always cycle them through. Some just propagate the first entry, others try to determine which is best (regionally near) by looking at the IP address.
However if the user base is big enough (spreads over multiple ISPs, etc.) it balances pretty well. The discrepancies from highest to lowest loaded server hardly every exceeds 15%.
However now I have the problem that I am introducing more servers into the systems, and that not all have the same capacities.
I currently only have 1 Gbps servers, but I want to work with 100 Mbps and also 10 Gbps servers too.
So what I want is I want to introduce a server with 10 Gbps with a weight of 100, a 1 Gbps server with a weight of 10 and a 100 Mbps server with a weight of 1.
I previously added servers twice to bring more traffic to them (which worked nice—the bandwidth almost doubled). But adding a 10 Gbps server 100 times to DNS is a bit ridiculous.
So I thought about using the TTL.
If I give server A 240 seconds TTL and server B only 120 seconds (which is about about the minimum to use for round robin, as a lot of DNS servers set to 120 if a lower TTL is specified (so I have heard)). I think something like this should occur in an ideal scenario:
First 120 seconds
50% of requests get server A -> keep it for 240 seconds.
50% of requests get server B -> keep it for 120 seconds
Second 120 seconds
50% of requests still have server A cached -> keep it for another 120 seconds.
25% of requests get server A -> keep it for 240 seconds
25% of requests get server B -> keep it for 120 seconds
Third 120 seconds
25% will get server A (from the 50% of Server A that now expired) -> cache 240 sec
25% will get server B (from the 50% of Server A that now expired) -> cache 120 sec
25% will have server A cached for another 120 seconds
12.5% will get server B (from the 25% of server B that now expired) -> cache 120sec
12.5% will get server A (from the 25% of server B that now expired) -> cache 240 sec
Fourth 120 seconds
25% will have server A cached -> cache for another 120 secs
12.5% will get server A (from the 25% of b that now expired) -> cache 240 secs
12.5% will get server B (from the 25% of b that now expired) -> cache 120 secs
12.5% will get server A (from the 25% of a that now expired) -> cache 240 secs
12.5% will get server B (from the 25% of a that now expired) -> cache 120 secs
6.25% will get server A (from the 12.5% of b that now expired) -> cache 240 secs
6.25% will get server B (from the 12.5% of b that now expired) -> cache 120 secs
12.5% will have server A cached -> cache another 120 secs
... I think I lost something at this point, but I think you get the idea...
As you can see this gets pretty complicated to predict and it will for sure not work out like this in practice. But it should definitely have an effect on the distribution!
I know that weighted round robin exists and is just controlled by the root server. It just cycles through DNS records when responding and returns DNS records with a set probability that corresponds to the weighting. My DNS server does not support this, and my requirements are not that precise. If it doesn't weight perfectly its okay, but it should go into the right direction.
I think using the TTL field could be a more elegant and easier solution—and it doesn't require a DNS server that controls this dynamically, which saves resources—which is in my opinion the whole point of DNS load balancing vs hardware load balancers.
My question now is: Are there any best practices / methods / rules of thumb to weight round robin distribution using the TTL attribute of DNS records?
Edit:
The system is a forward proxy server system. The amount of Bandwidth (not requests) exceeds what one single server with Ethernet can handle. So I need a balancing solution that distributes the bandwidth to several servers. Are there any alternative methods than using DNS? Of course I can use a load balancer with fibre channel etc, but the costs are ridiculous and it also increases only the width of the bottleneck and does not eliminate it. The only thing I can think of are anycast (is it anycast or multicast?) IP addresses, but I don't have the means to set up such a system.
First off, I completely agree with @Alnitak that DNS isn't designed for this sort of thing, and best practice is to not (ab)use DNS as a poor man's load balancer.
To answer on the premise of the question, the approach used to perform basix weighted round robin using DNS is to:
Server A
is to have 1/3 of traffic andServer B
is to have 2/3, then 1/3 of authoritative DNS responses to DNS proxies would contain onlyA
's IP, and 2/3 of responses onlyB
's IP. (If 2 or more servers share the same 'weight', then they can be bundled up into one response.)Amazon's Route 53 DNS service uses this method.
Right. So as I understand this, you have some sort of 'cheap' downloads / video distribution / large-file download service, where the total service bitrate exceeds 1 GBit.
Without knowing the exact specifics of your service and your server layout, it's hard to be precise. But a common solution in this case is:
This kind of setup can be built with open-source software, or with purpose-built appliances from many vendors. The load balancing tag here is a great starting point, or you could hire sysadmins who have done this before to consult for you...
Yes, best practice is don't do it !!
Please repeat after me
DNS is for mapping a name to one or more IP addresses. Any subsequent balancing you get is through luck, not design.
Take a look at PowerDNS. It allows you to create a custom pipe backend. I've modified an example load-balancer DNS backend written in perl to use the Algorithm::ConsistentHash::Ketama module. This lets me set arbitrary weights like so:
And another one:
I've added a cname from my desired top level domain to a subdoman I call gslb, or Global Server Load Balancing. From there, I invoke this custom DNS server and send out A records according to my desired weights.
Works like a champ. The ketama hash has the nice property of minimal disruption to existing configuration as you add servers or adjust weights.
I recommend reading Alternative DNS Servers, by Jan-Piet Mens. He has many good ideas in there as well as example code.
I'd also recommend abandoning the TTL modulation. You are getting pretty far afield already and adding another kludge on top will make troubleshooting and documentation extremely difficult.
To deal with this sort of setup you need to look at a real load balancing solution. Read Linux Virtual Server and HAProxy. You get the additional benefit of servers automatically being removed from the pool if they fail and the effects are much more easily understood. Weighting is simply a setting to be tweaked.
You can use PowerDNS to do weighted round robin, although distributing load in such an unbalanced fashion (100:1?) may get very interesting, at least with the algorithms I used in my solution, where each RR entry has a weight associated with it, between 1-100, and a random value is used to include or exclude records.
Here's an article I wrote on using the MySQL backend in PowerDNS to do weighted RR DNS: http://www.mccartney.ie/wordpress/2008/08/wrr-dns-with-powerdns/
R.I.Pienaar also has some Ruby based examples (using the PowerDNS pipe backend): http://code.google.com/p/ruby-pdns/wiki/RecipeWeightedRoundRobin