Trying to setup a Graphite/Carbon cluster. I have an elastic load balancer that directs traffic between two nodes in my cluster, each with one web app, relay, and cache.
In this example, I sent 1000 counts for Metric1 to the cluster.
Here's a diagram:
The problem
As seen above in the diagram, each server holds approximately half of the actual metric count. When queried via the web app, it only returns one half of the actual count. According to this fantastic post, this is expected behavior, because the web app returns the first result it sees. This implies (and is documented) that only complete counts should be stored on nodes (in my example, one or both of the nodes should have 1000.)
So my issue appears to be the improper sharding and replication of the count. In my example above, when a new count comes in from the web, it can be redirected to either NodeA or NodeB. I had assumed that counts can enter the cluster via any relay. To test this assumption, I removed the load balancer from the cluster, and directed all incoming counts to NodeA's relay. This worked: the full count appeared on one node, then replicated to the second, and the full count was returned correctly from the web app.
My question
The carbon-relay
appears to act as an application-level load balancer. This is fine, however I'm concerned that when inbound traffic becomes too great, using a single carbon-relay
as a load balancer will become a bottleneck, and single point of failure. I'd much prefer to use an actual load balancer to evenly distribute incoming traffic across the cluster's relays. However, carbon-relay
doesn't seem to play nice, hence the problem illustrated above.
- Why did the relay cluster split Metric1 between the two caches in the above scenario? (When a load balancer distributed the input into different relays?)
- Can I use an elastic load balancer in front of my Graphite/Carbon cluster? Have I misconfigured my cluster for this purpose?
- If I can't, should I put my primary
carbon-relay
on its own box to function as a load balancer?
Turns out my config's
DESTINATIONS
actually pointed to thecarbon-cache
s instead of the othercarbon-relay
, via a typo of the port #. Fixing the config to actually represent the diagram pictured in the question seemed to fix the problem: data now appears in complete form on each node (after replication.)As a side note however, I am now suffering from a problem of inconsistent results from the web app's render API, as detailed in this question. It may or may not be related to the configuration detailed above.
What you have is an authority problem. Using Whisper, each time series database needs to be owned by one and only carbon-cache daemon, or you run into the consistency problems you're seeing. carbon-relay attempts to address this problem by sending the same time series consistently to the same endpoint. You can do this with either the regex-based rule engine, or by using consistent hashing.
My recommendation would be to not over-engineer the problem, scale up until you can't anymore, and only then scale out. We have a single carbon-relay handling 350,000 metrics every 60 seconds with no problems on a single 5-year-old Westmere EP core. If you're using consistent hashing, it's a very low-cost operation to figure out where to route a metric downstream. If you're using a mass of regex rules, that's a lot of string matching and you can hit a performance wall much faster.
The Whisper database is not especially performant. In all likelihood, you're going to hit an I/O performance bottleneck long before the relay starts giving you problems. You are completely overthinking your architecture.
If and when you really need to scale out beyond what a single node is able to give you, you can either route to a specific relay based on client configuration management logic, or you can set up an ELB that routes to multiple relays that each run on the same set of rules and route metrics to the same endpoints. I believe this would require you to use regex-based matching, but consistent hashing might work too if your relays are the same version; I've never tested this approach.