I have Graphite setup on three instances on EC2:
- carbon-relay -
relay1.graphite.prod.example.ec2
- carbon-cache + webapp -
cache3.graphite.prod.example.ec2
- carbon-cache + webapp -
cache4.graphite.prod.example.ec2
The relay is working perfectly with consistent-hashing. The problem is the two web servers are not communicating with each other, so I only see the metrics from one server.
I spent a lot of time looking at https://answers.launchpad.net/graphite/+question/114206 and I can't figure out what I have setup incorrectly. I can run a wget from cache3 against cache4, get data back and see it in the Apache logs. So I don't think it's a firewall issue. I tried enabling suppressError = False
in remote_storage.py and turned on DEBUG in local_settings.py, but I don't see any errors in Firebug.
cache3 - local_settings.py
CLUSTER_SERVERS = [ 'cache4.graphite.prod.example.ec2', 'localhost' ]
cache4 - local_settings.py
CLUSTER_SERVERS = [ 'cache3.graphite.prod.example.ec2', 'localhost' ]
I have tried using IP addresses as well and that had no impact.
I did a little more debugging and modified storage.py
to directly hard code my remote hosts:
STORE = Store(settings.DATA_DIRS, remote_hosts=["cache4.graphite.prod.example.ec2", "127.0.0.1"])
That worked. So, somehow my CLUSTER_SERVERS
value isn't getting pulled in from local_settings.py correctly.
Any suggestions?
Turns out the permissions on
local_settings.py
were too restrictive and Apache was unable to read it:Fixing the permissions to 644 (instead of 600) resolved the problem.