I have a docker swarm running our business stack defined in a docker-compose.yml on two servers (nodes). The docker-compose has defined cAdvisor starting on each of the two nodes like that:
cadvisor:
image: gcr.io/google-containers/cadvisor:latest
command: "--logtostderr --housekeeping_interval=30s"
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
- /:/rootfs:ro
- /var/run:/var/run
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
- /dev/disk:/dev/disk/:ro
ports:
- "9338:8080"
deploy:
mode: global
resources:
limits:
memory: 128M
reservations:
memory: 64M
On a third server I run a docker separately from the docker swarm on node 1 and 2 and this server is used to run Prometheus and Grafana. Prometheus is configured to scrape only the node1:9338 resource to get the cAdvisor information.
I occasionally get the problem that when scraping node1:9338 not all containers running on both nodes 1 and 2 are shown in the cAdvisor statistics.
I was assuming that cAdvisor is synching its information in the swarm so that I'm able to configure Prometheus to only use node1:9338 as entrypoint into the docker swarm and scraping the information.
Or do I have to also put node2:9338 into my Prometheus configuration to always get all information of all nodes? If yes, how should this scale then because I would need to add each new node to the Prometheus config.
Running Prometheus together with the business stack in one swarm is no option.
edit: I experienced today when opening the cAdvisor metrics URL http://node1:9338/metrics as well as http://node2:9338/metrics a strange behaviour as I see the same information of all containers running on node1 on both URLs. The information of the containers running on node2 are missing when requesting http://node2:9338/metrics.
Could it be that the docker-internal load balancing is routing the request from http://node2:9338/metrics to the node1:9338 cAdvisor so the metrics of node1 are shown despite node2 is requested?
cAdvisor looks at the container information provided by Linux on that machine, it knows nothing of Swarm. You'll want to have Prometheus scraping all your machines.
Indeed the problem was the docker-internal load balancing in swarm mode.
As I wrote in my initial post we were adding cAdvisor to our docker-compose file and we were instantiating the docker-swarm via
The configuration of cAdvisor with
leads to one instance per node but requesting a certain node via http://node2:9338/metrics doesn't mean you get the result of cAdvisor running on that node. The internal docker network might reroute your request to http://node1:9338/metrics so that you won't be able to scrape the real cAdvisor results from node2.
The solution which worked for me was to explicit tell docker to use
mode: host
in the ports section of cAdvisor in my docker-compose. My final config looks like:Please notice the changed ports section.