I have a fresh k8s cluster on gke.
Whenever I run kubectl top node gke-data-custom-vm-6-25-0cbae9b9-hrkc
I get
Error from server (NotFound): the server could not find the requested resource (get services http:heapster:)
At the same time I have this service:
> kubectl -n kube-system get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default-http-backend NodePort 10.11.241.20 <none> 80:32688/TCP 59d
heapster ClusterIP 10.11.245.182 <none> 80/TCP 59d
kube-dns ClusterIP 10.11.240.10 <none> 53/UDP,53/TCP 59d
metrics-server ClusterIP 10.11.249.26 <none> 443/TCP 59d
and a pod with heapster is running (and I can see it was restarted a lot of times)
kubectl -n kube-system get pods
NAME READY STATUS RESTARTS AGE
event-exporter-v0.2.3-85644fcdf-kwd6g 2/2 Running 0 16d
fluentd-gcp-scaler-8b674f786-dbrcr 1/1 Running 0 16d
fluentd-gcp-v3.2.0-2fqgl 2/2 Running 0 17d
fluentd-gcp-v3.2.0-47586 2/2 Running 0 17d
fluentd-gcp-v3.2.0-552xm 2/2 Running 0 16d
heapster-v1.6.0-beta.1-fdc7fd478-8s998 3/3 Running 73 16d
However I can see in logs of heapster-nanny container some errors:
> kubectl logs -n kube-system --tail 10 -f po/heapster-v1.6.0-beta.1-fdc7fd478-8s998 -c heapster-nanny
ERROR: logging before flag.Parse: E0418 23:30:10.075539 1 nanny_lib.go:95] Error while querying apiserver for resources: Get https://10.11.240.1:443/api/v1/namespaces/kube-system/pods/heapster-v1.6.0-beta.1-fdc7fd478-8s998: dial tcp 10.11.240.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0418 23:30:10.971230 1 reflector.go:205] k8s.io/autoscaler/addon-resizer/nanny/kubernetes_client.go:107: Failed to list *v1.Node: Get https://10.11.240.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.11.240.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0418 23:30:11.972337 1 reflector.go:205] k8s.io/autoscaler/addon-resizer/nanny/kubernetes_client.go:107: Failed to list *v1.Node: Get https://10.11.240.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.11.240.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0418 23:30:12.973637 1 reflector.go:205] k8s.io/autoscaler/addon-resizer/nanny/kubernetes_client.go:107: Failed to list *v1.Node: Get https://10.11.240.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.11.240.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0418 23:30:13.975024 1 reflector.go:205] k8s.io/autoscaler/addon-resizer/nanny/kubernetes_client.go:107: Failed to list *v1.Node: Get https://10.11.240.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.11.240.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0418 23:30:14.976582 1 reflector.go:205] k8s.io/autoscaler/addon-resizer/nanny/kubernetes_client.go:107: Failed to list *v1.Node: Get https://10.11.240.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.11.240.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0418 23:30:16.063760 1 reflector.go:205] k8s.io/autoscaler/addon-resizer/nanny/kubernetes_client.go:107: Failed to list *v1.Node: Get https://10.11.240.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.11.240.1:443: getsockopt: connection refused
ERROR: logging before flag.Parse: E0418 23:30:27.065693 1 reflector.go:205] k8s.io/autoscaler/addon-resizer/nanny/kubernetes_client.go:107: Failed to list *v1.Node: Get https://10.11.240.1:443/api/v1/nodes?resourceVersion=0: net/http: TLS handshake timeout
ERROR: logging before flag.Parse: E0418 23:30:30.077159 1 nanny_lib.go:95] Error while querying apiserver for resources: Get https://10.11.240.1:443/api/v1/namespaces/kube-system/pods/heapster-v1.6.0-beta.1-fdc7fd478-8s998: net/http: TLS handshake timeout
ERROR: logging before flag.Parse: E0418 23:30:59.778560 1 reflector.go:205] k8s.io/autoscaler/addon-resizer/nanny/kubernetes_client.go:107: Failed to list *v1.Node: Get https://10.11.240.1:443/api/v1/nodes?resourceVersion=0: dial tcp 10.11.240.1:443: i/o timeout
and also in heapster container
I0423 07:02:10.765134 1 heapster.go:113] Starting heapster on port 8082
W0423 07:16:27.975467 1 manager.go:152] Failed to get all responses in time (got 2/3)
W0423 07:16:43.064110 1 manager.go:107] Failed to get kubelet_summary:10.128.0.49:10255 response in time
W0423 07:20:36.875359 1 manager.go:152] Failed to get all responses in time (got 2/3)
W0423 07:20:44.383790 1 manager.go:107] Failed to get kubelet_summary:10.128.0.49:10255 response in time
W0423 07:22:29.683060 1 manager.go:152] Failed to get all responses in time (got 2/3)
W0423 07:22:40.278962 1 manager.go:107] Failed to get kubelet_summary:10.128.0.49:10255 response in time
W0423 07:31:27.072711 1 manager.go:152] Failed to get all responses in time (got 2/3)
W0423 07:31:54.580031 1 manager.go:107] Failed to get kubelet_summary:10.128.0.49:10255 response in time
How can I fix this?
Any additional info that I should provide?
Heapster Deprecation
Heapster is a deprecated project and may have problems when running in recent Kubernetes versions.
See Heapster Deprecation Timeline:
Since Kubernetes v1.10, the
kubectl top
relies on metrics-server by default.CHANGELOG-1.10.md:
What you should do:
As of Heapster is deprecated, and you already have a metrics-server deployed, the best option is to use a
kubectl
versionv1.10
or above, as it fetches the metrics from metrics-server.However, beware of
kubectl
Version Skew Policy:Check your
kube-apiserver
version before choosing yourkubectl
version.I guess your issue might be related to auto-upgrade of your GKE's Master nodes.
Mine got upgraded recently to
v1.11.8-gke.6
, and during the upgrade, I observed the same intermittent errors insideheapster-nanny
container:For me, the problem no longer persists, and I can safely get the nodes' metrics with
kubectl
.