My development environments are running on Google Container Engine, and the following PODs created by Replication Controller
NAME READY STATUS RESTARTS AGE NODE
couchdb-dev-ocbud 1/1 Running 3 13h cz5w
couchdb-stage-8f9bn 1/1 Running 1 13h uqu4
etcd-1-rmwzy 1/1 Running 0 3d q0cz
etcd-2-n4ckp 1/1 Running 8 3d uqu4
etcd-3-yzz2x 1/1 Running 0 3d yt9e
mongodb-dev-ig9xo 1/1 Running 3 16h cz5w
mysql-dev-rykih 1/1 Running 3 17h cz5w
mysql-stage-n240p 1/1 Running 3 16h cz5w
redis-dev-19dxg 0/1 Running 5 3d cz5w
redis-dev-s5v6k 1/1 Running 0 3d yt9e
redis-dev-wccyb 0/1 Running 8 3d uqu4
redis-stage-qnbb6 0/1 Running 8 3d uqu4
redis-stage-xb54r 0/1 Running 0 3d yt9e
redis-stage-xntc2 0/1 Running 5 3d cz5w
shadowsocks-b8009 1/1 Running 0 2d q0cz
shadowsocks-i1anu 1/1 Running 0 2d yt9e
ts-stage-4esg8 1/1 Running 8 3d uqu4
ts-stage-cer7a 1/1 Running 5 3d cz5w
ts-stage-dtpdh 1/1 Running 0 3d yt9e
ts-stage-mah7w 1/1 Running 0 3d q0cz
uls-dev-upibo 1/1 Running 5 1d cz5w
uls-stage-zht0j 1/1 Running 6 1d uqu4
zookeeper-1-4dklm 1/1 Running 0 3d q0cz
zookeeper-2-pw13k 1/1 Running 8 3d uqu4
zookeeper-3-u9a34 1/1 Running 0 3d yt9e
PODs on NODE uqu4
were restarted for 8 times without my interaction.
Here is the termination reason from kubectl describe pod <pod>
, error code is 137
Last Termination State: Terminated
Reason: Error
Exit Code: 137
Started: Mon, 21 Mar 2016 08:33:24 +0000
Finished: Mon, 21 Mar 2016 21:04:57 +0000
Ready: True
Restart Count: 8
When I ssh to the uqu4
node, I receives a warning as below
WARNING: Could not setup log file in /root/.config/gcloud/logs, (OSError: [Errno 28] No space left on device: '/root/.config/gcloud/logs/2016.03.22')
The df -h
looks ok
Filesystem Size Used Avail Use% Mounted on
rootfs 99G 14G 82G 14% /
udev 10M 0 10M 0% /dev
tmpfs 750M 340K 750M 1% /run
/dev/disk/by-uuid/6be8ff15-205a-4019-99e0-92d9c347301b 99G 14G 82G 14% /
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 1.5G 1.7M 1.5G 1% /run/shm
cgroup 3.7G 0 3.7G 0% /sys/fs/cgroup
tmpfs 3.7G 8.0K 3.7G 1% /var/lib/kubelet/pods/46f374dc-ecbe-11e5-bf3b-42010af00080/volumes/kubernetes.io~secret/default-token-binen
tmpfs 3.7G 8.0K 3.7G 1% /var/lib/kubelet/pods/4a17371c-ecbe-11e5-bf3b-42010af00080/volumes/kubernetes.io~secret/default-token-binen
/dev/sdb 976M 187M 722M 21% /var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/etcd-2-data-disk
/dev/sdb 976M 187M 722M 21% /var/lib/kubelet/pods/4a13021d-ecbe-11e5-bf3b-42010af00080/volumes/kubernetes.io~gce-pd/etcd-data
tmpfs 3.7G 8.0K 3.7G 1% /var/lib/kubelet/pods/4a13021d-ecbe-11e5-bf3b-42010af00080/volumes/kubernetes.io~secret/default-token-binen
/dev/sdc 976M 9.5M 900M 2% /var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/zookeeper-2-data-disk
/dev/sdc 976M 9.5M 900M 2% /var/lib/kubelet/pods/4a5933ee-ecbe-11e5-bf3b-42010af00080/volumes/kubernetes.io~gce-pd/zookeeper-2-data
tmpfs 3.7G 8.0K 3.7G 1% /var/lib/kubelet/pods/4a5933ee-ecbe-11e5-bf3b-42010af00080/volumes/kubernetes.io~secret/default-token-binen
tmpfs 3.7G 8.0K 3.7G 1% /var/lib/kubelet/pods/b93210e7-ecfb-11e5-a962-42010af00080/volumes/kubernetes.io~secret/default-token-binen
/dev/sdd 30G 48M 28G 1% /var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/uls-stage-data-disk
/dev/sdd 30G 48M 28G 1% /var/lib/kubelet/pods/f2764484-ee6b-11e5-a962-42010af00080/volumes/kubernetes.io~gce-pd/uls-stage-data-disk
tmpfs 3.7G 8.0K 3.7G 1% /var/lib/kubelet/pods/f2764484-ee6b-11e5-a962-42010af00080/volumes/kubernetes.io~secret/default-token-binen
/dev/sde 50G 52M 47G 1% /var/lib/kubelet/plugins/kubernetes.io/gce-pd/mounts/couchdb-stage-data-disk
/dev/sde 50G 52M 47G 1% /var/lib/kubelet/pods/e721dfb1-ef5b-11e5-a962-42010af00080/volumes/kubernetes.io~gce-pd/couchdb-stage-data-disk
tmpfs 3.7G 8.0K 3.7G 1% /var/lib/kubelet/pods/e721dfb1-ef5b-11e5-a962-42010af00080/volumes/kubernetes.io~secret/default-token-binen
/dev/disk/by-uuid/6be8ff15-205a-4019-99e0-92d9c347301b 99G 14G 82G 14% /var/lib/docker/aufs
none 99G 14G 82G 14% /var/lib/docker/aufs/mnt/8d9c854d1688439657c6b55107f6898d6b9fbdb74b9610dd0b48a1b22c6102d1
none 99G 14G 82G 14% /var/lib/docker/aufs/mnt/9e09bc6c69af03192569ba25762861edd710bf45baf65c449a4caf5ad69500f3
none 99G 14G 82G 14% /var/lib/docker/aufs/mnt/f82c122422db51310ce965173ca2b043ffa7b55b84f5b28bf9c19004a3e44fa9
none 99G 14G 82G 14% /var/lib/docker/aufs/mnt/6a0ccec3cedbcdf481a2ce528f2dcc9d1626f263591bebdb96a77beea0c0443f
none 99G 14G 82G 14% /var/lib/docker/aufs/mnt/ae8059fb1c2abbbffc72813a0355a4dd3d2633c720ef61b16d19a46ed2d63358
none 99G 14G 82G 14% /var/lib/docker/aufs/mnt/9d5b9ad1148e1ee4e10f826fc500f0a5c549bdc9ed66519e5f3b222d99641dfd
none 99G 14G 82G 14% /var/lib/docker/aufs/mnt/668f95f658cb13457b193f31716df5e5b8da7f227bc3ae1e0367098ec20580b0
none 99G 14G 82G 14% /var/lib/docker/aufs/mnt/bdf7d3660b81879c75a0048f921fa47b0138c3a9ec5454e85a55e62ccf9d86fe
none 99G 14G 82G 14% /var/lib/docker/aufs/mnt/8cb75d5e0df5d34ceefe41ec55a88198568a0670b6bddade4d8bb7194ba49779
none 99G 14G 82G 14% /var/lib/docker/aufs/mnt/a9bb332d1aebc349d1440416a59f898f9ed12be1c744e11e8f3e502dd630df0e
none 99G 14G 82G 14% /var/lib/docker/aufs/mnt/36a2bd14af419e19fe89fe32e3f02f490f5553246e76d6c7636ae80e6bba8662
none 99G 14G 82G 14% /var/lib/docker/aufs/mnt/a8c983eb3b1701263d1501b025f080ae0d967ddee2fd4bd5071e6e9297b389b9
none 99G 14G 82G 14% /var/lib/docker/aufs/mnt/e0131ab5360fce8e3a83522b9bc7176d005b893b726bf616d0ee2f7e5ab4269e
none 99G 14G 82G 14% /var/lib/docker/aufs/mnt/2e1fd00cb2ec9ca11323b3ac66f413b6873ca2e949ceb3ba5eb368de8de18af5
none 99G 14G 82G 14% /var/lib/docker/aufs/mnt/78c89fcc2b2a81c8544883209fac002a3525fed8504ebf43722b592179264dea
none 99G 14G 82G 14% /var/lib/docker/aufs/mnt/4e56c31cbc3dfde7df17c1075595d80214dc81e55093ee9d9b63ef88b09502ad
Here is result from free
total used free shared buffers cached
Mem: 7679824 5625036 2054788 0 207872 1148568
-/+ buffers/cache: 4268596 3411228
Swap: 0 0 0
What is the reason causing the PODs restart?
I'll recommend running the following commands to view the pods/nodes current state:
These commands can provide detailed information on the pods that might be failing, as well as the nodes where the pods are being created. This Kubernetes link has more information on how to determine the root cause for a pod failure.
For monitoring the resources used by pods, it's better to use the monitoring tools that Kubernetes suggest or the Web UI (Dashboard), as these tools can provide detailed information on the resources being used by every pod.