I'd like to set up instrumentation for OOMKilled events, which look like this when examining a pod:
Name: pnovotnak-manhole-123456789-82l2h
Namespace: test
Node: test-cluster-cja8smaK-oQSR/10.x.x.x
Start Time: Fri, 03 Feb 2017 14:34:57 -0800
Labels: pod-template-hash=123456789
run=pnovotnak-manhole
Status: Running
IP: 10.x.x.x
Controllers: ReplicaSet/pnovotnak-manhole-123456789
Containers:
pnovotnak-manhole:
Container ID: docker://...
Image: pnovotnak/it
Image ID: docker://sha256:...
Port:
Limits:
cpu: 2
memory: 3Gi
Requests:
cpu: 200m
memory: 256Mi
State: Running
Started: Fri, 03 Feb 2017 14:41:12 -0800
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Fri, 03 Feb 2017 14:35:08 -0800
Finished: Fri, 03 Feb 2017 14:41:11 -0800
Ready: True
Restart Count: 1
Volume Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-tder (ro)
Environment Variables: <none>
Conditions:
Type Status
Initialized True
Ready True
PodScheduled True
Volumes:
default-token-46euo:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-tder
QoS Class: Burstable
Tolerations: <none>
Events:
FirstSeen LastSeen Count From SubObjectPath Type Reason Message
--------- -------- ----- ---- ------------- -------- ------ -------
11m 11m 1 {default-scheduler } Normal Scheduled Successfully assigned pnovotnak-manhole-123456789-82l2h to test-cluster-cja8smaK-oQSR
10m 10m 1 {kubelet test-cluster-cja8smaK-oQSR} spec.containers{pnovotnak-manhole} Normal Created Created container with docker id xxxxxxxxxxxx; Security:[seccomp=unconfined]
10m 10m 1 {kubelet test-cluster-cja8smaK-oQSR} spec.containers{pnovotnak-manhole} Normal Started Started container with docker id xxxxxxxxxxxx
11m 4m 2 {kubelet test-cluster-cja8smaK-oQSR} spec.containers{pnovotnak-manhole} Normal Pulling pulling image "pnovotnak/it"
10m 4m 2 {kubelet test-cluster-cja8smaK-oQSR} spec.containers{pnovotnak-manhole} Normal Pulled Successfully pulled image "pnovotnak/it"
4m 4m 1 {kubelet test-cluster-cja8smaK-oQSR} spec.containers{pnovotnak-manhole} Normal Created Created container with docker id yyyyyyyyyyyy; Security:[seccomp=unconfined]
4m 4m 1 {kubelet test-cluster-cja8smaK-oQSR} spec.containers{pnovotnak-manhole} Normal Started Started container with docker id yyyyyyyyyyyy
All I get from the pod logs is;
{
textPayload: "shutting down, got signal: Terminated
"
insertId: "aaaaaaaaaaaaaaaa"
resource: {
type: "container"
labels: {
pod_id: "pnovotnak-manhole-123456789-82l2h"
...
}
}
timestamp: "2017-02-03T22:34:48Z"
severity: "ERROR"
labels: {
container.googleapis.com/container_name: "POD"
...
}
logName: "projects/myproj/logs/POD"
}
And the kublet logs;
{
insertId: "bbbbbbbbbbbbbb"
jsonPayload: {
_BOOT_ID: "ffffffffffffffffffffffffffffffff"
MESSAGE: "I0203 22:41:11.925928 1843 kubelet.go:1816] SyncLoop (PLEG): "pnovotnak-manhole-123456789-82l2h_test(a-uuid)", event: &pleg.PodLifecycleEvent{ID:"another-uuid", Type:"ContainerDied", Data:"..."}"
...
Which doesn't seem like quite enough to uniquely identify this as an OOM event. Any other ideas?
Although the OOMKilled event isn't present in the logs, if you can detect that a pod was killed you can then use
kubectl get pod -o go-template=... <pod-id>
to determine the reason. As an example straight from the docs:If you're doing this programmatically a better alternative to relying on
kubectl
output is to use the Kubernetes REST APIGET /api/v1/pods
method. Methods for accessing the API are also given in the documentation.