I continue down the frustratingly stop-start road of learning Kubernetes (specifically MicroK8S).
I build an image locally on a development laptop thus:
docker build -t k8s-workload .
This is a simple PHP web app that reports some request metadata. It builds successfully:
Sending build context to Docker daemon 13.82kB
Step 1/5 : FROM php:8.2-cli-alpine
---> c5f1f9770838
Step 2/5 : WORKDIR /root
---> Using cache
---> 492c997c963b
Step 3/5 : RUN apk update && apk upgrade
---> Using cache
---> f91505d5fe68
Step 4/5 : COPY src /root
---> 02bcc72dfc97
Step 5/5 : CMD ["sh", "/root/bin/start-server.sh"]
---> Running in 6bc3b72365e4
Removing intermediate container 6bc3b72365e4
---> 0c8a405b06af
Successfully built 0c8a405b06af
Successfully tagged k8s-workload:latest
I create a tarball from this so it can be sent to my three-node cluster:
docker save k8s-workload > k8s-workload.docker.tar
I then send it to the leader in the cluster (though I assume it could be sent to any of them):
scp k8s-workload.docker.tar 192.168.50.251:/home/myuser/
This is all looking good so far. Now I want to sideload the image into all nodes in the cluster:
root@arran:/home/myuser# microk8s images import < k8s-workload.docker.tar
Pushing OCI images to 192.168.50.251:25000
Pushing OCI images to 192.168.50.135:25000
Pushing OCI images to 192.168.50.74:25000
With that looking successful, I tried to create a workload:
root@arran:/home/myuser# microk8s kubectl create deployment k8s-workload --image=k8s-workload
Finally let's get the status of this pod:
root@arran:/home/myuser# microk8s kubectl get pods
NAME READY STATUS RESTARTS AGE
k8s-workload-6cdfbb6b59-zvgrl 0/1 ImagePullBackOff 0 35m
OK, that doesn't look good. There was also an error of ErrImagePull, but that seems to have been replaced now.
How can I debug why an image won't start?
I discovered a way to list the images on a node. I have found my newly built image on the leader node:
root@arran:/home/myuser# microk8s ctr images list | grep workload
docker.io/library/k8s-workload:latest application/vnd.docker.distribution.manifest.v2+json sha256:725b...582b 103.5 MiB linux/amd64
So the image is available. I can get some logs on the problem, but it does not reveal anything that I don't already know:
root@arran:/home/myuser# microk8s kubectl logs k8s-workload-1cdfaa6c49-zvgrl
Error from server (BadRequest): container "k8s-workload" in pod "k8s-workload-1cdfaa6c49-zvgrl" is waiting to start: trying and failing to pull image
What can I try next? No node would actually need to pull an image, as far as I know, since they are all available on each node.
Update 1
I was hesitant to add too many problems to the one question, but on balance I think they are worth adding, since they are all obstacles to obtaining one result: successfully deploying a trivial workload on K8S.
When describing the single pod within the single deployment, I noticed it showed me this error:
kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy.
Yikes! Another thing that doesn't work out of the box. I have fixed this the MicroK8S way using this answer. It hasn't solved the problem, but at least I am knocking the roadblocks on the head, one by one.
Update 2
I wanted to check that the side-loaded image was valid, so I did this on the leader:
root@arran:/home/myuser# docker load < k8s-workload.docker.tar
That unpacks fine:
bb01bd7e32b5: Loading layer [==================================================>] 7.618MB/7.618MB
e759f13eb8bc: Loading layer [==================================================>] 6.015MB/6.015MB
1a72c946ba2b: Loading layer [==================================================>] 12.29kB/12.29kB
9bbacedbd5e4: Loading layer [==================================================>] 6.144kB/6.144kB
53b5e1394bc2: Loading layer [==================================================>] 12.08MB/12.08MB
aff825926dad: Loading layer [==================================================>] 4.096kB/4.096kB
c76bce6229c6: Loading layer [==================================================>] 71.7MB/71.7MB
0503c7346508: Loading layer [==================================================>] 12.8kB/12.8kB
8c2f9e7d94bb: Loading layer [==================================================>] 65.54kB/65.54kB
7e0ad9ed4982: Loading layer [==================================================>] 10.97MB/10.97MB
b99f234d8751: Loading layer [==================================================>] 5.632kB/5.632kB
Loaded image: k8s-workload:latest
I then run it on the leader on a custom port (i.e. this is in Docker, not K8S):
root@arran:/home/myuser# docker run -p 9000:80 -it k8s-workload
This responds via cURL as I would expect from another machine on the LAN.
Update 3
It occurred to me that the "namespaced" image name might a different - should I be specifying docker.io/library/k8s-workload:latest
rather than k8s-workload
? I tried both, and I find I get the same result.
So here is the latest error:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 4m46s default-scheduler Successfully assigned default/k8s-workload-68c899df98-qhmhr to yamazaki
Normal Pulling 3m17s (x4 over 4m45s) kubelet Pulling image "k8s-workload"
Warning Failed 3m15s (x4 over 4m43s) kubelet Failed to pull image "k8s-workload": rpc error: code = NotFound desc = failed to pull and unpack image "docker.io/library/k8s-workload:latest": failed to unpack image on snapshotter overlayfs: unexpected media type text/html for sha256:e823...45c8: not found
Warning Failed 3m15s (x4 over 4m43s) kubelet Error: ErrImagePull
Warning Failed 2m52s (x6 over 4m43s) kubelet Error: ImagePullBackOff
Normal BackOff 2m37s (x7 over 4m43s) kubelet Back-off pulling image "k8s-workload"
OK, so I now have a bit more detail. What does the "failed to unpack image" error actually mean?
Update 4
A helpful answer below suggests that I might need to set a pull policy to get K8S to expect the image to be available on each node, and that it should not try to pull them (it does not exist anywhere remotely).
However when taking the supplied advice, although I get a different error code (CreateContainerError
), the underlying cause is the same:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 64s default-scheduler Successfully assigned default/k8s-workload to yamazaki
Normal Pulled 6s (x7 over 62s) kubelet Container image "k8s-workload" already present on machine
Warning Failed 6s (x7 over 62s) kubelet Error: failed to create containerd container: error unpacking image: unexpected media type text/html for sha256:1f2c...753e1: not found
Update 5
I have reported this as a bug for now, though I would still welcome answers here.
Update 6
On the basis that dogged levels of persistence is mysteriously good for the soul, I tried deleting the image using the ctr
subcommand. This is on a follower node:
root@yamazaki:/home/myuser# microk8s ctr images rm docker.io/library/k8s-workload:latest
docker.io/library/k8s-workload:latest
Then using the same subcommand I reimported:
root@yamazaki:/home/myuser# microk8s ctr images import k8s-workload.docker.tar
unpacking docker.io/library/k8s-workload:latest (sha256:725b...582b)...done
Since this operates at the node level and not the cluster, I did this for each of the three nodes.
I then used the run
command, since this permits a pull policy to be set, and I don't want to conflate the unpack problem with a pull problem on top. This is back on the cluster leader:
root@arran:/home/myuser# microk8s kubectl run k8s-workload --image=k8s-workload --image-pull-policy='Never' --port=80
pod/k8s-workload created
I then describe the resulting pod, and get a familiar error:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 36s default-scheduler Successfully assigned default/k8s-workload to yamazaki
Normal Pulled 6s (x5 over 35s) kubelet Container image "k8s-workload" already present on machine
Warning Failed 6s (x5 over 35s) kubelet Error: failed to create containerd container: error unpacking image: unexpected media type text/html for sha256:5f76...a3aa: not found
Paradoxically this is reassuring - sending an image to each node individually is a right hassle, and thus I'd want the cluster level image import to work. I suspect it will once I have got to the bottom of the unpack problem.
Update 7
Right, I have spotted something. The image tarball on all nodes has an identical checksum, as one would expect. But when it is imported, one node reports the wrong type against the image. These are lightly reformatted for ease of comparison:
Node "Arran":
docker.io/library/k8s-workload:latest
application/vnd.docker.distribution.manifest.v2+json
sha256:725b...582b 103.5 MiB
linux/amd64
io.cri-containerd.image=managed
Node "Yamazaki":
docker.io/library/k8s-workload:latest
text/html
sha256:5f76...a3aa 218.4 KiB
-
io.cri-containerd.image=managed
Node "Nikka":
docker.io/library/k8s-workload:latest
application/vnd.docker.distribution.manifest.v2+json
sha256:725b...582b 103.5 MiB
linux/amd64
io.cri-containerd.image=managed
It looks like the workloads consistently were elected to run on Yamazaki, and that's the node with the corrupted image. Now to reimport the image and get it to match the others...
https://stackoverflow.com/questions/59980445/setting-image-pull-policy-using-kubectl
kubectl run
will take--image-pull-policy
as a cmdline argumentMy final update hints at the problem - one node had a corrupted image. By coincidence this was the node on which K8S wanted to run the workload. All I had to do to fix this was to reimport the image locally:
As per the question updates, I imported this image in two ways, both involving MicroK8S:
microk8s images
that does a global cluster importmicrok8s ctr images import
that does a per-node importI think I can say, with a high degree of certainty, that MicroK8S or containerd corrupted the image (i.e. it cannot be blamed on scp or erroneous file handling). For the per-node import I verified the local tarball with
sha256sum
, and it was the same as all the others. Unfortunately I expect that this is no longer an investigable bug, given that the exact history of commands is now so complex it can be considered as lost.That said, I will try zapping the image from all containerd instances, and using the cluster importer again. It is possible that the bug might be triggered again.