About a month ago I installed the latest version of Docker on Debian 12 on a VMware ESXi 8.0U2 virtual machine (latest vSphere version).
I tried to get VNC running but other priorities took over, so for now I'm just accessing the VM via VMware Remote Console, using Gnome.
It has the latest version of VMware Tools installed:
$ sudo apt-get install open-vm-tools open-vm-tools-desktop
open-vm-tools is already the newest version (2:12.2.0-1+deb12u2).
open-vm-tools-desktop is already the newest version (2:12.2.0-1+deb12u2).
There is only ever 1 single container running at the same time. I regularly do docker image prune -f
and docker system prune
Whenever the VM is running "on it's own" (without me connected to it via Remote Console on Gnome), it runs like a champ, and never ever goes down.
Whenever I am accessing the VM and doing stuff, it randomly hangs, and my "fix" is to reboot the VM. Trying to access the nginx server running on the container from another machine whilst it is hanging results in our reverse-proxy returning Error 503 Service Unavailable - No server is available to handle this request
.
The entire Debian Docker host VM totally locks up when it does this and doesn't let me click/type and the screen freezes. I am not doing anything overly demanding. I am usually just editing a file in nano
in the Terminal or something.
Sometimes I'm doing nothing whatsoever on the VM, I am simply just connected to it via VMware Remote Console - and it will go down. It does it at random intervals, but I usually get about an hour or 2 before it crashes, but this seems to change at random, sometimes I get a few hours - lately I'm getting less time before it crashes.
If I wait and don't reboot the VM then Remote Console does eventually come back to life after 5-10 minutes, the screen unfreezes, and I can give input again. The Docker daemon dies. I am not using Docker Deskop. It is installed but it is not set to startup automatically on boot/login.
Sometimes there are no containers running when it crashes. The container that I do use on there is a very simple webserver that only I access.
Sometimes (but not always) even issuing a Reboot to the VM from within VMware Remote Console while the VM is hanging won't work the 1st time (but it usually works the 2nd time):
Within vSphere when the VM is hanging I am [sometimes?] seeing the CPU sustaining 100% usage! I have reduced the CPU cores from 4 to 2 as it was causing very high CPU consumption on the VMware host server with a 6c 12t Xeon E5-1650V3. The VM has 8GB RAM.
I have no problems with any of my other (Linux, FreeBSD & Windows) non-Docker VMs.
Where do I begin trying to troubleshoot this please?
My feeling is that this is either VMware Tools or Gnome related. Whenever I'm running a Dockerfile build (but also whilst doing nothing sometimes), I have noticed that these processes often consume very high CPU:
qemu-system-x86_64
gnome-shell
docker-scout
com.docker.backend
Often the build will complete successfully, but those processes still remain using high CPU for a little while afterwards - sometimes causing a crash (when nothing is building any more).
I am not adverse to trying something on a brand new install of Debian, but I'm not convinced it will help if I change nothing else in my setup.
OP here.
Our setup is: physical box runs ESXi > has a VM for Debian/Docker > has a container for nginx.
I originally misunderstood @GeraldSchneider's comment - I thought he meant to uninstall Gnome Desktop (maybe he did too), but in fact uninstalling Docker Desktop makes total sense now, in hindsight.
Thanks goes to @AB's comments for pointing out that:
I abandoned the original install and reinstalled Debian from scratch, following these guides:
I also switched to KDE, now that Gnome-related things aren't needed by Docker Desktop. Only time will tell, but it seems to be a lot snappier in response times and no crashes (yet)!! 🥳
See also (but do NOT follow the guides):