The pods in my application scale with 1 pod per user (each user gets their own pod). I have the limits for the application container set up like so:
resources:
limits:
cpu: 250m
memory: 768Mi
requests:
cpu: 100m
memory: 512Mi
The nodes in my nodepool have 8GB of memory each. I started up a bunch of user instances to begin testing, and watched my resource metrics go up as I started each one:
CPU:
Memory:
At 15:40, I saw the event logs show this error (note: the first node is excluded using a taint):
0/2 nodes are available: 1 Insufficient memory, 1 node(s) didn't match node selector.
Why did this happen when the memory/cpu requests were still well below the total capacity (~50% for cpu, ~60% mem)?
Here is some relevant info from kubectl describe node
:
Non-terminated Pods: (12 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
ide theia-deployment--ac031811--football-6b6d54ddbb-txsd4 110m (5%) 350m (18%) 528Mi (9%) 832Mi (15%) 13m
ide theia-deployment--ac031811--footballteam-6fb7b68794-cv4c9 110m (5%) 350m (18%) 528Mi (9%) 832Mi (15%) 12m
ide theia-deployment--ac031811--how-to-play-football-669ddf7c8cjrzl 110m (5%) 350m (18%) 528Mi (9%) 832Mi (15%) 14m
ide theia-deployment--ac031811--packkide-7bff98d8b6-5twkf 110m (5%) 350m (18%) 528Mi (9%) 832Mi (15%) 9m54s
ide theia-deployment--ac032611--static-website-8569dd795d-ljsdr 110m (5%) 350m (18%) 528Mi (9%) 832Mi (15%) 16m
ide theia-deployment--aj090111--spiderboy-6867b46c7d-ntnsb 110m (5%) 350m (18%) 528Mi (9%) 832Mi (15%) 2m36s
ide theia-deployment--ar041311--tower-defenders-cf8c5dd58-tl4j9 110m (5%) 350m (18%) 528Mi (9%) 832Mi (15%) 14m
ide theia-deployment--np091707--my-friends-suck-at-coding-fd48ljs7z 110m (5%) 350m (18%) 528Mi (9%) 832Mi (15%) 4m14s
ide theia-deployment--np091707--topgaming-76b98dbd94-fgdz6 110m (5%) 350m (18%) 528Mi (9%) 832Mi (15%) 5m17s
kube-system csi-azurefile-node-nhbpg 30m (1%) 400m (21%) 60Mi (1%) 400Mi (7%) 12d
kube-system kube-proxy-knq65 100m (5%) 0 (0%) 0 (0%) 0 (0%) 12d
lens-metrics node-exporter-57zp4 10m (0%) 200m (10%) 24Mi (0%) 100Mi (1%) 6d20h
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 1130m (59%) 3750m (197%)
memory 4836Mi (90%) 7988Mi (148%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
attachable-volumes-azure-disk 0 0
According to kubernetes documentation:
More information about how pod limits are run can be found here.
Update:
It is possible to optimize the resource consumption by readjusting the memory limits and by add eviction policy that fits to your preferences. You can find more details in kubernetes documentation here and here.
Update 2:
In order to better understand why the scheduler refuses to place a Pod on a node I suggest enabling resource logs in Your AKS cluster. Take a look at this guide from AKS documentation. From the common logs look for
kube-scheduler
logs to see more details.I found out that when viewing available capacity, you need to pay attention to
Allocatable
, and notCapacity
. From Azure support: