In a k3s
cluster (with multiple control-plane nodes) and with Rancher Longhorn installed, I am observing the following warning for pods that have a pvc with the default longhorn
storage class (output of kubectl get events
, wrapped for better readability):
LAST SEEN TYPE REASON OBJECT
113s Warning FailedMount pod/grafana-6756f6587b-rv2xj
MESSAGE
MountVolume.SetUp failed for volume "pvc-7b6d12e3-132d-4af1-99c0-920ac5af0687" :
rpc error:
code = Aborted
desc = no Pending workload pods for volume pvc-7b6d12e3-132d-4af1-99c0-920ac5af0687
to be mounted: map[Running:[grafana-6756f6587b-rv2xj]]
What does this recurring warning mean and what action is required to fix it?
At least from what I can see in the pod's container, the volume has been mounted. I already tried restarting the pod and also recreating the pod, but the warning still persists. Also the longhorn dashboard does not show any problems.
See below for the pvc and deployment resource definitions:
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana-pvc
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
storageClassName: longhorn
resources:
requests:
storage: 1Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
namespace: monitoring
labels:
app: grafana
spec:
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
spec:
securityContext:
fsGroup: 472
supplementalGroups:
- 0
containers:
- name: grafana
image: grafana/grafana:latest
imagePullPolicy: IfNotPresent
ports:
- containerPort: 3000
name: http-grafana
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /robots.txt
port: 3000
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 2
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
tcpSocket:
port: 3000
timeoutSeconds: 1
resources:
requests:
cpu: 250m
memory: 750Mi
volumeMounts:
- mountPath: /var/lib/grafana
name: grafana-pv
volumes:
- name: grafana-pv
persistentVolumeClaim:
claimName: grafana-pvc
This is not expected and a bug in recent versions of Longhorn, caused by a Kubelet restart (in our case a rolling deployment of RKE2). Fix is on its way: https://github.com/longhorn/longhorn/issues/8072