Recover Pods From Out of Space Errors¶
Starting from release 1.5, Rok ships with Rok Scheduler, a custom extension of the Kubernetes scheduler that supports capacity aware scheduling.
The Rok Scheduler schedules Pods to nodes with sufficient free space to provision their new volumes. However, it does not yet support capacity aware scheduling for unpinned volumes. As a consequence, when the user drains a node, and Rok unpins the volumes, it may schedule the Pods of the drained node on nodes without enough storage available for their Rok volumes.
Scheduling a Pod on a node with insufficient space for its unpinned volumes will result in the Pod getting stuck at Init state, because there is not enough free space for Rok to recover the volumes.
This guide will walk you through recovering such Pods and migrating their volumes to new nodes.
What You’ll Need¶
- A configured management environment.
- An existing Rok deployment.
Check Your Environment¶
Check if a Pod using one or more Rok PVCs is stuck at Init state:
root@rok-tools:~# kubectl get pods -n personal-user NAME READY STATUS RESTARTS AGE test-notebook-0 0/2 Init:0/1 0 60sSpecify the name of the Pod:
root@rok-tools:~# export POD=<POD_NAME>Replace
<POD_NAME>
with the name of the Pod, for example:root@rok-tools:~# export POD=test-notebook-0Specify the Pod’s namespace:
root@rok-tools:~# export NAMESPACE=<POD_NAMESPACE>Replace
<POD_NAMESPACE>
with the namespace of the Pod, for example:root@rok-tools:~# export NAMESPACE=personal-userList all Rok PVCs used by the Pod, along with their access mode:
root@rok-tools:~# kubectl get pods -n ${NAMESPACE:?} ${POD:?} -ojson \ > | jq -r '.spec.volumes[]?.persistentVolumeClaim.claimName | values' \ > | xargs -r -n1 kubectl get pvc -n ${NAMESPACE:?} -ojson \ > | jq -r 'select(.spec.storageClassName=="rok") | .metadata.name,.spec.accessModes[0]' \ > | paste - - test-notebook-datavol-1-6grlb ReadWriteMany test-notebook-workspace-65qvh ReadWriteOnceVerify that the Pod is stuck because the node does not have enough free space to restore the Pod’s volumes. For each PVC listed in the output of the previous step:
Specify the name of the PVC:
root@rok-tools:~# export PVC=<PVC_NAME>Replace
<PVC_NAME>
with the name of the PVC, for example:root@rok-tools:~# export PVC=test-notebook-datavol-1-6grlbDescribe the PVC. If the node doesn’t have enough free space to restore the volume, you will see events like the following:
root@rok-tools:~# kubectl describe pvc -n ${NAMESPACE:?} ${PVC:?} Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning JobFailed 2m9s rok-csi Job Failed: Insufficient free space: 318901321728 bytes required, but only 258708865024 bytes available: Command `<ExtCommand [2mwNle-iCr8] `lvcreate -n roklvm-9e0e95a6-6d3d-4086-bf4f-b92746d2044c-data -L 318901321728B rokvg --wipesignatures n --config "devices { global_filter = [ 'r|^/dev/.*roklvm.*|' ] }"', status=FINISHED (ret: 5), PID=7360, shell=False>' failed. Error log: Volume group "rokvg" has insufficient free space (61681 extents): 76032 required.\n: Run `kubectl logs -n rok rok-csi-node-pmv4z -c csi-node' for more informationNote the name of the PVC that has failed and its access mode, as you are going to use them later.
Procedure¶
For each failed RWX (ReadWriteMany) PVC you noted earlier, do the following. If there are no affected RWX PVCs, skip this step.
Specify the name of the RWX PVC:
root@rok-tools:~# export RWX_PVC=<RWX_PVC_NAME>Replace
<RWX_PVC_NAME>
with the name of the RWX PVC, for example:root@rok-tools:~# export RWX_PVC=test-notebook-datavol-1-6grlbGet the name of the RWX PV:
root@rok-tools:~# export RWX_PV=$(kubectl get pvc \ > -n ${NAMESPACE:?} ${RWX_PVC:?} -ojson \ > | jq -r '.spec.volumeName')Get the name of the RWO PV backing the RWX volume:
root@rok-tools:~# export ACCESS_SERVER_PV=$(kubectl get pvc \ > -n rok vol-rok-access-${RWX_PV:?}-0 -ojson \ > | jq -r '.spec.volumeName')Initialize an empty array to store the cordoned nodes:
root@rok-tools:~# NODES=()Find the node where the volume lives:
root@rok-tools:~# export NODE=$(kubectl get pv ${ACCESS_SERVER_PV:?} -ojson \ > | jq -r '.spec.nodeAffinity.required.nodeSelectorTerms[]?.matchExpressions[].values[]')Append the node to the array of cordoned nodes:
root@rok-tools:~# NODES+=(${NODE:?})Cordon the node:
root@rok-tools:~# kubectl cordon ${NODE:?} node/ip-192-168-173-13.eu-central-1.compute.internal cordonedDelete the access-server Pod:
root@rok-tools:~# kubectl delete pods -n rok rok-access-${RWX_PV:?}-0 pod "rok-access-pvc-4d7b0f3d-b9da-49af-a089-a468e912d531-0" deletedWait until Rok moves the volume to a new node. Describe the PVC and wait until you see the following events:
root@rok-tools:~# watch "kubectl describe pvc -n ${NAMESPACE:?} ${RWX_PVC:?} | tail" Every 2.0s: kubectl describe pvc -n personal-user test-notebook-datavol-1-6grlb | tail rok-tools: Thu Jun 23 11:07:17 2022 ... Normal INFO 47s rok-csi Successfully pinned PVC `vol-rok-access-pvc-4d7b0f3d-b9da-49af-a089-a468e912d531-0' (PV `pvc-1bf297c4-e8b4-4ef6-890f-bbed465dd565') to node `ip-192-168-144-56.eu-central-1.compute.internal' ... Normal INFO 45s rok-csi Successfully recovered volume `pvc-1bf297c4-e8b4-4ef6-890f-bbed465dd565' ...Troubleshooting
The new node has insufficient free space.
If the new node Kubernetes picks for the volume doesn’t have enough free space either, you will see the following events in the output of
kubectl describe
:Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning JobFailed 2m9s rok-csi Job Failed: Insufficient free space: 318901321728 bytes required, but only 258708865024 bytes available: Command `<ExtCommand [2mwNle-iCr8] `lvcreate -n roklvm-9e0e95a6-6d3d-4086-bf4f-b92746d2044c-data -L 318901321728B rokvg --wipesignatures n --config "devices { global_filter = [ 'r|^/dev/.*roklvm.*|' ] }"', status=FINISHED (ret: 5), PID=7360, shell=False>' failed. Error log: Volume group "rokvg" has insufficient free space (61681 extents): 76032 required.\n: Run `kubectl logs -n rok rok-csi-node-pmv4z -c csi-node' for more informationIn this case, go back to step 1e and repeat the steps for the new node.
Note
You may need to repeat these steps more than once, until the new node has sufficient free space for the Pod’s volume.
No more nodes available in the cluster.
In case you have cordoned all nodes in the cluster, which means that none of the existing nodes has sufficient free space available for the Pod’s volume, you will see the following events when decribing the access-server Pod:
root@rok-tools:~# kubectl describe pod -n rok rok-access-${RWX_PV:?}-0 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 35s rok-scheduler 0/2 nodes are available: 2 node(s) were unschedulable. Warning FailedScheduling 35s rok-scheduler 0/2 nodes are available: 2 node(s) were unschedulable. Normal NotTriggerScaleUp 30s cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 1 max node group size reachedIn this case, you have to manually scale up your cluster and add one or more nodes with enough storage capacity to accommodate the Pod’s volume, or increase the maximum size of the appropriate nodegroup, so the Cluster Autoscaler can scale up the cluster automatically. For EKS check out the Scale Out EKS Cluster guide.
Uncordon the nodes:
root@rok-tools:~# kubectl uncordon ${NODES[@]:?} node/ip-192-168-173-13.eu-central-1.compute.internal uncordoned
For affected RWO (ReadWriteOnce) PVCs, do the following. If there are no affected RWO PVCs, skip this step.
List all RWO Rok PVCs used by the Pod:
root@rok-tools:~# kubectl get pods -n ${NAMESPACE:?} ${POD:?} -ojson \ > | jq -r '.spec.volumes[]?.persistentVolumeClaim.claimName | values' \ > | xargs -r -n1 kubectl get pvc -n ${NAMESPACE:?} -ojson \ > | jq -r 'select(.spec.storageClassName=="rok" and .spec.accessModes[0]=="ReadWriteOnce") | .metadata.name' test-notebook-workspace-65qvh ReadWriteOnceInitialize an empty array to store the cordoned nodes:
root@rok-tools:~# NODES=()Pick an affected RWO PVC and specify its name:
root@rok-tools:~# export RWO_PVC=<RWO_PVC_NAME>Since all RWO PVCs live on the same node, it doesn’t matter which one you pick. Replace
<RWO_PVC_NAME>
with the name of the RWO PVC, for example:root@rok-tools:~# export RWO_PVC=test-notebook-workspace-65qvhGet the name of the RWO PV:
root@rok-tools:~# export RWO_PV=$(kubectl get pvc \ > -n ${NAMESPACE:?} ${RWO_PVC:?} -ojson \ > | jq -r '.spec.volumeName')Find the node where the volume lives:
root@rok-tools:~# export NODE=$(kubectl get pv ${RWO_PV:?} -ojson \ > | jq -r '.spec.nodeAffinity.required.nodeSelectorTerms[]?.matchExpressions[].values[]')Append the node to the array of cordoned nodes:
root@rok-tools:~# NODES+=(${NODE:?})Cordon the node:
root@rok-tools:~# kubectl cordon ${NODE:?} node/ip-192-168-151-238.eu-central-1.compute.internal cordonedDelete all Pods using the same RWO PVCs as the affected Pod. This will allow Rok to move the volumes to a new node:
root@rok-tools:~# kubectl get pods -n ${NAMESPACE:?} ${POD:?} -ojson \ > | jq -r '.spec.volumes[]?.persistentVolumeClaim.claimName | values' \ > | xargs -r -n1 kubectl get pvc -n ${NAMESPACE:?} -ojson \ > | jq -r 'select(.spec.storageClassName=="rok") | select(.spec.accessModes[]?=="ReadWriteOnce") | .metadata.name' \ > | while read pvc; do kubectl get pods -n ${NAMESPACE:?} -ojson \ > | jq -r --arg PVC "${pvc:?}" \ > '.items[] | select(.spec.volumes[]?.persistentVolumeClaim.claimName==$PVC) | .metadata.name' \ > | xargs -r -n1 kubectl delete pods -n ${NAMESPACE:?}; done pod "pvc-viewer-test-notebook-workspace-65qvh" deleted pod "test-notebook-0" deletedAfter deleting the Pod, its name might change. This can happen, for example, if the Pod is managed by a Deployment. In such a case, specify the name of the Pod again. Replace
<POD_NAME>
with the new name of the Pod:root@rok-tools:~# export POD=<POD_NAME>For each RWO PVC listed in the output of step 2a:
Specify the name of the RWO PVC:
root@rok-tools:~# export RWO_PVC=<RWO_PVC_NAME>Replace
<RWO_PVC_NAME>
with the name of the RWO PVC, for example:root@rok-tools:~# export RWO_PVC=test-notebook-workspace-65qvhWait until Rok moves the volume to a new node. Describe the PVC and wait until you see the following events:
root@rok-tools:~# watch "kubectl describe pvc -n ${NAMESPACE:?} ${RWO_PVC:?} | tail" Every 2.0s: kubectl describe pvc -n personal-user test-notebook-workspace-65qvh | tail rok-tools: Thu Jun 23 11:07:17 2022 ... Normal INFO 47s rok-csi Successfully pinned PVC `test-notebook-workspace-65qvh' (PV `pvc-9bf4843e-1630-403f-a418-cd279abc9813') to node `ip-192-168-144-56.eu-central-1.compute.internal' ... Normal INFO 45s rok-csi Successfully recovered volume `pvc-9bf4843e-1630-403f-a418-cd279abc9813' ...Troubleshooting
The new node has insufficient free space.
The new node Kubernetes picks for the volume doesn’t have enough free space either. In this case you will see the following events in the output of
kubectl describe
:Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning JobFailed 2m9s rok-csi Job Failed: Insufficient free space: 318901321728 bytes required, but only 258708865024 bytes available: Command `<ExtCommand [2mwNle-iCr8] `lvcreate -n roklvm-9e0e95a6-6d3d-4086-bf4f-b92746d2044c-data -L 318901321728B rokvg --wipesignatures n --config "devices { global_filter = [ 'r|^/dev/.*roklvm.*|' ] }"', status=FINISHED (ret: 5), PID=7360, shell=False>' failed. Error log: Volume group "rokvg" has insufficient free space (61681 extents): 76032 required.\n: Run `kubectl logs -n rok rok-csi-node-pmv4z -c csi-node' for more informationIn this case, go back to step 2c, and repeat the steps for the new node.
Note
You may need to repeat these steps more than once, until the new node has sufficient free space for the Pod’s volume(s).
No more nodes available in the cluster.
In case you have cordoned all nodes in the cluster, which means that none of the existing nodes has sufficient free space available for the Pod’s volume(s), you will see the following events when decribing the Pod:
root@rok-tools:~# kubectl describe pod -n ${NAMESPACE:?} ${POD:?} Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 35s rok-scheduler 0/2 nodes are available: 2 node(s) were unschedulable. Warning FailedScheduling 35s rok-scheduler 0/2 nodes are available: 2 node(s) were unschedulable. Normal NotTriggerScaleUp 30s cluster-autoscaler pod didn't trigger scale-up (it wouldn't fit if a new node is added): 1 max node group size reachedIn this case, you have to manually scale up your cluster and add one or more nodes with enough storage capacity to accommodate the Pod’s volume(s), or increase the maximum size of the appropriate nodegroup, so the Cluster Autoscaler can scale up the cluster automatically. For EKS check out the Scale Out EKS Cluster guide.
Uncordon the nodes:
root@rok-tools:~# kubectl uncordon ${NODES[@]:?} node/ip-192-168-151-238.eu-central-1.compute.internal uncordoned
Verify¶
Wait until the Pod is up and running:
root@rok-tools:~# watch kubectl get pod -n ${NAMESPACE:?} ${POD:?} Every 2.0s: kubectl get pod -n personal-user test-notebook-0 rok-tools: Thu Jun 23 11:05:52 2022 NAME READY STATUS RESTARTS AGE test-notebook-0 2/2 Running 0 8m31s
Summary¶
You have successfully recovered the stuck Pod and migrated its volumes to nodes with enough space to accommodate them.
What’s Next¶
Check out the rest of the maintenance operations that you can perform on your cluster.