Scale In EKS Cluster¶
EKF supports automatic scaling operations on the Kubernetes cluster using a modified version of the Cluster Autoscaler that supports Rok volumes.
This guide will walk you through manually scaling in your EKS cluster, by selecting and removing nodes one-by-one.
See also
- Scale In Kubernetes Cluster using
rok-k8s-drain
to forcefully scale your EKS cluster to a desired size.
Overview
What You’ll need¶
A configured management environment.
An existing EKS cluster.
One or more managed or self-managed node groups.
Optional
A working Cluster Autoscaler.
Procedure¶
List the Kubernetes nodes of your cluster:
root@rok-tools:~# kubectl get nodes NAME STATUS ROLES AGE VERSION ip-192-168-147-191.eu-central-1.compute.internal Ready <none> 18d v1.23.13-eks-ba74326 ip-192-168-168-207.eu-central-1.compute.internal Ready <none> 18d v1.23.13-eks-ba74326Specify the node you want to remove:
root@rok-tools:~# export NODE=<NODE>Replace
<NODE>
with the node name. For example:root@rok-tools:~# export NODE=ip-192-168-168-207.eu-central-1.compute.internalNote
Normally, the Cluster Autoscaler finds a scale-in candidate automatically. In order to find a good candidate manually, you have to
- Pick an underutilized node.
- Ensure that you don’t try to scale in past the ASG’s
minSize
. - Ensure that existing EBS volumes are reachable from other nodes in the cluster.
Start a drain operation for the selected node:
root@rok-tools:~# kubectl drain --ignore-daemonsets --delete-local-data ${NODE?} ... node/ip-192-168-168-207.eu-central-1.compute.internal evictedNote
This may take a while, since Rok is unpinning all volumes on this node, and as such,
rok-csi-guard
pods are expected to be evicted last.Warning
Do not delete
rok-csi-guard
pods manually, since this might cause data loss.Troubleshooting
The command does not complete.
Most likely the unpinning of a Rok PVC fails. Inspect the logs of Rok CSI controller to debug further.
Delete the master Rok Pod if it runs on the selected node:
Note
The Rok master Pod, which runs as part of the Rok DaemonSet, has the
cluster-autoscaler.kubernetes.io/safe-to-evict: false
annotation, which prevents the Cluster Autoscaler from removing the node. To allow the Cluster Autoscaler to remove the node, you need to delete this Pod so that another Rok Pod gets elected as master.Delete the master Rok Pod:
root@rok-tools:~# kubectl get pods -n rok -l app=rok,role=master \ > --field-selector spec.nodeName==${NODE?} -ojson \ > | jq -r '.items[] | .metadata.name' \ > | xargs -r kubectl delete pod -n rok pod "rok-c2fj5" deletedIf the previous command does not produce any output, it is normal and indicates that the Rok master Pod does not run on the selected node.
Wait until a Rok Pod has been elected as master:
root@rok-tools:~# watch kubectl get pods -n rok -l app=rok,role=master Every 2.0s: kubectl get pods -n rok -l app=rok,role=master rok-tools: Tue Mar 7 14:19:35 2023 NAME READY STATUS RESTARTS AGE rok-ghb9q 2/2 Running 0 1mEnsure the new Rok Pod that is created on the selected node has not been elected as master:
root@rok-tools:~# kubectl get pods -n rok -l app=rok,role=master \ > --field-selector spec.nodeName==${NODE?} -ojson \ > | jq -e '.items == []' >/dev/null && echo OK || echo FAIL OKTroubleshooting
The output of the command is
FAIL
.The new Rok Pod of the selected node has been elected as master again.
Repeat the previous step, i.e., delete the Rok Pod of the selected node again, to trigger a new election.
Once the drain operation completes, remove the node.
Fast Forward
Skip this step if you have a Cluster Autoscaler instance running in your cluster, since it will see the drained node, will consider it as unneeded, and after a period of time (based on
scale-down-unneeded-time
option) it will automatically terminate the EC2 instance and decrement the desired size of the Auto Scaling group.Find the EC2 instance of the drained node:
root@rok-tools:~# export INSTANCE=$(kubectl get nodes ${NODE?} \ > -o jsonpath={.spec.providerID} \ > | sed 's|aws:///.*/||')Terminate the instance and decrement the desired capacity of its Auto Scaling group:
root@rok-tools:~# aws autoscaling terminate-instance-in-auto-scaling-group \ > --instance-id ${INSTANCE?} \ > --should-decrement-desired-capacity
Verify¶
Ensure that the selected node has been removed from your Kubernetes cluster:
root@rok-tools:~# kubectl get nodes ${NODE?} Error from server (NotFound): nodes "ip-192-168-168-207.eu-central-1.compute.internal" not foundEnsure that the underlying instance has been deleted:
root@rok-tools:~# aws ec2 describe-instances --instance-id ${INSTANCE?} An error occurred (InvalidInstanceID.NotFound) when calling the DescribeInstances operation: The instance ID 'i-0f992f0b02d777901' does not exist
What’s Next¶
Check out the rest of the EKS maintenance operations that you can perform on your cluster.