Scale Up Rok etcd¶

This guide will walk you through increasing the size of your Rok etcd cluster by introducing one additional member. For adding more members, simply follow the guide again.

Important

To withstand a node failure, use a cluster with at least three members. We highly recommend you use an odd number of members and no more than seven members.

What You’ll Need ¶

A configured management environment.
Your clone of the Arrikto GitOps repository.
An existing Kubernetes cluster.
An existing Rok deployment.

Check Your Environment ¶

Retrieve the endpoints of all etcd cluster members:

root@rok-tools:~/ops/deployments# export ETCD_ENDPOINTS=$(kubectl \ > exec -ti -n rok sts/rok-etcd -- etcdctl member list -w json \ > | jq -r '.members[].clientURLs[]' | paste -sd, -)
Ensure that the etcd cluster is currently healthy. Inspect the etcd endpoints and verify that the HEALTH field is true for all endpoints:

root@rok-tools:~/# kubectl exec -ti -n rok sts/rok-etcd -c etcd -- \ > etcdctl --endpoints ${ETCD_ENDPOINTS?} endpoint health -w table +--------------------------------------+--------+------------+-------+ | ENDPOINT | HEALTH | TOOK | ERROR | +--------------------------------------+--------+------------+-------+ | rok-etcd-0.rok-etcd-cluster.rok:2379 | true | 9.302141ms | | | rok-etcd-1.rok-etcd-cluster.rok:2379 | true | 9.302141ms | | +--------------------------------------+--------+------------+-------+

Procedure ¶

Go to your GitOps repository, inside your rok-tools management environment:

root@rok-tools:~# cd ~/ops/deployments
Retrieve the current size of the etcd cluster:

root@rok-tools:~/ops/deployments# export ETCD_CLUSTER_SIZE=$(kubectl get sts \ > -n rok rok-etcd -o jsonpath="{.spec.replicas}") \ > && echo ${ETCD_CLUSTER_SIZE?} 2
Set the name of the new etcd member:

root@rok-tools:~/ops/deployments# export \ > NAME=rok-etcd-${ETCD_CLUSTER_SIZE?}.rok-etcd-cluster.rok
Set the URL of the new etcd member:

root@rok-tools:~/ops/deployments# export \ > PEER_URL=http://rok-etcd-${ETCD_CLUSTER_SIZE?}.rok-etcd-cluster.rok:2380
Add a new member to the etcd cluster:

root@rok-tools:~/ops/deployments# kubectl exec -ti -n rok sts/rok-etcd -c etcd -- \ > etcdctl member add --learner ${NAME?} --peer-urls ${PEER_URL?} Member 49a1544e41ae84e4 added to cluster 844c2991de84c0b ETCD_NAME="rok-etcd-2.rok-etcd-cluster.rok" ETCD_INITIAL_CLUSTER="rok-etcd-2.rok-etcd-cluster.rok=http://rok-etcd-2.rok-etcd-cluster.rok:2380,rok-etcd-1.rok-etcd-cluster.rok=http://rok-etcd-1.rok-etcd-cluster.rok:2380" ETCD_INITIAL_ADVERTISE_PEER_URLS="http://rok-etcd-2.rok-etcd-cluster.rok:2380" ETCD_INITIAL_CLUSTER_STATE="existing"
Troubleshooting
Error: etcdserver: unhealthy cluster
There are cases, mostly due to a network hiccup, where an existing member rejoins the cluster, for example after a Pod restart, and other members end up considering it inactive. In such a case, member add fails with:

{"level":"warn","ts":"2022-09-23T09:52:00.805Z","logger":"etcd-client","caller":"v3/retry_interceptor.go:62","msg":"retrying of unary invoker failed","target":"etcd-endpoints://0xc000458a80/127.0.0.1:2379","attempt":0,"error":"rpc error: code = Unavailable desc = etcdserver: unhealthy cluster"} Error: etcdserver: unhealthy cluster

At the same time, the etcd cluster remains operational and clients are able to access it and make read/write requests.

To recover, follow the steps below:

Retrieve the endpoints of all etcd cluster members:

root@rok-tools:~/ops/deployments# export ETCD_ENDPOINTS=$(kubectl \ > exec -ti -n rok sts/rok-etcd -- etcdctl member list -w json \ > | jq -r '.members[].clientURLs[]' | paste -sd, -)

Ensure that the etcd cluster is currently healthy. Inspect the etcd endpoints and verify that the HEALTH field is true for all endpoints:

root@rok-tools:~/# kubectl exec -ti -n rok sts/rok-etcd -c etcd -- \ > etcdctl --endpoints ${ETCD_ENDPOINTS?} endpoint health -w table +--------------------------------------+--------+------------+-------+ | ENDPOINT | HEALTH | TOOK | ERROR | +--------------------------------------+--------+------------+-------+ | rok-etcd-0.rok-etcd-cluster.rok:2379 | true | 9.302141ms | | | rok-etcd-1.rok-etcd-cluster.rok:2379 | true | 9.325642ms | | +--------------------------------------+--------+------------+-------+

Restart the etcd Pods:

root@rok-tools:~/# kubectl delete pods -n rok -l app=etcd

Rerun the command to add a member to the etcd cluster.
Increase the etcd cluster size:

root@rok-tools:~/ops/deployments# let ETCD_CLUSTER_SIZE++
Render the patch for the cluster size:

root@rok-tools:~/ops/deployments# j2 \ > rok/rok-external-services/etcd/overlays/deploy/patches/cluster-size.yaml.j2 \ > -o rok/rok-external-services/etcd/overlays/deploy/patches/cluster-size.yaml
Set the cluster state:

root@rok-tools:~/ops/deployments# export ETCD_CLUSTER_STATE=existing
Render the patch for the cluster state:

root@rok-tools:~/ops/deployments# j2 \ > rok/rok-external-services/etcd/overlays/deploy/patches/cluster-state.yaml.j2 \ > -o rok/rok-external-services/etcd/overlays/deploy/patches/cluster-state.yaml
Edit rok/rok-external-services/etcd/overlays/deploy/kustomization.yaml and ensure that both cluster-size and cluster-state patches are enabled:

patches: - path: patches/cluster-size.yaml target: kind: StatefulSet name: etcd - path: patches/cluster-state.yaml
Commit your changes:

root@rok-tools:~/ops/deployments# git commit -am "Scale Rok etcd to ${ETCD_CLUSTER_SIZE?} members"
Apply the kustomization:

root@rok-tools:~/ops/deployments# rok-deploy --apply rok/rok-external-services/etcd/overlays/deploy
Wait for a few minutes to give the new member a chance to join the cluster and retrieve its member ID. Ensure the following command outputs SUCCESS:

root@rok-tools:~/ops/deployments# export ID=$(kubectl \ > exec -ti -n rok sts/rok-etcd -c etcd -- \ > etcdctl member list -w json --hex \ > | jq -r '.members[] | select(.name == "'${NAME?}'") | .ID') \ > && [[ -z "${ID?}" ]] && echo ERROR || echo SUCCESS SUCCESS

Troubleshooting

The command output is ERROR

If the new member has not yet managed to join the cluster, then its name will be empty and the above command will output ERROR. In this case, wait for a few minutes to allow the new member to start and join the cluster, and try again.
Promote the new member to a voting member:

root@rok-tools:~/ops/deployments# kubectl exec -ti -n rok sts/rok-etcd -c etcd -- \ > etcdctl member promote ${ID?} Member 49a1544e41ae84e4 promoted in cluster 4c194b295a903d33

Troubleshooting

The member is not in sync with the leader

If the above command fails with the following error:

Error: etcdserver: can only promote a learner member which is in sync with leader

it means that you try to promote the new member before it has managed to catch up with the cluster. In this case, wait for a few more minutes and try again.

Verify ¶

Ensure that all Rok etcd Pods are ready. Verify that field READY is 2/2 and field STATUS is Running for all Pods:

root@rok-tools:~/ops/deployments# kubectl get pods -n rok -l app=etcd NAME READY STATUS RESTARTS AGE rok-etcd-0 2/2 Running 0 2d22h rok-etcd-1 2/2 Running 0 2d22h rok-etcd-2 2/2 Running 0 2d22h
Retrieve the endpoints of all etcd cluster members:

root@rok-tools:~/ops/deployments# export ETCD_ENDPOINTS=$(kubectl \ > exec -ti -n rok sts/rok-etcd -- etcdctl member list -w json \ > | jq -r '.members[].clientURLs[]' | paste -sd, -)
Ensure that the etcd cluster is currently healthy. Inspect the etcd endpoints and verify that the HEALTH field is true for all endpoints:

root@rok-tools:~/# kubectl exec -ti -n rok sts/rok-etcd -c etcd -- \ > etcdctl --endpoints ${ETCD_ENDPOINTS?} endpoint health -w table +--------------------------------------+--------+------------+-------+ | ENDPOINT | HEALTH | TOOK | ERROR | +--------------------------------------+--------+------------+-------+ | rok-etcd-0.rok-etcd-cluster.rok:2379 | true | 9.302141ms | | | rok-etcd-1.rok-etcd-cluster.rok:2379 | true | 9.325642ms | | | rok-etcd-2.rok-etcd-cluster.rok:2379 | true | 9.325642ms | | +--------------------------------------+--------+------------+-------+
Ensure that the Rok etcd cluster has the expected member count. Verify that the output of the following command is for example 2:

root@rok-tools:~# kubectl exec -ti -n rok sts/rok-etcd -c etcd -- \ > etcdctl member list | wc -l 2
List the members of the etcd cluster. Verify that field STATUS is started and field IS LEARNER is false for all members:

root@rok-tools:~/ops/deployments# kubectl exec -ti -n rok sts/rok-etcd -c etcd -- \ > etcdctl member list -w table +------------------+---------+---------------------------------+---------------------------------------------+---------------------------------------------+------------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER | +------------------+---------+---------------------------------+---------------------------------------------+---------------------------------------------+------------+ | b2ff88bb2eae13b7 | started | rok-etcd-0.rok-etcd-cluster.rok | http://rok-etcd-0.rok-etcd-cluster.rok:2380 | http://rok-etcd-0.rok-etcd-cluster.rok:2379 | false | | f823900dacf44825 | started | rok-etcd-1.rok-etcd-cluster.rok | http://rok-etcd-1.rok-etcd-cluster.rok:2380 | http://rok-etcd-1.rok-etcd-cluster.rok:2379 | false | | 49a1544e41ae84e4 | started | rok-etcd-2.rok-etcd-cluster.rok | http://rok-etcd-2.rok-etcd-cluster.rok:2380 | http://rok-etcd-2.rok-etcd-cluster.rok:2379 | false | +------------------+---------+---------------------------------+---------------------------------------------+---------------------------------------------+------------+

Summary ¶

You have successfully added a member to the Rok etcd cluster.

What’s Next ¶

Check out the rest of the maintenance operations you can perform on your Rok etcd cluster.

Manage Rok etcd

Previous Next

Scale Up Rok etcd¶

What You’ll Need¶

Check Your Environment¶

Procedure¶

Verify¶

Summary¶

What’s Next¶