Upgrade Rok¶
This guide will walk you through upgrading Rok.
We assume that you are already running a 1.4 Rok cluster on Kubernetes and that you also have access to the 1.5.3 kustomization tree you are upgrading to. Since a Rok cluster on Kubernetes consists of multiple components, you will upgrade each one of them separately.
During the upgrade, Rok Operator will remove all members from the cluster and add a dedicated one to perform the upgrade. It will scale the cluster down to zero, and a Kubernetes Job will run to upgrade the cluster config on etcd and run any needed migrations. Finally, it will scale the cluster back to its initial size.
Overview
What You’ll Need¶
- An upgraded management environment.
- An existing Kubernetes cluster.
- An existing Rok 1.4 deployment.
- Your local clone of the Arrikto GitOps repository.
- Arrikto manifests for EKF version 1.5.3.
Procedure¶
Note
To increase observability and gain insight into the status of the cluster upgrade, run the following commands in a separate window:
Get the live cluster status:
root@rok-tools:~# watch kubectl get rokcluster -n rokGet the live cluster events:
root@rok-tools:~# watch 'kubectl describe rokcluster -n rok rok | tail -n 20'
Go to your GitOps repository, inside your
rok-tools
management environment:root@rok-tools:~# cd ~/ops/deploymentsSet the namespace in which you deployed Rok. Choose one of the following options, based on your cloud provider.
Restore the required context from previous sections.
root@rok-tools:~/ops/deployments# source deploy/env.cloudidentityroot@rok-tools:~/ops/deployments# export ROK_CLUSTER_NAMESPACEroot@rok-tools:~/ops/deployments# export ROK_CLUSTER_NAMESPACE=rokroot@rok-tools:~/ops/deployments# export ROK_CLUSTER_NAMESPACE=rokUpgrade Rok Disk Manager:
Apply the latest Rok Disk Manager manifests:
root@rok-tools:~/ops/deployments# rok-deploy --apply rok/rok-disk-manager/overlays/deployEnsure Rok Disk Manager has become ready. Verify field READY is 1/1 and field STATUS is Running for all Pods:
root@rok-tools:~# watch kubectl get pods -n rok-system -l name=rok-disk-manager Every 2.0s: kubectl get pods -n rok-system -l name=rok-disk-manager rok-tools: Thu Nov 25 09:36:49 2021 NAME READY STATUS RESTARTS AGE rok-disk-manager-4kk5m 1/1 Running 0 1m rok-disk-manager-prqzl 1/1 Running 0 1m
Upgrade Rok kmod:
Apply the latest Rok kmod manifests:
root@rok-tools:~/ops/deployments# rok-deploy --apply rok/rok-kmod/overlays/deployEnsure Rok kmod has become ready. Verify field READY is 1/1 and field STATUS is Running for all Pods:
root@rok-tools:~/ops/deployments# watch kubectl get pods -n rok-system -l app=rok-kmod Every 2.0s: kubectl get pods -n rok-system -l app=rok-kmod rok-tools: Thu Nov 25 09:39:58 2021 NAME READY STATUS RESTARTS AGE rok-kmod-j9bpw 1/1 Running 0 1m rok-kmod-pqbxb 1/1 Running 0 1mTroubleshooting
The
STATUS
field of some or all Rok kmod Pods isCrashLoopBackOff
orERROR
Inspect the logs of the Pods in question:
root@rok-tools:~/ops/deployments# kubectl logs -n rok-system <POD_NAME>Replace
<POD_NAME>
with the name of the failing Pod.If you see the following error in the logs:
modprobe: FATAL: Module dm_era is in use.it means that Rok kmod failed to install the new version of the kernel module, because the old version is in use by some device.
This is expected. Please go on with this guide and then follow the Upgrade Kernel Modules guide to finish upgrading the kernel modules.
Upgrade Rok Operator:
Apply the latest Rok Operator manifests:
root@rok-tools:~/ops/deployments# rok-deploy --apply rok/rok-operator/overlays/deployNote
The above command also updates the
RokCluster
CRD.Ensure Rok Operator has become ready. Verify field READY is 1/1 and field STATUS is Running:
root@rok-tools:~/ops/deployments# watch kubectl get pods -n rok-system -l app=rok-operator Every 2.0s: kubectl get pods -n rok-system -l app=rok-operator rok-tools: Thu Nov 25 09:47:35 2021 NAME READY STATUS RESTARTS AGE rok-operator-0 1/1 Running 0 1m
Upgrade the Rok cluster:
Apply the latest Rok cluster manifests:
root@rok-tools:~/ops/deployments# rok-deploy --apply rok/rok-cluster/overlays/deployEnsure Rok cluster has been upgraded:
Check the status of the cluster upgrade Job:
root@rok-tools:~/ops/deployments# kubectl get job -n ${ROK_CLUSTER_NAMESPACE?} rok-upgrade-release-1.5-l0-release-1.5.3 NAME COMPLETIONS DURATION AGE rok-upgrade-release-1.5-l0-release-1.5 1/1 45s 3mEnsure that Rok is up and running after the upgrade Job finishes. Verify field HEALTH is OK and field PHASE is Running:
root@rok-tools:~/ops/deployments# kubectl get rokcluster -n ${ROK_CLUSTER_NAMESPACE?} rok NAME VERSION HEALTH TOTAL MEMBERS READY MEMBERS PHASE AGE rok release-1.5-l0-release-1.5 OK 2 2 Running 1m
Upgrade Rok etcd:
Apply the latest Rok etcd manifests:
root@rok-tools:~/ops/deployments# rok-deploy --apply rok/rok-external-services/etcd/overlays/deploy \ > --force --force-kinds StatefulSetNote
You need to re-create the StatefulSet because Rok
1.5
changes the port names of the container, which are immutable fields. The underlying PVC will not be deleted.Ensure that Rok etcd has become ready. Verify field READY is 1/1 and field STATUS is Running:
root@rok-tools:~/ops/deployments# watch kubectl get pods -n rok -l app=etcd Every 2.0s: kubectl get pods -n rok -l app=etcd rok-tools: Thu Nov 25 09:47:35 2021 NAME READY STATUS RESTARTS AGE rok-etcd-0 1/1 Running 0 1m
Upgrade Rok Monitoring stack:
Apply the latest Rok Monitoring manifests:
root@rok-tools:~/ops/deployments# rok-deploy --apply rok/monitoring/overlays/deploy \ > --force --force-kinds Deployment DaemonSet RoleBindingNote
You need to re-create resources because Rok
1.5
renames Kube State Metrics cluster-scoped RBAC resources, and the refs to them are immutable fields.Remove a stale RBAC resource that is left behind by the previous version of Rok:
root@rok-tools:~/ops/deployments# kubectl delete role -n monitoring kube-state-metrics \ > --ignore-not-found role.rbac.authorization.k8s.io "kube-state-metrics" deleted
Upgrade the rest of the Rok installation components by applying the latest Rok manifests:
root@rok-tools:~/ops/deployments# rok-deploy --apply install/rok
Verify¶
Go to your GitOps repository, inside your
rok-tools
management environment:root@rok-tools:~# cd ~/ops/deploymentsSet the namespace in which you deployed Rok. Choose one of the following options, based on your cloud provider.
Restore the required context from previous sections.
root@rok-tools:~/ops/deployments# source deploy/env.cloudidentityroot@rok-tools:~/ops/deployments# export ROK_CLUSTER_NAMESPACEroot@rok-tools:~/ops/deployments# export ROK_CLUSTER_NAMESPACE=rokroot@rok-tools:~/ops/deployments# export ROK_CLUSTER_NAMESPACE=rokEnsure all pods in the
rok-system
namespace are up-and-running. Verify field READY is 1/1 and field STATUS is Running for all Pods:root@rok-tools:~/ops/deployments# kubectl get pods -n rok-system NAME READY STATUS RESTARTS AGE rok-disk-manager-4kk5m 1/1 Running 0 1m rok-disk-manager-prqzl 1/1 Running 0 1m rok-kmod-j9bpw 1/1 Running 0 1m rok-kmod-pqbxb 1/1 Running 0 1m rok-operator-0 1/1 Running 0 1mEnsure all pods in the Rok namespace are up-and-running. Verify field READY is n/n and field STATUS is Running for all Pods:
root@rok-tools:~/ops/deployments# kubectl get pods -n ${ROK_CLUSTER_NAMESPACE?} NAME READY STATUS RESTARTS AGE rok-csi-controller-0 4/4 Running 0 1m rok-csi-guard-ip-172-31-34-181.eu-central-1.compute.interntthrs 1/1 Running 0 1m rok-csi-guard-ip-172-31-47-250.eu-central-1.compute.internnsgb5 1/1 Running 0 1m rok-csi-node-27422 2/2 Running 0 1m rok-csi-node-qs7pm 2/2 Running 0 1m rok-etcd-0 1/1 Running 0 1m rok-p7kqh 1/1 Running 0 1m rok-postgresql-0 1/1 Running 0 1m rok-redis-0 2/2 Running 0 1m rok-vd5lp 1/1 Running 0 1mEnsure that Dex is up-and-running. Verify that field READY is 1/1:
root@rok-tools:~/ops/deployments# kubectl get deploy -n auth NAME READY UP-TO-DATE AVAILABLE AGE dex 1/1 1 1 1mEnsure that AuthService is up-and-running. Verify that field READY is 1/1:
root@rok-tools:~/ops/deployments# kubectl get sts -n istio-system authservice NAME READY AVAILABLE AGE authservice 1/1 1 1mEnsure that Reception is up-and-running. Verify that field READY is 1/1:
root@rok-tools:~/ops/deployments# kubectl get deploy -n kubeflow kubeflow-reception NAME READY UP-TO-DATE AVAILABLE AGE kubeflow-reception 1/1 1 1 1mEnsure that the Profiles Controller is up-and-running. Verify that field READY is 1/1:
root@rok-tools:~/ops/deployments# kubectl get deploy -n kubeflow profiles-deployment NAME READY UP-TO-DATE AVAILABLE AGE profiles-deployment 1/1 1 1 1mVerify that the cert-manager pods are up-and-running. Check the pod status and verify field STATUS is Running and field READY is 1/1 for all pods:
root@rok-tools:~/ops/deployments# kubectl -n cert-manager get pods NAME READY STATUS RESTARTS AGE cert-manager-6d86476c77-bl9rs 1/1 Running 0 1m cert-manager-cainjector-5b9cd446fd-n5jpd 1/1 Running 0 1m cert-manager-webhook-64d967c45-cdfwh 1/1 Running 0 1mVerify that the Rok Monitoring Stack is up and running:
root@rok-tools:~/ops/deployments# kubectl get pods -n monitoring NAME READY STATUS RESTARTS AGE grafana-6d7d7b78f7-6flm7 2/2 Running 0 1m kube-state-metrics-765c7c7f95-chkzn 4/4 Running 0 1m node-exporter-zng26 2/2 Running 0 1m prometheus-k8s-0 2/2 Running 1 1m prometheus-operator-5f75d76f9f-fmpp5 3/3 Running 0 1m