Configure Notebook Culling¶
The Notebook Controller periodically checks for the state of every Notebook
Server. You can inspect the Last activity
of each Notebook Server listed on
a column on the Notebooks UI. According to the execution state of the kernels,
the Notebook Controller updates the respective
notebooks.kubeflow.org/last-activity
annotation of each Notebook CR (Custom
Resource). When at least one kernel is busy performing computations, then the
Last activity
of the Notebook Server will be the current time. When none of
the kernels are performing computations, then the Last activity
will be the
time that the last kernel completed its computations.
The culling feature allows you to stop a Notebook Server based on its Last
activity
. The following table lists the parameters you can define in the
ConfigMap
. These parameters will enforce specific values to the environment
variables for the Notebook Controller. This way you can form a culling policy.
Parameter Name | Default Value | Description |
---|---|---|
ENABLE_CULLING |
"false" | If set to true then the Notebook Controller will scale to zero all
Notebooks with Last activity older than the CULL_IDLE_TIME . |
CULL_IDLE_TIME |
"1440" (minutes) | If a Notebook's age from the Last activity until the current
timestamp exceeds this value, then the Notebook will be scaled to zero
(culled). ENABLE_CULLING must be set to "true" for this setting to
take effect. |
IDLENESS_CHECK_PERIOD |
"1" (minutes) | How frequently the controller should poll each Notebook to update its
Last activity . |
If you have enabled culling and the Last activity
of a Notebook Server has
expired, then the Notebook Controller will cull this Notebook Server.
Note
This means that the Notebook Server will stop. The Notebook Server will not get deleted and the PVCs will not be affected. When starting their Notebooks again, the users can resume their work without any data loss.
This guide will walk you through setting a culling policy for your Notebook Controller.
Overview
What You'll Need¶
- A configured management environment.
- Your clone of the Arrikto GitOps repository.
- An existing Kubernetes Cluster.
- An existing Kubeflow deployment.
Procedure¶
Go to your GitOps repository, inside your
rok-tools
management environment:root@rok-tools:~# cd ~/ops/deployments
Uncomment the following line in the
kubeflow/manifests/apps/jupyter/notebook-controller/upstream/overlays/deploy/kustomization.yaml
:patchesStrategicMerge: [...] #- patches/culler-config-map.yaml # <-- Uncomment this line to enable culling.
Edit the Notebook Controller config at
kubeflow/manifests/apps/jupyter/notebook-controller/upstream/overlays/deploy/patches/culler-config-map.yaml
and set the values for the parameters:apiVersion: v1 kind: ConfigMap metadata: name: config data: ENABLE_CULLING: "true" # <-- Update this line with your desired value. CULL_IDLE_TIME: "30" # <-- Update this line with your desired value. IDLENESS_CHECK_PERIOD: "1" # <-- Update this line with your desired value.
Commit your changes:
root@rok-tools:~/ops/deployments# git commit -am "kubeflow: Configure Notebook Culling"
Reapply the kustomization:
root@rok-tools:~/ops/deployments# rok-deploy --apply kubeflow/manifests/apps/jupyter/notebook-controller/upstream/overlays/deploy
Verify¶
Get the Notebook Controller pod name:
root@rok-tools:~# export POD=$(kubectl get pod -n kubeflow \ > -l app=notebook-controller -o jsonpath="{.items[0].metadata.name}") \ > && echo ${POD} notebook-controller-deployment-54884d6854-gzs2r
Get the environment variables of the Notebook Controller container:
root@rok-tools:~# kubectl exec -n kubeflow ${POD} -c manager -- printenv | \ > grep -E "IDLE|CULL" ENABLE_CULLING=true CULL_IDLE_TIME=30 IDLENESS_CHECK_PERIOD=1
Note
Make sure the above environment variables have the values you defined previously.
What's Next¶
Check out the rest of the operations you can perform on your Kubeflow deployment.