Create GKE Cluster¶
This section will guide you through creating a GKE cluster using the Google Cloud SDK. After completing this guide you will have a GKE cluster with:
- Kubernetes 1.19.
- Worker nodes with local NVMe SSDs.
See also
Overview
What You'll Need¶
- A configured management environment.
- A configured cloud environment.
Procedure¶
To create the GKE cluster follow the steps below:
Switch to your management environment and specify the cluster name:
root@rok-tools:~# export GKE_CLUSTER=arrikto-cluster
Specify the Kubernetes version:
root@rok-tools:~# export CLUSTER_VERSION=1.19.16-gke.1500
Specify the name of the default node pool:
root@rok-tools:~# export NODE_POOL_NAME=default-workers
Specify the machine type:
root@rok-tools:~# export MACHINE_TYPE=n1-standard-8
Specify the number of nodes to create:
root@rok-tools:~# export NUM_NODES=3
Specify the number of local NVMe SSDs to add:
root@rok-tools:~# export NUM_SSD=3
Note
Rok will automatically find and use all local SSDs, which are expected to be unformatted. Each local NVMe SSD is 375 GB in size. You can attach a maximum of 24 local SSD partitions for 9 TB per instance.
See also
Create the cluster:
root@rok-tools:~# gcloud alpha container clusters create ${GKE_CLUSTER?} \ > --account ${CLUSTER_ADMIN_ACCOUNT?} \ > --cluster-version ${CLUSTER_VERSION?} \ > --release-channel None \ > --no-enable-basic-auth \ > --node-pool-name ${NODE_POOL_NAME?} \ > --machine-type ${MACHINE_TYPE?} \ > --image-type UBUNTU \ > --disk-type pd-ssd \ > --disk-size 200 \ > --local-ssd-volumes count=${NUM_SSD?},type=nvme,format=block \ > --metadata disable-legacy-endpoints=True \ > --workload-pool=${PROJECT_ID?}.svc.id.goog \ > --scopes gke-default \ > --num-nodes ${NUM_NODES?} \ > --logging=SYSTEM,WORKLOAD \ > --monitoring=SYSTEM \ > --enable-ip-alias \ > --default-max-pods-per-node 110 \ > --no-enable-master-authorized-networks \ > --no-enable-intra-node-visibility \ > --addons HorizontalPodAutoscaling,HttpLoadBalancing,GcePersistentDiskCsiDriver \ > --max-surge-upgrade 1 \ > --max-unavailable-upgrade 0 \ > --no-enable-autoupgrade \ > --no-enable-autorepair \ > --enable-shielded-nodes
Troubleshooting
The command fails with 'Insufficient regional quota to satisfy request: resource "SSD_TOTAL_GB"'
Ensure that your region has enough quotas for local SSD. To inspect the usage / limits run:
root@rok-tools:~# gcloud compute regions describe ${REGION?} --format json | \ > jq -r '.quotas[] | select(.metric=="SSD_TOTAL_GB") | "\(.usage)/\(.limit)"'
Either delete some resources or choose a different region/zone.
The command fails with 'Master version "1.19.16-gke.1500" is unsupported'
If the above command fails with an error message similar to the following:
ERROR: (gcloud.alpha.container.clusters.create) ResponseError: code=400, message=Master version "1.19.16-gke.1500" is unsupported.
it means that the Kubernetes version
1.19.16-gke.1500
is not supported in your selected zone.To proceed, do the following:
Check the Kubernetes versions that are available in your selected zone:
root@rok-tools:~# gcloud container get-server-config \ > --format="yaml(validMasterVersions)"
Select one of the available 1.19 Kubernetes versions:
root@rok-tools:~# export CLUSTER_VERSION=<CLUSTER_VERSION>
Replace
<CLUSTER_VERSION>
with your selected 1.19 Kubernetes version.Go back to step 7 and create the cluster.
Note
This will create a zonal cluster with 3 nodes in the cluster's primary zone. It will use the default network and subnet in the zone.
Verify¶
Switch back to your management environment and
Ensure that the GKE cluster exists and its status is RUNNING:
root@rok-tools:~# gcloud container clusters describe ${GKE_CLUSTER?} ... name: arrikto-cluster ... status: RUNNING
Get the list of the node pools:
root@rok-tools:~# gcloud container node-pools list --cluster=${GKE_CLUSTER?} NAME MACHINE_TYPE DISK_SIZE_GB NODE_VERSION default-workers n1-standard-8 200 1.19.16-gke.1500
Ensure the default node pool exists and its status is RUNNING:
root@rok-tools:~# gcloud container node-pools describe ${NODE_POOL_NAME?} \ > --cluster=${GKE_CLUSTER?} ... name: default-workers ... status: RUNNING
Verify that all instances of your node pool have the necessary storage attached:
Find the Instance group that corresponds to the workers node pool:
root@rok-tools:~# export INSTANCE_GROUP=$(gcloud container node-pools describe ${NODE_POOL_NAME?} \ > --cluster=${GKE_CLUSTER?} \ > --format="value(instanceGroupUrls)")
Find the Template of the Instance group:
root@rok-tools:~# export TEMPLATE=$(gcloud compute instance-groups managed describe ${INSTANCE_GROUP?} \ > --format="value(instanceTemplate)")
Inspect the Template and ensure that
kube-env
metadata key has the expectedNODE_LOCAL_SSDS_EXT
:root@rok-tools:~# gcloud compute instance-templates describe ${TEMPLATE?} --format json | \ > jq -r '.properties.metadata.items[] | select(.key == "kube-env") | .value' | \ > grep NODE_LOCAL_SSDS NODE_LOCAL_SSDS_EXT: 3,nvme,block
Inspect the Template and ensure that it has NVMe local SSDs attached. The command below will list all disks of type
SCRATCH
and show their interface. It should beNVME
:root@rok-tools:~# gcloud compute instance-templates describe ${TEMPLATE?} --format json | \ > jq -r '.properties.disks[] | select(.type == "SCRATCH") | .index, .deviceName, .interface' | paste - - - 1 local-ssd-0 NVME 2 local-ssd-1 NVME 3 local-ssd-2 NVME
Ensure that all instances inside the instance group run with the desired template:
root@rok-tools:~# gcloud compute instance-groups managed describe ${INSTANCE_GROUP?} \ > --format="value(status.versionTarget.isReached)" True