Create GKE Cluster¶
This section will guide you through creating a GKE cluster using the Google Cloud SDK. After completing this guide you will have a GKE cluster with:
- Kubernetes 1.19, 1.20, or 1.21.
- Worker nodes with local NVMe SSDs.
See also
Overview
What You’ll Need¶
- A configured management environment.
- A configured cloud environment.
Procedure¶
To create the GKE cluster follow the steps below:
Switch to your management environment and specify the cluster name:
root@rok-tools:~# export GKE_CLUSTER=arrikto-clusterSpecify the Kubernetes version. Choose one of the following options, among the supported Kubernetes versions:
root@rok-tools:~# export CLUSTER_VERSION=1.21.5-gke.1805root@rok-tools:~# export CLUSTER_VERSION=1.20.15-gke.300root@rok-tools:~# export CLUSTER_VERSION=1.19.16-gke.6800Specify the name of the default node pool:
root@rok-tools:~# export NODE_POOL_NAME=default-workersSpecify the machine type:
root@rok-tools:~# export MACHINE_TYPE=n1-standard-8Specify the number of nodes to create:
root@rok-tools:~# export NUM_NODES=3Specify the number of local NVMe SSDs to add:
root@rok-tools:~# export NUM_SSD=3Note
Rok will automatically find and use all local SSDs, which are expected to be unformatted. Each local NVMe SSD is 375 GB in size. You can attach a maximum of 24 local SSD partitions for 9 TB per instance.
See also
Create the cluster:
root@rok-tools:~# gcloud alpha container clusters create ${GKE_CLUSTER?} \ > --account ${CLUSTER_ADMIN_ACCOUNT?} \ > --cluster-version ${CLUSTER_VERSION?} \ > --release-channel stable \ > --no-enable-basic-auth \ > --node-pool-name ${NODE_POOL_NAME?} \ > --machine-type ${MACHINE_TYPE?} \ > --image-type UBUNTU \ > --disk-type pd-ssd \ > --disk-size 200 \ > --local-ssd-volumes count=${NUM_SSD?},type=nvme,format=block \ > --metadata disable-legacy-endpoints=True \ > --workload-pool=${PROJECT_ID?}.svc.id.goog \ > --scopes gke-default \ > --num-nodes ${NUM_NODES?} \ > --logging=SYSTEM,WORKLOAD \ > --monitoring=SYSTEM \ > --enable-ip-alias \ > --default-max-pods-per-node 110 \ > --no-enable-master-authorized-networks \ > --no-enable-intra-node-visibility \ > --addons HorizontalPodAutoscaling,HttpLoadBalancing,GcePersistentDiskCsiDriver \ > --max-surge-upgrade 1 \ > --max-unavailable-upgrade 0 \ > --enable-autoupgrade \ > --enable-autorepair \ > --enable-shielded-nodesTroubleshooting
The command fails with ‘Insufficient regional quota to satisfy request: resource “SSD_TOTAL_GB”’
Ensure that your region has enough quotas for local SSD. To inspect the usage / limits run:
root@rok-tools:~# gcloud compute regions describe ${REGION?} --format json | \ > jq -r '.quotas[] | select(.metric=="SSD_TOTAL_GB") | "\(.usage)/\(.limit)"'Either delete some resources or choose a different region/zone.
The command fails with ‘Master version is unsupported’
If the above command fails with an error message similar to the following:
ERROR: (gcloud.alpha.container.clusters.create) ResponseError: code=400, message=Master version "1.21.5-gke.1805" is unsupported.it means that the Kubernetes version you have specified is not supported in your selected zone.
To proceed, do the following:
Check the Kubernetes versions that are available in your selected zone:
root@rok-tools:~# gcloud container get-server-config \ > --flatten="channels" \ > --filter="channels.channel=STABLE" \ > --format="yaml(channels.channel,channels.validVersions)" Fetching server config for us-east1-b --- channels: channel: STABLE validVersions: - 1.21.5-gke.1805 - 1.20.15-gke.2500 - 1.20.15-gke.1000 - 1.20.15-gke.300 - 1.19.16-gke.8300 - 1.19.16-gke.6800Select one of the available Kubernetes versions you found in the previous step:
root@rok-tools:~# export CLUSTER_VERSION=<CLUSTER_VERSION>Replace
<CLUSTER_VERSION>
with your selected Kubernetes version. For example:root@rok-tools:~# export CLUSTER_VERSION=1.21.5-gke.1805root@rok-tools:~# export CLUSTER_VERSION=1.20.15-gke.2500root@rok-tools:~# export CLUSTER_VERSION=1.19.16-gke.8300Go back to step 7 and create the cluster.
Note
This will create a zonal cluster with 3 nodes in the cluster’s primary zone. It will use the default network and subnet in the zone.
Verify¶
Ensure that the GKE cluster exists and its status is RUNNING:
root@rok-tools:~# gcloud container clusters describe ${GKE_CLUSTER?} \ > --format="value(status)" RUNNINGTroubleshooting
The status is RECONCILING
If the status of the GKE cluster is RECONCILING, it means that some work is actively being done on the cluster.
One possibility is that there is an auto-upgrade in progress. Check for running control plane and node upgrade operations:
root@rok-tools:~# gcloud container operations list \ > --filter="TYPE:(UPGRADE_MASTER OR UPGRADE_NODES) AND \ > TARGET:(${GKE_CLUSTER?} OR \ > ${NODE_POOL_NAME?}) AND STATUS:RUNNING"You can also check for other running operations:
root@rok-tools:~# gcloud container operations list \ > --filter="STATUS:RUNNING"In any case, wait until the running operations complete and re-run this verification step.
Ensure that the GKE cluster is enrolled in the
STABLE
release channel:root@rok-tools:~# gcloud container clusters describe ${GKE_CLUSTER?} \ > --format="value(releaseChannel.channel)" STABLEObtain the Kubernetes version of the control plane:
root@rok-tools:~# VERSION=$(gcloud container clusters describe ${GKE_CLUSTER?} \ > --format="value(currentMasterVersion)")Ensure that the control plane runs the desired Kubernetes minor version:
root@rok-tools:~# [[ ${VERSION%*.*.*?} == ${CLUSTER_VERSION%*.*.*?} ]] \ > && echo OK || echo FAIL OKGet the list of the node pools:
root@rok-tools:~# gcloud container node-pools list --cluster=${GKE_CLUSTER?} NAME MACHINE_TYPE DISK_SIZE_GB NODE_VERSION default-workers n1-standard-8 200 1.21.5-gke.1805Ensure the default node pool exists and its status is RUNNING:
root@rok-tools:~# gcloud container node-pools describe ${NODE_POOL_NAME?} \ > --cluster=${GKE_CLUSTER?} \ > --format="value(status)" RUNNINGObtain the Kubernetes version of your default node pool:
root@rok-tools:~# VERSION=$(gcloud container clusters describe ${GKE_CLUSTER?} \ > --format=json \ > | jq -r ".nodePools[] | select(.name == \"${NODE_POOL_NAME?}\") | .version")Ensure that the default node pool runs the desired Kuberentes version:
root@rok-tools:~# [[ ${VERSION?} == ${CLUSTER_VERSION?} ]] \ > && echo OK || echo FAIL OKVerify that all instances of your node pool have the necessary storage attached:
Find the instance group that corresponds to the default-workers node pool:
root@rok-tools:~# export INSTANCE_GROUP=$(gcloud container node-pools describe ${NODE_POOL_NAME?} \ > --cluster=${GKE_CLUSTER?} \ > --format="value(instanceGroupUrls)")Find the template of the instance group:
root@rok-tools:~# export TEMPLATE=$(gcloud compute instance-groups managed describe ${INSTANCE_GROUP?} \ > --format="value(instanceTemplate)")Inspect the template and ensure that
kube-env
metadata key has the expectedNODE_LOCAL_SSDS_EXT
:root@rok-tools:~# gcloud compute instance-templates describe ${TEMPLATE?} --format json | \ > jq -r '.properties.metadata.items[] | select(.key == "kube-env") | .value' | \ > grep NODE_LOCAL_SSDS NODE_LOCAL_SSDS_EXT: 3,nvme,blockInspect the template and ensure that it has NVMe local SSDs attached. The command below will list all disks of type
SCRATCH
and show their interface. It should beNVME
:root@rok-tools:~# gcloud compute instance-templates describe ${TEMPLATE?} --format json | \ > jq -r '.properties.disks[] | select(.type == "SCRATCH") | .index, .deviceName, .interface' | paste - - - 1 local-ssd-0 NVME 2 local-ssd-1 NVME 3 local-ssd-2 NVMEEnsure that all instances inside the instance group run with the desired template:
root@rok-tools:~# gcloud compute instance-groups managed describe ${INSTANCE_GROUP?} \ > --format="value(status.versionTarget.isReached)" True
What’s Next¶
The next step is to restrict auto-upgrades for your GKE cluster.