Managing Nodes on VMware vSphere
This document explains how to manage worker nodes on VMware vSphere after the baseline cluster is running. Node lifecycle operations are managed through VSphereResourcePool, VSphereMachineTemplate, KubeadmConfigTemplate, and MachineDeployment resources.
Prerequisites
Before you begin, ensure the following conditions are met:
- The workload cluster was created successfully. See Creating Clusters on VMware vSphere.
- The worker CAPV static allocation pool has enough available slots.
- The control plane is healthy and reachable.
- You know which manifest files currently define the worker nodes.
Steps
Scale out worker nodes
When you add more worker nodes, update the worker static allocation pool before you increase the replica count.
- Add one or more new node slots to
03-vsphereresourcepool-worker.yaml. - Update
replicasin30-workers-md-0.yaml. - Apply the updated manifests.
Use the following order:
Note: If MachineDeployment.spec.replicas is greater than the number of available slots in VSphereResourcePool.spec.resources[], the new worker nodes cannot be assigned correctly.
Roll out updated worker node configuration
When you need to change worker VM specifications, create a new VSphereMachineTemplate and update the MachineDeployment to reference it. This triggers a rolling update that replaces worker nodes with the new configuration.
Templates are immutable
VSphereMachineTemplate resources cannot be modified in-place. You must create a new template with a new name and update the reference in MachineDeployment. For more information, refer to the Cluster API documentation.
Typical changes include:
- VM template name (
spec.template.spec.template) - CPU or memory sizing (
numCPUs,memoryMiB) - System disk or data disk layout (
diskGiB,dataDisks)
-
Export the existing template
-
Modify the template
Edit
new-worker-template.yaml:- Set
metadata.nameto a new unique name (for example,<cluster_name>-worker-v2) - Update the desired specification fields
- Remove server-generated fields:
metadata.resourceVersion,metadata.uid,metadata.generation,metadata.creationTimestamp,metadata.managedFields,metadata.annotations["kubectl.kubernetes.io/last-applied-configuration"], andstatus
- Set
-
Apply the new template
-
Update the MachineDeployment reference
If you also need to change bootstrap settings, see Updating Bootstrap Templates below.
-
Monitor the rolling update
Rolling back a failed update
If the rolling update fails (for example, new VMs fail to boot), revert the MachineDeployment reference back to the previous template name. The old template still exists and Cluster API will roll back to it.
Verify worker node status
Run the following commands to verify the management-cluster and workload-cluster status:
Confirm the following results:
- The target worker replica count is reached.
- Every new worker node joins the cluster.
- The nodes eventually become
Ready.
Updating Bootstrap Templates
KubeadmConfigTemplate resources are also immutable. Changes to an existing template do not trigger rollouts of existing machines. To update bootstrap configuration, create a new template and update the MachineDeployment reference.
-
Export the existing template
-
Modify the template
Edit
new-bootstrap-template.yaml:- Set
metadata.nameto a new unique name (for example,<cluster_name>-worker-bootstrap-v2) - Update the desired bootstrap configuration fields
- Remove the same server-generated fields listed in Roll out updated worker node configuration step 2
- Set
-
Apply the new template
-
Update the MachineDeployment reference
The Cluster API controller triggers a rolling update. Existing machines continue using the old bootstrap configuration until they are replaced.
Upgrading Kubernetes version? See Upgrading Clusters on VMware vSphere for the full control plane and worker upgrade workflow.
Troubleshooting
Use the following checks first when worker node management fails:
- Check
VSphereMachineconditions forResourcePoolReady. IfFalse, the reason indicates why slot allocation failed:PoolBoundToOtherConsumer: the pool is already bound to a differentKubeadmControlPlaneorMachineDeployment.NoAvailableSlots: no slots match the required datacenter or failure domain.
- Verify that the worker CAPV static allocation pool still has free slots.
- Verify that the worker IP addresses, gateway, and DNS settings are correct.
- Verify that the worker VM template still matches the required Kubernetes version and guest-tools requirements.
- Check
VSphereVM.status.addresseswhen a node is waiting for IP allocation.
Next Steps
If you need to change worker networking, placement, or disk topology, continue with Extending a VMware vSphere Cluster Deployment.