How to delete the failed TKGm or Native k8s cluster in CSE?

In CSE 3.1.1, delete operation on a cluster (Native or TKG) that is in an error state (RDE.state = RESOLUTION_ERROR (or) status.phase = :FAILED), may fail with Bad request (400) or the Delete process will be stuck in ‘DELETEIN_PROGRESS’ state. The steps are given below to resolve the issue.

Step1: Assign API explorer privilege to the CSE Service Account.

Login to VCD Provider portal as Administrator.

Edit the CSE Service Role.

Navigate to Administration > Provider Access Control > Roles > CSE Service Role.

In the tenant portal check if there’re any stale vApp entries for the failed clusters. If so, please delete them.

Step2: Get the failed cluster ID through vcd cli

# vcd cse cluster list
Name        Org             Owner     VDC            K8s Runtime    K8s Version            Status
----------  --------------  --------  -------------  -------------  ---------------------  ------------------
test-tkg-1  CSE-Site1-Test  orgadmin  CSE-TEST-OVDC  TKGm           TKGm v1.21.2+vmware.1  DELETE:IN_PROGRESS
tkg         CSE-Site1-Test  orgadmin  CSE-TEST-OVDC  TKGm           TKGm v1.21.2+vmware.1  DELETE:IN_PROGRESS
tkgtest     CSE-Site1-Test  orgadmin  CSE-TEST-OVDC  TKGm           TKGm v1.21.2+vmware.1  DELETE:IN_PROGRESS
tkg-test    CSE-Site1-Test  orgadmin  CSE-TEST-OVDC  TKGm           TKGm v1.21.2+vmware.1  DELETE:IN_PROGRESS
tkg-test3   CSE-Site1-Test  orgadmin  CSE-TEST-OVDC  TKGm           TKGm v1.21.2+vmware.1  DELETE:IN_PROGRESS

# vcd cse cluster info tkg-test3 | grep uid
  uid: urn:vcloud:entity:cse:nativeCluster:9364bf18-0faa-49ce-8be7-7e92af692d1b

Step3: Run GET call to check the status

Login to VCD Provider portal with the CSE service account which has CSE Service Role assigned.
Open API Explorer.
Click on GET in difinedEntity section.

Click on TryitOut
In Description, provide the cluster UID from last step.
In the output we can see the state as PRE_CREATED.

Step3: Run the POST call resolve to resolve

Select the POST call from definedEntity section.

/1.0.0/entities/{id}/resolve
Validates the defined entity against the entity type schema.

Provide the cluster ID and run the call. The state will be changed to RESOLVED.


Step4: Run the DELETE call to delete RDE.

Povide the cluser ID and ‘false’ as value for inovkeHooks.

Please check and confirm the failed Cluster is deleted now.

#vcd cse cluster list
InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised.
Name      Org             Owner     VDC            K8s Runtime    K8s Version            Status
--------  --------------  --------  -------------  -------------  ---------------------  ------------------
tkg       CSE-Site1-Test  orgadmin  CSE-TEST-OVDC  TKGm           TKGm v1.21.2+vmware.1  DELETE:IN_PROGRESS
tkgtest   CSE-Site1-Test  orgadmin  CSE-TEST-OVDC  TKGm           TKGm v1.21.2+vmware.1  DELETE:IN_PROGRESS
tkg-test  CSE-Site1-Test  orgadmin  CSE-TEST-OVDC  TKGm           TKGm v1.21.2+vmware.1  DELETE:IN_PROGRESS