How to validate TKGm Cluster?

Please find the steps to validate a TKGm cluster deployed through VMware Container Service Extension.

Step 1 : Download kubeconfig file

  • Download the Kubeconfig file to a windows machine which has access to the Native Kubernetes cluster.
  • Create folder .kube under $HOME.

$HOME\.kube

  • Copy the configfile dowloaded to .kube folder.
  • Rename the file to ‘config’ without any extensions.

Step 2 : Download kubectl

  • Download Kubectl for Windows from
https://dl.k8s.io/release/v1.22.0/bin/windows/amd64/kubectl.exe
  • Create folder $HOME\kubectl and copy kubectl.ext to the folder. Add the folder to the ‘Path’ User variable in Environment Variables.

Run kubectl

Step 3: Run a ‘hello world’ application in the cluster.

Follow the steps from following article to deploy a Hello World applicaiton in the K8S cluster created.

Exposing an External IP Address to Access an Application in a Cluster | Kubernetes

Note: In the following command use NodePort instead of LoadBalancer

kubectl expose deployment hello-world --type=LoadBalancer --name=my-service

How to create NSX-T Routed network in VCD for Tanzu Kubernetes Grid (TKG) clusters?

Please find the steps for configuring the Network in VCD for deploying TKG clusters.

Add the public IP to the Static IP Pool of T0 GW

  • Login to VCD Provider portal.
  • Navigate to Resources > Cloud Resources > Tier-0 Gateways.
  • Select the T0 Gateway.
  • Select ‘Network Specification’
  • Edit
  • Add the Public IP(s) to the ‘Static IP Pools’

Create Edge Gateway (T1 Router)

  • Login to VCD Provider portal.
  • Navigate to Resources > Cloud Resources. > Edge Gateways
  • Select New
  • Select the Org VDC and click Next
  • Provide a name for the Edge.
  • Select the appropriate T0 Gateway
  • Choose the appropriate Edge Cluster option for your environment.
  • Assign the Public IP for SNAT as Primary IP
  • Cleck Next review the settings and click Finish.

Create Organization Network

  • From provider portal select the Test organization.
  • Navigate to Networking > Networks.
  • Click New
  • Select Org VDC
  • Select Network Type ‘Routed
  • Select the Edge Gateway (T1)
  • Provide the Name and Gateway CIDR
  • Provide the DNS server accessible from the Org Network created. The DNS server should be able to resolve the FQDNS in the public domain/internet.
  • Click Next, review the settings and click on Finish.

Create SNAT

  • From provider portal select the Test organization.
  • Navigate to Networking > Edge Gateways
  • Select the Edge Gateway (T1)
  • Navigate to Services > NAT
  • Click New
  • Provide the details as mentioned in the screenshot.

Modify default Firewall rule

  • From provider portal select the Test organization.
  • Navigate to Networking > Edge Gateways
  • Select the Edge Gateway (T1)
  • Navigate to Services > Firewall
  • Select ‘Edit Rules’
  • Select the ‘default_rule’
  • Edit
  • Select Allow as Action.

How to run VMware Container Service Extension (CSE) as Linux Service?

After installing CSE please follow the steps below to run it as a service.

Create cse.sh file

Create cse.service file. You can copy the following code or create new one based on following link.
container-service-extension/cse.sh at master · vmware/container-service-extension (github.com)

# vi /opt/vmware/cse/cse.sh
#!/usr/bin/env bash
export CSE_CONFIG=/opt/vmware/cse/encrypted-config.yaml
export CSE_CONFIG_PASSWORD=<passwd>
cse run

Copy encrypted-config.yaml to /opt/vmware/cse directory.

Change the file permission

chmod +x /opt/vmware/cse/cse.sh

Create cse.service file

Create cse.service file. You can copy the following code or create new one based on following link.
container-service-extension/cse.service at master · vmware/container-service-extension (github.com)

vi /etc/systemd/system/cse.service
[Unit]
Description=Container Service Extension for VMware Cloud Director

[Service]
ExecStart=/opt/vmware/cse/cse.sh
User=root
WorkingDirectory=/opt/vmware/cse
Type=simple
Restart=always

[Install]
WantedBy=default.target

Enable and start the service

# systemctl enable cse.service
# systemctl start cse.service

Check the service status

# systemctl status cse.service
  cse.service - Container Service Extension for VMware Cloud Director
   Loaded: loaded (/etc/systemd/system/cse.service; enabled; vendor preset: disabled)
   Active: active (running) since Wed 2021-11-24 14:43:56 +01; 1min 9s ago
 Main PID: 770 (bash)
   CGroup: /system.slice/cse.service
           ├─770 bash /opt/vmware/cse/cse.sh
           └─775 /usr/local/bin/python3.7 /usr/local/bin/cse run

Nov 24 14:44:06 cse01.lab.com cse.sh[770]: Validating CSE installation according to config file
Nov 24 14:44:06  cse.sh[770]: MQTT extension and API filters found
Nov 24 14:44:06 cse01.lab.com cse.sh[770]: Found catalog 'cse-site1-k8s'
Nov 24 14:44:06 cse01.lab.com  cse.sh[770]: CSE installation is valid
Nov 24 14:44:06 cse01.lab.com cse.sh[770]: Started thread 'MessageConsumer' (140229531580160)
Nov 24 14:44:06 cse01.lab.com l cse.sh[770]: Started thread 'ConsumerWatchdog' (140229523187456)
Nov 24 14:44:06 cse01.lab.com  cse.sh[770]: Container Service Extension for vCloud Director
Nov 24 14:44:06 cse01.lab.com  cse.sh[770]: Server running using config file: /opt/vmware/cse/encrypted-config.yaml
Nov 24 14:44:06 cse01.lab.com  cse.sh[770]: Log files: /root/.cse-logs/cse-server-info.log, /root/.cse-logs/cse-server-debug.log
Nov 24 14:44:06 cse01.lab.com  cse.sh[770]: waiting for requests (ctrl+c to close)

How to enable TKG in Container Service Extension (CSE) 3.1.1?

Step1: Download the TKG OVA

Starting CSE 3.1.1, CSE allows providers to import Ubuntu 20.04 based VMware Tanzu Kubernetes Grid OVA into VCD via CSE server cli. The following link provide different TKG Templates. Download the template with the K8s version you need.
Note: Ubuntu 20.04 Kubernetes OVAs from VMware Tanzu Kubernetes Grid Versions 1.4.0, 1.3.1, 1.3.0 are supported.
Kubernetes OVAs for VMware Tanzu Kubernetes Grid 1.4.0 are available here


I’ve downloaded Ubuntu 2004 Kubernetes v1.21.2 OVA since that’s the lates available version.
File Name : ubuntu-2004-kube-v1.21.2+vmware.1-tkg.1-7832907791984498322.ova

Step2: Import TKG OVA to VCD Catalog

Upload the downloaded OVA to the CSE server. Use the following command to import the OVA in Catalog.

# cse template import -c encrypted-config.yaml -F ubuntu-2004-kube-v1.21.2+vmware.1-tkg.1-7832907791984498322.ova

Required Python version: >= 3.7.3
Installed Python version: 3.7.12 (default, Nov 23 2021, 15:49:55)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)]
Password for config file decryption:
Decrypting 'encrypted-config.yaml'
Validating config file 'encrypted-config.yaml'
InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised.
Connected to vCloud Director (vcd.lab.com:443)
Connected to vCenter Server 'demovc.local' as '[email protected]' (demovc.local)
Config file 'encrypted-config.yaml' is valid
Uploading 'ubuntu-2004-kube-v1.21.2+vmware.1-tkg.1-7832907791984498322' to catalog 'cse-site1-k8s'
Uploaded 'ubuntu-2004-kube-v1.21.2+vmware.1-tkg.1-7832907791984498322' to catalog 'cse-site1-k8s'
Writing metadata onto catalog item ubuntu-2004-kube-v1.21.2+vmware.1-tkg.1-7832907791984498322.
Successfully imported TKGm OVA.

Step3: Restart CSE service.

I assume you’ve configured CSE to run as service. If yest restart the service.

Step4: Confirm TKG is available as option for Kubernetes Runtime

Login to the tenant portal and navigate to More > Kubernetes Container Clusters.

Click on New

How to delete the failed TKGm or Native k8s cluster in CSE?

In CSE 3.1.1, delete operation on a cluster (Native or TKG) that is in an error state (RDE.state = RESOLUTION_ERROR (or) status.phase = :FAILED), may fail with Bad request (400) or the Delete process will be stuck in ‘DELETEIN_PROGRESS’ state. The steps are given below to resolve the issue.

Step1: Assign API explorer privilege to the CSE Service Account.

Login to VCD Provider portal as Administrator.

Edit the CSE Service Role.

Navigate to Administration > Provider Access Control > Roles > CSE Service Role.

In the tenant portal check if there’re any stale vApp entries for the failed clusters. If so, please delete them.

Step2: Get the failed cluster ID through vcd cli

# vcd cse cluster list
Name        Org             Owner     VDC            K8s Runtime    K8s Version            Status
----------  --------------  --------  -------------  -------------  ---------------------  ------------------
test-tkg-1  CSE-Site1-Test  orgadmin  CSE-TEST-OVDC  TKGm           TKGm v1.21.2+vmware.1  DELETE:IN_PROGRESS
tkg         CSE-Site1-Test  orgadmin  CSE-TEST-OVDC  TKGm           TKGm v1.21.2+vmware.1  DELETE:IN_PROGRESS
tkgtest     CSE-Site1-Test  orgadmin  CSE-TEST-OVDC  TKGm           TKGm v1.21.2+vmware.1  DELETE:IN_PROGRESS
tkg-test    CSE-Site1-Test  orgadmin  CSE-TEST-OVDC  TKGm           TKGm v1.21.2+vmware.1  DELETE:IN_PROGRESS
tkg-test3   CSE-Site1-Test  orgadmin  CSE-TEST-OVDC  TKGm           TKGm v1.21.2+vmware.1  DELETE:IN_PROGRESS

# vcd cse cluster info tkg-test3 | grep uid
  uid: urn:vcloud:entity:cse:nativeCluster:9364bf18-0faa-49ce-8be7-7e92af692d1b

Step3: Run GET call to check the status

Login to VCD Provider portal with the CSE service account which has CSE Service Role assigned.
Open API Explorer.
Click on GET in difinedEntity section.

Click on TryitOut
In Description, provide the cluster UID from last step.
In the output we can see the state as PRE_CREATED.

Step3: Run the POST call resolve to resolve

Select the POST call from definedEntity section.

/1.0.0/entities/{id}/resolve
Validates the defined entity against the entity type schema.

Provide the cluster ID and run the call. The state will be changed to RESOLVED.


Step4: Run the DELETE call to delete RDE.

Povide the cluser ID and ‘false’ as value for inovkeHooks.

Please check and confirm the failed Cluster is deleted now.

#vcd cse cluster list
InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised.
Name      Org             Owner     VDC            K8s Runtime    K8s Version            Status
--------  --------------  --------  -------------  -------------  ---------------------  ------------------
tkg       CSE-Site1-Test  orgadmin  CSE-TEST-OVDC  TKGm           TKGm v1.21.2+vmware.1  DELETE:IN_PROGRESS
tkgtest   CSE-Site1-Test  orgadmin  CSE-TEST-OVDC  TKGm           TKGm v1.21.2+vmware.1  DELETE:IN_PROGRESS
tkg-test  CSE-Site1-Test  orgadmin  CSE-TEST-OVDC  TKGm           TKGm v1.21.2+vmware.1  DELETE:IN_PROGRESS