Kubernetes Deployment#

../../_images/ov_cloud_banner.jpg

Overview#

This guide demonstrates deploying the Portal Sample on Azure Kubernetes Service (AKS), providing a Kubernetes-based alternative to the standalone deployment. If using an alternative Kubernetes Service, use the information below as guidance and modify the steps below to suit your configuration.

This document covers deploying:

  • An AKS cluster with managed NGINX ingress

  • The Portal Sample frontend and backend as containerized workloads

  • TLS certificates via cert-manager and Let’s Encrypt

Considerations#

To deploy the Portal Sample on Kubernetes, you must:

  • Define an FQDN for the portal endpoint

  • Configure DNS records to resolve to the ingress external IP

  • Obtain TLS certificates for HTTPS access

The specific steps vary based on your organization’s DNS infrastructure, certificate authority, and security policies. Consult your IT, DevOps, or Security team for guidance.

For evaluation and testing, this guide provides an alternative approach using:

  • nip.io for DNS resolution without configuring DNS zones

  • Let’s Encrypt with cert-manager for automated certificate provisioning

Warning

The following configuration is for demonstration purposes only and should not be used in production environments. Production deployments must use your organization’s DNS and certificate management infrastructure.

Prerequisites#

  • An Azure subscription with permissions to create AKS resources

  • A Linux workstation with Docker installed (used to build the portal sample containers)

  • An NGC Account with a Personal API Key (NVCF and Private Registry scopes enabled)

  • An OIDC identity provider configured

  • Nucleus server FQDN (if connecting to Omniverse assets)

Create an AKS Cluster#

If you already have an AKS cluster, proceed to Prepare the AKS Cluster.

  1. Navigate to: https://portal.azure.com/#browse/Microsoft.ContainerService%2FmanagedClusters.

  2. Click Kubernetes Cluster.

  3. Configure basic parameters of your cluster.

  4. Configure node pools. The Portal Sample has minimal resource requirements; a small node pool with 2-4 vCPUs and 4-8 GB of memory is sufficient.

  5. Accept the default values for the remaining configuration options, or modify them to align with your organization’s requirements.

  6. Create the cluster and wait for deployment to complete (this takes a few minutes).

Prepare the AKS Cluster#

Install Azure CLI#

Skip this section if you already have Azure CLI installed.

You will need a Linux VM. Follow the Azure CLI installation guide.

curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash

Example output:

Preparing to unpack .../azure-cli_2.70.0-1~jammy_amd64.deb ...
Unpacking azure-cli (2.70.0-1~jammy) over (2.69.0-1~jammy) ...
Setting up azure-cli (2.70.0-1~jammy) ...

Log in to Your Azure Subscription#

az login --use-device-code

Example output:

To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code <redacted> to authenticate.

Open a browser on any PC, navigate to https://microsoft.com/devicelogin, and enter the code provided by the az login --use-device-code CLI.

Complete the login/SSO procedure.

If your account has access to multiple tenants, select the appropriate one:

[Tenant and subscription selection]

No   Subscription name      Subscription ID                        Tenant
---  --------------------   ------------------------------------   ------------------
[1]  <redacted>             <redacted>                             NVIDIA Corporation
[2]  <redacted>             <redacted>                             NVIDIA Corporation
[3]  <redacted>             <redacted>                             NVIDIA Corporation

The default is marked with an *; the default tenant is 'NVIDIA Corporation' and subscription is '<redacted>' (<redacted>).

Select a subscription and tenant (Type a number or Enter for no changes):

Select the right tenant and confirm selection.

Install kubectl#

Skip this section if you already have kubectl installed.

Follow the kubectl installation guide.

curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
rm -rf kubectl

Verify kubectl:

kubectl version

Example output:

Client Version: v1.32.2
Kustomize Version: v5.5.0
Server Version: v1.30.9
WARNING: version difference between client (1.32) and server (1.30) exceeds the supported minor version skew of +/-1

Use Azure CLI to Retrieve kubeconfig#

Clean all existing .kube artifacts: (This will permanently delete all existing Kubernetes configurations. Back up the directory if you need to preserve any existing cluster access.)

rm -rf .kube

List available AKS clusters:

az aks list --output table

Note the Name and ResourceGroup of the cluster that matches the one you’ve created or plan to use.

Get kubeconfig:

az aks get-credentials --admin --name <your cluster name> --resource-group <your resource group name>

Example output:

Merged "my-k8s-for-ov-admin" as current context in /home/USERNAME/.kube/config

Verify you have connection to the cluster’s API endpoint:

kubectl get nodes

Example output:

NAME                                STATUS   ROLES    AGE   VERSION
aks-agentpool-19968595-vmss000000   Ready    <none>   28m   v1.30.9

Install Helm#

Skip this section if you already have Helm installed.

Follow the Helm installation guide.

wget https://get.helm.sh/helm-v3.17.1-linux-amd64.tar.gz
tar -zxvf helm-v3.17.1-linux-amd64.tar.gz
sudo mv linux-amd64/helm /usr/local/bin/helm
rm -rf linux-amd64/helm
rm -rf helm-v3.17.1-linux-amd64.tar.gz

Verify the Helm installation:

helm version

Example output:

version.BuildInfo{Version:"v3.17.1", GitCommit:"980d8ac1939e39138101364400756af2bdee1da5", GitTreeState:"clean", GoVersion:"go1.23.5"}

Install the Managed AKS Ingress#

Skip this section if you already have a custom Ingress implementation or the managed AKS Ingress is already installed.

Follow the AKS app routing guide.

Check the AKS cluster has no defined IngressClasses:

kubectl get ingressClasses -A

Execute the Azure CLI command to install the managed AKS Ingress (this operation takes approximately 5 minutes):

az aks approuting enable --resource-group <your resource group> --name <your cluster name>

Check IngressClasses again:

kubectl get ingressClasses -A

Example output:

NAME                                 CONTROLLER                                 PARAMETERS   AGE
webapprouting.kubernetes.azure.com   webapprouting.kubernetes.azure.com/nginx   <none>       4m57s

Verify the actual service was created and external IP address is assigned:

kubectl get svc nginx -n app-routing-system

Example output:

NAME    TYPE           CLUSTER-IP    EXTERNAL-IP     PORT(S)                      AGE
nginx   LoadBalancer   10.0.247.55   4.255.101.179   80:30142/TCP,443:30119/TCP   5m41s

Visit the EXTERNAL-IP with curl or navigate to the IP via web browser:

curl http://<enter your EXTERNAL-IP>

Example output:

<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>

Create a DNS Record#

This is where you need to use your organization’s methods to define a portal FQDN, configure DNS zones, and request TLS certificates.

NVIDIA cannot provide prescriptive guidance on how this is done for each particular case. The rest of this guide focuses on a deployment example that doesn’t require any external dependencies.

Warning

Do not use the guide below as a solution for production environments.

Getting DNS Mapping for the Ingress External IP#

We will use the https://nip.io/ service to provide a DNS mapping for the external IP without needing to edit any DNS zones. This is achieved by constructing an FQDN that contains the target IP address.

For example: app.10.8.0.1.nip.io maps to 10.8.0.1

Construct an FQDN using the pattern: ov-portal.<EXTERNAL-IP>.nip.io

Use curl to verify you can reach the ingress IP via the domain name.

HTTP:

curl http://ov-portal.4.255.101.179.nip.io
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>

HTTPS:

curl -k https://ov-portal.4.255.101.179.nip.io
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>nginx</center>
</body>
</html>

Setup Infrastructure to Issue Public Certificates#

This section follows the DNS approach established in the previous section. It uses cert-manager as the Kubernetes automation tool and Let’s Encrypt as the certificate issuer.

Note

The DNS and certificate configuration below uses nip.io and Let’s Encrypt for demonstration purposes only. For production deployments, use your organization’s DNS and certificate management infrastructure.

Install cert-manager#

Follow the cert-manager Helm installation guide.

helm repo add jetstack https://charts.jetstack.io --force-update

Example output:

"jetstack" has been added to your repositories
helm install \
  cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --create-namespace \
  --version v1.17.0 \
  --set crds.enabled=true

Example output:

NAME: cert-manager
LAST DEPLOYED: Tue Mar 11 16:11:35 2025
NAMESPACE: cert-manager
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
cert-manager v1.17.0 has been deployed successfully!

In order to begin issuing certificates, you will need to set up a ClusterIssuer
or Issuer resource (for example, by creating a 'letsencrypt-staging' issuer).

More information on the different types of issuers and how to configure them
can be found in our documentation:

https://cert-manager.io/docs/configuration/

For information on how to configure cert-manager to automatically provision
Certificates for Ingress resources, take a look at the `ingress-shim`
documentation:

https://cert-manager.io/docs/usage/ingress/

Create a Kubernetes Namespace for Your OV Portal Installation#

kubectl create namespace ov-portal

Example output:

namespace/ov-portal created

Configure CertManager Issuers#

Skip this section if you already have an Issuer or ClusterIssuer, or you don’t plan to use CertManager/CertBot.

Follow the cert-manager configuration guide.

Note

You must specify a valid email in the configurations below. If you use the default example email, Let’s Encrypt might refuse to issue a certificate.

As per the AKS Let’s Encrypt tutorial:

The email address is only used by Let’s Encrypt to remind you to renew the certificate 30 days before expiry. You will only receive this email if something goes wrong when renewing the certificate with cert-manager.

We create two issuers because the Let’s Encrypt production issuer has strict rate limits. When experimenting and learning, it’s easy to hit those limits. We’ll start with the Let’s Encrypt staging issuer, and once we’re confident it’s working, we’ll switch to the production issuer.

kubectl apply -n ov-portal -f - << EOF
---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: letsencrypt-staging
spec:
  acme:
    # The ACME server URL
    server: https://acme-staging-v02.api.letsencrypt.org/directory
    # Email address used for ACME registration
    email: <REPLACE_WITH_VALID_EMAIL>
    # Name of a secret used to store the ACME account private key
    privateKeySecretRef:
      name: letsencrypt-staging
    # Enable the HTTP-01 challenge provider
    solvers:
      - http01:
          ingress:
            ingressClassName: webapprouting.kubernetes.azure.com
---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    # The ACME server URL
    server: https://acme-v02.api.letsencrypt.org/directory
    # Email address used for ACME registration
    email: user@example.com
    # Name of a secret used to store the ACME account private key
    privateKeySecretRef:
      name: letsencrypt-prod
    # Enable the HTTP-01 challenge provider
    solvers:
      - http01:
          ingress:
            ingressClassName: webapprouting.kubernetes.azure.com
EOF

Verify the Issuers were initialized correctly:

kubectl get issuer -n ov-portal

Example output:

NAME                  READY   AGE
letsencrypt-prod      False   2m24s
letsencrypt-staging   False   44s

Alternative Approach#

You can import existing certificates, create a secret from them (inside the target namespace), and reference that secret in the ingress.tls.secretName OV portal Helm value (named web-streaming-example by default).

In this case, you don’t need cert-manager, issuers, or the cert-manager.io/issuer ingress annotation.

Build, Package, and Upload the OV Portal Containers#

Obtain the OV Portal Source Code#

git clone https://github.com/NVIDIA-Omniverse/ov-dgxc-portal-sample

Create Containers with OV on DGXC Portal Frontend and Backend#

This guide uses NGC Private Registry to store the OV Portal Frontend and Backend containers. Developers can use alternative registries such as Azure Container Registry or any container registry of choice.

The following commands assume Docker is already authenticated to the NGC Private Registry. To verify or establish authentication, run:

docker login nvcr.io

Use $oauthtoken as the username and your NGC Personal API Key as the password.

Obtain your NCA_ID by visiting https://org.ngc.nvidia.com/profile

Set the necessary environment variables, build, and containerize the portal code:

export NCA_ID=<paste your NCA_ID here>
export DOCKER_REGISTRY=nvcr.io/$NCA_ID
export DOCKER_BUILDKIT=1

echo "Build backend"
docker build -t $DOCKER_REGISTRY/ov-portal-backend:latest -f backend/Dockerfile backend
docker push $DOCKER_REGISTRY/ov-portal-backend:latest

echo "Build web"
docker build -t $DOCKER_REGISTRY/ov-portal-frontend:latest -f web/Dockerfile --secret id=npmrc,src=./web/.npmrc web
docker push $DOCKER_REGISTRY/ov-portal-frontend:latest

These commands build and push the container images to the NGC Private Registry.

Deploy the Sample Portal on AKS#

Export Variables#

export NVCF_TOKEN=<your NGC/NVCF Personal API token with NVIDIA Cloud Functions and Private Registry Enabled>
export MY_OV_PORTAL_FQDN=<your OV portal FQDN>
export MY_OV_NUCLEUS_FQDN=<your Nucleus server FQDN>

Create NGC Image Pull Secret#

kubectl create secret docker-registry ngc -n ov-portal --docker-server=nvcr.io --docker-username='$oauthtoken' --docker-password=$NVCF_TOKEN

Example output:

secret/ngc created

Deploy the Helm Chart#

Deploy the Helm chart provided as a sample with the necessary values:

helm upgrade --install web-streaming-example ./helm/web-streaming-example/. -f ./helm/web-streaming-example/values.yaml -n ov-portal --create-namespace\
  --set images.backend.repository="$DOCKER_REGISTRY/ov-portal-backend"\
  --set images.backend.tag=latest\
  --set images.web.repository="$DOCKER_REGISTRY/ov-portal-frontend"\
  --set images.web.tag=latest\
  --set imagePullSecrets[0].name=ngc\
  --set nvcf.key=$NVCF_TOKEN\
  --set ingress.className=webapprouting.kubernetes.azure.com\
  --set ingress.hosts[0].host=$MY_OV_PORTAL_FQDN\
  --set ingress.tls[0].hosts[0]=$MY_OV_PORTAL_FQDN\
  --set ingress.annotations."cert-manager\.io\/issuer"=letsencrypt-prod\
  --set config.auth.clientId="<your OIDC client ID>"\
  --set config.auth.redirectUri="https://$MY_OV_PORTAL_FQDN/openid"\
  --set config.auth.authority="your OIDC authority URL"\
  --set config.auth.metadataUri="your OIDC well known configuration URL"\
  --set config.auth.jwksAlg="your OIDC algorithm e.g. RS256"\
  --set config.auth.jwksTtl="your OIDC JWKS validity time"\
  --set config.auth.userinfoTtl="your OIDC userinfo validity time"\
  --set config.auth.scope="your OIDC space separated scopes"\
  --set config.auth.adminGroup="your IDP admin group name"\
  --set config.endpoints.backend="https://$MY_OV_PORTAL_FQDN/api"\
  --set config.endpoints.nucleus="$MY_OV_NUCLEUS_FQDN"\
  --set config.csp="empty or Content Security Policies"\
  --set config.maxAppInstancesCount=<your desired number of sessions per user per app>\
  --set certificate.create=false\
  --set clusterIssuer.create=false

Example with sample values:

export NVCF_TOKEN=<redacted>
export MY_OV_PORTAL_FQDN=ov-portal.<redacted>.nip.io
export MY_OV_NUCLEUS_FQDN=nucleus.<redacted>.com

helm upgrade --install web-streaming-example ./helm/web-streaming-example/. -f ./helm/web-streaming-example/values.yaml -n ov-portal --create-namespace\
  --set images.backend.repository="$DOCKER_REGISTRY/ov-portal-backend"\
  --set images.backend.tag=latest\
  --set images.web.repository="$DOCKER_REGISTRY/ov-portal-frontend"\
  --set images.web.tag=latest\
  --set imagePullSecrets[0].name=ngc\
  --set nvcf.key=$NVCF_TOKEN\
  --set ingress.className=webapprouting.kubernetes.azure.com \
  --set ingress.hosts[0].host=$MY_OV_PORTAL_FQDN\
  --set ingress.tls[0].hosts[0]=$MY_OV_PORTAL_FQDN\
  --set ingress.annotations."cert-manager\.io\/issuer"=letsencrypt-prod\
  --set config.auth.clientId="<redacted>"\
  --set config.auth.redirectUri="https://$MY_OV_PORTAL_FQDN/openid"\
  --set config.auth.authority="https://login.microsoftonline.com/<redacted>/v2.0"\
  --set config.auth.metadataUri="https://login.microsoftonline.com/<redacted>/v2.0/.well-known/openid-configuration"\
  --set config.auth.jwksAlg="RS256"\
  --set config.auth.jwksTtl=1000\
  --set config.auth.userinfoTtl=1000\
  --set config.auth.scope="openid profile email"\
  --set config.auth.adminGroup="034cc6ca-d3c9-429c-a207-xxxxxxxxxx"\
  --set config.endpoints.backend="https://$MY_OV_PORTAL_FQDN/api"\
  --set config.endpoints.nucleus="$MY_OV_NUCLEUS_FQDN"\
  --set config.csp=""\
  --set config.maxAppInstancesCount=2\
  --set certificate.create=false\
  --set clusterIssuer.create=false

Check the certificate was issued (this may take some time, especially if you exceeded Let’s Encrypt production issuer thresholds):

kubectl get certificates -n ov-portal

Example output:

NAMESPACE   NAME                    READY   SECRET                  AGE
ov-portal   web-streaming-example   True    web-streaming-example   2m13s

Note

While iterating and experimenting, you might want to replace:

--set ingress.annotations."cert-manager\.io\/issuer"=letsencrypt-prod\

with:

--set ingress.annotations."cert-manager\.io\/issuer"=letsencrypt-staging\

This avoids hitting production rate limits during testing.