Cloud Architecture#

Cloud Architecture

As shown in the above diagram, the Omniverse on DGX Cloud (OVC) architecture consists of two interconnected networks. The first is the NVIDIA Access Virtual Network (VNet) in Azure Commercial, where a majority of OVC’s control plane services are deployed using Azure Kubernetes Service (AKS) which is an Azure managed Kubernetes cluster. The second hosts a separate Kubernetes cluster that is responsible for managing NVIDIA OVX Pods. These Pods run either on Azure Dedicated or within NVIDIA’s own data center. To ensure the best performance, it’s important that both clusters are geographically located within close proximity.

To establish seamless connectivity between the NVIDIA Access VNet and Azure Dedicated, we rely on network peering. While Azure VPN tunneling is used for peering the OVX clusters in NVIDIA’s data center with up to 2 Gbps network bandwidth, we leverage the Microsoft Smart Switch that offers much higher network bandwidth throughput (up to 3.2 Tbps) for Azure Dedicated.

The AKS cluster houses the OVC control plane services, including storage, authentication, public stream API, streaming session, farm agent, jobs, metrics, retries, and settings services. Furthermore, static web contents, encompassing the user portal and the WebRTC streaming client, are also hosted in this infrastructure for user access.

On Azure Dedicated, the focus is primarily on running data and compute intensive applications and services to optimize the utilization of OVC’s high-performance devices and services such as L40 GPU’s and networking. Notably, Kit-based applications such as Factory Explorer and USD Composer are executed on OVX nodes, serving either as interactive streaming applications or batch applications responsible for rendering high-resolution scenes.

To interact with the control plane services on AKS, controller agents known as controller services are utilized and run on Azure Dedicated. For instance, the streaming controller service receives requests from the streaming session service running on AKS and orchestrates the spawning of Kit-based applications within Kubernetes on Azure Dedicated.

Due to the restricted access to services and applications within the Azure Dedicated, we use reverse proxy services that run within AKS that forward WebRTC streaming traffic from Azure Dedicated to end users. Specifically, when the streaming session service sends a request to launch a Kit-based application, a reverse proxy routing rule is established, which then enables the streaming traffic from Azure Dedicated to be efficiently forwarded via the proxy to the end user.

A distinctive feature of our architecture is the delivery of services, including streaming, through private and secured channels. To accomplish this, we have incorporated Azure’s Private Link Service (PLS), a managed service enabling private connections, into the NVIDIA Access VNet. Each tenant must work with the OVC integration team to establish the connection of a private endpoint (PE) within their Azure commercial virtual network to the PLS of their dedicated NVIDIA Access VNet before onboarding its end users.

We assume that the tenant has already established network peering between their on-premises network and Azure VNet, utilizing site-to-site VPN tunneling or Azure ExpressRoute. By adopting this approach, the end-to-end OVC service traffic is kept within private channels and never needs to traverse the public Internet, providing users with direct and secure access to OVC services.

As part of the tenant-OVC integration, it is crucial for a fully qualified domain name (FQDN), such as tenant.cloud.omniverse.nvidia.com, to resolve to the IP Address of this private endpoint. Once this resolution is properly set up, users can access our services, including streaming, by simply visiting this URL in their web browser and logging in.

Our portal operates on TCP Port 443, utilizing HTTPS for secure communication. The streaming proxy handles TCP traffic for signaling server communications, while UDP is utilized for media streaming traffic. Specifically, we employ HA Proxy to route the signaling traffic on port TCP 48322, performing SSL/TLS termination. We use the port range of UDP 10500-20000 for streaming traffic. To ensure the highest level of security, all WebRTC streaming traffic is encrypted using DTLS. The tenant is responsible for configuring the firewall rules to permit the usage of these ports.