country_code

Portal Sample Metrics#

Overview#

The Omniverse DGX Cloud Portal Sample includes built-in OpenTelemetry (OTel) metrics export capabilities for comprehensive session monitoring and observability. This guide covers the metrics exported, architecture setup, and how to observe them in Azure Monitor.

Metrics Exported#

The Omniverse on DGX Cloud Portal Sample exports the following OpenTelemetry session metrics:

  • sessions.active.count (UpDownCounter)

    Description: Current number of active streaming sessions

    Use Case: Real-time capacity monitoring, concurrent user tracking

  • sessions.start.count (Counter)

    Description: Total number of sessions started

    Use Case: Usage analytics, growth tracking

  • sessions.end.count (Counter)

    Description: Total number of sessions ended

    Use Case: Completion rate analysis, session lifecycle tracking

  • sessions.duration (Histogram)

    Description: Session duration in seconds with histogram buckets

    Use Case: Performance analysis, user engagement metrics

Dimensional Data#

Each metric includes the following attributes for filtering and analysis:

  • session.id - Unique session identifier

  • session.username - User name

  • session.app - Application name being streamed

  • session.user - User ID

  • nvcf.function_id - NVIDIA Cloud Function ID

  • nvcf.function_version_id - NVCF Function Version

  • session.duration.seconds - Duration for end events

Prerequisites#

  • Docker installed on collector instance. Ports 4317 (gRPC) and 4318 (HTTP) available on collector instances.

  • Network connectivity between the Portal Sample and the collector.

  • Observability Backend: This guide provides an example of configuring Azure Monitor to export Portal Metrics. Steps to configure Grafana Cloud & Datadog are provided in the NVCF Observability Guide.

OTel Config For Azure Monitor#

The OTel collector can be set up on the same instance as the portal or a different instance. Configure the collector to receive metrics from the portal, process them (add labels, batch them), and forward them to Azure Monitor.

  1. Create a directory for organizing the OTel collector’s configuration:

mkdir observability
  1. Create the OTel configuration:

touch otel-collector-config.yaml
  1. Configure the following receiver, processor, and exporter blocks. Copy the code block in the otel-collector-config.yaml file:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
# Add resource attributes to identify the source
resource:
    attributes:
    - key: service.name
        value: "ov-dgxc-portal"
        action: upsert
    - key: service.version
        value: "1.0.0"
        action: upsert
    - key: deployment.environment
        value: "production"
        action: upsert

# Batch processor for efficient export
batch:
    timeout: 1s
    send_batch_size: 1024
    send_batch_max_size: 2048

# Memory limiter to prevent OOM
memory_limiter:
    limit_mib: 256
    check_interval: 1s

exporters:
debug:
    verbosity: detailed
azuremonitor:
    instrumentation_key: "${APPLICATIONINSIGHTS_CONNECTION_STRING}"

service:
pipelines:
    metrics:
    receivers: [otlp]
    processors: [memory_limiter, resource, batch]
    exporters: [debug, azuremonitor]

Additional documentation for OTel collector configuration can be found here.

Azure Monitor Setup#

Use an existing Azure Monitor instance, or create a new instance.

After you either create a new Application Insights instance, or use an existing instance. Navigate to Overview -> JSON View

Capture the ConnectionString Value from the Azure portal.

It will be in the format:

"InstrumentationKey=xxxxxxxxxx;IngestionEndpoint=https://xxxxx.applicationinsights.azure.com/;LiveEndpoint=https://xxxxx.monitor.azure.com/;ApplicationId=xxxxxxx"

Create OTel Collector#

Launch the collector as a Docker container so it can start receiving metrics from the Portal Sample and start forwarding them to Azure Monitor:

docker run -d \
--name otel-collector \
-p 4317:4317 \
-p 4318:4318 \
-e APPLICATIONINSIGHTS_CONNECTION_STRING="InstrumentationKey=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx;IngestionEndpoint=https://xxxx.in.applicationinsights.azure.com/;LiveEndpoint=https://xxxxxx.livediagnostics.monitor.azure.com/;ApplicationId=xxxxxx-xxxxx-xxxxx-xxxxx-xxxxxxxxxxxx" \
-v "$(pwd)/otel-collector-config.yaml:/etc/otelcol-contrib/config.yaml" \
otel/opentelemetry-collector-contrib:latest

Export Environment Variables#

Configure the Portal Sample to send its built-in OpenTelemetry metrics to your collector. Navigate to the Omniverse on DGX Cloud Portal Sample instance and configure the following environment variables:

export OTEL_EXPORTER_OTLP_ENDPOINT="http://<IP_OF_OTEL_INSTANCE>:4317"
export OTEL_SERVICE_NAME="web-streaming-backend"

Note

The IP Address or service name can be used for the OTEL_EXPORTER_OTLP_ENDPOINT.

Metrics export verification#

Check Collector Status:

docker logs -f otel-collector

Verify metrics export:

  1. Navigate to the Portal instance

cd ov-dgxc-portal-sample/backend
  1. Test the metrics export

poetry run test-metrics

The expected output is:

Testing OpenTelemetry metrics...
Recording session start...
Incrementing active sessions...
Recording session end...
Decrementing active sessions...
Metrics recorded. Check your collector/backend for the data.
Waiting 10 seconds to ensure export...

To generate session activity from the Sample Portal, start streaming sessions from within it.

Confirm Telemetry on Azure Monitor#

  1. Log into the Azure Portal. Once logged in, navigate to -> Monitor -> Application Insights -> Metrics

  2. Select the appropriate metric namespace.

Sample Azure Monitor Queries#

Active Sessions Monitoring:

customMetrics
| where name == "sessions.active.count"
| extend session_app = tostring(customDimensions.session_app)
| extend session_user = tostring(customDimensions.session_user)
| extend nvcf_function_id = tostring(customDimensions.nvcf_function_id)
| project timestamp, name, value, session_app, session_user, nvcf_function_id

Active Session Duration:

customMetrics
| where name == "sessions.duration"
| extend session_app = tostring(customDimensions.session_app)
| extend session_user = tostring(customDimensions.session_user)
| extend nvcf_function_id = tostring(customDimensions.nvcf_function_id)
| project timestamp, name, value, session_app, session_user, nvcf_function_id

Usage Trends:

customMetrics
| where name == "sessions.start.count"
| extend session_app = tostring(customDimensions.session_app)
| summarize session_starts = count() by bin(timestamp, 1h), session_app
| render timechart