country_code

Observability (Logs and Metrics)#

_images/ov_cloud_banner.jpg

Overview#

This section documents how to configure observability for Omniverse on DGX Cloud. The solution captures Kit application logs and forwards them to your chosen observability platform (Grafana Cloud, Datadog, Azure Monitor) in OpenTelemetry (OTel) format, along with NVCF, and Portal Sample platform metrics.

Key Capabilities#

  • Real-time OV application log capture and forwarding

  • Forward NVCF platform metrics as part of observability

  • OpenTelemetry-compliant log transformation

  • Automatic health monitoring and recovery

  • Seamless integration with Kit-based applications

Terms Used#

Term

Explanation

Vector

Refers to vector.dev, an open source log processor

NVCF

NVIDIA Cloud Functions

KitVector

Refers to the Kit App & vector integrated container

OTel

CNCF project; Open Telemetry used for standardizing telemetry format

OTel Collector

NVCF cluster’s Open Telemetry collector; used to export telemetry

Architecture#

The architecture diagram below illustrates the complete data flow from Kit application logs to your observability platform.

_images/byoo_architecture.png
  1. The Kit Application is embedded with an open-source log processor (vector.dev in this case) to scrape stdout/stderr or log files. Vector is embedded within the same container.

  2. Vector has the following functionalities:

    • Log Capture: Scraping the Stdout/Stderr or a log file location of the Kit Application.

    • Transformation: Transform plaintext logs into OpenTelemetry (OTel) format required by NVCF.

    • Forwarding: Forward the transformed Kit-OTel logs to the BYOO-OTel collector in the NVCF.

Components#

Component

Role

Kit Application

Existing OV Kit applications

Vector.dev

Open-source Log processor that captures, transforms, and forwards logs

Dockerfile

Vector & Kit integration within the same container

Custom Entrypoint

Orchestrates Vector configuration and application startup

Vector Configuration

Defines log source, transforms, and sinks. Base64 encoded

NVCF BYOO Collector

Platform component that routes logs to the observability endpoint

Customer Cloud BYOO

Observability endpoint to receive container telemetry

Container Architecture#

The enhanced container (Kit + Vector) includes:

  • Your existing Kit application and dependencies

  • Vector binary (/opt/vector/bin/vector)

  • Custom entrypoint script for orchestration

  • Health monitoring script

  • Vector configuration template

Important Considerations#

This procedure document covers:

Kit App & Vector integration:

  • Required files and utilities for co-packaging Kit applications with Vector log processor

  • Container build process and configuration

  • Environment variable setup and validation

Known Limitations & Mitigation#

  • Kit application logs do not implement a Log File rotation policy currently. This is a known application limitation and is being addressed for engineering. This vector BYOO solution does not implement a log file rotation mechanism.

  • Verbose Kit logs can be enabled as a Kit argument during function creation. Verbose Kit logs, while useful for debugging and monitoring, pose risks. The Vector agent captures and redirects all Kit application output (stdout/stderr) to a local log file. However, for continuous applications, verbose logging can exhaust disk space and increase observability costs due to the high data volume. To address this, a feature request has been submitted to implement granular verbose logging levels, aiming to balance detail with resource efficiency.

Prerequisites#

  • Follow the Developing and Containerizing Apps documentation. If the previous steps were followed, a local Docker container of the OV Kit App should now be available (refer to the document guidelines).

  • Observability backend: This guide provides an example to configure Azure Monitor. Steps to configure Grafana Cloud & Datadog are provided in the NVCF Observability Guide.

  • vector.dev: Vector is open-source, and platform specific application that can be downloaded from the following link. The current document has been verified against Vector version 0.46.1. We are using vector-0.46.1-x86_64-unknown-linux-gnu.tar.gz for these instructions.