Metrics Facility

About

The Metrics Facility allows the collecting of user defined metrics and allows for them to be scraped or pushed to a metrics stack. By default Prometheus based metrics are used exposing metric types such as Gauges, Counters, Histograms etc.

When enabled the extension will collect some default metrics for each endpoint but, via the facility, custom metrics can be collected and exposed making it trivial for service developers to provide key insight into how their services are performing in a standardized way.

Configuration

The Metrics Facility can be accessed from a Kit Service by first registering the Facility’s instance on the Service’s router:

metrics/extension.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
import omni.ext

from omni.services.core import main
from omni.services.facilities.monitoring.metrics.facilities import MetricsFacility

from .services.sample import router


class SampleMetricsFacilityExtension(omni.ext.IExt):
    """Sample Extension illustrating usage of the Progress Facility."""

    def on_startup(self) -> None:
        router.register_facility(name="metrics", facility_inst=MetricsFacility("sample"))
        main.register_router(router=router, prefix="/sample-progress", tags=["example"])

    def on_shutdown(self) -> None:
        main.deregister_router(router=router, prefix="/sample-progress")

Usage

Once configured, the Metrics Facility then becomes available to the Service’s endpoints via dependency injection:

metrics/services/sample.py
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
import asyncio

from omni.services.core import routers
from omni.services.facilities.monitoring.progress.facilities import MetricsFacility

router = routers.ServiceAPIRouter()


@router.post("/sample-endpoint")
async def sample(
    metrics_facility: MetricsFacility = router.get_facility("metrics"),
):

    frame_count = metrics_facility.counter("processed_frames", "Number of frames processed for a given USD stage", labelnames=("stage", "agent", "camera"), unit="frames")
    process_time = metrics.summaries("process_time", "Time taken to process a frame", labelnames=("stage", "agent", "camera"))

    for frame in frame_range:
        with process_time.labels(stage=stage_name, agent=agent, camera=camera).time():
           await _process_next_frame()
        frame_count.labels(stage=stage_name, agent=agent, camera=camera).inc()

    return {"success": True}

For a service that is ‘always on’ it is then possible to scrape the metrics on the /metrics endpoint by default. To set and use a specific job name use the following setting:

metrics/config/extension.toml
1
2
[settings]
exts."services.monitoring.metrics".job_name = "sample-service"

For services that shutdown after each request, for example in a Function as a service type set up it is also possible to push the metrics periodically to a pushgateway. This can be configured in the settings using the following settings:

  • push_metrics: Boolean indicating if metrics should be pushed (default: false)

  • push_gateway: String endpoint to push the metrics to (default: http://localhost:9091, this assumes a Prometheus Pushgateway to be installed and running on the local machine, remote hosts can be used as well)

  • push_interval: Integer value indicating how many seconds should pass between pushing the metrics.

metrics/app/example.kit
1
2
3
4
[settings]
exts."services.monitoring.metrics".push_metrics = true
exts."services.monitoring.metrics".push_gateway = "http://localhost:9091"
exts."services.monitoring.metrics".push_interval = 10

Implementation References

For examples of the Progress Facility being used in the Omniverse Farm ecosystem, consult the following extensions:

  • omni.services.collect: Bulk collection of USD assets

  • omni.services.render: Rendering service for USD stages