Metrics Facility

About

The Metrics Facility allows the collecting of user defined metrics and allows for them to be scraped or pushed to a metrics stack. By default Prometheus based metrics are used exposing metric types such as gauges, counters, histograms etc.

When enabled, the extension will collect some default metrics for each endpoint but, via the facility, custom metrics can be collected and exposed making it trivial for service developers to provide key insight into how their services are performing in a standardized way.

Configuration

The Metrics Facility can be accessed from a Kit Service by first registering the Facility’s instance on the Service’s router:

metrics/extension.py
 1import omni.ext
 2
 3from omni.services.core import main
 4from omni.services.facilities.monitoring.metrics.facilities import MetricsFacility
 5
 6from .services.sample import router
 7
 8
 9class SampleMetricsFacilityExtension(omni.ext.IExt):
10    """Sample Extension illustrating usage of the Progress Facility."""
11
12    def on_startup(self) -> None:
13        router.register_facility(name="metrics", facility_inst=MetricsFacility("sample"))
14        main.register_router(router=router, prefix="/sample-progress", tags=["example"])
15
16    def on_shutdown(self) -> None:
17        main.deregister_router(router=router, prefix="/sample-progress")

Usage

Once configured, the Metrics Facility then becomes available to the Service’s endpoints via dependency injection:

metrics/services/sample.py
 1from omni.services.core import routers
 2from omni.services.facilities.monitoring.progress.facilities import MetricsFacility
 3
 4router = routers.ServiceAPIRouter()
 5
 6
 7@router.post("/sample-endpoint")
 8async def sample(
 9    metrics_facility: MetricsFacility = router.get_facility("metrics"),
10) -> Dict:
11    frame_count = metrics_facility.counter(
12        name="processed_frames",
13        documentation="Number of frames processed for a given USD stage",
14        labelnames=("stage", "agent", "camera"),
15        unit="frames",
16    )
17    process_time = metrics_facility.summaries(
18        name="process_time",
19        documentation="Time taken to process a frame",
20        labelnames=("stage", "agent", "camera"),
21    )
22
23    # [...]
24
25    for frame in frame_range:
26        with process_time.labels(stage=stage_name, agent=agent, camera=camera).time():
27            await _process_next_frame()
28        frame_count.labels(stage=stage_name, agent=agent, camera=camera).inc()
29
30    return {"success": True}

For a service that is “always on”, it is then possible to scrape the metrics on the /metrics endpoint by default. To set and use a specific job name, use the following setting:

metrics/config/extension.toml
1[settings]
2exts."services.monitoring.metrics".job_name = "sample-service"

For services that shutdown after each request, for example in a “Function as a service” type set up, it is also possible to push the metrics periodically to a pushgateway. This can be configured in the settings using the following settings:

  • push_metrics: Boolean indicating if metrics should be pushed (default: false)

  • push_gateway: String endpoint to push the metrics to (default: http://localhost:9091, this assumes a Prometheus Pushgateway to be installed and running on the local machine, remote hosts can be used as well)

  • push_interval: Integer value indicating how many seconds should pass between pushing the metrics.

metrics/app/example.kit
1[settings]
2exts."services.monitoring.metrics".push_metrics = true
3exts."services.monitoring.metrics".push_gateway = "http://localhost:9091"
4exts."services.monitoring.metrics".push_interval = 10

Implementation References

For examples of the Progress Facility being used in the Omniverse Farm ecosystem, consult the following extensions:

  • omni.services.collect: Bulk collection of USD assets

  • omni.services.render: Rendering service for USD stages