Error Handling

This document outlines the how the Execution Framework (i.e. EF) handles errors. EF errors fit into one of the following broad categories:

  • Memory allocation errors.

  • Invalid pointers passed to the API.

  • Unmet API preconditions.

  • Failure to build the execution graph.

  • Failure to execute.

  • Failure to retrieve a node’s data.

Most API’s in EF are expected to never fail and as such do not return a result indicating success or failure. The general approach taken by EF is to terminate the program when unrecoverable errors or programmer errors are detected. For errors generated by plugins (i.e. developer authored executors and passes), it is up to the developer to report errors via either the integration layer (e.g. omni.kit.exec.core) or authoring layer (e.g. omni.graph.core).

The following sections explore the topics above in-depth.

Memory Allocation Errors

EF allocates memory on the heap during both graph construction and execution. The size of each allocation is generally small (less than 1KB). Because of the small size of each allocation, if an allocation fails, EF considers the system’s memory to be exhausted and no reasonable action can be taken to free memory. The system is in a bad state, and as such, EF terminates the application.

This termination happens in two ways:

Invalid Pointers

EF is a low-level API designed with speed in mind. As such, EF spends little time validating and report bad input to its API. The expectation is that the developer is providing valid input. When invalid input is provided, EF immediately terminates the application.

While seemingly harsh, this “fail-fast” approach has several benefits:

  • Developers often neglect to handle errors returned from APIs. This neglect can lead to the application being in an unexpected state and generate hard to find bugs.

  • By failing-fast and terminating the application, API misuse is captured by Omniverse’s Carbonite Crash Reporter. During local development, the crash reporter immediately reports the stack trace of any API misuse. During testing, the reporter logs the API misuse and generates telemetry. This telemetry can aggregated and examined to find API misuse across Omniverse’s suite of products before said products ship to customers.

To implement this fail-fast strategy, EF primarily uses two macros: OMNI_GRAPH_EXEC_ASSERT() and OMNI_GRAPH_EXEC_FATAL_UNLESS_ARG().

OMNI_GRAPH_EXEC_ASSERT() is used to validate that a supplied pointer is not nullptr. Its use is preferred when the pointer will be dereferenced by the function before it returns. The reason for this is two-fold:

  • OMNI_GRAPH_EXEC_ASSERT() checks the given pointer only in debug builds. This means there is no performance penalty in release builds.

  • Since the pointer will be used by the function performing the check, in release builds a crash will be generated (and reported) due to dereferencing the null pointer.

The latter point suggests OMNI_GRAPH_EXEC_ASSERT() is not strictly needed. While true, OMNI_GRAPH_EXEC_ASSERT() serves as “code as documentation” and provides a helpful message when the check fails.

Following you can see an example of when it is appropriate to use OMNI_GRAPH_EXEC_ASSERT().

void printName(INode* node) noexcept
{
    OMNI_GRAPH_EXEC_ASSERT(node); // prints a useful message in debug builds if node is nullptr

    // if node is nullptr, a crash will be triggered and reported in the release build.
    //
    // prefer using OMNI_GRAPH_EXEC_ASSERT() to check if an input parameter is nullptr when
    // the pointer is immediately used by the function.  this mean you'll get a helpful message in
    // debug builds and an easy to debug crash in release builds.
    std::cout << node->getName() << std::endl;
}

The next macro used is OMNI_GRAPH_EXEC_FATAL_UNLESS_ARG(). EF prefers using this macro when the input pointer is not immediately used, but rather stored for later use. OMNI_GRAPH_EXEC_FATAL_UNLESS_ARG() has the benefit of performing the nullptr check in both debug and release builds. By checking the pointer in both build flavors, we avoid hard to debug situations where the stored pointer is later used and unexpectedly nullptr. When encountering such a situation, questions such as “Was the pointer passed nullptr?” or “Was the stored pointer corrupted due to an overrun?” are reasonable. Checking for nullptr when the pointer is stored, helps answer questions like these much easier.

Below, you can see an example use of OMNI_GRAPH_EXEC_FATAL_UNLESS_ARG().

void MyObject::setDef(IDef* def) noexcept
{
    // prints a useful message in both release and debug builds if def is nullptr
    OMNI_GRAPH_EXEC_FATAL_UNLESS_ARG(node);

    // here we store def for later use.  by checking if def is nullptr above, we can quickly
    // debug why m_def is nullptr when later used.
    m_def = def;
}

Unmet Preconditions

To avoid the generation of hard to investigate bugs, EF lists expected preconditions for each part of its API and terminates the program if any of these preconditions are not met. Preconditions that are not nullptr checks are usually checked with the OMNI_GRAPH_EXEC_FATAL_UNLESS() macro. This macro performs the precondition check in both release and debug builds. An example of one of these checks follows:

PassTypeRegistryEntry getPassAt_abi(uint64_t index) noexcept override
{
    OMNI_GRAPH_EXEC_FATAL_UNLESS(index < passes.size());

    return { passes[index].id, passes[index].name.c_str(), passes[index].factory.get(),
             &(passes[index].nameToMatch), passes[index].priority };
}

For hot code paths, OMNI_GRAPH_EXEC_ASSERT() can be used to eliminate the performance cost of these checks in release builds.

Failure to Build the Execution Graph

Graph construction is handled by user plugins via passes. The main method in these passes is the run() method (e.g. IPopulatePass::run()). run() does not report errors. It is up to the implementor of run() to handle and report errors.

How a developer handles errors is their choice. They may choose to flag to the integration layer that the graph should not be executed. They may choose to populate the graph with “pass-through” nodes. They may choose to report the error via an authoring level API or an integration layer API.

The main message here is that EF assumes graph construction will succeed and if it does not, it’s up to the developer to handle and report the failure during construction and ensure the program is in a defined state.

Failure During Graph Execution

Failures are expected during graph execution. For example, it is reasonable to assume that a node that makes an I/O request, may periodically fail. EF’s execution APIs are designed to flag that a task failed, but that’s it. EF does not contain APIs to describe the failure or even associate a failure with nodes or definitions.

EF’s execution APIs generally return a Status object, which is a bit-field of possible execution outcomes. When using the default ExecutorFallback, nodes downstream of failing node are still executed and their resulting Status or’d together. The end result is a Status which encodes that some of the downstream nodes failed while some succeeded. Which nodes failed and which succeeded aren’t reported.

Again, much like graph construction, it is up to the developer, during graph execution, to handle and report errors as they see fit. Implementors of IExecutor may choose to report errors via their authoring layer or even stop executing nodes. EF provides enough information to let the developer grossly know what happened and react appropriately.

Failure to Retrieve Node Data

Node data needed by the graph during construction and execution is stored in IExecutionContext. This context allows each instance of a node to store arbitrary data based on the node’s path and a user defined key. The data is accessed with the IExecutionContext::getNodeData() method, which returns a pointer to the data.

The pointer returned by this method may be nullptr. Here we run into a design decision. Does nullptr mean the data was never set or does it mean the data was set, but set to nullptr?

EF is designed to allow for the latter scenario. A returned nullptr means the data was explicitly set to nullptr.

In order to handle the case where the data was never set, IExecutionContext::getNodeData() returns an omni::expected. omni::expected contains either the “expected” value or an “unexpected” value. For IExecutionContext::getNodeData(), it contains the value of the pointer set by the user or a omni::core::Result with a value of omni::core::kResultNotFound.

An example of valid usage of this API is as follows:

auto data = OMNI_GRAPH_EXEC_GET_NODE_DATA_AS(
    task->getContext(),        // pointer to either IExecutionContext or IExecutionStateInfo
    GraphContextCacheOverride, // the type of the data to retrieve
    task->getUpstreamPath(),   // node path
    tokens::kInstanceContext   // key to use as a lookup in the node's key/value datastore
);

if (data)
{
    GraphContextCacheOverride* item = data.value();
    // ...
}
else
{
    omni::core::Result badResult = data.error(); // e.g. kResultNotFound (see docs)
    // ...
}

An alternative usage of the API, can be seen here:

auto data = OMNI_GRAPH_EXEC_GET_NODE_DATA_AS(
    task->getContext(),        // pointer to either IExecutionContext or IExecutionStateInfo
    GraphContextCacheOverride, // the type of the data to retrieve
    task->getUpstreamPath(),   // node path
    tokens::kInstanceContext   // key to use as a lookup in the node's key/value datastore
).data(); // will throw an exception if the result is unexpected

Above, by not checking if the omni::expected has an unexpected value, omni::expected will throw an exception. This exception can be caught by the developer. If the exception is not caught, it will eventually reach an ABI boundary, call std::unexpected(), and terminate the program. Such a strategy is useful when the missing node data represents an unexpected state in the program.

Exceptions

EF does not use exceptions to report errors. Rather, it uses the error reporting strategies outlined above.

This fact introduces two questions developers may ask:

  • Can I use exceptions in my EF plugin?

  • What happens if I throw an exception and don’t catch it?

Developers are free to use exceptions in their plugins. However, if an exception crosses an ABI boundary (i.e. escapes a function postfixed with _abi), the following will happen:

  • The C++ runtime will invoke std::unexpected(), which by default calls std::terminate().

  • In Omniverse applications, std::terminate() has been set to be handled by Omniverse’s Carbonite Crash Reporter. The reporter will generate a .dmp file for later inspection, print out a stack trace, upload the .dmp to Omniverse’s crash aggregation system, and produce telemetry describing the context of the crash.

In short, developers should feel free to use exceptions. If an exception can be handled, they should by caught and appropriate cleanup actions performed. If an exception represents an undefined state, it can be ignored so that it is reported by the crash reporting system, which will terminate the ill-defined application.