Efficiently processing sets of prims with USDRT Scenegraph API
About
Usdrt has new APIs for efficiently processing sets of prims selected from the stage according to some criteria. The APIs support processing on CPU and GPU, in C++ and Python (using NVIDIA Warp). Possible uses include:
A physics system updating all physics meshes.
A clash detector finding collisions between all tagged meshes.
A renderer rendering all meshes.
An animation system updating all skeletons.
A shader compiler compiling all materials.
A procedural mesh system generating meshes for all prims containing input parameters.
Using the new APIs has the following advantages compared to the standard USDRT APIs for finding and iterating over prims.
By leveraging Fabric’s stage index, the new APIs find the set of prims in constant time, as opposed to visiting every prim on the stage using a traversal. For example, if you want to select 1000 prims from a stage of 1000000, the search cost is some constant k, not k*1000000 or even k*1000.
The new APIs give access to Fabric’s vectorized memory representation of the selected prims’ attributes, which allows fast, parallelizable access on CPU or GPU.
Accessing the N selected prims using the new APIs is faster than calling the existing APIs N times, as it allows USDRT to amortize overheads. This gives significant performance benefit with N as low as 100.
This document reviews these APIs, their behavior, and how you can leverage them to increase performance in your own application or extension. It first describes how to use the new APIs to select which prims and attributes to process, and then how to use a new iterator (on CPU or GPU) to get fast access to the attribute data as we process it.
Selecting which prims and attributes to process
API
The UsdStage has a new API, SelectPrims, that we use to select the subset of the stage’s prims that we want to work on. We can select prims according to their type and/or what applied schemas they have. Additionally, SelectPrims has a parameter that we use to select the subset of the prims’ attributes that we want to access or modify, and the prim selection will only include prims that have them. Although maybe less useful, it’s also possible to leave prim type and applied schema unspecified, and just find all prims that have particular attributes. SelectPrims is defined as follows.
C++
RtPrimSelection UsdStage::SelectPrims(std::vector<TfToken> requireAppliedSchema,
std::vector<AttrSpec> requireAttrs,
carb::cpp17::optional<TfToken> requirePrimType = {},
uint32_t device = kDeviceCPU)
The Device parameter is an integer that specifies whether to make the RtPrimSelection on CPU or GPU, and specifies a particular GPU on multiple GPU systems. Set device to kDeviceCPU to run on CPU, or 0 to run on CUDA GPU 0. Running on other GPUs in multi GPU systems is not currently supported.
Python
Usd.Stage.SelectPrims(device: str,
require_applied_schemas=None: list(str),
require_attrs=None: list(tuple),
require_prim_type=None: str)
In Python the device is specified by a string, set device to “cpu” to use the CPU, or “cuda:0” to run on CUDA GPU 0. Running on other GPUs in multi GPU systems is not currently supported.
In both C++ and Python, SelectPrims returns an object of a new type, RtPrimSelection, described in a later section, Processing prims by iterating over the selection.
Populating the stage into Fabric
Unlike the USDRT query APIs, SelectPrims searches only USD prims that are present in Fabric, not the whole USD stage. For the examples in this document we want to search USD stages loaded from files, so we need the following code to populate Fabric.
C++
// Load a USD stage and populate Fabric with it
using namespace usdrt;
UsdStageRefPtr stage = UsdStage::Open("./data/usd/tests/cornell.usda");
for (UsdPrim prim : stage->Traverse())
{
} // Populate Fabric
Python
# Load a USD stage and populate Fabric with it
from usdrt import Sdf, Usd, Rt
stage = Usd.Stage.Open(TEST_DIR + "/data/usd/tests/cornell_with_physics.usda")
for prim in stage.Traverse(): # Populate Fabric
pass
Example 1: Selecting prims by type
The following example shows how to select all the meshes on the stage using C++ and Python.
C++
using namespace usdrt;
RtPrimSelection selection = stage->SelectPrims({}, {}, TfToken("Mesh"), kDeviceCpu);
std::cout << "Found " << selection.GetCount() << "meshes\n";
Python
selection = stage.SelectPrims(require_prim_type="Mesh", device="cpu")
print(f"Found {selection.GetCount()} meshes")
Example 2: Selecting prims by applied schema
With SelectPrims we can also select prims according to which applied schemas they have. The following example shows how to select all physics meshes, which are prims that have type “Mesh” and applied schema “PhysicsRigidBodyAPI”.
C++
using namespace usdrt;
std::vector<TfToken> requireAppliedSchemas = { "PhysicsRigidBodyAPI" };
RtPrimSelection selection = stage->SelectPrims(requireAppliedSchemas, {}, TfToken("Mesh"), kDeviceCpu);
std::cout << "Found " << selection.GetCount() << "physics meshes\n";
Python
selection = stage.SelectPrims(
require_prim_type="Mesh", require_applied_schemas=["PhysicsRigidBodyAPI"], device="cpu"
)
print(f"Found {selection.GetCount()} physics meshes")
Example 3: Specifying which attributes to access or modify
So far we’ve selected a subset of the stage’s prims, next we need to specify which of their attributes we want to access. The following example selects all meshes that have a displayColor, and specifies that we want read/write access to it.
C++
TfToken requireType = TfToken("Mesh");
std::vector<AttrSpec> requireAttrs = { AttrSpec{ usdrt::SdfValueTypeNames->Color3fArray,
UsdGeomTokens->primvarsDisplayColor, AccessType::eReadWrite } };
RtPrimSelection selection = stage->SelectPrims({}, requireAttrs, requireType, kDeviceCpu);
std::cout << "Found " << selection.GetCount() << "meshes with displayColor\n";
Python
selection = stage.SelectPrims(
require_attrs=[
(Sdf.ValueTypeNames.Color3fArray, UsdGeom.Tokens.primvarsDisplayColor, Usd.Access.ReadWrite)
],
require_prim_type="Mesh",
device="cpu",
)
print(f"Found {selection.GetCount()} meshes with displayColor")
This example requests read-write access to displayColor. Other options for access type are read-only and overwrite. Read/write and write access cause Fabric’s change tracker to log the access, and this has a cost. So it’s best to use read-only access for attributes that we only read. Overwrite tells USDRT that we will overwrite any existing data. This allows a performance optimization, because USDRT doesn’t need to initialize data that it knows is about to be overwritten. This is even more important for GPU, because Fabric doesn’t need to send arrays from CPU to GPU or vice versa when it knows they are about to be overwritten.
Processing prims by iterating over the selection
Once we’ve made a prim selection we can iterate over it and do some computation. We’ll look at how to do that first in Python, then in C++.
Python
To process the selected prims in Python we use NVIDIA warp to iterate over the prims and apply a kernel function to each. The advantage of using NVIDIA warp is that it allows us to iterate and apply the python kernel on either CPU or GPU. To pass attributes to the kernel function we need to wrap each with warp.fabricarray, and then launch the function using warp.launch. The definitions of warp.fabricarray and warp.launch are as follows.
warp.fabricarray(view: dict, attrib: str)
warp.launch(kernel, dim, inputs, device)
Example 4: Setting world position
In this example we’ll select all the physics meshes on the stage and change their world position. We’ll select the prims for processing on CUDA GPU 0, but if we wanted to use the CPU instead the only change we’d need to make would be to set the device parameter to “cpu”.
selection = stage.SelectPrims(
require_applied_schemas=["PhysicsRigidBodyAPI"],
require_attrs=[(Sdf.ValueTypeNames.Double3, "_worldPosition", Usd.Access.ReadWrite)],
require_prim_type="Mesh",
device="cuda:0",
)
Next we define the kernel function we want to apply to the selected prims.
@wp.kernel(enable_backward=False)
def move_prims(
pos: wp.fabricarray(dtype=wp.vec3d),
):
i = wp.tid()
old_pos = pos[i]
pos[i] = wp.vec3d(old_pos.x + 1, old_pos.y, old_pos.z)
Finally we run the kernel on the selection using warp.
positions = wp.fabricarray(selection.__fabric_arrays_interface__, "_worldPosition")
wp.launch(move_prims, dim=positions.size, inputs=[positions], device="cuda:0")
Example 5: Setting an array-valued attribute
In this example we’ll select all the meshes on the stage and change their color to red. A quirk of USD is that displayColor is an array-valued attribute, even when one color is applied to the whole mesh. We’ll use this example to demonstrate the use of array-valued attributes, even though in this case the array has size one. First we select all the meshes on the stage.
selection = stage.SelectPrims(
require_attrs=[
(Sdf.ValueTypeNames.Color3fArray, UsdGeom.Tokens.primvarsDisplayColor, Usd.Access.ReadWrite)
],
require_prim_type="Mesh",
device="cpu",
)
print(f"Found {selection.GetCount()} meshes with displayColor")
Next we define the kernel function we want to apply to the selected prims. The main difference compared to the last example is that the type of the colors parameter is fabricarrayarray instead of fabricarray. The parameter is an array-of-arrays, because the selection contains multiple prims, each with an array of colors. We index the array using colors[i, 0] to access color 0 of selected prim i.
@wp.kernel(enable_backward=False)
def change_colors(colors: wp.fabricarrayarray(dtype=wp.vec3d)):
i = wp.tid()
colors[i, 0] = [1, 0, 0]
Finally we make a warp.fabricarray for the attribute we want to pass to the kernel, and then launch it using warp. Note that here the array is a fabricarray, not a fabricarrayarray.
colors = wp.fabricarray(selection.__fabric_arrays_interface__, "primvars:displayColor")
wp.launch(self.change_colors, dim=colors.size, inputs=[colors], device="cpu")
C++
SelectPrims returns an RtPrimSelection, which has the following methods in C++.
AttributeRef GetRef(SdfValueTypeName type, const TfToken& name) const;
omni::fabric::batch::View GetBatchView() const;
size_t GetCount() const;
To iterate over the prim selection’s data on CPU or GPU we need to construct a batch::ViewIterator, using GetBatchView as follows.
batch::ViewIterator iter(selection.GetBatchView());
batch::ViewIterator has the following methods:
ViewIterator(const batch::View& view, const size_t index = 0)
bool peek();
bool advance(const size_t stride = 1);
template<typename T> const T& getAttributeRd(const AttributeRef& attributeRef) const;
template<typename T> T& getAttributeWr(const AttributeRef& attributeRef) const;
template<typename T> SpanOf<const T> getArrayAttributeRd(const AttributeRef& attributeRef) const;
template<typename T> SpanOf<T> getArrayAttributeWr(const AttributeRef& attributeRef) const;
size_t getGlobalIndex() const;
size_t getElementRangesIndex() const;
size_t getElementIndex() const;
size_t getElementCount(const AttributeRef& attributeRef) const;
uint64_t getPath() const;
Example 6: Setting an array-valued attribute in C++ on CPU
As in the Python example, we’ll select all the meshes on the stage and make them red. First we make a prim selection, get an AttributeRef for displayColor, make an RTiterator, then iterate over the data using a while loop.
TfToken requireType = TfToken("Mesh");
std::vector<AttrSpec> requireAttrs = { AttrSpec{ usdrt::SdfValueTypeNames->Color3fArray,
UsdGeomTokens->primvarsDisplayColor, AccessType::eReadWrite } };
RtPrimSelection selection = stage->SelectPrims({}, requireAttrs, requireType, kDeviceCpu);
AttributeRef colors = selection.GetRef(usdrt::SdfValueTypeNames->Color3fArray, UsdGeomTokens->primvarsDisplayColor);
batch::ViewIterator iter(selection.GetBatchView());
while (iter.advance())
{
iter.getArrayAttributeWr<GfVec3f>(colors).elements[0] = { 1.0f, 0.0f, 0.0f };
}
Example 7: Setting an array-valued attribute in C++ on GPU
Now we’ll port the previous example to the GPU. First we will define a CUDA kernel to change the colors.
__global__ void changeColors_impl(const omni::fabric::batch::View view, omni::fabric::batch::AttributeRef displayColors)
{
const unsigned int index = blockIdx.x * blockDim.x + threadIdx.x;
const unsigned int gridStride = blockDim.x * gridDim.x;
omni::fabric::batch::ViewIterator iter(view, index);
while (iter.advance(gridStride))
{
iter.getArrayAttributeWr<carb::Float3>(displayColors).elements[0] = { 1, 0, 0};
}
}
void changeColors(const usdrt::RtPrimSelection selection, omni::fabric::batch::AttributeRef displayColors)
{
const dim3 blockDim(768);
const dim3 gridDim((selection.GetCount() + blockDim.x - 1) / blockDim.x);
changeColors_impl<<<gridDim, blockDim>>>(selection.GetBatchView(), displayColors);
}
Next we make a prim selection and AttributeRef as before, then launch the CUDA kernel.
TfToken requireType = TfToken("Mesh");
std::vector<AttrSpec> requireAttrs = { AttrSpec{ usdrt::SdfValueTypeNames->Color3fArray,
UsdGeomTokens->primvarsDisplayColor, AccessType::eReadWrite } };
RtPrimSelection selection = stage->SelectPrims({}, requireAttrs, requireType, kDeviceCuda0);
AttributeRef colors = selection.GetRef(usdrt::SdfValueTypeNames->Color3fArray, UsdGeomTokens->primvarsDisplayColor);
// Launch CUDA kernel
changeColors(selection, colors);