Annotators Information

Annotators provide you with the labeled synthetic data. Within replicator there are multiple off-the-shelf annotators. Through this tutorial we will show you what annotated data you can get and the output format.

Annotator Registry

The annotator registry is where all of the Annotators are. To get access to them, you can use the function AnnotatorRegistry.get_annotator(). The current annotators that are available through the registry are:

  • RGB

  • Normals

  • Bounding Box 2d Loose

  • Bounding Box 2d Tight

  • Bounding Box 3d

  • Distance to Camera

  • Distance to Image Plane

  • Semantic Segmentation

  • Instance ID Segmentation

  • Instance Segmentation

  • Point Cloud

  • Pose

  • Motion Vectors

  • Cross Correspondence

Some annotators support intiation parameters. For example, bounding box annotators can be parametrized with valid semantic types to filter out other semantics. In the code line bellow we get the bounding box tight annotator.

AnnotatorRegistry.get_annotator("bounding_box_2d_tight", init_params={"semanticTypes": semantic_types})

To see how annotators are used within a writer, we have prepared scripts that implement the basic writer which covers all standard annotators. How to get there is shown in Scripts for Replicator.

Annotator Output

Below is a short description of the output. For more information you can consult the scrip annotator_registry.py. (Follow instructions in Scripts for Replicator to find them).

RGB

The “rgb” annotator produces a array of type np.uint8 with shape (width, height, 4), where the four channels correspond to R,G,B,A.

Example

import omni.replicator.core as rep

async def test_rgb():
    cone = rep.create.cone()

    cam = rep.create.camera(position=(500,500,500), look_at=cone)
    rp = rep.create.render_product(cam, (1024, 512))

    rgb = rep.AnnotatorRegistry.get_annotator("rgb")
    rgb.attach(rp)

    await rep.orchestrator.step_async()
    data = rgb.get_data()
    print(data.shape, data.dtype)   # ((512, 1024, 4), uint8)

import asyncio
asyncio.ensure_future(test_rgb())

Nomals

The “normals” annotator produces an array of type np.float32 with shape (height, width, 4). The first three channels correspond to (x, y, z). The fourth channel is unused.

Example

import omni.replicator.core as rep

async def test_normals():
    cone = rep.create.cone()

    cam = rep.create.camera(position=(500,500,500), look_at=cone)
    rp = rep.create.render_product(cam, (1024, 512))

    normals = rep.AnnotatorRegistry.get_annotator("normals")
    normals.attach(rp)

    await rep.orchestrator.step_async()
    data = normals.get_data()
    print(data.shape, data.dtype)   # ((512, 1024, 4), float32)

import asyncio
asyncio.ensure_future(test_normals())

Bounding Box 2D Loose

Outputs loose 2d bounding box of each entity with semantics in the camera’s field of view. Loose bounding boxes bound the entire entity regardless of occlusions.

Initialization Parameters

  • semanticTypes: List of allowed semantic types the types. For example, if semantic_types is [“class”], only the bounding boxes for prims with semantics of type “class” will be retrieved.

Output Format

The bounding box annotator returns a dictionary with the bounds and semantic id found under the “data” key, while other information is under the “info” key: “idToLabels”, “bboxIds” and “primPaths”.

{
    "data": np.dtype(
                [
                    ("semanticId", "<u4"),
                    ("x_min", "<i4"),
                    ("y_min", "<i4"),
                    ("x_max", "<i4"),
                    ("y_max", "<i4"),
                ],
    "info": {
        "idToLabels": {<semanticId>: <semantic_labels>},    # mapping from integer semantic ID to a comma delimited list of associated semantics
        "bboxIds": [<bbox_id_0>, ..., <bbox_id_n>],         # ID specific to bounding box annotators allowing easy mapping between different bounding box annotators.
        "primPaths": [<prim_path_0>, ... <prim_path_n>],    # prim path tied to each bounding box
    }
}

Other Notes

  • bounding_box_2d_loose will produce the loose 2d bounding box of any prim in the viewport, no matter if is partially occluded or fully occluded.

Example

import omni.replicator.core as rep

async def test_bbox_2d_loose():
    cone = rep.create.cone(semantics=[("prim", "cone")], position=(100, 0, 0))
    sphere = rep.create.sphere(semantics=[("prim", "sphere")], position=(-100, 0, 0))
    invalid_type = rep.create.cube(semantics=[("shape", "boxy")], position=(0, 100, 0))

    cam = rep.create.camera(position=(500,500,500), look_at=cone)
    rp = rep.create.render_product(cam, (1024, 512))

    bbox_2d_loose = rep.AnnotatorRegistry.get_annotator("bounding_box_2d_loose", init_params={"semanticTypes": ["prim"]})
    bbox_2d_loose.attach(rp)

    await rep.orchestrator.step_async()
    data = bbox_2d_loose.get_data()
    print(data)
    # {
    #   'data': array([
    #       (0, 443, 198, 581, 357),
    #       (1, 245,  92, 375, 220),
    #       dtype=[('semanticId', '<u4'),
    #              ('x_min', '<i4'),
    #              ('y_min', '<i4'),
    #              ('x_max', '<i4'),
    #              ('y_max', '<i4')]),
    #   'info': {
    #       'bboxIds': array([0, 1], dtype=uint32),
    #       'idToLabels': {'0': {'prim': 'cone'}, '1': {'prim': 'sphere'}},
    #       'primPaths': ['/Replicator/Cone_Xform_03', '/Replicator/Sphere_Xform_03']}
    #   }
    # }

import asyncio
asyncio.ensure_future(test_bbox_2d_loose())

Bounding Box 2D Tight

Outputs tight 2d bounding box of each entity with semantics in the camera’s viewport. Tight bounding boxes bound only the visible pixels of entities. Completely occluded entities are ommited.

Initialization Parameters

  • semanticTypes: List of allowed semantic types the types. For example, if semantic_types is [“class”], only the bounding boxes for prims with semantics of type “class” will be retrieved.

Output Format

The bounding box annotator returns a dictionary with the bounds and semantic id found under the “data” key, while other information is under the “info” key: “idToLabels”, “bboxIds” and “primPaths”.

{
    "data": np.dtype(
                [
                    ("semanticId", "<u4"),
                    ("x_min", "<i4"),
                    ("y_min", "<i4"),
                    ("x_max", "<i4"),
                    ("y_max", "<i4"),
                ],
    "info": {
        "idToLabels": {<semanticId>: <semantic_labels>},    # mapping from integer semantic ID to a comma delimited list of associated semantics
        "bboxIds": [<bbox_id_0>, ..., <bbox_id_n>],         # ID specific to bounding box annotators allowing easy mapping between different bounding box annotators.
        "primPaths": [<prim_path_0>, ... <prim_path_n>],    # prim path tied to each bounding box
    }
}

Other Notes

  • bounding_box_2d_tight bounds only visible pixels.

Example

import omni.replicator.core as rep

async def test_bbox_2d_tight():
    cone = rep.create.cone(semantics=[("prim", "cone")], position=(100, 0, 0))
    sphere = rep.create.sphere(semantics=[("prim", "sphere")], position=(-100, 0, 0))
    invalid_type = rep.create.cube(semantics=[("shape", "boxy")], position=(0, 100, 0))

    cam = rep.create.camera(position=(500,500,500), look_at=cone)
    rp = rep.create.render_product(cam, (1024, 512))

    bbox_2d_tight = rep.AnnotatorRegistry.get_annotator("bounding_box_2d_tight", init_params={"semanticTypes": ["prim"]})
    bbox_2d_tight.attach(rp)

    await rep.orchestrator.step_async()
    data = bbox_2d_tight.get_data()
    print(data)
    # {
    #   'data': array([
    #       (0, 443, 198, 581, 357),
    #       (1, 245,  94, 368, 220),
    #       dtype=[('semanticId', '<u4'),
    #              ('x_min', '<i4'),
    #              ('y_min', '<i4'),
    #              ('x_max', '<i4'),
    #              ('y_max', '<i4')]),
    #   'info': {
    #       'bboxIds': array([0, 1], dtype=uint32),
    #       'idToLabels': {'0': {'prim': 'cone'}, '1': {'prim': 'sphere'}},
    #       'primPaths': ['/Replicator/Cone_Xform_03', '/Replicator/Sphere_Xform_03']}
    #   }
    # }

import asyncio
asyncio.ensure_future(test_bbox_2d_tight())

Bounding Box 3D

Outputs 3D bounding box of each entity with semantics in the camera’s viewport.

Initialization Parameters

  • semanticTypes: List of allowed semantic types the types. For example, if semantic_types is [“class”], only the bounding boxes for prims with semantics of type “class” will be retrieved.

Output Format

The bounding box annotator returns a dictionary with the bounds and semantic id found under the “data” key, while other information is under the “info” key: “idToLabels”, “bboxIds” and “primPaths”.

{
    "data": np.dtype(
                [
                    ("semanticId", "<u4"),
                    ("x_min", "<i4"),
                    ("y_min", "<i4"),
                    ("x_max", "<i4"),
                    ("y_max", "<i4"),
                    ("z_min", "<i4"),
                    ("z_max", "<i4"),
                    ("transform", "<i4"),
                ],
    "info": {
        "idToLabels": {<semanticId>: <semantic_labels>},    # mapping from integer semantic ID to a comma delimited list of associated semantics
        "bboxIds": [<bbox_id_0>, ..., <bbox_id_n>],         # ID specific to bounding box annotators allowing easy mapping between different bounding box annotators.
        "primPaths": [<prim_path_0>, ... <prim_path_n>],    # prim path tied to each bounding box
    }
}

Other Notes

  • bounding_box_3d are generated regardless of occlusion.

  • bounding box dimensions (<axis>_min, <axis>_max) are expressed in stage units.

Example

import omni.replicator.core as rep

async def test_bbox_3d():
    cone = rep.create.cone(semantics=[("prim", "cone")], position=(100, 0, 0))
    sphere = rep.create.sphere(semantics=[("prim", "sphere")], position=(-100, 0, 0))
    invalid_type = rep.create.cube(semantics=[("shape", "boxy")], position=(0, 100, 0))

    cam = rep.create.camera(position=(500,500,500), look_at=cone)
    rp = rep.create.render_product(cam, (1024, 512))

    bbox_3d = rep.AnnotatorRegistry.get_annotator("bounding_box_3d", init_params={"semanticTypes": ["prim"]})
    bbox_3d.attach(rp)

    await rep.orchestrator.step_async()
    data = bbox_3d.get_data()
    print(data)
    # {
    #   'data': array([
    #       (0, -50., -50., -50., 50., 49.9999, 50., [[   1.,    0.,    0.,    0.], [   0.,    1.,    0.,    0.], [   0.,    0.,    1.,    0.], [ 100.,    0.,    0.,    1.]]),
    #       (1, -50., -50., -50., 50., 50.    , 50., [[   1.,    0.,    0.,    0.], [   0.,    1.,    0.,    0.], [   0.,    0.,    1.,    0.], [-100.,    0.,    0.,    1.]]),
    #       dtype=[('semanticId', '<u4'),
    #              ('x_min', '<i4'),
    #              ('y_min', '<i4'),
    #              ('x_max', '<i4'),
    #              ('y_max', '<i4'),
    #              ("z_min", "<i4"),
    #              ("z_max", "<i4"),
    #              ("transform", "<i4")]),
    #   'info': {
    #       'bboxIds': array([0, 1], dtype=uint32),
    #       'idToLabels': {'0': {'prim': 'cone'}, '1': {'prim': 'sphere'}},
    #       'primPaths': ['/Replicator/Cone_Xform_03', '/Replicator/Sphere_Xform_03']}
    #   }
    # }

import asyncio
asyncio.ensure_future(test_bbox_3d())

Distance to Camera

Outputs a depth map from objects to camera positions. The “distance_to_camera” annotator produces a 2d array of types np.float32 with 1 channel.

Data Details

  • The unit for distance to camera is in meters (For example, if the object is 1000 units from the camera, and the meters_per_unit variable of the scene is 100, the distance to camera would be 10).

  • 0 in the 2d array represents infinity (which means there is no object in that pixel).

Distance to Image Plane

Outputs a depth map from objects to image plane of the camera. The “distance_to_image_plane” annotator produces a 2d array of types np.float32 with 1 channel.

Data Details

  • The unit for distance to image plane is in meters (For example, if the object is 1000 units from the image plane of the camera, and the meters_per_unit variable of the scene is 100, the distance to camera would be 10).

  • 0 in the 2d array represents infinity (which means there is no object in that pixel).

Semantic Segmenetation

Outputs semantic segmentation of each entity in the camera’s viewport that has semantic labels.

Output Format

  • Outputs semantic segmentation image:

  • If colorize is set to true, the image will be a 2d array of types np.uint8 with 4 channels.

  • Different colors represent different semantic labels.

  • If colorize is set to false, the image will be a 2d array of types np.uint32 with 1 channel, which is the semantic id of the entities.

  • id to labels json file.

  • If colorize is set to true, it will be the mapping from color to semantic labels.

  • If colorize is set to false, it will be the mapping from semantic id to semantic labels.

Data Details

  • Colorize: whether to output colorized semantic segmentation or non-colorized one.

  • Semantic_types: filter that filter the type of semantic labels we want. For example, if semantic_types is [“class”], only the semantic labels with type “class” will be output when writing the data.

Other Notes

  • The semantic labels of an entity will be the semantic labels of itself, plus all the semantic labels it inherit from its parent, and semantic labels with same type will be concatenated, separated by comma. For example, if an entity has a semantic label of [{“class”: “cube”}], and its parent has [{“class”: “rectangle”}]. Then the final semantic labels of that entity will be [{“class”: “rectangle, cube”}].

  • If an entity and its parent do not have any semantic labels required by the semantic_types, it will be ignored by the annotator in the final output.

Instance ID Segmentation

Outputs instance id segmentation of each entity in the camera’s viewport. The instance id is unique for each prim in the scene with different paths.

Output Format

  • Instance id segmentation image:

  • If colorize is set to true, the image will be a 2d array of types np.uint8 with 4 channels.

  • Different colors represent different instance ids.

  • If colorize is set to false, the image will be a 2d array of types np.uint32 with 1 channel, which is the instance id of the entities.

  • Id to labels json file:

  • If colorize is set to true, it will be the mapping from color to usd prim path of that entity.

  • If colorize is set to false, it will be the mapping from instance id to usd prim path of that entity.

Data Details

  • Colorize: whether to output colorized semantic segmentation or non-colorized one.

Other Notes

  • The instance id is assigned in a way that each of the leaf prim in the scene will be assigned to an instance id, no matter if it has semantic labels or not.

Instance Segmentation

Outputs instance segmentation of each entity in the camera’s viewport. The main difference between instance id segmentation and instance segmentation are that instance segmentation annotator goes down the hierarchy to the lowest level prim which has semantic labels, which instance id segmentation always goes down to the leaf prim.

Output Format

  • Instance segmentation image:

  • If colorize is set to true, the image will be a 2d array of types np.uint8 with 4 channels.

  • Different colors represent different semantic instances.

  • If colorize is set to false, the image will be a 2d array of types np.uint32 with 1 channel, which is the instance id of the semantic entities.

  • Id to labels json file:

  • If colorize is set to true, it will be the mapping from color to usd prim path of that semantic entity.

  • If colorize is set to false, it will be the mapping from instance id to usd prim path of that semantic entity.

  • Id to semantic json file:

  • If colorize is set to true, it will be the mapping from color to semantic labels of that semantic entity.

  • If colorize is set to false, it will be the mapping from instance id to semantic labels of that semantic entity.

Data Details

  • Colorize: whether to output colorized instance segmentation or non-colorized one.

  • Semantic_types: filter that filter the type of semantic labels we want. For example, if semantic_types is [“class”], only the entities with semantic labels that is type “class” will be output when writing the data.

Other Notes

  • Two prims with same semantic labels but live in different USD path will have different ids.

  • If two prims have no semantic labels, and they have a same parent which has semantic labels, they will be classified as the same instance.

Point Cloud

Outputs a 2D array of shape (N, 3) representing the points sampled on the surface of the prims in the viewport, where N is the number of point.

Output Format

The pointcloud annotator returns positions of the points found under the “data” key, while other information is under the “info” key: “pointRgb”, “pointNormals” and “pointSemantic”.

{
    "data": np.dtype(np.float32),  # position value of each point of shape (N, 3)
    "info": {
        "idToLabels": {<semanticId>: <semantic_labels>},    # rgb value of each point of shape (N, 4)
        "bboxIds": [<bbox_id_0>, ..., <bbox_id_n>],         # normal value of each point of shape (N, 3)
        "primPaths": [<prim_path_0>, ... <prim_path_n>],    # semantic ids of each point of shape (N)
    }
}

Data Details

  • Point positions are in the world space.

  • Sample resolution is determined by the resolution of the render product.

Other Notes

  • To get the mapping from semantic id to semantic labels, pointcloud annotator is better used with semantic segmentation annotator, and users can extract the idToLabels data from the semantic segmentation annotator.

Example

Pointcloud annotator captures prims seen in the camera, and sampled the points on the surface of the prims, based on the resolution of the render product attached to the camera. Additional to the points sampled, it also outputs rgb, normals and semantic id values associated to the prim where that point belongs to. For prims without any valid semantic labels, pointcloud annotator will ignore it.

Just like any other annotators, pointcloud annotator is initialized together with a writer.

writer = rep.WriterRegistry.get("BasicWriter")
writer.initialize(output_dir=out_dir, pointcloud=True, semantic_types=["class"])

Because pointcloud annotator only samples from the current viewport, to get the whole picture, we need to place the camera in different places.

# Camera positions to capture the cube
camera_positions = [(500, 500, 0), (-500, -500, 0), (500, 0, 500), (-500, 0, -500)]

with rep.trigger.on_frame(num_frames=len(camera_positions)):
    with camera:
        rep.modify.pose(position=rep.distribution.sequence(camera_positions), look_at=cube)  # make the camera look at the cube

Below is the entire script which can be copied directly to the script editor.

import omni.replicator.core as rep
import os
import asyncio

# Pointcloud only capture prims with valid semantics
cube = rep.create.cube(position=(0, 0, 0), semantics=[("class", "cube")])

camera = rep.create.camera()

render_product = rep.create.render_product(camera, (1024, 512))

out_dir = os.path.join(os.path.dirname(os.path.realpath(__file__), "out")
os.makedirs(out_dir, exist_ok=True)

# Camera positions to capture the cube
camera_positions = [(500, 500, 0), (-500, -500, 0), (500, 0, 500), (-500, 0, -500)]

with rep.trigger.on_frame(num_frames=len(camera_positions)):
    with camera:
        rep.modify.pose(position=rep.distribution.sequence(camera_positions), look_at=cube)  # make the camera look at the cube

# Initialize and attach writer
writer = rep.WriterRegistry.get("BasicWriter")
writer.initialize(output_dir=out_dir, pointcloud=True, semantic_types=["class"])
writer.attach([render_product])

# Run the simulation graph
rep.orchestrator.run()

After points are saved to the disk, you can read the and process. Here is an example to save the points as ply file.

import numpy as numpy
import open3d as o3d
import os
import numpy as np

pointcloud_out_dir = os.path.join(os.path.dirname(os.path.realpath(__file__), "out")

points = []
points_rgb = []

all_files = [f for f in os.listdir(pointcloud_out_dir) if os.path.isfile(os.path.join(pointcloud_out_dir, f))]

for file_name in all_files:
    if file_name.startswith("pointcloud") and file_name.endswith(".npy"):
        if file_name.split(".")[0].split("_")[1] == "rgb":
            points_rgb.append(np.load(os.path.join(pointcloud_out_dir, file_name))[..., :3] / 255)  # Convert rgb data from [0, 255] to [0, 1]
        elif len(file_name.split(".")[0].split("_")) == 2:
            points.append(np.load(os.path.join(pointcloud_out_dir, file_name)))

ply_out_dir = os.path.join(os.path.dirname(os.path.realpath(__file__), "out", "pointcloud.ply")


pc_data = np.concatenate(points)
pc_rgb = np.concatenate(points_rgb)

pcd = o3d.geometry.PointCloud()
pcd.points = o3d.utility.Vector3dVector(pc_data)
pcd.colors = o3d.utility.Vector3dVector(pc_rgb)
o3d.io.write_point_cloud(ply_out_dir, pcd)

Pose

The pose annotator outputs information about the skeletons in the scene view.

Output Format

skeleton json file.

Data Details

  • “global_translations” - Global translation of each joint

  • “local_rotations” - Local rotation of each joint

  • “skeleton_joints” - All the joint names

  • “skeleton_parents” - Which joint is the parent of the index, -1 is root

  • “skel_name” - Name of the skeleton

  • “translations_2d” - Projected 2d points of each joint

  • “in_view” - If the skeleton is in view of the camera

  • “rest_local_rotations” - n/a

  • “rest_local_translations” - n/a

Other Notes

  • Pose annotator currently only works with pinhole type cameras.

Example

Below is an example script that outputs 10 images with pose annotation.

import omni.replicator.core as rep

# Define paths for the character
PERSON_SRC = 'omniverse://localhost/NVIDIA/Assets/Characters/Reallusion/Worker/Worker.usd'

with rep.new_layer():
    # Human Model
    person = rep.create.from_usd(PERSON_SRC, semantics=[('class', 'person')])
    # Area to scatter cubes in
    area = rep.create.cube(scale=2, position=(0.0, 0.0, 100.0), visible=False)

    # Create the camera and render product
    camera = rep.create.camera(position=(25, -421.0, 182.0), rotation=(77.0, 0.0, 3.5))
    render_product = rep.create.render_product(camera, (1024, 1024))

    def randomize_spheres():
        spheres = rep.create.sphere(scale=0.1, count=100)
        with spheres:
            rep.randomizer.scatter_3d(area)
        return spheres.node

    rep.randomizer.register(randomize_spheres)

    with rep.trigger.on_frame(interval=10, num_frames=5):
        rep.randomizer.randomize_spheres()

    # Initialize and attach writer
    writer = rep.WriterRegistry.get("BasicWriter")
    writer.initialize(output_dir="_output_pose_example", rgb=True, skeleton_data=True)
    writer.attach([render_product])

Motion Vectors

Outputs a 2D array of motion vectors representing the relative motion of a pixel in the camera’s viewport between frames.

Output Format

  • The “motion_vectors” annotator produces a 2darray of types np.float32 with 4 channels.

Data Details

  • Each value is a normalized direction in 3D space

Other Notes

  • The values represent motion relative to camera space.

Cross Correspondance

The cross correspondence annotator outputs a 2D array representing the camera optical flow map of the camera’s viewport against a reference viewport.

Output Format

The “cross_correspondence” annotator produces a 2d array of types np.float32 with 4 channels.

Data Details

  • The components of each entry in the 2D array represent for different values encoded as floating point values:

  • x: dx - difference to the x value of of the corresponding pixel in the reference viewport. This value is normalized to [-1.0, 1.0]

  • y: dy - difference to the y value of of the corresponding pixel in the reference viewport. This value is normalized to [-1.0, 1.0]

  • z: occlusion mask - boolean signifying that the pixel is occluded or truncated in one of the cross referenced viewports. Floating point value represents a boolean (1.0 = True, 0.0 = False)

  • w: geometric occlusion calculated - boolean signifying that the pixel can or cannot be tested as having occluded geometry (e.g. no occlusion testing is performed on missed rays) (1.0 = True, 0.0 = False)

Other Notes

  • Invalid data is returned as [-1.0, -1.0, -1.0, -1.0]