Synthetic Data

Omniverse Isaac Sim features sensors that return data such as bounding boxes, segmentation, depth, etc. that, together with fast high quality rendering, make it a great platform to develop Deep Learning applications.

Tutorials

Adding Semantic Information

Adding Domain Randomization

Synthetic Data Recording

Synthetic Data Recorder

The Synthetic Data Recorder enables offline data generation of various synthetic data sensor output locally from multiple viewports.

Synthetic Data Visualizer

The Synthetic Data Visualizer enables visualization of various synthetic data sensor output.

Python Samples

Below we provide several examples showcasing the different sensors currently available and their use in a deep learning training application using Pytorch.

Basic Sample

This is an introductory sample that shows how to setup a scene, get groundtruth and visualize.

Scene

This example will demonstrate how to build a simple randomized scene and utilize various sensors to collect groundtruth data. Below we will go through the code one piece at a time. If you would like to dive straight in or run the code right away, the demo can be run with:

./python.sh python_samples/syntheticdata/basic/visualize_groundtruth.py

First, let’s import a few things as well as some helper functions:

import copy
import os
import omni
import random
import numpy as np
from omni.isaac.python_app import OmniKitHelper

Let’s focus on few of the imports. OmniKitHelper takes care of launching Omniverse Kit and gives us control over an update() function we can call whenever we would like to render a new frame. Some imports can happen only after launching kit e.g. pxr imports and Omniverse extensions like omni.isaac.synthetic_utils, omni.syntheticdata. SyntheticDataHelper provides convenience functions that wrap around the syntheticdata API to make it easier to collect groundtruth for common use cases.

kit = OmniKitHelper(
    {"renderer": "RayTracedLighting", "experience": f'{os.environ["EXP_PATH"]}/omni.isaac.sim.python.kit'}
)
from pxr import UsdGeom, Semantics
from omni.isaac.synthetic_utils import SyntheticDataHelper
sd_helper = SyntheticDataHelper()

Next, we need to set the stage. We will create a simple scene consisting of a distant light and an assortment of spheres and cubes.

# Get the current stage
stage = kit.get_stage()

# Add a distant light
stage.DefinePrim("/World/Light", "DistantLight")

# Create 10 randomly positioned and coloured spheres and cube
# We will assign each a semantic label based on their shape (sphere/cube)
for i in range(10):
    prim_type = random.choice(["Cube", "Sphere"])
    prim = stage.DefinePrim(f"/World/cube{i}", prim_type)
    translation = np.random.rand(3) * TRANSLATION_RANGE
    UsdGeom.XformCommonAPI(prim).SetTranslate(translation.tolist())
    UsdGeom.XformCommonAPI(prim).SetScale((SCALE, SCALE, SCALE))
    prim.GetAttribute("primvars:displayColor").Set([np.random.rand(3).tolist()])

    # Add semantic label based on prim type
    sem = Semantics.SemanticsAPI.Apply(prim, "Semantics")
    sem.CreateSemanticTypeAttr()
    sem.CreateSemanticDataAttr()
    sem.GetSemanticTypeAttr().Set("class")
    sem.GetSemanticDataAttr().Set(prim_type)

That last block is defining a semantic label for each prim. This is how you would specify your class name for classification, object detection or semantic segmentation tasks. In our case, we will classify all the spheres as “Sphere” and all the cubes as “Cube”.

Groundtruth

Now we’re ready to collect groundtruth data from our sensors. More information on the sensors can be found here.

# Get groundtruth
kit.update()
viewport = omni.kit.viewport.get_default_viewport_window()
gt = sd_helper.get_groundtruth(
    [
        "rgb",
        "depth",
        "boundingBox2DTight",
        "boundingBox2DLoose",
        "instanceSegmentation",
        "semanticSegmentation",
        "boundingBox3D",
    ],
    viewport,
)

We now have all the groundtruth data inside the gt dictionary. From here we could save it to disk to create a training dataset or feed it directly to a model being trained (we will look at this scenario in the another example). For now, let’s just visualize our groundtruth so that we can confirm that everything is working as expected.

Visualization

Let’s import a few more things to help with visualization.

import matplotlib.pyplot as plt
from omni.syntheticdata import visualize, helpers

Now we setup a matplotlib figure and create a subplot for each sensor.

# Setup a figure
_, axes = plt.subplots(2, 4, figsize=(20, 7))
axes = axes.flat
for ax in axes:
    ax.axis("off")

# RGB
axes[0].set_title("RGB")
for ax in axes[:-1]:
    ax.imshow(gt["rgb"])

# DEPTH
axes[1].set_title("Depth")
depth_data = np.clip(gt["depth"], 0, 255)
axes[1].imshow(visualize.colorize_depth(depth_data.squeeze()))

# BBOX2D TIGHT
axes[2].set_title("BBox 2D Tight")
rgb_data = copy.deepcopy(gt["rgb"])
axes[2].imshow(visualize.colorize_bboxes(gt["boundingBox2DTight"], rgb_data))

# BBOX2D LOOSE
axes[3].set_title("BBox 2D Loose")
rgb_data = copy.deepcopy(gt["rgb"])
axes[3].imshow(visualize.colorize_bboxes(gt["boundingBox2DLoose"], rgb_data))

# INSTANCE SEGMENTATION
axes[4].set_title("Instance Segmentation")
instance_seg = gt["instanceSegmentation"][0]
instance_rgb = visualize.colorize_segmentation(instance_seg)
axes[4].imshow(instance_rgb, alpha=0.7)

# SEMANTIC SEGMENTATION
axes[5].set_title("Semantic Segmentation")
semantic_seg = gt["semanticSegmentation"]
semantic_rgb = visualize.colorize_segmentation(semantic_seg)
axes[5].imshow(semantic_rgb, alpha=0.7)

# BBOX 3D
axes[6].set_title("BBox 3D")
bbox_3d_data = gt["boundingBox3D"]
bboxes_3d_corners = bbox_3d_data["corners"]
projected_corners = helpers.world_to_image(bboxes_3d_corners.reshape(-1, 3), viewport)
projected_corners = projected_corners.reshape(-1, 8, 3)
rgb_data = copy.deepcopy(gt["rgb"])
bboxes3D_rgb = visualize.colorize_bboxes_3d(projected_corners, rgb_data)
axes[6].imshow(bboxes3D_rgb)

# Save Figure
print("saving figure to: ", os.getcwd() + "/visualize_groundtruth.png")
plt.savefig("visualize_groundtruth.png")

If all goes well, we should be seeing something like the image below saved locally as visualize_groundtruth.png. Every run is random, so your results will look different.

Synthetic Data Demo

Advanced Samples

Dataset With Physics and Glass

This example extends from our basic sample and will demonstrate how to create simple shapes, apply physics, gravity and a glass material, get groundtruth and visualize. Objects are simulated for one second to allow them to settle on a flat plane. RGB, depth, semantic labels and bounding boxes are shown after simulation. Open locally saved image file visualize_groundtruth_physics.png to see the visualization.

./python.sh python_samples/syntheticdata/advanced/visualize_groundtruth_physics.py
Synthetic Data Glass Demo

There are three parts to this sample, physics setup, object creation and finally simulation

The following shows how to create a physics scene in the stage and apply gravity. Because the number of objects is small CPU physics is used.

# Add physics scene
scene = UsdPhysics.Scene.Define(stage, Sdf.Path("/World/physicsScene"))
# Set gravity vector
scene.CreateGravityDirectionAttr().Set(Gf.Vec3f(0.0, 0.0, -1.0))
scene.CreateGravityMagnitudeAttr().Set(981.0)
# Set physics scene to use cpu physics
PhysxSchema.PhysxSceneAPI.Apply(stage.GetPrimAtPath("/World/physicsScene"))
physxSceneAPI = PhysxSchema.PhysxSceneAPI.Get(stage, "/World/physicsScene")
physxSceneAPI.CreateEnableCCDAttr(True)
physxSceneAPI.CreateEnableStabilizationAttr(True)
physxSceneAPI.CreateEnableGPUDynamicsAttr(False)
physxSceneAPI.CreateBroadphaseTypeAttr("MBP")
physxSceneAPI.CreateSolverTypeAttr("TGS")

# Create a ground plane
PhysicsSchemaTools.addGroundPlane(stage, "/World/groundPlane", "Z", 1000, Gf.Vec3f(0, 0, -100), Gf.Vec3f(1.0))

Next, we create 10 different objects of varying shapes and apply semantic tags to them representing their shapes.

# Create 10 randomly positioned and coloured cube, spheres and cylinder
# We will assign each a semantic label based on their shape (cube/sphere/cylinder)
prims = []
for i in range(10):
    prim_type = random.choice(["Cube", "Sphere", "Cylinder"])
    prim = stage.DefinePrim(f"/World/cube{i}", prim_type)
    translation = np.random.rand(3) * TRANSLATION_RANGE
    UsdGeom.XformCommonAPI(prim).SetTranslate(translation.tolist())
    UsdGeom.XformCommonAPI(prim).SetScale((SCALE, SCALE, SCALE))
    # prim.GetAttribute("primvars:displayColor").Set([np.random.rand(3).tolist()])

    # Add semantic label based on prim type
    sem = Semantics.SemanticsAPI.Apply(prim, "Semantics")
    sem.CreateSemanticTypeAttr()
    sem.CreateSemanticDataAttr()
    sem.GetSemanticTypeAttr().Set("class")
    sem.GetSemanticDataAttr().Set(prim_type)

    # Add physics to prims
    utils.setRigidBody(prim, "convexHull", False)
    # Set Mass to 1 kg
    mass_api = UsdPhysics.MassAPI.Apply(prim)
    mass_api.CreateMassAttr(1)
    # add prim reference to list
    prims.append(prim)

Next each object gets a glass material with a random color applied to it. The OmniGlass material has many parameters that can be changed. Only parameters related to glass rendering are demonstrated here.

# Apply glass material
for prim in prims:
    # Create Glass material
    mtl_created_list = []
    kit.execute(
        "CreateAndBindMdlMaterialFromLibrary",
        mdl_name="OmniGlass.mdl",
        mtl_name="OmniGlass",
        mtl_created_list=mtl_created_list,
    )
    mtl_prim = stage.GetPrimAtPath(mtl_created_list[0])

    # Set material inputs, these can be determined by looking at the .mdl file
    # or by selecting the Shader attached to the Material in the stage window and looking at the details panel
    color = Gf.Vec3f(random.random(), random.random(), random.random())
    omni.usd.create_material_input(mtl_prim, "glass_color", color, Sdf.ValueTypeNames.Color3f)
    omni.usd.create_material_input(mtl_prim, "glass_ior", 1.25, Sdf.ValueTypeNames.Float)
    # This value is the volumetric light absorption scale, reduce to zero to make glass clearer
    omni.usd.create_material_input(mtl_prim, "depth", 0.001, Sdf.ValueTypeNames.Float)
    # Enable for thin glass objects if needed
    omni.usd.create_material_input(mtl_prim, "thin_walled", False, Sdf.ValueTypeNames.Bool)
    # Bind the material to the prim
    prim_mat_shade = UsdShade.Material(mtl_prim)
    UsdShade.MaterialBindingAPI(prim).Bind(prim_mat_shade, UsdShade.Tokens.strongerThanDescendants)

Next we simulate physics and also allow material loading to complete. During simulation the rendering mode is changed to RayTracedLighting for better performance. Once simulated, PathTracing is used to capture the final image

# force RayTracedLighting mode for better performance while simulating physics
kit.set_setting("/rtx/rendermode", "RayTracedLighting")

# start simulation
kit.play()
# Step simulation so that objects fall to rest
# wait until all materials are loaded
frame = 0
print("simulating physics...")
while frame < 60 or kit.is_loading():
    kit.update(1 / 60.0)
    frame = frame + 1
print("done")

# Return to user specified render mode
kit.set_setting("/rtx/rendermode", CUSTOM_CONFIG["renderer"])

Similar to previous samples the syntheticdata helper is used to get the ground truth data before visualizing the data locally. Details about visualization are covered in the visualization section above

viewport = omni.kit.viewport.get_default_viewport_window()
gt = sd_helper.get_groundtruth(
    [
        "rgb",
        "depth",
        "boundingBox2DTight",
        "boundingBox2DLoose",
        "instanceSegmentation",
        "semanticSegmentation",
        "boundingBox3D",
    ],
    viewport,
)

ShapeNet to USD

For this example, we’ll be using the ShapeNet dataset and convert the dataset to USD. We assume that you already have an account and have all or a subset of the dataset available locally.

First, we set a variable to tell the script where to find ShapeNet dataset locally:

export SHAPENET_LOCAL_DIR=<path/to/shapenet>

We will convert only the geometry to allow for quick loading of assets into our scene. With the SHAPENET_LOCAL_DIR variable set, run the following script. Note, this will create a new directory at {SHAPENET_LOCAL_DIR}_nomat where the geometry-only USD files will be stored.

./python.sh python_samples/syntheticdata/advanced/shapenet_usd_convertor.py --categories plane watercraft rocket --max-models 100

Here we’ve told the script to convert the plane, watercraft, rocket categories and to convert a maximum of 100 models per category.

Offline Dataset Generation

This example will demonstrate how to generate synthetic dataset offline which can be used for training deep neural networks.

Single Camera

To generate synthetic dataset offline using a single camera, run the following command.

./python.sh python_samples/syntheticdata/offline_generation/generator.py

The above command line has three arguments which if not specified is redirected to their default values.

  • scenario: Specify the USD stage to load from omniverse server for dataset generation.

  • num_frames: Number of frames to record.

  • max_queue_size: Maximum size of queue to store and process synthetic data. If value of this field is less than or equal to zero, the queue size is infinite.

With arguments, the above command will look as follows.

./python.sh python_samples/syntheticdata/offline_generation/generator.py --scenario omniverse://<server-name>/Isaac/Samples/Synthetic_Data/Stage/warehouse_with_sensors.usd --num_frames 10 --max_queue_size 500

The offline dataset generated can be found locally as shown below.

Offline Synthetic Data
Code deep dive

There are three parts to this sample - load environment, get sensor output and finally write output to disk using a data writer.

The following shows how to load environment by opening a USD stage.

async def load_stage(self, path):
    await omni.usd.get_context().open_stage_async(path)

def _setup_world(self, scenario_path):
    # Load scenario
    setup_task = asyncio.ensure_future(self.load_stage(scenario_path))
    while not setup_task.done():
        self.kit.update()

We render frame by frame, get corresponding synthetic sensor output and write it to disk within __next__ function.

1
2
3
4
5
# step once and then wait for materials to load
self.dr_helper.randomize_once()
self.kit.update()
while self.kit.is_loading():
    self.kit.update()

If the loaded USD stage has DR components, line 2 will randomize the scene once. For example, Isaac/Samples/Synthetic_Data/Stage/warehouse_with_sensors.usd contains a DR movement component which his attached to a camera. Hence, the location of the camera randomizes whenever randomize_once() call happens. Via lines 3-5, we ensure that all materials are loaded before getting synthetic sensor output.

Next, we will setup synthetic sensor and their output format for offline data generation.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# Enable/disable sensor output and their format
self._enable_rgb = True
self._enable_depth = True
self._enable_instance = True
self._enable_semantic = True
self._enable_bbox_2d_tight = True
self._enable_bbox_2d_loose = True
self._enable_depth_colorize = True
self._enable_instance_colorize = True
self._enable_semantic_colorize = True
self._enable_bbox_2d_tight_colorize = True
self._enable_bbox_2d_loose_colorize = True
self._enable_depth_npy = True
self._enable_instance_npy = True
self._enable_semantic_npy = True
self._enable_bbox_2d_tight_npy = True
self._enable_bbox_2d_loose_npy = True
self._num_worker_threads = 4
self._output_folder = os.getcwd() + "/output"

In lines 2-7, we specify the synthetic sensors whose output we want to save. In order to save the output as a colorized image(.png), we specify a *_colorize flag per sensor in lines 8-12. Similarly, to save the output as a numpy array(.npy), we specify a *_npy flag per sensor in lines 13-17. In line 19, we specify the location where the data gets saved locally.

Finally, we setup the data writer, get synthetic sensor output and write to disk.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
# Write to disk
if self.data_writer is None:
    self.data_writer = self.writer_helper(self._output_folder, self._num_worker_threads, self.max_queue_size)
    self.data_writer.start_threads()

viewport_iface = omni.kit.viewport.get_viewport_interface()
viewport_name = "Viewport"
viewport = viewport_iface.get_viewport_window(viewport_iface.get_instance(viewport_name))
groundtruth = {
    "METADATA": {
        "image_id": str(self.cur_idx),
        "viewport_name": viewport_name,
        "DEPTH": {},
        "INSTANCE": {},
        "SEMANTIC": {},
        "BBOX2DTIGHT": {},
        "BBOX2DLOOSE": {},
    },
    "DATA": {},
}

gt_list = []
if self._enable_rgb:
    gt_list.append("rgb")
if self._enable_depth:
    gt_list.append("depthLinear")
if self._enable_bbox_2d_tight:
    gt_list.append("boundingBox2DTight")
if self._enable_bbox_2d_loose:
    gt_list.append("boundingBox2DLoose")
if self._enable_instance:
    gt_list.append("instanceSegmentation")
if self._enable_semantic:
    gt_list.append("semanticSegmentation")

# Collect Groundtruth
gt = self.sd_helper.get_groundtruth(gt_list, viewport)

# RGB
image = gt["rgb"]
if self._enable_rgb:
    groundtruth["DATA"]["RGB"] = gt["rgb"]

# Depth
if self._enable_depth:
    groundtruth["DATA"]["DEPTH"] = gt["depthLinear"].squeeze()
    groundtruth["METADATA"]["DEPTH"]["COLORIZE"] = self._enable_depth_colorize
    groundtruth["METADATA"]["DEPTH"]["NPY"] = self._enable_depth_npy

# Instance Segmentation
if self._enable_instance:
    instance_data = gt["instanceSegmentation"][0]
    instance_data_shape = instance_data.shape
    groundtruth["DATA"]["INSTANCE"] = instance_data
    groundtruth["METADATA"]["INSTANCE"]["WIDTH"] = instance_data_shape[1]
    groundtruth["METADATA"]["INSTANCE"]["HEIGHT"] = instance_data_shape[0]
    groundtruth["METADATA"]["INSTANCE"]["COLORIZE"] = self._enable_instance_colorize
    groundtruth["METADATA"]["INSTANCE"]["NPY"] = self._enable_instance_npy

# Semantic Segmentation
if self._enable_semantic:
    semantic_data = gt["semanticSegmentation"]
    semantic_data_shape = semantic_data.shape
    groundtruth["DATA"]["SEMANTIC"] = semantic_data
    groundtruth["METADATA"]["SEMANTIC"]["WIDTH"] = semantic_data_shape[1]
    groundtruth["METADATA"]["SEMANTIC"]["HEIGHT"] = semantic_data_shape[0]
    groundtruth["METADATA"]["SEMANTIC"]["COLORIZE"] = self._enable_semantic_colorize
    groundtruth["METADATA"]["SEMANTIC"]["NPY"] = self._enable_semantic_npy

# 2D Tight BBox
if self._enable_bbox_2d_tight:
    groundtruth["DATA"]["BBOX2DTIGHT"] = gt["boundingBox2DTight"]
    groundtruth["METADATA"]["BBOX2DTIGHT"]["COLORIZE"] = self._enable_bbox_2d_tight_colorize
    groundtruth["METADATA"]["BBOX2DTIGHT"]["NPY"] = self._enable_bbox_2d_tight_npy

# 2D Loose BBox
if self._enable_bbox_2d_loose:
    groundtruth["DATA"]["BBOX2DLOOSE"] = gt["boundingBox2DLoose"]
    groundtruth["METADATA"]["BBOX2DLOOSE"]["COLORIZE"] = self._enable_bbox_2d_loose_colorize
    groundtruth["METADATA"]["BBOX2DLOOSE"]["NPY"] = self._enable_bbox_2d_loose_npy

self.data_writer.q.put(groundtruth)

The data writer is initialized in lines 2-4. From line 22 to 80, we get synthetic sensor output for each sensor and setup groundtruth dictionary with additional information before pushing it to the data writer as shown in line 82.

Multiple Cameras

To generate synthetic dataset offline using multiple cameras, run the following command.

./python.sh python_samples/syntheticdata/offline_generation/generator_stereo.py

This sample extends the offline synthetic dataset generation using single camera section to enable support for multiple cameras. There are three command line arguments similar to the previous sample. The offline dataset generated can be found locally in unique folders corresponding to each camera.

Note

Depending on when sensors start generating synthetic data after it has been initialized, it may happen that certain frames are missing in the output folders.

Code deep dive

Similar to the previous example, there are three parts to this sample - load environment with stereo camera setup, get sensor output and write output to disk using a data writer. Now, we will elaborate on the code which is unique to this sample.

First, we load environment by opening a USD stage. Then we add the stereo camera setup in add_stereo_setup() function.

async def load_stage(self, path):
    await omni.usd.get_context().open_stage_async(path)

def _setup_world(self, scenario_path):
    # Load scenario
    setup_task = asyncio.ensure_future(self.load_stage(scenario_path))
    while not setup_task.done():
        self.kit.update()
    self.add_stereo_setup()
    self.kit.update()

In lines 6-17 of add_stereo_setup() function, we create a stereo camera rig which is a USD prim specified at stereoPrimPath containing two cameras - LeftCamera and RightCamera with slight difference in their translation values. Next, we create two viewport windows with resolution 1280X720 and each referring to one of the cameras in lines 22-33. Finally, we use movement randomization component to specify a circular movement behavior for the stereo camera rig in lines 36-50.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
def add_stereo_setup(self):
    from pxr import Gf, UsdGeom

    stage = omni.usd.get_context().get_stage()
    # Create two camera
    center_point = Gf.Vec3d(-1100, 5000, 200)
    stereoPrimPath = "/World/Stereo"
    leftCameraPrimPath = stereoPrimPath + "/LeftCamera"
    rightCameraPrimPath = stereoPrimPath + "/RightCamera"
    self.stereoPrim = stage.DefinePrim(stereoPrimPath, "Xform")
    UsdGeom.XformCommonAPI(self.stereoPrim).SetTranslate(center_point)
    leftCameraPrim = stage.DefinePrim(leftCameraPrimPath, "Camera")
    UsdGeom.XformCommonAPI(leftCameraPrim).SetTranslate(Gf.Vec3d(0, -10, 0))
    UsdGeom.XformCommonAPI(leftCameraPrim).SetRotate(Gf.Vec3f(90, 0, 90))
    rightCameraPrim = stage.DefinePrim(rightCameraPrimPath, "Camera")
    UsdGeom.XformCommonAPI(rightCameraPrim).SetTranslate(Gf.Vec3d(0, 10, 0))
    UsdGeom.XformCommonAPI(rightCameraPrim).SetRotate(Gf.Vec3f(90, 0, 90))

    # Need to set this before setting viewport window size
    carb.settings.acquire_settings_interface().set_int("/app/renderer/resolution/width", -1)
    carb.settings.acquire_settings_interface().set_int("/app/renderer/resolution/height", -1)
    # Get existing viewport, set active camera as left camera
    viewport_handle_1 = omni.kit.viewport.get_viewport_interface().get_instance("Viewport")
    viewport_window_1 = omni.kit.viewport.get_viewport_interface().get_viewport_window(viewport_handle_1)
    viewport_window_1.set_texture_resolution(1280, 720)
    viewport_window_1.set_active_camera(leftCameraPrimPath)
    # Create new viewport, set active camera as right camera
    viewport_handle_2 = omni.kit.viewport.get_viewport_interface().create_instance()
    viewport_window_2 = omni.kit.viewport.get_viewport_interface().get_viewport_window(viewport_handle_2)
    viewport_window_2.set_active_camera("/World/Stereo/RightCamera")
    viewport_window_2.set_texture_resolution(1280, 720)
    viewport_window_2.set_window_pos(720, 0)
    viewport_window_2.set_window_size(720, 890)

    # Setup stereo camera movement randomization
    radius = 100
    target_points_list = []
    for theta in range(200, 300):
        th = theta * np.pi / 180
        x = radius * np.cos(th) + center_point[0]
        y = radius * np.sin(th) + center_point[1]
        target_points_list.append(Gf.Vec3f(x, y, center_point[2]))
    lookat_target_points_list = [a for a in target_points_list[1:]]
    lookat_target_points_list.append(target_points_list[0])
    result, prim = omni.kit.commands.execute(
        "CreateTransformComponentCommand",
        prim_paths=[stereoPrimPath],
        target_points=target_points_list,
        lookat_target_points=lookat_target_points_list,
        enable_sequential_behavior=True,
    )

Next, we will setup synthetic sensor and their output format for offline data generation. This step is a bit different from the previous sample as we are addressing a scenario with multiple cameras now.

# Enable/disable sensor output and their format
sensor_settings_viewport_1 = {
    "rgb": {"enabled": True},
    "depth": {"enabled": True, "colorize": True, "npy": True},
    "instance": {"enabled": True, "colorize": True, "npy": True},
    "semantic": {"enabled": True, "colorize": True, "npy": True},
    "bbox_2d_tight": {"enabled": True, "colorize": True, "npy": True},
    "bbox_2d_loose": {"enabled": True, "colorize": True, "npy": True},
}
sensor_settings_viewport_2 = {
    "rgb": {"enabled": True},
    "depth": {"enabled": True, "colorize": True, "npy": True},
    "instance": {"enabled": True, "colorize": True, "npy": True},
    "semantic": {"enabled": True, "colorize": True, "npy": True},
    "bbox_2d_tight": {"enabled": True, "colorize": True, "npy": True},
    "bbox_2d_loose": {"enabled": True, "colorize": True, "npy": True},
}
viewports = self._viewport.get_instance_list()
self._viewport_names = [self._viewport.get_viewport_window_name(vp) for vp in viewports]
# Make sure two viewports are initialized
if len(self._viewport_names) != 2:
    return
self._sensor_settings[self._viewport_names[0]] = copy.deepcopy(sensor_settings_viewport_1)
self._sensor_settings[self._viewport_names[1]] = copy.deepcopy(sensor_settings_viewport_2)
self._num_worker_threads = 4
self._output_folder = os.getcwd() + "/output"

Finally, we setup the data writer, get synthetic sensor output and write to disk.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
# Write to disk
if self.data_writer is None:
    self.data_writer = self.writer_helper(self._output_folder, self._num_worker_threads, self.max_queue_size)
    self.data_writer.start_threads()

image = None
for viewport_name in self._viewport_names:
    groundtruth = {
        "METADATA": {
            "image_id": str(self.cur_idx),
            "viewport_name": viewport_name,
            "DEPTH": {},
            "INSTANCE": {},
            "SEMANTIC": {},
            "BBOX2DTIGHT": {},
            "BBOX2DLOOSE": {},
        },
        "DATA": {},
    }

    gt_list = []
    if self._sensor_settings[viewport_name]["rgb"]["enabled"]:
        gt_list.append("rgb")
    if self._sensor_settings[viewport_name]["depth"]["enabled"]:
        gt_list.append("depthLinear")
    if self._sensor_settings[viewport_name]["bbox_2d_tight"]["enabled"]:
        gt_list.append("boundingBox2DTight")
    if self._sensor_settings[viewport_name]["bbox_2d_loose"]["enabled"]:
        gt_list.append("boundingBox2DLoose")
    if self._sensor_settings[viewport_name]["instance"]["enabled"]:
        gt_list.append("instanceSegmentation")
    if self._sensor_settings[viewport_name]["semantic"]["enabled"]:
        gt_list.append("semanticSegmentation")

    # Collect Groundtruth
    viewport = self._viewport.get_viewport_window(self._viewport.get_instance(viewport_name))
    gt = self.sd_helper.get_groundtruth(gt_list, viewport)

    # RGB
    image = gt["rgb"]
    if self._sensor_settings[viewport_name]["rgb"]["enabled"] and gt["state"]["rgb"]:
        groundtruth["DATA"]["RGB"] = gt["rgb"]

    # Depth
    if self._sensor_settings[viewport_name]["depth"]["enabled"] and gt["state"]["depthLinear"]:
        groundtruth["DATA"]["DEPTH"] = gt["depthLinear"].squeeze()
        groundtruth["METADATA"]["DEPTH"]["COLORIZE"] = self._sensor_settings[viewport_name]["depth"]["colorize"]
        groundtruth["METADATA"]["DEPTH"]["NPY"] = self._sensor_settings[viewport_name]["depth"]["npy"]

    # Instance Segmentation
    if self._sensor_settings[viewport_name]["instance"]["enabled"] and gt["state"]["instanceSegmentation"]:
        instance_data = gt["instanceSegmentation"][0]
        groundtruth["DATA"]["INSTANCE"] = instance_data
        groundtruth["METADATA"]["INSTANCE"]["WIDTH"] = instance_data.shape[1]
        groundtruth["METADATA"]["INSTANCE"]["HEIGHT"] = instance_data.shape[0]
        groundtruth["METADATA"]["INSTANCE"]["COLORIZE"] = self._sensor_settings[viewport_name]["instance"][
            "colorize"
        ]
        groundtruth["METADATA"]["INSTANCE"]["NPY"] = self._sensor_settings[viewport_name]["instance"]["npy"]

    # Semantic Segmentation
    if self._sensor_settings[viewport_name]["semantic"]["enabled"] and gt["state"]["semanticSegmentation"]:
        semantic_data = gt["semanticSegmentation"]
        semantic_data[semantic_data == 65535] = 0  # deals with invalid semantic id
        groundtruth["DATA"]["SEMANTIC"] = semantic_data
        groundtruth["METADATA"]["SEMANTIC"]["WIDTH"] = semantic_data.shape[1]
        groundtruth["METADATA"]["SEMANTIC"]["HEIGHT"] = semantic_data.shape[0]
        groundtruth["METADATA"]["SEMANTIC"]["COLORIZE"] = self._sensor_settings[viewport_name]["semantic"][
            "colorize"
        ]
        groundtruth["METADATA"]["SEMANTIC"]["NPY"] = self._sensor_settings[viewport_name]["semantic"]["npy"]

    # 2D Tight BBox
    if self._sensor_settings[viewport_name]["bbox_2d_tight"]["enabled"] and gt["state"]["boundingBox2DTight"]:
        groundtruth["DATA"]["BBOX2DTIGHT"] = gt["boundingBox2DTight"]
        groundtruth["METADATA"]["BBOX2DTIGHT"]["COLORIZE"] = self._sensor_settings[viewport_name][
            "bbox_2d_tight"
        ]["colorize"]
        groundtruth["METADATA"]["BBOX2DTIGHT"]["NPY"] = self._sensor_settings[viewport_name]["bbox_2d_tight"][
            "npy"
        ]

    # 2D Loose BBox
    if self._sensor_settings[viewport_name]["bbox_2d_loose"]["enabled"] and gt["state"]["boundingBox2DLoose"]:
        groundtruth["DATA"]["BBOX2DLOOSE"] = gt["boundingBox2DLoose"]
        groundtruth["METADATA"]["BBOX2DLOOSE"]["COLORIZE"] = self._sensor_settings[viewport_name][
            "bbox_2d_loose"
        ]["colorize"]
        groundtruth["METADATA"]["BBOX2DLOOSE"]["NPY"] = self._sensor_settings[viewport_name]["bbox_2d_loose"][
            "npy"
        ]

    self.data_writer.q.put(groundtruth)

The data writer is initialized in lines 2-4. From line 7 to 91, for each viewport, we get synthetic sensor output for each sensor and setup groundtruth dictionary with additional information before pushing it to the data writer as shown in line 93.

Offline Training with TLT

This example will demonstrate how to generate synthetic dataset offline in Kitti format which can be used for training deep neural networks with NVIDIA’s Transfer Learning Toolkit. NVIDIA Transfer Learning Toolkit (TLT) is a Python-based AI toolkit for taking purpose-built pretrained AI models and customizing them with your own data.

Offline Kitti Dataset Generation

To leverage TLT, we need to have a dataset in the Kitti format. We extend the offline data generator and introduce a flag writer_mode that will store the data in the proper format. The below script produces 500 frames of synthetic data containing bounding boxes, in the default scene, /Isaac/Samples/Synthetic_Data/Stage/warehouse_with_sensors.usd for object detection in Kitti format with the classes ceiling and floor.

./python.sh python_samples/syntheticdata/offline_generation/generator.py --writer_mode kitti --classes ceiling floor --num_frames 500
Code deep dive

There are three parts to this sample - load environment, get sensor output and finally write output to disk using a data writer. The first two parts are similar to the one described in the code deep dive section of offline data generator. The part that differs is the writer, which reads the synthetic data and writes it in Kitti format. You will find the writer here: exts/omni.isaac.synthetic_utils/omni/isaac/synthetic_utils/scripts/kitti_writer.py

The following shows how the generated synthetic data is processed to meet the definitions of Kitti data format for 2D object detection. Below is the section of the code that would need to be modified to produce desired format.

def save_label(self, data):
    """Saves the labels for the 2d bounding boxes in Kitti format."""
    label_set = []
    viewport_width = data["METADATA"]["BBOX2DLOOSE"]["WIDTH"]
    viewport_height = data["METADATA"]["BBOX2DLOOSE"]["HEIGHT"]

    for box in data["DATA"]["BBOX2DLOOSE"]:
        label = []

        # 2D bounding box points
        x_min, y_min, x_max, y_max = int(box[6]), int(box[7]), int(box[8]), int(box[9])

        # Check if bounding boxes are in the viewport
        if (
            x_min < 0
            or y_min < 0
            or x_max > viewport_width
            or y_max > viewport_height
            or x_min > viewport_width
            or y_min > viewport_height
            or y_max < 0
            or x_max < 0
        ):
            continue

        semantic_label = str(box[2])

        # Skip label if not in class list
        if self.classes != [] and semantic_label not in self.classes:
            continue

        # Adding Kitting Data,  NOTE: Only class and 2d bbox coordinates are filled in
        label.append(semantic_label)
        label.append(f"{0.00:.2f}")
        label.append(3)
        label.append(f"{0.00:.2f}")
        label.append(x_min)
        label.append(y_min)
        label.append(x_max)
        label.append(y_max)
        for _ in range(7):
            label.append(f"{0.00:.2f}")

        label_set.append(label)

    with open(os.path.join(self.train_label_dir, f"{data['METADATA']['image_id']}.txt"), "w") as annotation_file:
        writer = csv.writer(annotation_file, delimiter=" ")
        writer.writerows(label_set)

Using the Transfer Learning Toolkit with Synthetic Data

Once the generated synthetic data is in Kitti format, we can proceed to using the Transfer Learning Toolkit to train a model. The Transfer Learning Toolkit provides segmentation, classification and object detection models. For this example, we will be looking into an object detection use case, using the Detectnet V2 model.

To get started with TLT, follow the set up instructions. Then, activate the virtual environment and download the Jupyter Notebooks as explained in detail here.

TLT uses jupyter notebooks to guide through the training process. In the folder tlt_cv_samples_v1.0.2, you will find notebooks for multiple models. For this use case, refer to any of the object detection networks. We will refer to Detectnet_V2.

In the detectnet_v2 folder, you will find the jupyter notebook and the specs folder. The documentation as mentioned here, goes into details about this sample. TLT works with configuration files that can be found in the specs folder. Here you need to modify the specs to refer to the generated synthetic data as the input.

To prepare the data you need to run the below command.

tlt detectnet_v2 dataset-convert [-h] -d DATASET_EXPORT_SPEC -o OUTPUT_FILENAME [-f VALIDATION_FOLD]

This is in the jupyter notebook with a sample configuration. Modify the spec file to match the folder structure of your synthetic data.

After that, the data will be in TFrecord format and is ready for training. Again, the spec file for training has to be changed to represent the path to the synthetic data and the classes being detected.

tlt detectnet_v2 train [-h] -k <key>
                        -r <result directory>
                        -e <spec_file>
                        [-n <name_string_for_the_model>]
                        [--gpus <num GPUs>]
                        [--gpu_index <comma separate gpu indices>]
                        [--use_amp]
                        [--log_file <log_file>]

For any questions regarding the Transfer Learning Toolkit, we encourage you to watch the webinar and check out the documentation of TLT which goes into further details.

Online Training

In this example, we extend the basic sample above by integrating scene generation and groundtruth collection into a PyTorch dataset that we will use to train a Mask-RCNN instance segmentation model.

Dataset

We will cover the most interesting parts of this example but trim some of it down for length. The full example can be found under syntheticdata/online_generation/segmentation/dataset.py.

First we’ll create a dataset to generate data during training. We will use PyTorch’s torch.utils.data.IterableDataset class as our dataset will just generate an endless stream of random scenes.

The basic structure we’ll follow is like this:

class MyAwesomeDataset(torch.utils.data.IterableDataset):
    def __init__(self):
        initialize_omnikit()
        setup_scene()

    def __next__(self):
        populate_scene()
        gt = collect_groundtruth()
        return gt

Now, that we have a skeleton of what we want to do, let’s put our dataset together. We first launch a kit instance using OmniKitHelper and pass it our rendering configuration. We then setup the SyntheticDataHelper we used in the earlier examples. The self._find_usd_assets() method will search the root directory within the category directories we’ve specified for USD files and return their paths. When we want to add a new asset to our scene we will simply pick a path at random and attach it as a reference to a new prim in our scene. We use split to select a subset of training samples so that we can keep a hold-out set for validation. Finally, self._setup_world() creates a room, lights and a camera.

RENDER_CONFIG = {
    "width": 600,
    "height": 600,
    "renderer": "PathTracing",
    "samples_per_pixel_per_frame": 12,
    "experience": f'{os.environ["EXP_PATH"]}/omni.isaac.sim.python.kit',
}


class RandomObjects(torch.utils.data.IterableDataset):
    """Dataset of random ShapeNet objects. """

    def __init__(
        self, root, categories, max_asset_size=None, num_assets_min=3, num_assets_max=5, split=0.7, train=True
    ):
        self.kit = OmniKitHelper(config=RENDER_CONFIG)
        from omni.isaac.synthetic_utils import SyntheticDataHelper
        from omni.isaac.synthetic_utils import shapenet

        self.sd_helper = SyntheticDataHelper()
        self.stage = self.kit.get_stage()

        # If ShapeNet categories are specified with their names, convert to synset ID
        # Remove this if using with a different dataset than ShapeNet
        category_ids = [shapenet.LABEL_TO_SYNSET.get(c, c) for c in categories]
        self.categories = category_ids
        self.range_num_assets = (num_assets_min, max(num_assets_min, num_assets_max))
        self.references = self._find_usd_assets(root, category_ids, max_asset_size, split, train)

        self._setup_world()
        self.cur_idx = 0

    def _find_usd_assets(self, root, categories, max_asset_size, split, train=True):
        ... # (see code for implementation details)

    def _setup_world(self):
        ... # (see code for implementation details)

Now, we want to load assets and place them in our scene so that they rest on the ground plane in random poses. We will create a load_single_asset method that does just that. Note that to ensure the asset is resting on the ground, we simply get its bounds with ComputeWorldBound() and translate it by the negative of its y (up-axis) component.

def load_single_asset(self, ref, semantic_label, suffix=""):
    from pxr import UsdGeom

    """Load a USD asset with random pose.
    args
        ref (str): Path to the USD that this prim will reference.
        semantic_label (str): Semantic label.
        suffix (str): String to add to the end of the prim's path.
    """
    x = random.uniform(*RANDOM_TRANSLATION_X)
    z = random.uniform(*RANDOM_TRANSLATION_Z)
    rot_y = random.uniform(*RANDOM_ROTATION_Y)
    asset = self.kit.create_prim(
        f"/World/Asset/mesh{suffix}",
        "Xform",
        scale=(SCALE, SCALE, SCALE),
        rotation=(0.0, rot_y, 0.0),
        ref=ref,
        semantic_label=semantic_label,
    )
    bound = UsdGeom.Mesh(asset).ComputeWorldBound(0.0, "default")
    box_min_y = bound.GetBox().GetMin()[1]
    UsdGeom.XformCommonAPI(asset).SetTranslate((x, -box_min_y, z))
    return asset

Now that we can generate a single asset, we can populate the whole scene by simply looping through the number of assets we want to generate.

def populate_scene(self):
    """Clear the scene and populate it with assets."""
    self.stage.RemovePrim("/World/Asset")
    self.assets = []
    num_assets = random.randint(*self.range_num_assets)
    for i in range(num_assets):
        category = random.choice(list(self.references.keys()))
        ref = random.choice(self.references[category])
        self.assets.append(self.load_single_asset(ref, category, i))

We then want to randomize our assets’ material properties. Here we pick random values within reasonable ranges for diffuse, roughness and metallic channels.

def randomize_asset_material(self):
    """Ranomize asset material properties"""
    for asset in self.assets:
        colour = (random.random(), random.random(), random.random())

        # Here we choose not to have materials unrealistically rough or reflective.
        roughness = random.uniform(0.1, 0.9)

        # Here we choose to have more metallic than non-metallic objects.
        metallic = random.choices([0.0, 1.0], weights=(0.8, 0.2))[0]
        self._add_preview_surface(asset, colour, roughness, metallic)

def _add_preview_surface(self, prim, diffuse, roughness, metallic):
    ... # (see code for implementation details)

Finally, we also want to vary our viewpoint. We want to keep our camera pointing to the centre of the stage but vary its azimuth and elevation angles. An easy trick to do this is to make the camera a child of an Xform prim which we’ll call camera_rig. Now to vary the distance from the camera to the centre of the stage, we translate the camera with respect to the rig, and to change the azimuth and elevation angles, we rotate the rig. We’ve set the camera as a child of a camera_rig Xform in our _setup_world() method, so our randomize_camera() method below simply clears any previous transform and sets new angles on the Y and X axes.

def randomize_camera(self):
    """Randomize the camera position."""
    # By simply rotating a camera "rig" instead repositioning the camera
    # itself, we greatly simplify our job.

    # Clear previous transforms
    self.camera_rig.ClearXformOpOrder()
    # Change azimuth angle
    self.camera_rig.AddRotateYOp().Set(random.random() * 360)
    # Change elevation angle
    self.camera_rig.AddRotateXOp().Set(random.random() * -90)

And that’s it for scene generation! Now we simply fill in the __next__ method. The first three lines generate the scene using the methods we just described. The next step is to collect the groundtruth. The code that follows consists of preparing the data for our model to consume and will be in large part specific to the model you are using and your application.

def __iter__(self):
    return self

def __next__(self):
    # Generate a new scene
    self.populate_scene()
    self.randomize_camera()
    self.randomize_asset_material()
    # step once and then wait for materials to load
    self.kit.update()
    print("waiting for materials to load...")
    while self.kit.is_loading():
        self.kit.update()
    print("done")
    self.kit.update()
    # Collect Groundtruth
    gt = self.sd_helper.get_groundtruth(["rgb", "boundingBox2DTight", "instanceSegmentation"], self.viewport)

    # RGB
    # Drop alpha channel
    image = gt["rgb"][..., :3]
    # Cast to tensor if numpy array
    if isinstance(gt["rgb"], np.ndarray):
        image = torch.tensor(image, dtype=torch.float, device="cuda")
    # Normalize between 0. and 1. and change order to channel-first.
    image = image.float() / 255.0
    image = image.permute(2, 0, 1)

    # Bounding Box
    gt_bbox = gt["boundingBox2DTight"]

    # Create mapping from categories to index
    mapping = {cat: i + 1 for i, cat in enumerate(self.categories)}
    bboxes = torch.tensor(gt_bbox[["x_min", "y_min", "x_max", "y_max"]].tolist())
    # For each bounding box, map semantic label to label index
    labels = torch.LongTensor([mapping[bb["semanticLabel"]] for bb in gt_bbox])

    # Calculate bounding box area for each area
    areas = (bboxes[:, 2] - bboxes[:, 0]) * (bboxes[:, 3] - bboxes[:, 1])
    # Idenfiy invalid bounding boxes to filter final output
    valid_areas = (areas > 0.0) * (areas < (image.shape[1] * image.shape[2]))

    # Instance Segmentation
    instance_data, instance_mappings = gt["instanceSegmentation"][0], gt["instanceSegmentation"][1]
    instance_list = [im[0] for im in instance_mappings]
    masks = np.zeros((len(instance_list), *instance_data.shape), dtype=np.bool)
    for i, instances in enumerate(instance_list):
        masks[i] = np.isin(instance_data, instances)
    if isinstance(masks, np.ndarray):
        masks = torch.tensor(masks, device="cuda")

    target = {
        "boxes": bboxes[valid_areas],
        "labels": labels[valid_areas],
        "masks": masks[valid_areas],
        "image_id": torch.LongTensor([self.cur_idx]),
        "area": areas[valid_areas],
        "iscrowd": torch.BoolTensor([False] * len(bboxes[valid_areas])),  # Assume no crowds
    }

    self.cur_idx += 1
    return image, target

And that’s it for our dataset! We can now generate an endless stream of randomized data to train wtih. Below, we show a visualization of the data our dataset is producing with the plane watercraft rocket categories selected. Open locally saved image file dataset.png to see the visualization.

./python.sh python_samples/syntheticdata/online_generation/segmentation/dataset.py \
--root $SHAPENET_LOCAL_DIR'_nomat' \
--categories plane watercraft rocket \
--max-asset-size 50
Instance Segmentation Dataset

Train

Now that we have a dataset, we can start training! The training can be launched with:

./python.sh python_samples/syntheticdata/online_generation/segmentation/train.py \
--root $SHAPENET_LOCAL_DIR'_nomat' \
--categories plane watercraft rocket \
--visualize \
--max-asset-size 50

You should see the loss going down in your terminal and after a hundred iterations or so start to see instance segmentation and object detection results being visualized. The max-asset-size 5 argument tells the dataset to skip assets over 5 MB in size. This helps avoid out of memory errors caused by loading larger assets. This value can be increased depending on the capacity of the GPU in use.

Let’s dive into the details. If you have worked with PyTorch before, the following code will appear absolutely unremarkable. We setup our device, dataset, dataloader, model and optimizer.

device = "cuda"

# Setup data
train_set = RandomObjects(
    args.root, args.categories, num_assets_min=3, num_assets_max=5, max_asset_size=args.max_asset_size
)
train_loader = DataLoader(train_set, batch_size=2, collate_fn=lambda x: tuple(zip(*x)))

# Setup Model
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=False, num_classes=1 + len(args.categories))
model = model.to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=args.learning_rate)

Finally, we have our training loop. After sending our data to GPU, we do a forward pass through our model, calculate the loss, and do a backward pass to update the model’s weights.

for i, train_batch in enumerate(train_loader):
    if i > args.max_iters:
        break

    model.train()
    images, targets = train_batch
    images = [i.to(device) for i in images]
    targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
    loss_dict = model(images, targets)
    loss = sum(loss for loss in loss_dict.values())

    print(f"ITER {i} | {loss:.6f}")

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

That’s it! Open locally saved image file train.png to see something like below during training.

Instance Segmentation Training