10.3. Offline Dataset Generation

Example of using Isaac Sim and Replicator to create synthetic datasets offline (on disk) for the training of machine learning models.

10.3.1. Learning Objectives

This tutorial illustrates the process of generating synthetic datasets using the omni.replicator extension. The resulting data will be stored offline (on disk), making it readily available for training deep neural networks. The examples can be executed within the Isaac Sim Python standalone environment.

After this tutorial you will be able to:

  • Utilize and set up external customizable config files (YAML/JSON) to adjust simulation and scenario parameters

  • Load custom environments

  • Spawn assets using the Isaac Sim API

  • Run randomized physics simulations

  • Register various Replicator randomization graphs.

  • Create cameras and render products with the Replicator API

  • Use Replicator writers to save data to disk

10.3.1.1. Prerequisites

10.3.2. Scenario

The scenario is executed by default in a warehouse environment. Within this setting, a forklift is randomly placed in a designated area. Based on the forklift’s position, a pallet is placed in front of it at a randomized distance. Using Replicator’s scatter_2d randomization function with the collision check argument check_for_collisions set to True, the pallet is scattered with boxes, ensuring the boxes do not self-collide. The scatter graph node will randomly scatter the boxes each capture frame. Additionally, a traffic cone is randomly positioned at one of the bottom corners of the forklift’s oriented bounding box (OBB). Before the synthetic data generation (SDG) pipeline starts, a short physics simulation is executed, during which several boxes are dropped onto a pallet situated behind the forklift.

../_images/isaac_tutorial_replicator_offline_data.png

Three cameras views are used for the data synthetic data generation (SDG). The first (top_view_cam) offers a top-down view of the scenario (left), the second (pallet_cam) captures a randomized view of the boxes scattered on the pallet (center), and the third is overlooking the pallet from the driver’s place in the forklift using various heights (right). The data is collected using Replicator writers, with the BasicWriter being the default choice. The writer’s config parameters are loaded form the writer_config entry and used to initialize the writer with annotators such as rgb, semantic_segmentation, bounding_box_3d, etc. The data is stored in the output_dir folder (by default <working_dir>/_out_offline_generation).

10.3.3. Getting Started

The main script of the tutorial is located at <install_path>/standalone_examples/replicator/offline_generation/offline_generation.py and it is set to run as a standalone application. Its helper functions are located in the offline_generation_utils.py file. The example script is set to be able to run with different configurations. Example config files are stored in offline_generation/config/*. The default configurations are stored in the script itself in the form of a python dictionary. The custom config files can be provided as a command-line argument using --config <path/to/file.json/yaml> which will overwrite the default parameters. The config file examples are meant to act as examples and showcase the configurability of the script. Note that the provided example configuration files serve as templates to illustrate the script’s configurability. The script can function without any external config file, using its default parameters, which can be adjusted as well to the user’s requirements.

To generate a synthetic dataset offline, run the following command.

./python.sh standalone_examples/replicator/offline_generation/offline_generation.py

10.3.3.1. Config Scenarios

The following provide details about the various config scenarios:

Without an explicit config file, the script will use the default parameters stored in the script itself. The default parameters are the following:

Built-in (default) Config
config = {
    "launch_config": {
        "renderer": "RayTracedLighting",
        "headless": False,
    },
    "resolution": [1024, 1024],
    "rt_subframes": 2,
    "num_frames": 20,
    "env_url": "/Isaac/Environments/Simple_Warehouse/full_warehouse.usd",
    "scope_name": "/MyScope",
    "writer": "BasicWriter",
    "writer_config": {
        "output_dir": "_out_offline_generation",
        "rgb": True,
        "bounding_box_2d_tight": True,
        "semantic_segmentation": True,
        "distance_to_image_plane": True,
        "bounding_box_3d": True,
        "occlusion": True,
    },
    "clear_previous_semantics": True,
    "forklift": {
        "url": "/Isaac/Props/Forklift/forklift.usd",
        "class": "Forklift",
    },
    "cone": {
        "url": "/Isaac/Environments/Simple_Warehouse/Props/S_TrafficCone.usd",
        "class": "TrafficCone",
    },
    "pallet": {
        "url": "/Isaac/Environments/Simple_Warehouse/Props/SM_PaletteA_01.usd",
        "class": "Pallet",
    },
    "cardbox": {
        "url": "/Isaac/Environments/Simple_Warehouse/Props/SM_CardBoxD_04.usd",
        "class": "Cardbox",
    },
}

The following command will run the script with the default parameters.

./python.sh standalone_examples/replicator/offline_generation/offline_generation.py

Using the config_basic_writer.yaml config file, will explictily chose the BasicWriter with the given writer_config configurations. It also changes the environment to /Isaac/Environments/Grid/default_environment.usd.

Custom YAML Config
launch_config:
    renderer: RayTracedLighting
    headless: false
resolution: [512, 512]
env_url: "/Isaac/Environments/Grid/default_environment.usd"
rt_subframes: 32
writer: BasicWriter
writer_config:
    output_dir: _out_basicwriter
    rgb: true

The following command will run the script with the custom parameters.

./python.sh standalone_examples/replicator/offline_generation/offline_generation.py \
    --config standalone_examples/replicator/offline_generation/config/config_basic_writer.yaml

The config_default_writer.json uses the default writer (which is still the BasicWriter) and changes the writer_config values to rgb and instance_segmentation annotators.

Custom JSON Config
{
    "launch_config": {
        "renderer": "RayTracedLighting",
        "headless": false
    },
    "resolution": [512, 512],
    "writer_config": {
        "output_dir": "_out_defaultwriter",
        "rgb": true,
        "instance_segmentation": true
    }
}

The following command will run the script with the custom parameters.

./python.sh standalone_examples/replicator/offline_generation/offline_generation.py \
    --config standalone_examples/replicator/offline_generation/config/config_default_writer.json

The config_kitti_writer.yaml config file uses the KittiWriter with the given writer_config configurations.

Custom YAML Config using KittiWriter
launch_config:
    renderer: RayTracedLighting
    headless: true
resolution: [512, 512]
num_frames: 5
clear_previous_semantics: false
writer: KittiWriter
writer_config:
    output_dir: _out_kitti
    colorize_instance_segmentation: true

The following command will run the script with the custom parameters.

./python.sh standalone_examples/replicator/offline_generation/offline_generation.py \
    --config standalone_examples/replicator/offline_generation/config/config_kitti_writer.yaml

10.3.4. Running as a SimulationApp

The code for this tutorial is a different from the default omni.replicator examples, which are usually executed using the script editor in the GUI. The provided script will run an instance of Isaac Sim as a Standalone Application. For this, the SimulationApp object needs to be created before importing any other dependencies (such as omni.replicator.core).

Isaac Sim as a Standalone Application
from omni.isaac.kit import SimulationApp

[..]

# Create the simulation app with the given launch_config
simulation_app = SimulationApp(launch_config=config["launch_config"])

[..]

import offline_generation_utils

# Late import of runtime modules (the SimulationApp needs to be created before loading the modules)
import omni.replicator.core as rep
import omni.usd
from omni.isaac.core.utils import prims
from omni.isaac.core.utils.nucleus import get_assets_root_path
from omni.isaac.core.utils.rotations import euler_angles_to_quat
from omni.isaac.core.utils.stage import get_current_stage, open_stage
from pxr import Gf

Note

Omniverse related imports need to be included after the SimulationApp is created.

10.3.5. Loading the Environment

The environment is a USD stage. As a first step, the stage is loaded using the helper function open_stage. To provide the url of the environment with a full nucleus path get_assets_root_path is used to get the path to the nucleus server. The open_stage function returns a boolean value indicating whether the stage was successfully loaded or not. If the stage was not loaded successfully, the application is terminated.

Load the Environment
# Get server path
assets_root_path = get_assets_root_path()
if assets_root_path is None:
    carb.log_error("Could not get nucleus server path, closing application..")
    simulation_app.close()

# Open the given environment in a new stage
print(f"Loading Stage {config['env_url']}")
if not open_stage(assets_root_path + config["env_url"]):
    carb.log_error(f"Could not open stage{config['env_url']}, closing application..")
    simulation_app.close()

10.3.6. Creating the Cameras and the Writer

The example provides two ways (Replicator and Isaac Sim API) of creating cameras rep.create.camera and prims.create_prim which will be used as render products to generate the data. The created render products are attached to the built-in BasicWriter to collect the data from the selected annotators (rgb, semantic_segmentation, bounding_box_3d, etc.) and to write it to the given output path. Using rep.get.prim_at_path, we can access the driver_cam_prim prim wrapped in an omnigraph node in order to be randomized each step by the randomization graph generated by Replicator.

The cameras used in the examples are created using Replicator (rep.create.camera) which will be further used by render products to generate data.

Cameras
driver_cam = rep.create.camera(
    focus_distance=400.0, focal_length=24.0, clipping_range=(0.1, 10000000.0), name="DriverCam"
)

# Camera looking at the pallet
pallet_cam = rep.create.camera(name="PalletCam")

# Camera looking at the forklift from a top view with large min clipping to see the scene through the ceiling
top_view_cam = rep.create.camera(clipping_range=(6.0, 1000000.0), name="TopCam")

Being a built-in writer, BasicWriter is already registered, and can be accessed from the WriterRegistry. The writer is then initialized with the output directory and the selected annotators. Finally, the render products are created from the cameras and attached to the writer.

Writer and Render Products
# Setup the writer
writer = rep.WriterRegistry.get(config["writer"])
writer.initialize(**config["writer_config"])
forklift_rp = rep.create.render_product(top_view_cam, config["resolution"], name="TopView")
driver_rp = rep.create.render_product(driver_cam, config["resolution"], name="DriverView")
pallet_rp = rep.create.render_product(pallet_cam, config["resolution"], name="PalletView")
writer.attach([forklift_rp, driver_rp, pallet_rp])

10.3.7. Domain Randomization

The following snippet provides examples of various randomization possibilities using Isaac Sim and Replicator API. It starts by spawning a forklift using Isaac Sim API to a randomly generated pose. It then uses the forklift pose to place a pallet in front of it withing the bounds of a random distance.

Isaac Sim API Randomization
# Spawn a new forklift at a random pose
forklift_prim = prims.create_prim(
    prim_path="/World/Forklift",
    position=(random.uniform(-20, -2), random.uniform(-1, 3), 0),
    orientation=euler_angles_to_quat([0, 0, random.uniform(0, math.pi)]),
    usd_path=assets_root_path + config["forklift"]["url"],
    semantic_label=config["forklift"]["class"],
)

# Spawn the pallet in front of the forklift with a random offset on the Y (pallet's forward) axis
forklift_tf = omni.usd.get_world_transform_matrix(forklift_prim)
pallet_offset_tf = Gf.Matrix4d().SetTranslate(Gf.Vec3d(0, random.uniform(-1.2, -1.8), 0))
pallet_pos_gf = (pallet_offset_tf * forklift_tf).ExtractTranslation()
forklift_quat_gf = forklift_tf.ExtractRotationQuat()
forklift_quat_xyzw = (forklift_quat_gf.GetReal(), *forklift_quat_gf.GetImaginary())

pallet_prim = prims.create_prim(
    prim_path="/World/Pallet",
    position=pallet_pos_gf,
    orientation=forklift_quat_xyzw,
    usd_path=assets_root_path + config["pallet"]["url"],
    semantic_label=config["pallet"]["class"],
)

Furthermore, using the Replicator API various randomizers are registered. It starts with a rep.randomizer.scatter_2d example, where boxes are randomly scattered on the surface of the pallet in front of the forklift. The randomizer is also randomizing the materials of the boxes using rep.randomizer.materials. The generated randomization graph is then registered using rep.randomizer.register.

Replicator Randomization Graph I
# Register the boxes and materials randomizer graph
def register_scatter_boxes(pallet_prim, assets_root_path, config):
    # Calculate the bounds of the prim to create a scatter plane of its size
    bb_cache = create_bbox_cache()
    bbox3d_gf = bb_cache.ComputeLocalBound(pallet_prim)
    prim_tf_gf = omni.usd.get_world_transform_matrix(pallet_prim)

    # Calculate the bounds of the prim
    bbox3d_gf.Transform(prim_tf_gf)
    range_size = bbox3d_gf.GetRange().GetSize()

    # Get the quaterion of the prim in xyzw format from usd
    prim_quat_gf = prim_tf_gf.ExtractRotation().GetQuaternion()
    prim_quat_xyzw = (prim_quat_gf.GetReal(), *prim_quat_gf.GetImaginary())

    # Create a plane on the pallet to scatter the boxes on
    plane_scale = (range_size[0] * 0.8, range_size[1] * 0.8, 1)
    plane_pos_gf = prim_tf_gf.ExtractTranslation() + Gf.Vec3d(0, 0, range_size[2])
    plane_rot_euler_deg = quat_to_euler_angles(np.array(prim_quat_xyzw), degrees=True)
    scatter_plane = rep.create.plane(
        scale=plane_scale, position=plane_pos_gf, rotation=plane_rot_euler_deg, visible=False
    )

    cardbox_mats = [
        f"{assets_root_path}/Isaac/Environments/Simple_Warehouse/Materials/MI_PaperNotes_01.mdl",
        f"{assets_root_path}/Isaac/Environments/Simple_Warehouse/Materials/MI_CardBoxB_05.mdl",
    ]

    def scatter_boxes():
        cardboxes = rep.create.from_usd(
            assets_root_path + config["cardbox"]["url"], semantics=[("class", config["cardbox"]["class"])], count=5
        )
        with cardboxes:
            rep.randomizer.scatter_2d(scatter_plane, check_for_collisions=True)
            rep.randomizer.materials(cardbox_mats)
        return cardboxes.node

    rep.randomizer.register(scatter_boxes)

The next randomization example calculates the corners of the bounding box of the forklift together with the pallet and uses the corners as a predefined list of locations to randomly place a traffic cone.

Replicator Randomization Graph II
# Register the place cones randomizer graph
def register_cone_placement(forklift_prim, assets_root_path, config):
    # Get the bottom corners of the oriented bounding box (OBB) of the forklift
    bb_cache = create_bbox_cache()
    centroid, axes, half_extent = compute_obb(bb_cache, forklift_prim.GetPrimPath())
    larger_xy_extent = (half_extent[0] * 1.3, half_extent[1] * 1.3, half_extent[2])
    obb_corners = get_obb_corners(centroid, axes, larger_xy_extent)
    bottom_corners = [
        obb_corners[0].tolist(),
        obb_corners[2].tolist(),
        obb_corners[4].tolist(),
        obb_corners[6].tolist(),
    ]

    # Orient the cone using the OBB (Oriented Bounding Box)
    obb_quat = Gf.Matrix3d(axes).ExtractRotation().GetQuaternion()
    obb_quat_xyzw = (obb_quat.GetReal(), *obb_quat.GetImaginary())
    obb_euler = quat_to_euler_angles(np.array(obb_quat_xyzw), degrees=True)

    def place_cones():
        cones = rep.create.from_usd(
            assets_root_path + config["cone"]["url"], semantics=[("class", config["cone"]["class"])]
        )
        with cones:
            rep.modify.pose(position=rep.distribution.sequence(bottom_corners), rotation_z=obb_euler[2])
        return cones.node

    rep.randomizer.register(place_cones)

The following example randomizes light parameters and their placement above the forklift and the pallet area.

Replicator Randomization Graph III
# Register light randomization graph
def register_lights_placement(forklift_prim, pallet_prim):
    bb_cache = create_bbox_cache()
    combined_range_arr = compute_combined_aabb(bb_cache, [forklift_prim.GetPrimPath(), pallet_prim.GetPrimPath()])
    pos_min = (combined_range_arr[0], combined_range_arr[1], 6)
    pos_max = (combined_range_arr[3], combined_range_arr[4], 7)

    def randomize_lights():
        lights = rep.create.light(
            light_type="Sphere",
            color=rep.distribution.uniform((0.2, 0.1, 0.1), (0.9, 0.8, 0.8)),
            intensity=rep.distribution.uniform(500, 2000),
            position=rep.distribution.uniform(pos_min, pos_max),
            scale=rep.distribution.uniform(5, 10),
            count=3,
        )
        return lights.node

    rep.randomizer.register(randomize_lights)

Similarly to the above examples, Replicator has support for many other randomizations. For more information, please refer to Replicator’s randomizer examples tutorials.

The registered randomizations are setup to trigger each frame, together with the camera movements. One camera is looking at the pallet in front of the forklift and orbiting it, while the other camera is looking at the whole scene from various heights above.

Replicator Randomization Triggers
# Generate graph nodes to be triggered every frame
with rep.trigger.on_frame():
    rep.randomizer.scatter_boxes()
    rep.randomizer.place_cones()
    rep.randomizer.randomize_lights()

    pallet_cam_min = (pallet_pos_gf[0] - 2, pallet_pos_gf[1] - 2, 2)
    pallet_cam_max = (pallet_pos_gf[0] + 2, pallet_pos_gf[1] + 2, 4)
    with pallet_cam:
        rep.modify.pose(
            position=rep.distribution.uniform(pallet_cam_min, pallet_cam_max),
            look_at=str(pallet_prim.GetPrimPath()),
        )

    driver_cam_min = (driver_cam_pos_gf[0], driver_cam_pos_gf[1], driver_cam_pos_gf[2] - 0.25)
    driver_cam_max = (driver_cam_pos_gf[0], driver_cam_pos_gf[1], driver_cam_pos_gf[2] + 0.25)
    with driver_cam:
        rep.modify.pose(
            position=rep.distribution.uniform(driver_cam_min, driver_cam_max),
            look_at=str(pallet_prim.GetPrimPath()),
        )

# Generate graph nodes to be triggered only at the given interval
with rep.trigger.on_frame(interval=4):
    top_view_cam_min = (foklift_pos_gf[0], foklift_pos_gf[1], 9)
    top_view_cam_max = (foklift_pos_gf[0], foklift_pos_gf[1], 11)
    with top_view_cam:
        rep.modify.pose(
            position=rep.distribution.uniform(top_view_cam_min, top_view_cam_max),
            rotation=rep.distribution.uniform((0, -90, -30), (0, -90, 30)),
        )

After registering the randomization graphs and before running the data collection we run a short physics simulation. After spawning the forklift and the empty pallet, the example runs a short physics simulation by dropping several stacked boxes on a pallet behind the forklift.

Isaac Sim Simulation
def simulate_falling_objects(forklift_prim, assets_root_path, config, max_sim_steps=250, num_boxes=8):
    # Create the isaac sim world to run any physics simulations
    world = World(physics_dt=1.0 / 90.0, stage_units_in_meters=1.0)

    # Set a random relative offset to the pallet using the forklift transform as a base frame
    forklift_tf = omni.usd.get_world_transform_matrix(forklift_prim)
    pallet_offset_tf = Gf.Matrix4d().SetTranslate(Gf.Vec3d(random.uniform(-1, 1), random.uniform(-4, -3.6), 0))
    pallet_pos = (pallet_offset_tf * forklift_tf).ExtractTranslation()

    # Spawn pallet prim at a relative random offset to the forklift
    [..]

    # Spawn boxes falling on the pallet
    for i in range(num_boxes):
        # Spawn box prim
        cardbox_prim_name = f"SimulatedCardbox_{i}"
        box_prim = prims.create_prim(
            prim_path=f"/World/{cardbox_prim_name}",
            usd_path=assets_root_path + config["cardbox"]["url"],
            semantic_label=config["cardbox"]["class"],
        )

        # Get the next spawn height for the box
        spawn_height += bb_cache.ComputeLocalBound(box_prim).GetRange().GetSize()[2] * 1.1

        # Wrap the cardbox prim into a rigid prim to be able to simulate it
        box_rigid_prim = RigidPrim(
            prim_path=str(box_prim.GetPrimPath()),
            name=cardbox_prim_name,
            position=pallet_pos + Gf.Vec3d(random.uniform(-0.2, 0.2), random.uniform(-0.2, 0.2), spawn_height),
            orientation=euler_angles_to_quat([0, 0, random.uniform(0, math.pi)]),
        )

        # Make sure physics are enabled on the rigid prim
        box_rigid_prim.enable_rigid_body_physics()

        # Register rigid prim with the scene
        world.scene.add(box_rigid_prim)

    # Reset the world to handle the physics of the newly created rigid prims
    world.reset()

    # Simulate the world for the given number of steps or until the highest box stops moving
    last_box = world.scene.get_object(f"SimulatedCardbox_{num_boxes - 1}")
    for i in range(max_sim_steps):
        world.step(render=False)
        if last_box and np.linalg.norm(last_box.get_linear_velocity()) < 0.001:
            print(f"Falling objects simulation finished at step {i}..")
            break

10.3.8. Running the Script

Finally the randomizations and frame writing is triggered to run for the given number of frames, followed by a wait until all data is written to disk, and closing the application.

Script Execution
# Register randomizers graphs
offline_generation_utils.register_scatter_boxes(pallet_prim, assets_root_path, config)
offline_generation_utils.register_cone_placement(forklift_prim, assets_root_path, config)
offline_generation_utils.register_lights_placement(forklift_prim, pallet_prim)

[..]

# Run a simulation before generating data
offline_generation_utils.simulate_falling_objects(forklift_prim, assets_root_path, config)

[..]

# Run the SDG
rep.orchestrator.run_until_complete(num_frames=config["num_frames"])

simulation_app.close()

10.3.9. Summary

This tutorial covered the following topics:

  1. Starting a SimulationApp instance of Isaac Sim to work with replicator

  2. Loading a stage and custom assets at random locations using plain Isaac Sim API

  3. Setting up cameras, render products, and writers with Replicator to generate data

  4. Creating Replicator randomization graphs

  5. Running a physics simulation with Isaac Sim API

10.3.9.1. Next Steps

One possible use for the created data is with the TAO Toolkit.

Once the generated synthetic data is in Kitti format, you can use the TAO Toolkit to train a model. TAO provides segmentation, classification and object detection models. This example uses object detection with the Detectnet V2 model as a use case.

To get started with TAO, follow the set-up instructions. Then, activate the virtual environment and run the Jupyter Notebooks as explained in detail here.

TAO uses Jupyter notebooks to guide you through the training process. In the folder cv_samples_v1.3.0, you will find notebooks for multiple models. You can use any of the object detection networks for this use case, but this example uses Detectnet_V2.

In the detectnet_v2 folder, you will find the Jupyter notebook and the specs folder. The TAO Detectnet_V2 documentation goes into more detail about this sample. TAO works with configuration files that can be found in the specs folder. Here, you need to modify the specs to refer to the generated synthetic data as the input.

To prepare the data, you need to run the following command.

tao detectnet_v2 dataset-convert [-h] -d DATASET_EXPORT_SPEC -o OUTPUT_FILENAME [-f VALIDATION_FOLD]

This is in the Jupyter notebook with a sample configuration. Modify the spec file to match the folder structure of your synthetic data. The data will be in TFrecord format and is ready for training. Again, you need to change the spec file for training to represent the path to the synthetic data and the classes being detected.

tao detectnet_v2 train [-h] -k <key>
                        -r <result directory>
                        -e <spec_file>
                        [-n <name_string_for_the_model>]
                        [--gpus <num GPUs>]
                        [--gpu_index <comma separate gpu indices>]
                        [--use_amp]
                        [--log_file <log_file>]

For any questions regarding the TAO Toolkit, refer to the TAO documentation, which goes into further detail.

10.3.9.2. Further Learning

To learn how to use Omniverse Isaac Sim to create data sets in an interactive manner, see the Synthetic Data Recorder, and then visualize them with the Synthetic Data Visualizer.