6. Offline Pose Estimation Synthetic Data Generation

6.1. Learning Objectives

This tutorial demonstrates how to generate synthetic data to train a 6D pose estimation model, while leveraging Isaac-Sim’s APIs to implement domain randomization techniques inspired by NVIDIA’s synthetic data research team. More specifically, we will focus on implementing randomization techniques similar to those shown by the MESH and DOME datasets in the NViSII paper. Data will be written in a format similar in structure to the YCB Video Dataset, which can then be used to train a variety of 6D pose estimation models. Example images generated from this tutorial, corresponding to the MESH and DOME datasets, are shown below.

../_images/isaac_tutorial_replicator_pose_estimation_dome.png

Fig 1. DOME scene

../_images/isaac_tutorial_replicator_pose_estimation_mesh.png

Fig 2. MESH scene

We will examine standalone_examples/replicator/offline_pose_generation/offline_pose_generation.py to understand how Isaac-Sim’s APIs can be used to allow flying distractors to continuously collide within a volume, and how the poses of objects can be manipulated. The full example can be executed within the Isaac-Sim Python environment.

50-60 min tutorial

6.1.1. Prerequisites

This tutorial requires a working knowledge of the Offline Dataset Generation and Hello World tutorials. We also highly recommend reading the NViSII paper to learn more about the MESH and DOME datasets and how they can be used to help bridge the Sim-to-Real Gap. At a high-level, both the MESH and DOME datasets add flying distractors around the object of interest. These datasets also randomize the distractors’ colors and materials, in addition to lighting conditions.

There are two notable differences between the MESH and DOME datasets: (1) the DOME dataset uses fewer distractors than the MESH dataset, and (2) the DOME dataset uses Dome Lights to provide realistic backgrounds.

6.2. Getting Started

To generate the synthetic dataset to train a pose estimation model, run the following command.

./python.sh standalone_examples/replicator/offline_pose_generation/offline_pose_generation.py

The above command line has several arguments which if not specified are set to their default values.

  • --num_mesh: Number of frames (similar to samples found in the MESH dataset) to record. Defaults to 30.

  • --num_dome: Number of frames (similar to samples found in the DOME dataset) to record. Defaults to 30.

  • --max_queue_size: Maximum size of queue to store and process synthetic data. If the value of this field is less than or equal to zero, the queue size is infinite.

Note that in Windows, be sure to specify the --vulkan flag when running the example. Without the flag, images where the dimensions are not powers of 2 will not render properly.

.\python.bat .\standalone_examples\replicator\offline_pose_generation\offline_pose_generation.py --vulkan

6.3. Setting up the Environment

Let’s first examine the _setup_world() function, which populates the stage with assets and sets up the environment prior to dataset generation.

6.3.1. Creating a Collision Box

We create an invisible collision box to hold a set of flying distractors, allowing the distractors to collide with one another and ricochet off the walls of the collision box. Since we want the distractors to be visible in our synthetic dataset, it’s important to place the collision box so that it’s in view of the camera and properly oriented.

Creating a Collision Box.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
# Allow flying distractors to float
world.get_physics_context().set_gravity(0.0)

# Create a collision box in view of the camera, allowing distractors placed in the box to be within
# [MIN_DISTANCE, MAX_DISTANCE] of the camera. The collision box will be placed in front of the camera,
# regardless of CAMERA_ROTATION or CAMERA_RIG_ROTATION.
theta_x = self.camera_rig.fov_x / 2.0
theta_y = self.camera_rig.fov_y / 2.0

collision_box_width = 2 * MAX_DISTANCE * math.tan(theta_x)
collision_box_height = 2 * MAX_DISTANCE * math.tan(theta_y)
collision_box_depth = MAX_DISTANCE - MIN_DISTANCE

collision_box_path = "/World/collision_box"
collision_box_name = "collision_box"

# Collision box is centered between MIN_DISTANCE and MAX_DISTANCE, with translation relative to camera in the z
# direction being negative due to cameras in Isaac Sim having coordinates of -z out, +y up, and +x right.
collision_box_translation_from_camera = np.array([0, 0, -(MIN_DISTANCE + MAX_DISTANCE) / 2.0])

# Collision box has no rotation with respect to the camera
collision_box_rotation_from_camera = np.array([0, 0, 0])
collision_box_orientation_from_camera = euler_angles_to_quat(collision_box_rotation_from_camera, degrees=True)

# Get the desired pose of the collision box from a pose defined locally with respect to the camera.
collision_box_center, collision_box_orientation = get_world_pose_from_local(
    world, self.camera_rig.camera_path, collision_box_translation_from_camera,
    collision_box_orientation_from_camera
)

collision_box = CollisionBox(
    collision_box_path, collision_box_name, position=collision_box_center,
    orientation=collision_box_orientation, width=collision_box_width, height=collision_box_height,
    depth=collision_box_depth
)
world.scene.add(collision_box)

In lines 10-15, we define the dimensions of the collision box to minimize the number of out-of-view flying distractors, and to constrain the distractors to be between MIN_DISTANCE and MAX_DISTANCE from the camera.

Next, we find the pose of the collision box with respect to the camera’s frame. The translation is calculated on line 19, taking careful consideration of the camera coordinate system convention used in Isaac-Sim. On line 23 we use the omni.isaac.core.utils.rotations library to find the quaternion representation of the collision box’s rotation relative to the camera.

The get_world_pose_from_local() function on line 26 (and expanded below) is of particular importance, since it allows us to use the pose defined in the camera’s frame to find the corresponding pose in the world’s frame.

Getting the Collision Box's World Pose.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
def get_world_pose_from_local(world: World, prim_path, local_translation, local_orientation):
    """Get a pose defined in the world frame from a pose defined in the local frame of the prim at prim_path"""

    prim = world.stage.GetPrimAtPath(prim_path)

    # Row-major transformation matrix from the prim's coordinate system to the world coordinate system
    prim_transform_matrix = UsdGeom.Xformable(prim).ComputeLocalToWorldTransform(Usd.TimeCode.Default())

    # Convert transformation matrix to column-major
    prim_to_world = np.transpose(prim_transform_matrix)

    # Column-major transformation matrix from the local pose to the frame the local pose is defined with respect to
    local_pose_to_prim = tf_matrix_from_pose(local_translation, local_orientation)

    # Chain the transformations
    local_pose_to_world = prim_to_world @ local_pose_to_prim

    # Translation and quaternion with respect to the world frame of the locally defined pose
    world_translation, world_orientation = pose_from_tf_matrix(local_pose_to_world)

    return world_translation, world_orientation

To get the transformation matrix from the local frame of a prim to the world frame, the built-in ComputeLocalToWorldTransform() function can be used; we use this function on line 7 to get the prim-to-world (or in our case, camera-to-world) transformation matrix.

Additionally, using the desired pose of the collision box (defined locally with respect to the camera), we can use the tf_matrix_from_pose() function from the omni.isaac.core.utils.transformations library to get the corresponding transformation matrix.

On line 16, the two transformations described above are chained and yield the desired pose of the collision box with respect to the world frame. With the pose defined relative to the world frame, the set_world_pose() function (described in more detail in later sections) can be used to appropriately place the collision box in our scene.

6.3.2. Flying Distractors

To help us manage the hundreds of distractors that we’ll be adding to the scene, we use several classes defined in the standalone_examples/replicator/offline_pose_generation/ directory, and provide a high-level overview of these below. Namely, we discuss: FlyingDistractors, DynamicAssetSet, DynamicShapeSet, DynamicObjectSet, and DynamicObject.

The flying distractors used in the MESH and DOME datasets consist of both shapes and objects. We create the DynamicAssetSet class to provide an API relevant to all flying distractors, regardless of whether the particular assets contained in the set are shapes or objects. The API allows the flying distractors managed by the set to be kept in motion within the collision box, and allows various properties of the assets to be randomized.

To create a suite of methods specific to shapes (and not objects), we create the DynamicShapeSet class. The DynamicShapeSet class allows dynamic shapes to be spawned and managed, in addition to inheriting methods from the DynamicAssetSet class. These dynamic shapes are chosen from predefined classes found in omni.isaac.core.prims, namely DynamicCuboid, DynamicSphere, DynamicCylinder, DynamicCone, and DynamicCapsule. Each of these dynamic shape classes is implemented in a similar way; the respective shape prim is wrapped with both the RigidPrim class to provide rigid body attributes, and the GeometryPrim class to provide an API for collisions and physics materials. For the purposes of this tutorial, GeometryPrim is critical as it allows objects to collide rather than intersect, leading to more realistic shadows for better Sim-to-Real transfer.

Similarly, within DynamicObjectSet, dynamic objects are created using the DynamicObject class, which in turn provides a way to take an asset from its USD reference and wrap it with RigidPrim and GeometryPrim. Note that in this tutorial we use YCB objects (common household objects) in our DynamicObjectSet.

Finally, FlyingDistractors allows us to simultaneously manage multiple instances of both the DynamicShapeSet and DynamicObjectSet classes. For more information on FlyingDistractors, DynamicAssetSet, DynamicShapeSet, DynamicObjectSet, or DynamicObject, please refer to the class definitions in the standalone_examples/replicator/offline_pose_generation/ directory.

To keep our flying distractors in motion, we define the apply_force_to_prims() method in DynamicAssetSet to apply random forces to them.

Applying Forces in Random Directions.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
def apply_force_to_prims(self, force_limit):
    """Apply force in random direction to prims in dynamic asset set"""

    for path in itertools.chain(self.glass_asset_paths, self.nonglass_asset_paths):

        # X, Y, and Z components of the force are constrained to be within [-force_limit, force_limit]
        random_force = np.random.uniform(-force_limit, force_limit, 3).tolist()

        handle = self.world.dc_interface.get_rigid_body(path)

        self.world.dc_interface.apply_body_force(handle, random_force, (0, 0, 0), False)

The Dynamic Control API provides a thorough physics interface, and is leveraged in lines 9-11 to apply a random force to each distractor.

6.3.3. Adding Object of Interest

In this tutorial we use the YCB Cracker Box asset as our object of interest, on which the pose estimation model will be trained. To train a pose estimation model on a different object, the Cracker Box would need to be replaced with an object of your choosing.

Adding the Object of Interest.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
    # Add the part to train the network on
    part_name = "003_cracker_box"
    ref_path = self.asset_path + part_name + ".usd"
    prim_type = f"_{part_name[1:]}"
    path = "/World/" + prim_type
    mesh_path = path + "/" + prim_type
    name = "train_part"

    self.train_part_mesh_path_to_prim_path_map[mesh_path] = path

    train_part = DynamicObject(usd_path=ref_path, prim_path=path, mesh_path=mesh_path, name=name, mass=1.0)

    train_part.prim.GetAttribute("physics:rigidBodyEnabled").Set(False)

    self.train_parts.append(train_part)

    # Add semantic information
    mesh_prim = world.stage.GetPrimAtPath(mesh_path)
    add_update_semantics(mesh_prim, prim_type)

To prevent the object of interest from moving off-screen due to a collision with a distractor, we disable its rigid body kinematics on line 13. Then, semantic information is added to the part on lines 18-19, which is done similarly in the Visualizing Synthetic Data tutorial in the omni.replicator documentation.

6.3.4. Domain Randomization

We define the following functions for domain randomization:

Randomize Lighting.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
def randomize_sphere_lights():
    lights = rep.create.light(
        light_type="Sphere",
        color=rep.distribution.uniform((0.0, 0.0, 0.0), (1.0, 1.0, 1.0)),
        intensity=rep.distribution.uniform(100000, 3000000),
        position=rep.distribution.uniform((-250, -250, -250), (250, 250, 100)),
        scale=rep.distribution.uniform(1, 20),
        count=NUM_LIGHTS,
    )

    return lights.node

We randomize the color, intensity, position, and size of the sphere lights in lines 4-7. This allows us to generate scenes under different lighting scenarios for the MESH and DOME datasets.

Randomize Domelight.
1
2
3
4
5
6
7
8
def randomize_domelight(texture_paths):
    lights = rep.create.light(
        light_type="Dome",
        rotation=rep.distribution.uniform((0, 0, 0), (360, 360, 360)),
        texture=rep.distribution.choice(texture_paths)
    )

    return lights.node

The rotation and texture of the Dome Light is randomized on lines 4-5. This allows the samples similar to the DOME dataset to have a randomly selected background with realistic lighting conditions.

Randomize Shape Properties.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
def randomize_colors(prim_path_regex):
    prims = rep.get.prims(path_pattern=prim_path_regex)

    mats = rep.create.material_omnipbr(
        metallic=rep.distribution.uniform(0.0, 1.0),
        roughness=rep.distribution.uniform(0.0, 1.0),
        diffuse=rep.distribution.uniform((0, 0, 0), (1, 1, 1)),
        count=100,
    )

    with prims:
        rep.randomizer.materials(mats)

    return prims.node

We randomize the metallic, reflective, and color properties of the distractor shapes on lines 5-7. This allows a wide variety of material properties to be present in the distractors.

In order to call the domain randomization, we must register these functions with rep.randomizer. For more examples on using omni.replicator.core.randomizer, please see this page.

Registering and Calling Domain Randomization
1
2
3
4
5
6
rep.randomizer.register(randomize_sphere_lights, override=True)
rep.randomizer.register(randomize_colors, override=True)

with rep.trigger.on_frame():
    rep.randomizer.randomize_sphere_lights()
    rep.randomizer.randomize_colors("(?=.*shape)(?=.*nonglass).*")

Note that we only register randomize_domelight() in the __next__ function because initially we don’t want the Dome lights to be randomized as we first generate images for the MESH dataset.

1
2
3
4
5
6
rep.randomizer.register(randomize_domelight, override=True)

dome_texture_paths = [self.dome_texture_path + dome_texture + ".hdr" for dome_texture in DOME_TEXTURES]

with rep.trigger.on_frame():
    rep.randomizer.randomize_domelight(dome_texture_paths)

randomize_movement_in_view() is another custom method we define to randomize the pose of the object of interest, while keeping it in view of the camera.

Randomize Movement in View.
1
2
3
4
5
6
7
8
def randomize_movement_in_view(self, prim):
    """Randomly move and rotate prim such that it stays in view of camera"""

    translation, orientation = self.camera_rig.get_random_world_pose_in_view(
        MIN_DISTANCE, MAX_DISTANCE, FRACTION_TO_SCREEN_EDGE, MIN_ROTATION_RANGE, MAX_ROTATION_RANGE
    )

    prim.set_world_pose(translation, orientation)

While get_random_world_pose_in_view() is simply a randomized version of how the pose of the collision box was determined, the set_world_pose() function is worth highlighting, as it provides a convenient way to set USD transform properties. As the object of interest is wrapped with RigidPrim, we can use RigidPrim’s set_world_pose() method. If you’d like to manipulate the pose of an object that doesn’t have rigid body attributes, please refer to XFormPrim.

6.4. Generating Data

We randomize our scene, capture the groundtruth data, and send the data to our data writer within the __next__ function.

Generate Data
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
def __next__(self):

    if self.cur_idx == self.num_mesh: # MESH datset generation complete, switch to DOME dataset

        self.mesh = False

        # Hide the FlyingDistractors used for the MESH dataset
        self.mesh_distractors.set_visible(False)

        # Show the FlyingDistractors used for the DOME dataset
        self.dome_distractors.set_visible(True)

        # Create and randomize a dome light for the DOME dataset
        def randomize_domelight(texture_paths):
            lights = rep.create.light(
                light_type="Dome",
                rotation=rep.distribution.uniform((0, 0, 0), (360, 360, 360)),
                texture=rep.distribution.choice(texture_paths),
            )

            return lights.node

        rep.randomizer.register(randomize_domelight, override=True)

        dome_texture_paths = [self.dome_texture_path + dome_texture + ".hdr" for dome_texture in DOME_TEXTURES]

        with rep.trigger.on_frame():
            rep.randomizer.randomize_domelight(dome_texture_paths)

        world.step(render=False)

    if self.mesh:
        flying_distractors = self.mesh_distractors
    else:
        flying_distractors = self.dome_distractors

    flying_distractors.apply_force_to_assets(FORCE_RANGE)

    flying_distractors.randomize_asset_glass_color()

    for train_part in self.train_parts:
        self.randomize_movement_in_view(train_part)

    rep.orchestrator.preview()

    world.step()

    self._num_worker_threads = 4

    # Write to disk
    if self.data_writer is None:
        self.data_writer = self.writer_helper(
            self._output_folder, self._num_worker_threads, self.train_size, self.max_queue_size
        )
        self.data_writer.start_threads()

    image = self._capture_viewport()
    self.cur_idx += 1
    return image

We begin the data generation process by generating samples similar to those in the MESH dataset. As samples are being generated, we (1) apply the previously described randomization function to keep the flying distractors in motion; (2) utilize the randomization functions we defined above to change the material properties of the distractor shapes and the sphere lights; and (3) randomize the pose of our object of interest, on lines 37, 44, and 42, respectively.

Once num_mesh samples have been generated, we prepare our scene for generating samples similar to the DOME dataset by dynamically modifying which assets are visible. On lines 8-11, we hide the FlyingDistractors used for the MESH samples, and show the smaller set of FlyingDistractors used for the DOME samples. Additionally, we define our randomize_domelight function, register it with rep.randomizer, and call it within rep.trigger. This randomizer gets triggered once we call rep.orchestrator.preview() on line 44. Note that this also calls the two previous randomization functions we defined, randomize_sphere_lights() and randomize_colors().

Groundtruth data is captured (and sent to our data writer) using the _capture_viewport() function on line 57. The _capture_viewport() function sends the data to our data writer so we can generate output images for training. This function sends information like the RGB image, segmentation, depth, and transform data. The pose of the object of interest is calculated with respect to the camera rig (rather than the camera), enabling the pose to be defined with respect to the camera coordinate system used by the YCB Video Dataset (which differs from the default Isaac-Sim camera coordinate system). This requires a specific fixed transformation between the camera and camera rig. For more information, please refer to offline_pose_generation.py and camera_rig.py (located in standalone_examples/replicator/offline_pose_generation/), as well as ycb_video.py (located in omni.isaac.synthetic_utils).

6.5. Summary

This tutorial covered the following topics:

  1. Getting transformation matrices with ComputeLocalToWorldTransform(), and manipulating transformations with the omni.isaac.core.utils.transformations library.

  2. Wrapping prims with the classes found in omni.isaac.core.prims, allowing a simple yet powerful set of APIs to be used. GeomPrim was used for collisions, RigidPrim for rigid body attributes, and XFormPrim to get and set poses.

  3. Creating custom randomization functions, allowing us to randomize (1) the pose of a prim and (2) the force applied to a prim to keep it in motion.

  4. Applying these randomization functions with OV Replicator when we randomized (1) the properties of sphere lights in the scene, (2) the material properties of distractor shapes, and (3) the texture files of background dome lights.

6.5.1. Next steps

The generated synthetic data can now be used to train a 6D pose estimation model, such as PoseCNN or PoseRBPF.