6. Offline Pose Estimation Synthetic Data Generation

6.1. Learning Objectives

This tutorial demonstrates how to generate synthetic data to train a 6D pose estimation model, while leveraging Isaac-Sim’s APIs to implement domain randomization techniques inspired by NVIDIA’s synthetic data research team. More specifically, we will focus on implementing randomization techniques similar to those shown by the MESH and DOME datasets in the NViSII paper. By Default, data will be written in a format similar in structure to the YCB Video Dataset, which can then be used to train a variety of 6D pose estimation models. We have also added the ability to output data to the DOPE training format. Example images generated from this tutorial, corresponding to the MESH and DOME datasets, are shown below.

../_images/isaac_tutorial_replicator_pose_estimation_dome.png

Fig 1. DOME scene

../_images/isaac_tutorial_replicator_pose_estimation_mesh.png

Fig 2. MESH scene

We will examine standalone_examples/replicator/offline_pose_generation/offline_pose_generation.py to understand how Isaac-Sim’s APIs can be used to allow flying distractors to continuously collide within a volume, and how the poses of objects can be manipulated. The full example can be executed within the Isaac-Sim Python environment.

50-60 min tutorial

6.1.1. Prerequisites

This tutorial requires a working knowledge of the Offline Dataset Generation and Hello World tutorials. We also highly recommend reading the NViSII paper to learn more about the MESH and DOME datasets and how they can be used to help bridge the Sim-to-Real Gap. At a high-level, both the MESH and DOME datasets add flying distractors around the object of interest. These datasets also randomize the distractors’ colors and materials, in addition to lighting conditions.

There are two notable differences between the MESH and DOME datasets: (1) the DOME dataset uses fewer distractors than the MESH dataset, and (2) the DOME dataset uses Dome Lights to provide realistic backgrounds.

6.2. Getting Started

To generate the synthetic dataset to train a pose estimation model, run the following command.

./python.sh standalone_examples/replicator/offline_pose_generation/offline_pose_generation.py

The above command line has several arguments which if not specified are set to their default values.

  • --num_mesh: Number of frames (similar to samples found in the MESH dataset) to record. Defaults to 30.

  • --num_dome: Number of frames (similar to samples found in the DOME dataset) to record. Defaults to 30.

  • --dome_interval: Number of frames to capture before changing the DOME background. When generating larger datasets, increasing this interval will increase performance. Defaults to 1.

  • --output_folder: The folder to output data to. By default, this is output.

  • --use_s3: If this flag is passed in, then the output will be written directly to an s3 bucket. Note that writing to s3 is currently only supported when using the DOPE writer.

  • --endpoint: If --use_s3 is specified, this flag specifies which endpoint should be used to write to.

  • --bucket: If --use_s3 is specified, this flag specifies which bucket should be used to write to.

  • --writer: Which writer to use. Can choose between either YCBVideo or DOPE. The default writer is YCBVideo.

Note

in Windows the --vulkan flag is required when running the examples. Without the flag, images where the dimensions are not powers of 2 will not render properly: .\python.bat .\standalone_examples\replicator\offline_pose_generation\offline_pose_generation.py --vulkan

6.3. Setting up the Environment

Let’s first examine the _setup_world() function, which populates the stage with assets and sets up the environment prior to dataset generation.

6.3.1. Creating a Collision Box

We create an invisible collision box to hold a set of flying distractors, allowing the distractors to collide with one another and ricochet off the walls of the collision box. Since we want the distractors to be visible in our synthetic dataset, it’s important to place the collision box so that it’s in view of the camera and properly oriented.

Creating a Collision Box.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# Disable gravity in the scene to allow the flying distractors to float around
world.get_physics_context().set_gravity(0.0)

# Create a collision box in view of the camera, allowing distractors placed in the box to be within
# [MIN_DISTANCE, MAX_DISTANCE] of the camera. The collision box will be placed in front of the camera,
# regardless of CAMERA_ROTATION or CAMERA_RIG_ROTATION.
self.fov_x = 2 * math.atan(WIDTH / (2 * F_X))
self.fov_y = 2 * math.atan(HEIGHT / (2 * F_Y))
theta_x = self.fov_x / 2.0
theta_y = self.fov_y / 2.0

# Avoid collision boxes with width/height dimensions smaller than 1.3
collision_box_width = max(2 * MAX_DISTANCE * math.tan(theta_x), 1.3)
collision_box_height = max(2 * MAX_DISTANCE * math.tan(theta_y), 1.3)
collision_box_depth = MAX_DISTANCE - MIN_DISTANCE

collision_box_path = "/World/collision_box"
collision_box_name = "collision_box"

# Collision box is centered between MIN_DISTANCE and MAX_DISTANCE, with translation relative to camera in the z
# direction being negative due to cameras in Isaac Sim having coordinates of -z out, +y up, and +x right.
collision_box_translation_from_camera = np.array([0, 0, -(MIN_DISTANCE + MAX_DISTANCE) / 2.0])

# Collision box has no rotation with respect to the camera
collision_box_rotation_from_camera = np.array([0, 0, 0])
collision_box_orientation_from_camera = euler_angles_to_quat(collision_box_rotation_from_camera, degrees=True)

# Get the desired pose of the collision box from a pose defined locally with respect to the camera.
collision_box_center, collision_box_orientation = get_world_pose_from_relative(
    self.camera_path, collision_box_translation_from_camera, collision_box_orientation_from_camera
)

collision_box = CollisionBox(
    collision_box_path,
    collision_box_name,
    position=collision_box_center,
    orientation=collision_box_orientation,
    width=collision_box_width,
    height=collision_box_height,
    depth=collision_box_depth,
)
world.scene.add(collision_box)

In lines 13-15, we define the dimensions of the collision box to minimize the number of out-of-view flying distractors, and to constrain the distractors to be between MIN_DISTANCE and MAX_DISTANCE from the camera.

Next, we find the pose of the collision box with respect to the camera’s frame. The translation is calculated on line 22, taking into account the camera coordinate system convention used in Isaac-Sim. On line 23 we use the omni.isaac.core.utils.rotations library to find the quaternion representation of the collision box’s rotation relative to the camera.

The get_world_pose_from_relative() function on line 29 (and expanded below) allows us to get the absolute pose of the collision box using the relative pose to the camera’s frame.

Getting the Collision Box's World Pose.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
def get_world_pose_from_relative(prim_path, relative_translation, relative_orientation):
    """Get a pose defined in the world frame from a pose defined relative to the frame of the prim at prim_path"""

    stage = get_current_stage()

    prim = stage.GetPrimAtPath(prim_path)

    # Row-major transformation matrix from the prim's coordinate system to the world coordinate system
    prim_transform_matrix = UsdGeom.Xformable(prim).ComputeLocalToWorldTransform(Usd.TimeCode.Default())

    # Convert transformation matrix to column-major
    prim_to_world = np.transpose(prim_transform_matrix)

    # Column-major transformation matrix from the pose to the frame the pose is defined with respect to
    relative_pose_to_prim = tf_matrix_from_pose(relative_translation, relative_orientation)

    # Chain the transformations
    relative_pose_to_world = prim_to_world @ relative_pose_to_prim

    # Translation and quaternion with respect to the world frame of the relatively defined pose
    world_position, world_orientation = pose_from_tf_matrix(relative_pose_to_world)

    return world_position, world_orientation

To get the transformation matrix from the local frame of a prim to the world frame, the USD built-in ComputeLocalToWorldTransform() function can be used; we use this function on line 9 to get the prim-to-world (or in our case, camera-to-world) transformation matrix.

Additionally, using the desired pose of the collision box (defined locally with respect to the camera), we can use the tf_matrix_from_pose() function from the omni.isaac.core.utils.transformations library to get the corresponding transformation matrix.

On line 21, the two transformations described above are chained and yield the desired pose of the collision box with respect to the world frame.

6.3.2. Flying Distractors

To help us manage the hundreds of distractors that we’ll be adding to the scene, we use several classes defined in the standalone_examples/replicator/offline_pose_generation/flying_distractors directory, and provide a high-level overview of these below. Namely, we discuss: FlyingDistractors, DynamicAssetSet, DynamicShapeSet, DynamicObjectSet, and DynamicObject.

The flying distractors used in the MESH and DOME datasets consist of both primitive shapes and objects. We create the DynamicAssetSet class to provide an API relevant to all flying distractors, regardless of whether the particular assets contained in the set are shapes or objects. The API allows the flying distractors managed by the set to be kept in motion within the collision box, and allows various properties of the assets to be randomized.

To create a suite of methods specific to shapes (and not objects), we create the DynamicShapeSet class. The DynamicShapeSet class allows dynamic shapes to be spawned and managed, in addition to inheriting methods from the DynamicAssetSet class. These dynamic shapes are chosen from predefined classes found in omni.isaac.core.prims, namely DynamicCuboid, DynamicSphere, DynamicCylinder, DynamicCone, and DynamicCapsule. Each of these dynamic shape classes is implemented in a similar way; the respective shape prim is wrapped with both the RigidPrim class to provide rigid body attributes, and the GeometryPrim class to provide an API for collisions and physics materials. For the purposes of this tutorial, GeometryPrim is critical as it allows objects to collide rather than intersect, leading to more realistic shadows for better Sim-to-Real transfer.

Similarly, within DynamicObjectSet, dynamic objects are created using the DynamicObject class, which in turn provides a way to take an asset from its USD reference and wrap it with RigidPrim and GeometryPrim. Note that in this tutorial we use YCB objects (common household objects) in our DynamicObjectSet.

Finally, FlyingDistractors allows us to simultaneously manage multiple instances of both the DynamicShapeSet and DynamicObjectSet classes. For more information on FlyingDistractors, DynamicAssetSet, DynamicShapeSet, DynamicObjectSet, or DynamicObject, please refer to the class definitions in the standalone_examples/replicator/offline_pose_generation/ directory.

To keep our flying distractors in motion, we define the apply_force_to_prims() method in DynamicAssetSet to apply random forces to them.

Applying Forces in Random Directions.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
def apply_force_to_prims(self, force_limit):
    """Apply force in random direction to prims in dynamic asset set"""

    for path in itertools.chain(self.glass_asset_paths, self.nonglass_asset_paths):

        # X, Y, and Z components of the force are constrained to be within [-force_limit, force_limit]
        random_force = np.random.uniform(-force_limit, force_limit, 3).tolist()

        handle = self.world.dc_interface.get_rigid_body(path)

        self.world.dc_interface.apply_body_force(handle, random_force, (0, 0, 0), False)

The Dynamic Control API provides a thorough physics interface, and is leveraged in lines 9-11 to apply a random force to each distractor.

6.3.3. Adding Object of Interest

In this tutorial we use the YCB Cracker Box asset as our object of interest, on which the pose estimation model will be trained. To train a pose estimation model on a different object, the Cracker Box would need to be replaced with an object of your choosing.

Adding the Object of Interest.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
    # Add the part to train the network on
    part_name = "003_cracker_box"
    ref_path = self.asset_path + part_name + ".usd"
    prim_type = f"_{part_name[1:]}"
    path = "/World/" + prim_type
    mesh_path = path + "/" + prim_type
    name = "train_part"

    self.train_part_mesh_path_to_prim_path_map[mesh_path] = path

    train_part = DynamicObject(
        usd_path=ref_path,
        prim_path=path,
        mesh_path=mesh_path,
        name=name,
        position=np.array([0.0, 0.0, 0.0]),
        scale=OBJECT_SCALE,
        mass=1.0,
    )

    train_part.prim.GetAttribute("physics:rigidBodyEnabled").Set(False)

    self.train_parts.append(train_part)

    # Add semantic information
    mesh_prim = world.stage.GetPrimAtPath(mesh_path)
    add_update_semantics(mesh_prim, prim_type)

To prevent the object of interest from moving off-screen due to a collision with a distractor, we disable its rigid body kinematics on line 21. Then, semantic information is added to the part on lines 26-27, which is done similarly in the Adding Semantics to a Scene tutorial in the omni.replicator documentation.

6.3.4. Domain Randomization

We define the following functions for domain randomization:

Randomize Lighting.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
def randomize_sphere_lights():
    lights = rep.create.light(
        light_type="Sphere",
        color=rep.distribution.uniform((0.0, 0.0, 0.0), (1.0, 1.0, 1.0)),
        intensity=rep.distribution.uniform(100000, 3000000),
        position=rep.distribution.uniform((-250, -250, -250), (250, 250, 100)),
        scale=rep.distribution.uniform(1, 20),
        count=NUM_LIGHTS,
    )
    return lights.node

We randomize the color, intensity, position, and size of the sphere lights in lines 4-7. This allows us to generate scenes under different lighting scenarios for the MESH and DOME datasets.

Randomize Domelight.
1
2
3
4
5
6
7
8
def randomize_domelight(texture_paths):
    lights = rep.create.light(
        light_type="Dome",
        rotation=rep.distribution.uniform((0, 0, 0), (360, 360, 360)),
        texture=rep.distribution.choice(texture_paths)
    )

    return lights.node

The rotation and texture of the Dome Light is randomized on lines 4-5. This allows the samples similar to the DOME dataset to have a randomly selected background with realistic lighting conditions.

Randomize Shape Properties.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
def randomize_colors(prim_path_regex):
    prims = rep.get.prims(path_pattern=prim_path_regex)

    mats = rep.create.material_omnipbr(
        metallic=rep.distribution.uniform(0.0, 1.0),
        roughness=rep.distribution.uniform(0.0, 1.0),
        diffuse=rep.distribution.uniform((0, 0, 0), (1, 1, 1)),
        count=100,
    )
    with prims:
        rep.randomizer.materials(mats)
    return prims.node

We randomize the metallic, reflective, and color properties of the distractor shapes on lines 5-7. This allows a wide variety of material properties to be present in the distractors.

In order to call the domain randomization, we must register these functions with rep.randomizer. For more examples on using omni.replicator.core.randomizer, please see this page.

Registering and Calling Domain Randomization
1
2
3
4
5
6
rep.randomizer.register(randomize_sphere_lights, override=True)
rep.randomizer.register(randomize_colors, override=True)

with rep.trigger.on_frame():
    rep.randomizer.randomize_sphere_lights()
    rep.randomizer.randomize_colors("(?=.*shape)(?=.*nonglass).*")

Note that we only register randomize_domelight() in the __next__ function because initially we don’t want the Dome lights to be randomized as we first generate images for the MESH dataset.

1
2
3
4
5
6
rep.randomizer.register(randomize_domelight, override=True)

dome_texture_paths = [self.dome_texture_path + dome_texture + ".hdr" for dome_texture in DOME_TEXTURES]

with rep.trigger.on_frame(interval=self.dome_interval):
    rep.randomizer.randomize_domelight(dome_texture_paths)

randomize_movement_in_view() is another custom method we define to randomize the pose of the object of interest, while keeping it in view of the camera.

Randomize Movement in View.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
def randomize_movement_in_view(self, prim):
    """Randomly move and rotate prim such that it stays in view of camera"""

    translation, orientation = get_random_world_pose_in_view(
        self.camera_path,
        MIN_DISTANCE,
        MAX_DISTANCE,
        self.fov_x,
        self.fov_y,
        FRACTION_TO_SCREEN_EDGE,
        self.rig.prim_path,
        MIN_ROTATION_RANGE,
        MAX_ROTATION_RANGE,
    )
    prim.set_world_pose(translation, orientation)

While get_random_world_pose_in_view() is simply a randomized version of how the pose of the collision box was determined, the set_world_pose() function is worth highlighting, as it provides a convenient way to set USD transform properties. As the object of interest is wrapped with RigidPrim, we can use RigidPrim’s set_world_pose() method. If you’d like to manipulate the pose of an object that doesn’t have rigid body attributes, please refer to XFormPrim.

6.4. Generating Data

We randomize our scene, capture the groundtruth data, and send the data to our data writer within the __next__ function.

Generate Data
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
def __next__(self):

    if self.cur_idx == self.num_mesh:  # MESH datset generation complete, switch to DOME dataset
        print(f"Starting DOME dataset generation of {self.num_dome} frames..")

        if rep.orchestrator.get_is_started():
            rep.orchestrator.stop()  # This is necessary to ensure that the first new frame will have been randomized

        # Increase subframes to 3 to clear the frames in flight and ensure dome light texture is loaded
        # See known issues: https://docs.omniverse.nvidia.com/prod_extensions/prod_extensions/ext_replicator.html
        rep.settings.carb_settings("/omni/replicator/RTSubframes", 3)

        # Hide the FlyingDistractors used for the MESH dataset
        self.mesh_distractors.set_visible(False)

        # Show the FlyingDistractors used for the DOME dataset
        self.dome_distractors.set_visible(True)

        # Switch the distractors to DOME
        self.current_distractors = self.dome_distractors

        # Randomize the dome backgrounds
        self._setup_dome_randomizers()

    # Randomize the distractors by applying forces to them and changing their materials
    self.current_distractors.apply_force_to_assets(FORCE_RANGE)
    self.current_distractors.randomize_asset_glass_color()

    # Randomize the pose of the object(s) of interest in the camera view
    for train_part in self.train_parts:
        self.randomize_movement_in_view(train_part)

    # Step physics, avoid objects overlapping each other
    world.step(render=False)
    world.step(render=False)

    print(f"ID: {self.cur_idx}/{self.train_size - 1}")
    rep.orchestrator.step()
    self.cur_idx += 1

    # Check if last frame has been reached
    if self.cur_idx >= self.train_size:
        print(f"Dataset of size {self.train_size} has been reached, generation loop will be stopped..")
        self.last_frame_reached = True

We begin the data generation process by generating samples similar to those in the MESH dataset. As samples are being generated, we (1) apply the previously described randomization function to keep the flying distractors in motion (line 26); (2) randomize the pose of our object of interest (line 31); and (3) utilize the randomization functions we defined above to change the material properties of the distractor shapes and the sphere lights, this is triggered by the internal step() function of omni.replicator (line 38)

Once num_mesh samples have been generated (line 3), we prepare our scene for generating samples similar to the DOME dataset by dynamically modifying which assets are visible. On line 14-17, we hide the FlyingDistractors used for the MESH samples, and show the smaller set of FlyingDistractors used for the DOME samples. Additionally, in _setup_dome_randomizers (line 23) we define our randomize_domelight function, register it with rep.randomizer, and call it within rep.trigger. This randomizer gets triggered once we call rep.orchestrator.step() on line 38. Note that this also calls the two previous randomization functions we defined, randomize_sphere_lights() and randomize_colors().

For more information, please refer to offline_pose_generation.py.

6.5. Writing Output

In order to capture data, we first create a camera using rep.create.camera() and create a render product using rep.create.render_product().

Creating Camera and Render Product
1
2
3
4
5
6
7
8
9
# Setup camera and render product
self.camera = rep.create.camera(
    position=(0, 0, -MAX_DISTANCE),
    rotation=CAMERA_ROTATION,
    focal_length=focal_length,
    clipping_range=(0.01, 10000),
)

self.render_product = rep.create.render_product(self.camera, (WIDTH, HEIGHT))

Then, we setup our writers to capture groundtruth data in the _setup_writer() function. We first initialize either the DOPE or YCBVideo writer on lines 8 or 23 before attaching the render product to the writer on line 31.

Setup Writers
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
def _setup_writer(self):
    """Setup the OV Replicator dataset writer and attach it to a render product.
    """

    if self.writer_helper == YCBVideoWriter:
        # Initialize and attach Replicator writer
        self.writer = rep.WriterRegistry.get("YCBVideoWriter")
        self.writer.initialize(
            output_dir=self._output_folder,
            num_frames=self.train_size,
            semantic_types=None,
            rgb=True,
            bounding_box_2d_tight=True,
            semantic_segmentation=True,
            distance_to_image_plane=True,
            pose=True,
            class_name_to_index_map=CLASS_NAME_TO_INDEX,
            factor_depth=10000,
            intrinsic_matrix=np.array([[F_X, 0, C_X], [0, F_Y, C_Y], [0, 0, 1]]),
        )
    elif self.writer_helper == DOPEWriter:
        self.writer = rep.WriterRegistry.get("DOPEWriter")
        self.writer.initialize(
            output_dir=self._output_folder,
            class_name_to_index_map=CLASS_NAME_TO_INDEX,
            use_s3=self.use_s3,
            bucket_name=self.bucket,
            endpoint_url=self.endpoint,
        )

    self.writer.attach([self.render_product])

For more details on how the DOPE or YCBVideo writers are defined, refer to dope_writer.py or ycbvideo_writer.py located in the omni.replicator.isaac extensions folder. To see how the custom annotator nodes are defined, refer to OgnDope.ogn, OgnDope.py or OgnPose.ogn, OgnPose.py in the omni.replicator.isaac extensions folder.

6.6. Switching Writers

Feel free to ignore this section if you are only using writing data in the YCB Video format.

We also have the option to switch to the DOPE writer and output data in the format used to train a DOPE network. To do so, simply specify --writer DOPE when running offline_pose_generation.py.

There is also the option to write directly to an s3 bucket when using the DOPE writer. To write to an s3 bucket instead of your local machine, pass in the --use_s3 flag at runtime. The DOPE writer uses the boto3 module to write to the bucket. boto3 expects a configuration file at ~/.aws/config. This stores your credentials and allows boto3 to authenticate before writing to the endpoint. Below is a sample file. To setup, copy this into ~/.aws/config and replace with your credentials.

Sample Config File
[default]
aws_access_key_id = <username>
aws_secret_access_key = <secret_key>
region = us-east-1

6.7. Summary

This tutorial covered the following topics:

  1. Getting transformation matrices with ComputeLocalToWorldTransform(), and manipulating transformations with the omni.isaac.core.utils.transformations library.

  2. Wrapping prims with the classes found in omni.isaac.core.prims, allowing a simple yet powerful set of APIs to be used. GeomPrim was used for collisions, RigidPrim for rigid body attributes, and XFormPrim to get and set poses.

  3. Creating custom randomization functions, allowing us to randomize (1) the pose of a prim and (2) the force applied to a prim to keep it in motion.

  4. Applying these randomization functions with OV Replicator when we randomized (1) the properties of sphere lights in the scene, (2) the material properties of distractor shapes, and (3) the texture files of background dome lights.

  5. Switching to a new writer to output data suitable for training a DOPE Model.

  6. How to set up s3 credentials to directly write data to an s3 bucket.

6.7.1. Next steps

The generated synthetic data can now be used to train a 6D pose estimation model, such as PoseCNN or PoseRBPF.