3. Offline Dataset Generation

3.1. Learning Objectives

This tutorial demonstrates how to generate a synthetic dataset offline that can be used for training deep neural networks. The full example can be executed through the Isaac-Sim python environment, and this tutorial will use standalone_examples/replicator/offline_generation.py to demonstrate the use of omni.replicator extension to collect ground-truth information from the sensors that come with omni.replicator.

After this tutorial, you should be able to collect and save sensor data from a stage and randomize components in it.

25-30 min tutorial

3.1.1. Prerequisites

Read the Getting Started With Replicator document to become familiar with the basics of omni.replicator

3.2. Getting Started

To generate a synthetic dataset offline, run the following command.

./python.sh standalone_examples/replicator/offline_generation.py

3.3. Start Omniverse Isaac Sim running

The code for this tutorial is a little different than the standard omni.replicator example, which are generally written to be run in the script editor inside the Kit GUI.

After some settings, we start up an instance of Omniverse Isaac Sim in headless mode. The SimulationApp object must be created before omni.replicator.core can be imported. In fact, most omni libraries will not load when run in the python.sh script without first creating an instance of a SimulationApp.

Starting Isaac Sim
12
13
14
15
16
17
18
19
20
21
from omni.isaac.kit import SimulationApp

# Set rendering parameters and create an instance of kit
CONFIG = {"renderer": "RayTracedLighting", "headless": True, "width": 1024, "height": 1024, "num_frames": 10}
STAGE = "/Isaac/Environments/Simple_Warehouse/full_warehouse.usd"

kit = SimulationApp(launch_config=CONFIG)

# we will be using the replicator library
import omni.replicator.core as rep

3.4. Loading the Environment

The environment is a USD stage. Once a new layer is created with rep.new_layer, the stage is loaded in that layer with the rep.create.from_usd function.

Important

The fact that we load the stage in a new layer is important. If a prim in the original stage is called, for example, /World/Camera/RandomCamera, then in the new layer that same prim will be named /Replicator/Ref/Camera/RandomCamera. You can see this when we choose the object to look at with our camera.

Load the stage
53
54
55
with rep.new_layer():
    print("Loading Stage")
    env = rep.create.from_usd(STAGE)

3.5. Creating The Sensors

We create the sensors by specifying a camera, with rep.create.camera, and a render product with rep.WriterRegistry.get("BasicWriter"). The BasicWriter outputs Numpy data. The render product can render any or all image type, like distance or segmentation data, or it can save the data for things like bounding boxes in a non image format.

Which type of data we gather with the camera will be determined when we initialize a Writer, which we will talk about later.

Creating the camera
57
58
    camera = rep.create.camera(position=(-2.5, 1.2, 2.5), clipping_range=(0.01, 10000.0))
    render_product = rep.create.render_product(camera, (CONFIG["width"], CONFIG["height"]))

3.6. Domain Randomization

In this example, we will randomize the camera position. There are many other things that can be randomized, and you are encouraged to read the omni.replicator documentation to see them all!

We want the camera to move around the stage, and look back at a single object from different angles. By setting the object to look at when we create the camera, and randomly moving around the position of the camera, we do just that.

Domain Randomization
60
61
62
63
64
65
    with rep.trigger.on_frame(num_frames=CONFIG["num_frames"]):
        with camera:
            rep.modify.pose(
                position=rep.distribution.uniform((-6.00, -10.0, 1.0), (4.00, 7.0, 5.0)),
                look_at="/Replicator/Ref/SM_CardBoxA_3",
            )

3.7. Choosing the data and format

When writing the data, we pick the format we want the data in by picking a writer type. Then initialize it with the sensors that we would like to output. In this case, we want to output all possible data, so we set all sensor types to True when we initialize the writer. And finally we attach the render_product to the writer.

At this point, the data is not yet written, it is only set up to write. The run_orchestrator actually makes the writer write.

Data will be saved in the directory used to start the process in the _output_headless subfolder, the full path will be output when the script is executed.

Note

The run_orchestrator function is left as the last line, and we will talk about it’s importance in the next section.

Choose Data to write and it's format
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
    writer = rep.WriterRegistry.get("BasicWriter")
    output_directory = os.getcwd() + "/_output_headless"
    print("Outputting data to ", output_directory)
    writer.initialize(
        output_dir=output_directory,
        rgb=True,
        bounding_box_2d_tight=True,
        bounding_box_2d_loose=True,
        semantic_segmentation=True,
        instance_segmentation=True,
        distance_to_camera=True,
        distance_to_image_plane=True,
        bounding_box_3d=True,
        occlusion=True,
        normals=True,
        motion_vectors=True
    )

    writer.attach([render_product])
    run_orchestrator()

3.8. Collecting randomized data

Once the environment is set up, the parameters we wish to randomize are set, the data format is chosen, and the sensor information we want to output is chosen, we need something to control all those pieces. That is where rep.orchestrator comes in.

The first thing we do is start it running with rep.orchestrator.run(), and in fact, if this was being done in the script editor in kit, that is all we would have to do.

Because we are running headless in a standalone script, we need to make sure the orchestrator has every things started before we start. Once we start, we then have to wait till the orchestrator is finished.

And, finally, the rep.BackendDispatch.wait_until_done() makes sure all the files we opened are closed so as not to have corrupt or missing data.

Collecting the data
39
40
41
42
43
44
45
46
47
48
49
50
def run_orchestrator():
    rep.orchestrator.run()

    # Wait until started
    while not rep.orchestrator.get_is_started():
        kit.update()

    # Wait until stopped
    while rep.orchestrator.get_is_started():
        kit.update()

    rep.BackendDispatch.wait_until_done()

3.9. Summary

This tutorial covered the following topics:

  1. Starting a SimulationApp instance of Omniverse Isaac Sim to work with replicator

  2. Loading a stage into a replicator layer

  3. Setting up a camera and randomizing it’s position

  4. Choosing which data and which format to write out

  5. Using orchestrator to run the data collection.

3.9.1. Next Steps

One possible use for the created data is with the TAO Toolit.

Once the generated synthetic data is in Kitti format, you can use the TAO Toolkit to train a model. TAO provides segmentation, classification and object detection models. This example uses object detection with the Detectnet V2 model as a use case.

To get started with TAO, follow the set up instructions. Then, activate the virtual environment and download the Jupyter Notebooks as explained in detail here.

TAO uses Jupyter notebooks to guide you through the training process. In the folder cv_samples_v1.3.0, you will find notebooks for multiple models. You can use any of the object detection networks for this use case, but this example uses Detectnet_V2.

In the detectnet_v2 folder, you will find the Jupyter notebook and the specs folder. The TAO Detectnet_V2 documentation goes into more detail about this sample. TAO works with configuration files that can be found in the specs folder. Here you need to modify the specs to refer to the generated synthetic data as the input.

To prepare the data, you need to run the following command.

tao detectnet_v2 dataset-convert [-h] -d DATASET_EXPORT_SPEC -o OUTPUT_FILENAME [-f VALIDATION_FOLD]

This is in the Jupyter notebook with a sample configuration. Modify the spec file to match the folder structure of your synthetic data. The data will be in TFrecord format and is ready for training. Again, you need to change the spec file for training to represent the path to the synthetic data and the classes being detected.

tao detectnet_v2 train [-h] -k <key>
                        -r <result directory>
                        -e <spec_file>
                        [-n <name_string_for_the_model>]
                        [--gpus <num GPUs>]
                        [--gpu_index <comma separate gpu indices>]
                        [--use_amp]
                        [--log_file <log_file>]

For any questions regarding the TAO Toolkit, refer to the TAO documentation, which goes into further detail.

3.9.2. Further Learning

To learn how to use Omniverse Isaac Sim to create data sets in an interactive manner, see the Synthetic Data Recorder, and then visualize them with the Synthetic Data Visualizer.