# 5. Online Generation¶

## 5.1. Learning Objectives¶

This example extends from the visualize synthetic data sample and demonstrate how to setup a PyTorch DataLoader and train Deep Neural Networks (DNNs) in an online manner. The full example can be executed through the Isaac-Sim python environment and in this tutorial we will examine that script section by section.

In this tutorial, we will integrate scene generation and groundtruth collection into a PyTorch dataloader that we will use to train a Mask-RCNN instance segmentation model.

## 5.2. Mesh Converter¶

Before we can generate data, we need to convert the shapenet assets in the database to USD. We assume you have already downloaded the ShapeNet dataset to some local directory. First, we set a variable to tell the script where to find ShapeNet dataset locally:

export SHAPENET_LOCAL_DIR=<path/to/shapenet>


We will convert only the geometry to allow for quick loading of assets into our scene. With the SHAPENET_LOCAL_DIR variable set, run the following script. Note, this will create a new directory at {SHAPENET_LOCAL_DIR}_nomat where the geometry-only USD files will be stored.

./python.sh standalone_examples/api/omni.isaac.shapenet/usd_convertor.py --categories plane watercraft rocket --max-models 100


Here we’ve told the script to convert the plane watercraft rocket categories and to convert a maximum of 100 models per category.

To run the example, use the following command.

./python.sh standalone_examples/replicator/online_generation/generate_shapenet.py \
--root $SHAPENET_LOCAL_DIR'_nomat' \ --categories plane watercraft rocket \ --max-asset-size 50  The generate_shapenet.py script will generate an endless stream of randomized data with which to train. Below, we show a visualization of the data our dataset is producing with the plane watercraft rocket categories selected. Open locally saved image file domain_randomization_test_image_*.png to see the visualization. ### 5.3.1. The Code¶ #### 5.3.1.1. Core of dataloader¶ To create a dataloader, we will use PyTorch’s torch.utils.data.IterableDataset class that will generate an endless stream of random scenes and their corresponding groundtruth. The basic structure for the dataset that we will follow is shown below: class MyAwesomeDataset(torch.utils.data.IterableDataset): def __init__(self): setup_scene() def __next__(self): populate_scene() randomize_scene() gt = collect_groundtruth() return gt  Now, that we have a skeleton of what we want to do, let’s put our dataset together by simply filling in the __next__ method. We generate and randomize our scene in lines 8-25. The next step is to collect the groundtruth as shown in line 35. The code that follows in lines 39-77 consists of preparing the data for our model to consume and will be in large part specific to the model we are using and our application.   1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 def __iter__(self): return self def __next__(self): from omni.isaac.core.utils.stage import is_stage_loading # Generate a new scene self.populate_scene() self.randomize_camera() """The below update calls set the paths of prims that need to be randomized with the settings provided in their corresponding DR create component """ # In this example, either update texture or color of assets # self.update_dr_comp(self.color_comp) self.update_dr_comp(self.texture_comp) # Also update movement, rotation and scale components # self.update_dr_comp(self.movement_comp) # self.update_dr_comp(self.rotation_comp) self.update_dr_comp(self.scale_comp) # randomize once self.dr.commands.RandomizeOnceCommand().do() # step once and then wait for materials to load self.kit.update() print("waiting for materials to load...") if is_stage_loading(): self.kit.update() print("done") self.kit.update() # Collect Groundtruth gt = self.sd_helper.get_groundtruth(["rgb", "boundingBox2DTight", "instanceSegmentation"], self.viewport) # RGB # Drop alpha channel image = gt["rgb"][..., :3] # Cast to tensor if numpy array if isinstance(gt["rgb"], np.ndarray): image = torch.tensor(image, dtype=torch.float, device="cuda") # Normalize between 0. and 1. and change order to channel-first. image = image.float() / 255.0 image = image.permute(2, 0, 1) # Bounding Box gt_bbox = gt["boundingBox2DTight"] # Create mapping from categories to index mapping = {cat: i + 1 for i, cat in enumerate(self.categories)} bboxes = torch.tensor(gt_bbox[["x_min", "y_min", "x_max", "y_max"]].tolist()) # For each bounding box, map semantic label to label index labels = torch.LongTensor([mapping[bb["semanticLabel"]] for bb in gt_bbox]) # Calculate bounding box area for each area areas = (bboxes[:, 2] - bboxes[:, 0]) * (bboxes[:, 3] - bboxes[:, 1]) # Identify invalid bounding boxes to filter final output valid_areas = (areas > 0.0) * (areas < (image.shape[1] * image.shape[2])) # Instance Segmentation instance_data, instance_mappings = gt["instanceSegmentation"][0], gt["instanceSegmentation"][1] instance_list = [im[0] for im in gt_bbox] masks = np.zeros((len(instance_list), *instance_data.shape), dtype=np.bool) for i, instances in enumerate(instance_list): masks[i] = np.isin(instance_data, instances) if isinstance(masks, np.ndarray): masks = torch.tensor(masks, device="cuda") target = { "boxes": bboxes[valid_areas], "labels": labels[valid_areas], "masks": masks[valid_areas], "image_id": torch.LongTensor([self.cur_idx]), "area": areas[valid_areas], "iscrowd": torch.BoolTensor([False] * len(bboxes[valid_areas])), # Assume no crowds } self.cur_idx += 1 return image, target  Details about rest of the dataloader that includes the initialization step and methods specified within __next__ are explained in the below sections. #### 5.3.1.2. Initialization step¶ We first launch kit using the SimulationApp and pass it our rendering configuration. Once the app starts, the default isaac extensions are hot loaded and we can import from them. We then setup the SyntheticDataHelper that we used in the earlier examples, as well as our nucleus server which we use in this example to manage our domain randomization assets. Domain randomization is entirely handled through the dr extension. from omni.isaac.kit import SimulationApp # Setup default generation variables # Value are (min, max) ranges RANDOM_TRANSLATION_X = (-30.0, 30.0) RANDOM_TRANSLATION_Z = (-30.0, 30.0) RANDOM_ROTATION_Y = (0.0, 360.0) SCALE = 20 CAMERA_DISTANCE = 300 BBOX_AREA_THRESH = 16 # Default rendering parameters RENDER_CONFIG = {"renderer": "PathTracing", "samples_per_pixel_per_frame": 12, "headless": False} class RandomObjects(torch.utils.data.IterableDataset): def __init__( self, root, categories, max_asset_size=None, num_assets_min=3, num_assets_max=5, split=0.7, train=True ): assert len(categories) > 1 assert (split > 0) and (split <= 1.0) self.kit = SimulationApp(RENDER_CONFIG) from omni.isaac.synthetic_utils import SyntheticDataHelper from omni.isaac.shapenet import utils import omni.isaac.dr as dr from omni.isaac.core.utils.nucleus_utils import find_nucleus_server self.sd_helper = SyntheticDataHelper() self.dr = dr self.dr.commands.ToggleManualModeCommand().do() self.stage = self.kit.context.get_stage() result, nucleus_server = find_nucleus_server() if result is False: carb.log_error("Could not find nucleus server with /Isaac folder") return self.asset_path = nucleus_server + "/Isaac" . . .  The self._find_usd_assets() method will search the root directory within the category directories we’ve specified for USD files and return their paths. When we want to add a new asset to our scene we will simply pick a path at random and attach it as a reference to a new prim in our scene. We use split to select a subset of training samples so that we can keep a hold-out set for validation. Finally, self._setup_world() creates a room, lights and a camera. class RandomObjects(torch.utils.data.IterableDataset): def __init__( self, root, categories, max_asset_size=None, num_assets_min=3, num_assets_max=5, split=0.7, train=True ): . . . # If ShapeNet categories are specified with their names, convert to synset ID # Remove this if using with a different dataset than ShapeNet category_ids = [utils.LABEL_TO_SYNSET.get(c, c) for c in categories] self.categories = category_ids self.range_num_assets = (num_assets_min, max(num_assets_min, num_assets_max)) self.references = self._find_usd_assets(root, category_ids, max_asset_size, split, train) self._setup_world() self.cur_idx = 0 self.exiting = False signal.signal(signal.SIGINT, self._handle_exit) def _find_usd_assets(self, root, categories, max_asset_size, split, train=True): ... # (see code for implementation details) def _setup_world(self): ... # (see code for implementation details)  #### 5.3.1.3. Load Assets¶ Now, we want to load assets and place them in our scene so that they rest on the ground plane in random poses. We will create a load_single_asset method that does just that. Note that to ensure the asset is resting on the ground, we simply get its bounds with ComputeWorldBound() and translate it by the negative of its y (up-axis) component. def load_single_asset(self, ref, semantic_label, suffix=""): from pxr import UsdGeom from omni.isaac.core.utils.prims import create_prim from omni.isaac.core.utils.rotations import euler_angles_to_quat from omni.isaac.core.utils.prims import get_prim_path """Load a USD asset with random pose. args ref (str): Path to the USD that this prim will reference. semantic_label (str): Semantic label. suffix (str): String to add to the end of the prim's path. """ x = random.uniform(*RANDOM_TRANSLATION_X) z = random.uniform(*RANDOM_TRANSLATION_Z) rot_y = random.uniform(*RANDOM_ROTATION_Y) asset = None try: _asset = create_prim( f"/World/Asset/mesh{suffix}", "Xform", scale=np.array([SCALE, SCALE, SCALE]), orientation=euler_angles_to_quat(np.array([0.0, rot_y, 0.0])), usd_path=ref, semantic_label=semantic_label, ) asset = _asset except: carb.log_warn("load_single_asset failure") print(ref, semantic_label, suffix) print("CURRENT PATHS**********************************") curr_prim = self.stage.GetPrimAtPath("/") for prim in Usd.PrimRange(curr_prim): print(get_prim_path(prim)) print("END ERROR PRINTS********************************") bound = UsdGeom.Mesh(asset).ComputeWorldBound(0.0, "default") box_min_y = bound.GetBox().GetMin()[1] UsdGeom.XformCommonAPI(asset).SetTranslate((x, -box_min_y, z)) return asset  #### 5.3.1.4. Populate scene¶ Now that we can generate a single asset, we can populate the whole scene by simply looping through the number of assets we want to generate. def populate_scene(self): """Clear the scene and populate it with assets.""" from omni.isaac.core.utils.prims import delete_prim delete_prim("/World/Asset") self.assets = [] num_assets = random.randint(*self.range_num_assets) for i in range(num_assets): category = random.choice(list(self.references.keys())) ref = random.choice(self.references[category]) self.assets.append(self.load_single_asset(ref, category, i))  #### 5.3.1.5. Randomize scene¶ Every time we query the dataset for new images, we want to randomize the scene. We manage this process through the domain randomization extension, which is covered in more detail here. def create_dr_comp(self): """Creates DR components with various attributes. The asset prims to randomize is an empty list for most components since we get a new list of assets every iteration. The asset list will be updated for each component in update_dr_comp() """ texture_list = [ self.asset_path + "/Samples/DR/Materials/Textures/checkered.png", self.asset_path + "/Samples/DR/Materials/Textures/marble_tile.png", self.asset_path + "/Samples/DR/Materials/Textures/picture_a.png", self.asset_path + "/Samples/DR/Materials/Textures/picture_b.png", self.asset_path + "/Samples/DR/Materials/Textures/textured_wall.png", self.asset_path + "/Samples/DR/Materials/Textures/checkered_color.png", ] material_list = [ self.asset_path + "/Samples/DR/Materials/checkered.mdl", self.asset_path + "/Samples/DR/Materials/checkered_color.mdl", self.asset_path + "/Samples/DR/Materials/marble_tile.mdl", self.asset_path + "/Samples/DR/Materials/picture_a.mdl", self.asset_path + "/Samples/DR/Materials/picture_b.mdl", self.asset_path + "/Samples/DR/Materials/textured_wall.mdl", ] light_list = ["World/Light1", "World/Light2"] self.texture_comp = self.dr.commands.CreateTextureComponentCommand( prim_paths=[], enable_project_uvw=True, texture_list=texture_list ).do() self.color_comp = self.dr.commands.CreateColorComponentCommand(prim_paths=[]).do() self.movement_comp = self.dr.commands.CreateMovementComponentCommand(prim_paths=[]).do() self.rotation_comp = self.dr.commands.CreateRotationComponentCommand(prim_paths=[]).do() self.scale_comp = self.dr.commands.CreateScaleComponentCommand(prim_paths=[], max_range=(50, 50, 50)).do() self.light_comp = self.dr.commands.CreateLightComponentCommand(light_paths=light_list).do() self.visibility_comp = self.dr.commands.CreateVisibilityComponentCommand(prim_paths=[]).do() def update_dr_comp(self, dr_comp): """Updates DR component with the asset prim paths that will be randomized""" comp_prim_paths_target = dr_comp.GetPrimPathsRel() comp_prim_paths_target.ClearTargets(True) for asset in self.assets: comp_prim_paths_target.AddTarget(asset.GetPrimPath())  Finally, we also want to vary our viewpoint. We want to keep our camera pointing to the center of the stage but vary its azimuth and elevation angles. An easy trick to do this is to make the camera a child of an Xform prim which we’ll call camera_rig. Now to vary the distance from the camera to the center of the stage, we translate the camera with respect to the rig, and to change the azimuth and elevation angles, we rotate the rig. We’ve set the camera as a child of a camera_rig Xform in our _setup_world() method, so our randomize_camera() method below simply clears any previous transform and sets new angles on the Y and X axes. def randomize_camera(self): """Randomize the camera position.""" # By simply rotating a camera "rig" instead repositioning the camera # itself, we greatly simplify our job. # Clear previous transforms self.camera_rig.ClearXformOpOrder() # Change azimuth angle self.camera_rig.AddRotateYOp().Set(random.random() * 360) # Change elevation angle self.camera_rig.AddRotateXOp().Set(random.random() * -90)  ## 5.4. Train¶ ### 5.4.1. Getting Started¶ Now that we have a dataloader, we can start training. To run the training example, use the following command. ./python.sh standalone_examples/replicator/online_generation/train_shapenet.py \ --root$SHAPENET_LOCAL_DIR'_nomat' \
--categories plane watercraft rocket \
--visualize \
--max-asset-size 50


You should see the loss going down in your terminal and after a hundred iterations or so start to see instance segmentation and object detection results being visualized. The max-asset-size 5 argument tells the dataset to skip assets over 5 MB in size. This helps avoid out of memory errors caused by loading larger assets. This value can be increased depending on the capacity of the GPU in use. The specific optimizer we use in this example maintains a gradient history which grows with iteration number. If you lack VRAM on your hardware, you can adjust the --max-iters command line argument to address this.

Open locally saved image file train.png to see something like below during training.

### 5.4.2. The Code¶

First, we setup our device, dataset, dataloader, model and optimizer.

device = "cuda"

# Setup data
train_set = RandomObjects(
args.root, args.categories, num_assets_min=3, num_assets_max=5, max_asset_size=args.max_asset_size
)
train_loader = DataLoader(train_set, batch_size=2, collate_fn=lambda x: tuple(zip(*x)))

# Setup Model
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=False, num_classes=1 + len(args.categories))
model = model.to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=args.learning_rate)


Next, we have our training loop. After sending our data to GPU, we do a forward pass through our model, calculate the loss, and do a backward pass to update the model’s weights.

for i, train_batch in enumerate(train_loader):
if i > args.max_iters:
break

model.train()
images, targets = train_batch
images = [i.to(device) for i in images]
targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
loss_dict = model(images, targets)
loss = sum(loss for loss in loss_dict.values())

print(f"ITER {i} | {loss:.6f}")