.. _isaac_sim_app_tutorial_replicator_online_generation: ========================================== Online Generation ========================================== Learning Objectives =================== This example uses ground truth visualizations from |isaac-sim| to demonstrate how to set up a PyTorch DataLoader and train Deep Neural Networks (DNNs) online (the generated training data will be directly fed to the training process without the need to store the data on disk.). The full example can be executed through the Isaac-Sim python environment, and in this tutorial you will examine the script section by section. In this tutorial, you will integrate scene generation and groundtruth collection into a PyTorch dataloader that you will use to train a `Mask-RCNN `_ instance segmentation model. .. _Shapenet Dataset DR: Mesh Converter ================== Before you can generate data, you need to convert the ShapeNet assets in the database to USD. You will need to first download the `ShapeNetCore `_ dataset to a local directory. Then, set a variable to tell the script where to find the ShapeNet dataset locally: .. code-block:: bash export SHAPENET_LOCAL_DIR= You will convert only the geometry to allow for quick loading of assets into the scene. With the `SHAPENET_LOCAL_DIR` variable set, run the following script. Note, this will create a new directory at ``{SHAPENET_LOCAL_DIR}_nomat`` where the geometry-only USD files will be stored. .. code-block:: bash ./python.sh standalone_examples/api/omni.isaac.shapenet/usd_convertor.py --categories plane watercraft rocket --max_models 100 The above command tells the script to convert the `plane watercraft rocket` categories and to convert a maximum of 100 models per category. .. Note:: Other category examples: ``table``, ``monitor``, ``phone``, ``chair``, ``bowl``, ``bench``, ``plane``, ``car``, ``microwave``, ``piano``, ``pillow``, ``sofa``, ``bottle``, ``bowl``, etc. DataLoader =========== To run the example, in Linux use the following command. .. code-block:: bash ./python.sh standalone_examples/replicator/online_generation/generate_shapenet.py \ --root $SHAPENET_LOCAL_DIR'_nomat' \ --categories plane watercraft rocket \ --max_asset_size 50 In windows, use the following command. .. code-block:: bash python.bat standalone_examples/replicator/online_generation/generate_shapenet.py --root %SHAPENET_LOCAL_DIR%_nomat --categories plane watercraft rocket --max_asset_size 50 The ``generate_shapenet.py`` script will generate an endless stream of randomized data with which to train. Below is a visualization of the data that the dataset is producing with the ``plane watercraft rocket`` categories selected. Open the ``_out_gen_imgs/domain_randomization_test_image_*.png`` locally saved image file to see the visualization. .. figure:: /content/images/isaac_synth-data_dataset.gif :align: center :alt: Instance Segmentation Dataset The Code ^^^^^^^^ The Dataloader Core ------------------- To create a dataloader, you will use the PyTorch ``torch.utils.data.IterableDataset`` class, which will generate an endless stream of random scenes, each with a corresponding groundtruth. The basic structure for the dataset is shown below: .. code-block:: python class MyAwesomeDataset(torch.utils.data.IterableDataset): def __init__(self): # Setup the scene, lights, walls, camera, etc. setup_scene() # Setup replicator randomizer graph setup_replicator() def __next__(self): # Trigger a randomization and a render of the scene self.rep.orchestrator.step() # Collect groundtruth gt = { "rgb": self.rgb.get_data(device="cuda"), "boundingBox2DTight": self.bbox_2d_tight.get_data(device="cpu"), "instanceSegmentation": self.instance_seg.get_data(device="cuda"), } # [..] return image, target Now that you have an outline, assemble your dataset by simply filling in the ``__next__`` method. You generate and randomize the scene in lines 8-25. The next step is to collect the groundtruth, as shown in line 35. The code that follows in lines 39-77 consists of preparing the data for the model to consume; this code will be in large part specific to the model your are using and your application. .. code-block:: python :linenos: def __iter__(self): return self def __next__(self): # Step - trigger a randomization and a render self.rep.orchestrator.step() # Collect Groundtruth gt = { "rgb": self.rgb.get_data(device="cuda"), "boundingBox2DTight": self.bbox_2d_tight.get_data(device="cpu"), "instanceSegmentation": self.instance_seg.get_data(device="cuda"), } # RGB # Drop alpha channel image = self.wp.to_torch(gt["rgb"])[..., :3] # Normalize between 0. and 1. and change order to channel-first. image = image.float() / 255.0 image = image.permute(2, 0, 1) # Bounding Box gt_bbox = gt["boundingBox2DTight"]["data"] # Create mapping from categories to index bboxes = torch.tensor(gt_bbox[["x_min", "y_min", "x_max", "y_max"]].tolist(), device="cuda") id_to_labels = gt["boundingBox2DTight"]["info"]["idToLabels"] prim_paths = gt["boundingBox2DTight"]["info"]["primPaths"] # For each bounding box, map semantic label to label index cat_to_id = {cat: i + 1 for i, cat in enumerate(self.categories)} semantic_labels_mapping = {int(k): v.get("class", "") for k, v in id_to_labels.items()} semantic_labels = [cat_to_id[semantic_labels_mapping[i]] for i in gt_bbox["semanticId"]] labels = torch.tensor(semantic_labels, device="cuda") # Calculate bounding box area for each area areas = (bboxes[:, 2] - bboxes[:, 0]) * (bboxes[:, 3] - bboxes[:, 1]) # Identify invalid bounding boxes to filter final output valid_areas = (areas > 0.0) * (areas < (image.shape[1] * image.shape[2])) # Instance Segmentation instance_data = self.wp.to_torch(gt["instanceSegmentation"]["data"]).squeeze() path_to_instance_id = {v: int(k) for k, v in gt["instanceSegmentation"]["info"]["idToLabels"].items()} instance_list = [im[0] for im in gt_bbox] masks = torch.zeros((len(instance_list), *instance_data.shape), dtype=bool, device="cuda") # Filter for the mask of each object for i, prim_path in enumerate(prim_paths): # Merge child instances of prim_path as one instance for instance in path_to_instance_id: if prim_path in instance: masks[i] += torch.isin(instance_data, path_to_instance_id[instance]) target = { "boxes": bboxes[valid_areas], "labels": labels[valid_areas], "masks": masks[valid_areas], "image_id": torch.LongTensor([self.cur_idx]), "area": areas[valid_areas], "iscrowd": torch.BoolTensor([False] * len(bboxes[valid_areas])), # Assume no crowds } self.cur_idx += 1 return image, target Details about rest of the dataloader, indcluding the initialization step and methods specified within ``__next__``, are explained in the below sections. Initialization Step ------------------- First, launch kit using the ``SimulationApp`` and the rendering configurations. Once the app starts, the default Isaac extensions are hot-loaded so you can ``import`` from them. You then set up replicator and your nucleus server, which are used in this example to manage the domain randomization assets. Domain randomization is entirely handled through replicator in this example. .. code-block:: python from omni.isaac.kit import SimulationApp # Setup default variables RESOLUTION = (1024, 1024) OBJ_LOC_MIN = (-50, 5, -50) OBJ_LOC_MAX = (50, 5, 50) CAM_LOC_MIN = (100, 0, -100) CAM_LOC_MAX = (100, 100, 100) SCALE_MIN = 15 SCALE_MAX = 40 # Default rendering parameters RENDER_CONFIG = {"renderer": "PathTracing", "samples_per_pixel_per_frame": 12, "headless": False} class RandomObjects(torch.utils.data.IterableDataset): def __init__( self, root, categories, max_asset_size=None, num_assets_min=3, num_assets_max=5, split=0.7, train=True ): assert len(categories) > 1 assert (split > 0) and (split <= 1.0) self.kit = SimulationApp(RENDER_CONFIG) from omni.isaac.shapenet import utils import omni.replicator.core as rep import warp as wp self.rep = rep self.wp = wp from omni.isaac.core.utils.nucleus import get_assets_root_path self.assets_root_path = get_assets_root_path() if self.assets_root_path is None: carb.log_error("Could not find Isaac Sim assets folder") return . . . The ``self._find_usd_assets()`` method will search the ``root`` directory for USD files within the category directories you've specified and return their paths. When you want to add a new asset to your scene, you will simply pick a path at random and attach it as a reference to a new prim in the scene. Use ``split`` to select a subset of training samples so that you can keep a hold-out set for validation. Finally, ``self.setup_scene()`` creates a room, lights, and a camera. .. code-block:: python class RandomObjects(torch.utils.data.IterableDataset): def __init__( self, root, categories, max_asset_size=None, num_assets_min=3, num_assets_max=5, split=0.7, train=True ): . . . # If ShapeNet categories are specified with their names, convert to synset ID # Remove this if using with a different dataset than ShapeNet category_ids = [utils.LABEL_TO_SYNSET.get(c, c) for c in categories] self.categories = category_ids self.range_num_assets = (num_assets_min, max(num_assets_min, num_assets_max)) try: self.references = self._find_usd_assets(root, category_ids, max_asset_size, split, train) except ValueError as err: carb.log_error(str(err)) self.kit.close() sys.exit() # Setup the scene, lights, walls, camera, etc. self.setup_scene() # Setup replicator randomizer graph self.setup_replicator() self.cur_idx = 0 self.exiting = False signal.signal(signal.SIGINT, self._handle_exit) def _find_usd_assets(self, root, categories, max_asset_size, split, train=True): ... # (see code for implementation details) def setup_scene(self): ... # (see code for implementation details) Setting up a Replicator graph ------------------------------ Now, we want to setup our randomizers to vary the content and appearance of every frame. We do this by leveraging Omni.Replicator. Replicator enables us to creates a randomization graph which will execute our specified randomizations. We'll start by setting our static components, in this case two sphere lights. Next, we set a replicator ``on_frame`` trigger, which will let us trigger randomization at each new frame. We then create the randomization components. The first will modify the ``color`` attribute of our two lights. Next, we randomize the camera position and set its ``look_at`` value to the origin so that the camera always orients itself towards that point. Finally, we setup our asset randomizers for each asset category and randomize their position, rotation, scale, and material texture. Using the ``instantiate`` method, we create a prototype of the asset in cache where new instances will reference the created prototype. .. code-block:: python def _instantiate_category(self, category, references): with self.rep.randomizer.instantiate(references, size=1, mode="reference"): self.rep.modify.semantics([("class", category)]) self.rep.modify.pose( position=self.rep.distribution.uniform(OBJ_LOC_MIN, OBJ_LOC_MAX), rotation=self.rep.distribution.uniform((0, -180, 0), (0, 180, 0)), scale=self.rep.distribution.uniform(SCALE_MIN, SCALE_MAX), ) self.rep.randomizer.texture(self._get_textures(), project_uvw=True) def setup_replicator(self): """Setup the replicator graph with various attributes.""" # Create two sphere lights light1 = self.rep.create.light(light_type="sphere", position=(-450, 350, 350), scale=100, intensity=30000.0) light2 = self.rep.create.light(light_type="sphere", position=(450, 350, 350), scale=100, intensity=30000.0) with self.rep.new_layer(): with self.rep.trigger.on_frame(): # Randomize light colors with self.rep.create.group([light1, light2]): self.rep.modify.attribute("color", self.rep.distribution.uniform((0.1, 0.1, 0.1), (1.0, 1.0, 1.0))) # Randomize camera position with self.camera: self.rep.modify.pose( position=self.rep.distribution.uniform((100, 0, -100), (100, 100, 100)), look_at=(0, 0, 0) ) # Randomize asset positions and textures for category, references in self.references.items(): self._instantiate_category(category, references) # Run replicator for a single iteration without triggering any writes self.rep.orchestrator.preview() Train ===== Getting Started ^^^^^^^^^^^^^^^^^ Now that you have a dataloader, you can start training. To run the training example, use the following command. .. code-block:: bash ./python.sh standalone_examples/replicator/online_generation/train_shapenet.py \ --root $SHAPENET_LOCAL_DIR'_nomat' \ --categories plane watercraft rocket \ --visualize \ --max_asset_size 50 You should see the loss going down in your terminal and, after approximately 100 iterations, start to see instance segmentation and object detection results being visualized. The ``max_asset_size 5`` argument tells the dataset to skip assets over 5 MB in size. This helps avoid out-of-memory errors caused by loading larger assets. This value can be increased depending on the capacity of the GPU in use. The specific optimizer used in this example maintains a gradient history that grows with iteration number. If you lack VRAM on your hardware, you can adjust the ``--max_iters`` command line argument to address this. Open the ``_out_train_imgs/train_*.png`` locally saved image file to see something like below during training. .. figure:: /content/images/isaac_synth-data_train.gif :align: center :alt: Instance Segmentation Training The Code ^^^^^^^^ First, set up the device, dataset, dataloader, model, and optimizer. .. code-block:: python device = "cuda" # Setup data train_set = RandomObjects( args.root, args.categories, num_assets_min=3, num_assets_max=5, max_asset_size=args.max_asset_size ) train_loader = DataLoader(train_set, batch_size=2, collate_fn=lambda x: tuple(zip(*x))) # Setup Model model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=False, num_classes=1 + len(args.categories)) model = model.to(device) optimizer = torch.optim.Adam(model.parameters(), lr=args.learning_rate) Next, set up the training loop. After sending the data to the GPU, perform a forward pass through the model, calculate the loss, and perform a backward pass to update the model weights. .. code-block:: python for i, train_batch in enumerate(train_loader): if i > args.max_iters: break model.train() images, targets = train_batch images = [i.to(device) for i in images] targets = [{k: v.to(device) for k, v in t.items()} for t in targets] loss_dict = model(images, targets) loss = sum(loss for loss in loss_dict.values()) print(f"ITER {i} | {loss:.6f}") optimizer.zero_grad() loss.backward() optimizer.step()