.. _isaac_sim_app_tutorial_replicator_online_generation:

==========================================
Online Generation
==========================================

Learning Objectives
===================

This example uses ground truth visualizations from |isaac-sim| to demonstrate how to set up a PyTorch DataLoader and train Deep Neural Networks (DNNs) online (the generated training data will be directly fed to the training process without the need to store the data on disk.).
The full example can be executed through the Isaac-Sim python environment, and in this tutorial you
will examine the script section by section.

In this tutorial, you will integrate scene generation and groundtruth collection into a PyTorch dataloader
that you will use to train a `Mask-RCNN <https://arxiv.org/abs/1703.06870>`_ instance segmentation model.

.. _Shapenet Dataset DR:

Mesh Converter
==================

Before you can generate data, you need to convert the ShapeNet assets in the database to USD. You will
need to first download the `ShapeNetCore <https://shapenet.org>`_ dataset to a local directory. Then,
set a variable to tell the script where to find the ShapeNet dataset locally:

.. code-block:: bash

    export SHAPENET_LOCAL_DIR=<path/to/shapenet>

You will convert only the geometry to allow for quick loading of assets into the scene.
With the `SHAPENET_LOCAL_DIR` variable set, run the following script. Note, this will create a new
directory at ``{SHAPENET_LOCAL_DIR}_nomat`` where the geometry-only USD files will be stored.

.. code-block:: bash

    ./python.sh standalone_examples/api/omni.isaac.shapenet/usd_convertor.py --categories plane watercraft rocket --max_models 100

The above command tells the script to convert the `plane watercraft rocket` categories and to
convert a maximum of 100 models per category.

.. Note:: Other category examples: ``table``, ``monitor``, ``phone``, ``chair``, ``bowl``, ``bench``, ``plane``, ``car``, ``microwave``, ``piano``, ``pillow``, ``sofa``,  ``bottle``, ``bowl``, etc.


DataLoader
===========

To run the example, in Linux use the following command.

.. code-block:: bash

    ./python.sh standalone_examples/replicator/online_generation/generate_shapenet.py \
    --root $SHAPENET_LOCAL_DIR'_nomat' \
    --categories plane watercraft rocket \
    --max_asset_size 50

In windows, use the following command.

.. code-block:: bash

    python.bat standalone_examples/replicator/online_generation/generate_shapenet.py --root %SHAPENET_LOCAL_DIR%_nomat --categories plane watercraft rocket --max_asset_size 50

The ``generate_shapenet.py`` script will generate an endless stream of randomized data with which to
train. Below is a visualization of the data that the dataset is producing with the ``plane watercraft rocket``
categories selected. Open the ``_out_gen_imgs/domain_randomization_test_image_*.png`` locally saved image file to
see the visualization.

.. figure:: /content/images/isaac_synth-data_dataset.gif
    :align: center
    :alt: Instance Segmentation Dataset

The Code
^^^^^^^^

The Dataloader Core
-------------------

To create a dataloader, you will use the PyTorch ``torch.utils.data.IterableDataset`` class, which
will generate an endless stream of random scenes, each with a corresponding groundtruth. The basic
structure for the dataset is shown below:

.. code-block:: python

    class MyAwesomeDataset(torch.utils.data.IterableDataset):
        def __init__(self):
            # Setup the scene, lights, walls, camera, etc.
            setup_scene()
            # Setup replicator randomizer graph
            setup_replicator()

        def __next__(self):
            # Trigger a randomization and a render of the scene
            self.rep.orchestrator.step()
            # Collect groundtruth 
            gt = {
                "rgb": self.rgb.get_data(device="cuda"),
                "boundingBox2DTight": self.bbox_2d_tight.get_data(device="cpu"),
                "instanceSegmentation": self.instance_seg.get_data(device="cuda"),
            }
            # [..]
            return image, target

Now that you have an outline, assemble your dataset by simply filling in the ``__next__`` method.
You generate and randomize the scene in lines 8-25. The next step is to collect the groundtruth,
as shown in line 35. The code that follows in lines 39-77 consists of preparing the data for the
model to consume; this code will be in large part specific to the model your are using and your
application.

.. code-block:: python
    :linenos:

    def __iter__(self):
        return self

    def __next__(self):
        # Step - trigger a randomization and a render
        self.rep.orchestrator.step()

        # Collect Groundtruth
        gt = {
            "rgb": self.rgb.get_data(device="cuda"),
            "boundingBox2DTight": self.bbox_2d_tight.get_data(device="cpu"),
            "instanceSegmentation": self.instance_seg.get_data(device="cuda"),
        }

        # RGB
        # Drop alpha channel
        image = self.wp.to_torch(gt["rgb"])[..., :3]

        # Normalize between 0. and 1. and change order to channel-first.
        image = image.float() / 255.0
        image = image.permute(2, 0, 1)

        # Bounding Box
        gt_bbox = gt["boundingBox2DTight"]["data"]

        # Create mapping from categories to index
        bboxes = torch.tensor(gt_bbox[["x_min", "y_min", "x_max", "y_max"]].tolist(), device="cuda")
        id_to_labels = gt["boundingBox2DTight"]["info"]["idToLabels"]
        prim_paths = gt["boundingBox2DTight"]["info"]["primPaths"]

        # For each bounding box, map semantic label to label index
        cat_to_id = {cat: i + 1 for i, cat in enumerate(self.categories)}
        semantic_labels_mapping = {int(k): v.get("class", "") for k, v in id_to_labels.items()}
        semantic_labels = [cat_to_id[semantic_labels_mapping[i]] for i in gt_bbox["semanticId"]]
        labels = torch.tensor(semantic_labels, device="cuda")

        # Calculate bounding box area for each area
        areas = (bboxes[:, 2] - bboxes[:, 0]) * (bboxes[:, 3] - bboxes[:, 1])
        # Identify invalid bounding boxes to filter final output
        valid_areas = (areas > 0.0) * (areas < (image.shape[1] * image.shape[2]))

        # Instance Segmentation
        instance_data = self.wp.to_torch(gt["instanceSegmentation"]["data"]).squeeze()
        path_to_instance_id = {v: int(k) for k, v in gt["instanceSegmentation"]["info"]["idToLabels"].items()}

        instance_list = [im[0] for im in gt_bbox]
        masks = torch.zeros((len(instance_list), *instance_data.shape), dtype=bool, device="cuda")

        # Filter for the mask of each object
        for i, prim_path in enumerate(prim_paths):
            # Merge child instances of prim_path as one instance
            for instance in path_to_instance_id:
                if prim_path in instance:
                    masks[i] += torch.isin(instance_data, path_to_instance_id[instance])

        target = {
            "boxes": bboxes[valid_areas],
            "labels": labels[valid_areas],
            "masks": masks[valid_areas],
            "image_id": torch.LongTensor([self.cur_idx]),
            "area": areas[valid_areas],
            "iscrowd": torch.BoolTensor([False] * len(bboxes[valid_areas])),  # Assume no crowds
        }

        self.cur_idx += 1
        return image, target

Details about rest of the dataloader, indcluding the initialization step
and methods specified within ``__next__``, are explained in the below sections.

Initialization Step
-------------------

First, launch kit using the ``SimulationApp`` and the rendering configurations.
Once the app starts, the default Isaac extensions are hot-loaded so you can ``import`` from them.
You then set up replicator and your nucleus server,
which are used in this example to manage the domain randomization assets. Domain
randomization is entirely handled through replicator in this example.

.. code-block:: python

    from omni.isaac.kit import SimulationApp

    # Setup default variables
    RESOLUTION = (1024, 1024)
    OBJ_LOC_MIN = (-50, 5, -50)
    OBJ_LOC_MAX = (50, 5, 50)
    CAM_LOC_MIN = (100, 0, -100)
    CAM_LOC_MAX = (100, 100, 100)
    SCALE_MIN = 15
    SCALE_MAX = 40

    # Default rendering parameters
    RENDER_CONFIG = {"renderer": "PathTracing", "samples_per_pixel_per_frame": 12, "headless": False}


    class RandomObjects(torch.utils.data.IterableDataset):
        def __init__(
            self, root, categories, max_asset_size=None, num_assets_min=3, num_assets_max=5, split=0.7, train=True
        ):
            assert len(categories) > 1
            assert (split > 0) and (split <= 1.0)

            self.kit = SimulationApp(RENDER_CONFIG)
            from omni.isaac.shapenet import utils
            import omni.replicator.core as rep
            import warp as wp

            self.rep = rep
            self.wp = wp

            from omni.isaac.core.utils.nucleus import get_assets_root_path

            self.assets_root_path = get_assets_root_path()
            if self.assets_root_path is None:
                carb.log_error("Could not find Isaac Sim assets folder")
                return
            .
            .
            .

The ``self._find_usd_assets()`` method will search the ``root`` directory for USD files within the
category directories you've specified and return their paths. When you want to add a new asset to
your scene, you will simply pick a path at random and attach it as a reference to a new prim in the
scene. Use ``split`` to select a subset of training samples so that you can keep a hold-out set
for validation. Finally, ``self.setup_scene()`` creates a room, lights, and a camera.

.. code-block:: python

    class RandomObjects(torch.utils.data.IterableDataset):
        def __init__(
            self, root, categories, max_asset_size=None, num_assets_min=3, num_assets_max=5, split=0.7, train=True
        ):
            .
            .
            .
            # If ShapeNet categories are specified with their names, convert to synset ID
            # Remove this if using with a different dataset than ShapeNet
            category_ids = [utils.LABEL_TO_SYNSET.get(c, c) for c in categories]
            self.categories = category_ids
            self.range_num_assets = (num_assets_min, max(num_assets_min, num_assets_max))
            try:
                self.references = self._find_usd_assets(root, category_ids, max_asset_size, split, train)
            except ValueError as err:
                carb.log_error(str(err))
                self.kit.close()
                sys.exit()

            # Setup the scene, lights, walls, camera, etc.
            self.setup_scene()

            # Setup replicator randomizer graph
            self.setup_replicator()

            self.cur_idx = 0
            self.exiting = False

            signal.signal(signal.SIGINT, self._handle_exit)

        def _find_usd_assets(self, root, categories, max_asset_size, split, train=True):
            ... # (see code for implementation details)

        def setup_scene(self):
            ... # (see code for implementation details)

Setting up a Replicator graph
------------------------------

Now, we want to setup our randomizers to vary the content and appearance of every frame.
We do this by leveraging Omni.Replicator. Replicator enables us to creates a randomization
graph which will execute our specified randomizations.
We'll start by setting our static components, in this case two sphere lights.

Next, we set a replicator ``on_frame`` trigger, which will let us trigger randomization at
each new frame. We then create the randomization components. The first will modify the
``color`` attribute of our two lights.
Next, we randomize the camera position and set its ``look_at`` value to the origin so that
the camera always orients itself towards that point.
Finally, we setup our asset randomizers for each asset category and randomize their
position, rotation, scale, and material texture. Using the ``instantiate`` method, we 
create a prototype of the asset in cache where new instances will reference the created 
prototype.

.. code-block:: python

    def _instantiate_category(self, category, references):
        with self.rep.randomizer.instantiate(references, size=1, mode="reference"):
            self.rep.modify.semantics([("class", category)])
            self.rep.modify.pose(
                position=self.rep.distribution.uniform(OBJ_LOC_MIN, OBJ_LOC_MAX),
                rotation=self.rep.distribution.uniform((0, -180, 0), (0, 180, 0)),
                scale=self.rep.distribution.uniform(SCALE_MIN, SCALE_MAX),
            )
            self.rep.randomizer.texture(self._get_textures(), project_uvw=True)

    def setup_replicator(self):
        """Setup the replicator graph with various attributes."""

        # Create two sphere lights
        light1 = self.rep.create.light(light_type="sphere", position=(-450, 350, 350), scale=100, intensity=30000.0)
        light2 = self.rep.create.light(light_type="sphere", position=(450, 350, 350), scale=100, intensity=30000.0)

        with self.rep.new_layer():
            with self.rep.trigger.on_frame():
                # Randomize light colors
                with self.rep.create.group([light1, light2]):
                    self.rep.modify.attribute("color", self.rep.distribution.uniform((0.1, 0.1, 0.1), (1.0, 1.0, 1.0)))

                # Randomize camera position
                with self.camera:
                    self.rep.modify.pose(
                        position=self.rep.distribution.uniform((100, 0, -100), (100, 100, 100)), look_at=(0, 0, 0)
                    )

                # Randomize asset positions and textures
                for category, references in self.references.items():
                    self._instantiate_category(category, references)

        # Run replicator for a single iteration without triggering any writes
        self.rep.orchestrator.preview()                    

Train
=====

Getting Started
^^^^^^^^^^^^^^^^^

Now that you have a dataloader, you can start training. To run the training example, use the
following command.

.. code-block:: bash

    ./python.sh standalone_examples/replicator/online_generation/train_shapenet.py \
    --root $SHAPENET_LOCAL_DIR'_nomat' \
    --categories plane watercraft rocket \
    --visualize \
    --max_asset_size 50

You should see the loss going down in your terminal and, after approximately 100 iterations, start
to see instance segmentation and object detection results being visualized. The ``max_asset_size 5``
argument tells the dataset to skip assets over 5 MB in size. This helps avoid out-of-memory errors
caused by loading larger assets. This value can be increased depending on the capacity of the GPU in
use. The specific optimizer used in this  example maintains a gradient history that grows with
iteration number. If you lack VRAM on your hardware, you can adjust the ``--max_iters`` command
line argument to address this.

Open the ``_out_train_imgs/train_*.png`` locally saved image file to see something like below during training.

.. figure:: /content/images/isaac_synth-data_train.gif
    :align: center
    :alt: Instance Segmentation Training


The Code
^^^^^^^^

First, set up the device, dataset, dataloader, model, and optimizer.

.. code-block:: python

    device = "cuda"

    # Setup data
    train_set = RandomObjects(
        args.root, args.categories, num_assets_min=3, num_assets_max=5, max_asset_size=args.max_asset_size
    )
    train_loader = DataLoader(train_set, batch_size=2, collate_fn=lambda x: tuple(zip(*x)))

    # Setup Model
    model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=False, num_classes=1 + len(args.categories))
    model = model.to(device)
    optimizer = torch.optim.Adam(model.parameters(), lr=args.learning_rate)

Next, set up the training loop. After sending the data to the GPU, perform a forward pass through
the model,  calculate the loss, and perform a backward pass to update the model weights.

.. code-block:: python

    for i, train_batch in enumerate(train_loader):
        if i > args.max_iters:
            break

        model.train()
        images, targets = train_batch
        images = [i.to(device) for i in images]
        targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
        loss_dict = model(images, targets)
        loss = sum(loss for loss in loss_dict.values())

        print(f"ITER {i} | {loss:.6f}")

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()