1. Omniverse Isaac Gym

The Omniverse Isaac Gym extension provides an interface for performing reinforcement learning training and inferencing in Isaac Sim. This framework simplifies the process of connecting reinforcement learning libraries and algorithms with other components in Isaac Sim. Similar to existing frameworks and environment wrapper classes that inherit from gym.Env, the Omniverse Isaac Gym extension also provides an interface inheriting from gym.Env and implements a simple set of APIs required by most common RL libraries. This interface can be used as a bridge connecting RL libraries with physics simulation and tasks running in the Isaac Sim framework.

1.1. Learning Objectives

In this tutorial, we will do an introduction of Omniverse Isaac Gym and the interfaces provided in the extension. We will

  1. Introduce reinforcement learning ecosystem in Isaac Sim

  2. Introduce different environment wrapper interfaces in Omniverse Isaac Gym

5 Minute Tutorial

1.2. Getting Started

This is an introductory tutorial that covers the basics of reinforcement learning interfaces provided in Isaac Sim.

1.3. Reinforcement Learning in Isaac Sim

We can view the RL ecosystem as three main pieces: the Task, the RL policy, and the Environment wrapper that provides an interface for communication between the task and the RL policy. We aim to provide the latter with Omniverse Isaac Gym.

1.3.1. Task

The Task is where main task logic is implemented, such as computing observations and rewards. This is where we can collect states of actors in the scene and apply controls or actions to our actors. Omniverse Isaac Gym allows for tasks to be defined following the BaseTask definition in omni.isaac.core. This provides flexibility for users to re-use task implementations for both RL and non-RL use cases.

1.3.2. Environment Wrappers

The main purpose of the Omniverse Isaac Gym extension is to provide Environment Wrapper interfaces that allow for RL policies to communicate with simulation in Isaac Sim. As a base interface, we are providing a class named VecEnvBase, a vectorized interface inheriting from gym.Env that implements common RL APIs. This class can also be easily extended towards RL libraries that require additional APIs by creating a new derived class.

Commonly used APIs provided by the base wrapper class VecEnvBase include:

  • render(self, mode: str = “human”): renders the current frame

  • close(self): closes the simulator

  • seed(self, seed: int = -1): sets a seed. Use -1 for a random seed.

  • step(self, actions: Union[np.ndarray, torch.Tensor]): triggers task pre_physics_step with actions, steps simulation and renderer, computes observations, rewards, dones, and returns state buffers

  • reset(self): triggers task reset(), steps simulation, and re-computes observations

1.3.2.1. Multi-Threaded Environment Wrapper

VecEnvBase is a simple interface that’s designed to provide commonly used gym.Env APIs required by RL libraries. Users can create an instance of this class, attach your task to the interface, and provide your wrapper instance to the RL policy. Since the RL algorithm maintains the main loop of execution, interaction with the UI and environments in the scene can be limited and may interfere with the training loop.

We also provide another environment wrapper class called VecEnvMT, which is designed to isolate the RL policy in a new thread, separate from the main simulation and rendering thread. This class provides the same set of APIs as VecEnvBase, but also includes threaded queues for sending and receiving actions and states between the RL policy and the task. In order to use this wrapper interface, users have to implement a TrainerMT class, which should implement a run() method that initiates the RL loop on a new thread. The setup for using VecEnvMT is more involved compared to the single-threaded VecEnvBase interface, but will allow users to have more control over starting and stopping the training loop through interaction with the UI.

Note that VecEnvMT has a timeout variable, which defaults to 30 seconds. If either the RL thread waiting for physics state exceeds the timeout amount or the simulation thread waiting for RL actions exceeds the timeout amount, the threaded queues will throw an exception and terminate training. For larger scenes that require longer simulation or training time, try increasing the timeout variable in VecEnvMT to prevent unnecessary timeouts. This can be done by passing in a timeout argument when calling VecEnvMT.initialize().

1.4. Summary

This tutorial covered the following topics:

  1. Introduction to RL in Isaac Sim

  2. Introduction to environment wrapper interfaces in Omniverse Isaac Gym

1.4.1. Next Steps

Continue on to the next tutorial in our Reinforcement Learning Tutorials series, Getting Started with Cloner, to learn about using the Cloner interface for scene generation for reinforcement learning.