RL [omni.isaac.gym]

Base Environment Wrapper

class VecEnvBase(headless: bool, sim_device: int = 0, enable_livestream: bool = False, enable_viewport: bool = False, launch_simulation_app: bool = True, experience: Optional[str] = None)

This class provides a base interface for connecting RL policies with task implementations. APIs provided in this interface follow the interface in gym.Env. This class also provides utilities for initializing simulation apps, creating the World, and registering a task.

action_space: spaces.Space[ActType]
close() None

Closes simulation.

create_viewport_render_product(resolution=(1280, 720))

Create a render product of the viewport for rendering.

metadata: dict[str, Any] = {'render_modes': []}
property np_random: numpy.random._generator.Generator

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns

Instances of np.random.Generator

property num_envs

Retrieves number of environments.

Returns

Number of environments.

Return type

num_envs(int)

observation_space: spaces.Space[ObsType]
render(mode='human') None

Run rendering without stepping through the physics.

By convention, if mode is:
  • human: render to the current display and return nothing. Usually for human consumption.

  • rgb_array: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.

Parameters

mode (str, optional) – The mode to render with. Defaults to “human”.

property render_enabled

Whether rendering is enabled.

Returns

is render enabled.

Return type

render(bool)

render_mode: str | None = None
reset(seed=None, options=None)

Resets the task and updates observations.

Parameters
  • seed (Optional[int]) – Seed.

  • options (Optional[dict]) – Options as used in gymnasium.

Returns

Buffer of observation data. info(dict): Dictionary of extras data.

Return type

observations(Union[numpy.ndarray, torch.Tensor])

reward_range = (-inf, inf)
seed(seed=- 1)

Sets a seed. Pass in -1 for a random seed.

Parameters

seed (int) – Seed to set. Defaults to -1.

Returns

Seed that was set.

Return type

seed (int)

set_task(task, backend='numpy', sim_params=None, init_sim=True, rendering_dt=0.016666666666666666) None
Creates a World object and adds Task to World.

Initializes and registers task to the environment interface. Triggers task start-up.

Parameters
  • task (RLTask) – The task to register to the env.

  • backend (str) – Backend to use for task. Can be “numpy” or “torch”. Defaults to “numpy”.

  • sim_params (dict) – Simulation parameters for physics settings. Defaults to None.

  • init_sim (Optional[bool]) – Automatically starts simulation. Defaults to True.

  • rendering_dt (Optional[float]) – dt for rendering. Defaults to 1/60s.

signal_handler(sig, frame)
property simulation_app

Retrieves the SimulationApp object.

Returns

SimulationApp.

Return type

simulation_app(SimulationApp)

spec: EnvSpec | None = None
step(actions)
Basic implementation for stepping simulation.

Can be overriden by inherited Env classes to satisfy requirements of specific RL libraries. This method passes actions to task for processing, steps simulation, and computes observations, rewards, and resets.

Parameters

actions (Union[numpy.ndarray, torch.Tensor]) – Actions buffer from policy.

Returns

Buffer of observation data. rewards(Union[numpy.ndarray, torch.Tensor]): Buffer of rewards data. dones(Union[numpy.ndarray, torch.Tensor]): Buffer of resets/dones data. info(dict): Dictionary of extras data.

Return type

observations(Union[numpy.ndarray, torch.Tensor])

property task

Retrieves the task.

Returns

Task.

Return type

task(BaseTask)

property unwrapped: gymnasium.core.Env[gymnasium.core.ObsType, gymnasium.core.ActType]

Returns the base non-wrapped environment.

Returns

The base non-wrapped gymnasium.Env instance

Return type

Env

update_task_params()
property world

Retrieves the World object for simulation.

Returns

Simulation World.

Return type

world(World)

Multi-Threaded Environment Wrapper

exception TaskStopException

Exception class for signalling task termination.

args
with_traceback()

Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.

class TrainerMT

A base abstract trainer class for controlling starting and stopping of RL policy.

abstract run()

Runs RL loop in a new thread

abstract stop()

Stop RL thread

class VecEnvMT(headless: bool, sim_device: int = 0, enable_livestream: bool = False, enable_viewport: bool = False, launch_simulation_app: bool = True, experience: Optional[str] = None)

This class provides a base interface for connecting RL policies with task implementations in a multi-threaded fashion. RL policies using this class will run on a different thread than the thread simulation runs on. This can be useful for interacting with the UI before, during, and after running RL policies. Data sharing between threads happen through message passing on multi-threaded queues.

action_space: spaces.Space[ActType]
clear_queues()

Clears all queues.

close() None

Closes simulation.

create_viewport_render_product(resolution=(1280, 720))

Create a render product of the viewport for rendering.

get_actions(block=True)

Retrieves actions from policy by waiting for actions to be sent to the queue from the RL thread.

Parameters

block (Optional[bool]) – Whether to block thread when waiting for data.

Returns

actions buffer retrieved from queue.

Return type

actions (Union[np.ndarray, torch.Tensor, None])

get_data(block=True)

Retrieves data from task by waiting for data dictionary to be sent to the queue from the simulation thread.

Parameters

block (Optional[bool]) – Whether to block thread when waiting for data.

Returns

data dictionary retrieved from queue.

Return type

actions (Union[np.ndarray, torch.Tensor, None])

initialize(action_queue, data_queue, timeout=30)

Initializes queues for sharing data across threads.

Parameters
  • action_queue (queue.Queue) – Queue for passing actions from policy to task.

  • data_queue (queue.Queue) – Queue for passing data from task to policy.

  • timeout (Optional[int]) – Seconds to wait for data when queue is empty. An exception will be thrown when the timeout limit is reached. Defaults to 30 seconds.

metadata: dict[str, Any] = {'render_modes': []}
property np_random: numpy.random._generator.Generator

Returns the environment’s internal _np_random that if not set will initialise with a random seed.

Returns

Instances of np.random.Generator

property num_envs

Retrieves number of environments.

Returns

Number of environments.

Return type

num_envs(int)

observation_space: spaces.Space[ObsType]
render(mode='human') None

Run rendering without stepping through the physics.

By convention, if mode is:
  • human: render to the current display and return nothing. Usually for human consumption.

  • rgb_array: Return an numpy.ndarray with shape (x, y, 3), representing RGB values for an x-by-y pixel image, suitable for turning into a video.

Parameters

mode (str, optional) – The mode to render with. Defaults to “human”.

property render_enabled

Whether rendering is enabled.

Returns

is render enabled.

Return type

render(bool)

render_mode: str | None = None
reset(seed=None, options=None)

Resets the task and updates observations.

Parameters
  • seed (Optional[int]) – Seed.

  • options (Optional[dict]) – Options as used in gymnasium.

Returns

Buffer of observation data. info(dict): Dictionary of extras data.

Return type

observations(Union[numpy.ndarray, torch.Tensor])

reward_range = (-inf, inf)
async run(trainer)

Main loop for controlling simulation and task stepping. This method is responsible for stepping task and simulation, collecting buffers from task, sending data to policy, and retrieving actions from policy. It also deals with the case when the policy terminates on completion and continues the simulation thread so that UI does not get affected.

Parameters

trainer (TrainerMT) – A Trainer object that implements APIs for starting and stopping RL thread.

seed(seed=- 1)

Sets a seed. Pass in -1 for a random seed.

Parameters

seed (int) – Seed to set. Defaults to -1.

Returns

Seed that was set.

Return type

seed (int)

send_actions(actions, block=True)

Sends actions from RL thread to simulation thread by adding actions to queue.

Parameters
  • actions (Union[np.ndarray, torch.Tensor]) – actions buffer to be added to queue.

  • block (Optional[bool]) – Whether to block thread when writing to queue.

send_data(data, block=True)

Sends data from task thread to RL thread by adding data to queue.

Parameters
  • data (dict) – Dictionary containing task data.

  • block (Optional[bool]) – Whether to block thread when writing to queue.

set_render_mode(render_mode)
set_task(task, backend='numpy', sim_params=None, init_sim=True, rendering_dt=0.016666666666666666) None
Creates a World object and adds Task to World.

Initializes and registers task to the environment interface. Triggers task start-up.

Parameters
  • task (RLTask) – The task to register to the env.

  • backend (str) – Backend to use for task. Can be “numpy” or “torch”. Defaults to “numpy”.

  • sim_params (dict) – Simulation parameters for physics settings. Defaults to None.

  • init_sim (Optional[bool]) – Automatically starts simulation. Defaults to True.

  • rendering_dt (Optional[float]) – dt for rendering. Defaults to 1/60s.

signal_handler(sig, frame)
property simulation_app

Retrieves the SimulationApp object.

Returns

SimulationApp.

Return type

simulation_app(SimulationApp)

spec: EnvSpec | None = None
step(actions)
Basic implementation for stepping simulation.

Can be overriden by inherited Env classes to satisfy requirements of specific RL libraries. This method passes actions to task for processing, steps simulation, and computes observations, rewards, and resets.

Parameters

actions (Union[numpy.ndarray, torch.Tensor]) – Actions buffer from policy.

Returns

Buffer of observation data. rewards(Union[numpy.ndarray, torch.Tensor]): Buffer of rewards data. dones(Union[numpy.ndarray, torch.Tensor]): Buffer of resets/dones data. info(dict): Dictionary of extras data.

Return type

observations(Union[numpy.ndarray, torch.Tensor])

property task

Retrieves the task.

Returns

Task.

Return type

task(BaseTask)

property unwrapped: gymnasium.core.Env[gymnasium.core.ObsType, gymnasium.core.ActType]

Returns the base non-wrapped environment.

Returns

The base non-wrapped gymnasium.Env instance

Return type

Env

update_task_params()
property world

Retrieves the World object for simulation.

Returns

Simulation World.

Return type

world(World)