2. Domain Randomization for RL

We sometimes need our reinforcement learning agents to be robust to different physics than they are trained with, such as when attempting a sim2real policy transfer. Using domain randomization (DR), we repeatedly randomize the simulation dynamics during training in order to learn a good policy under a wide range of physical parameters.

Isaac Sim provides a framework, omni.replicator.isaac, which is built on top of the Replicator framework. This extension supports “on the fly” domain randomization of physics attributes, allowing dynamics to be changed without requiring reloading of assets. This allows us to efficiently apply domain randomizations without common overheads like re-parsing asset files.

2.1. Learning Objectives

In this tutorial, we will focus on domain randomization of physics related properties that are specific to the RL use case. We will

  1. Explain different parameters available for DR

  2. Walk through an example of applying DR during simulation

10-15 Minute Tutorial

2.2. Getting Started

  • Refer to Default Python Environment to learn about Isaac Sim’s python environment and locate the python executable in Isaac Sim.

2.3. Domain Randomization Parameters

We will first explain what can be randomized in the scene and the sampling distributions. There are five main parameter groups that support randomization. They are:

  • observations: Add noise directly to the agent observations

  • actions: Add noise directly to the agent actions

  • simulation: Add noise to physical parameters defined for the entire scene, such as gravity

  • rigid_prim_views: Add noise to properties belonging to rigid prims, such as material_properties.

  • articulation_views: Add noise to properties belonging to articulations, such as stiffness of joints.

For each parameter you wish to randomize, you can specify two ways that determine when the randomization is applied:

  • on_reset: Adds correlated noise to a parameter of an environment when that environment gets reset. This correlated noise will remain with an environment until that environemnt gets reset again, which will then set a new correlated noise. To trigger on_reset, the indices for the environemnts that need to be reset must be passed in to omni.replicator.isaac.physics_view.step_randomization(reset_inds).

  • on_interval: Adds uncorrelated noise to a parameter at a frequency specified by frequency_interval. If a parameter also has on_reset randomization, the on_interval noise is combined with the noise applied at on_reset.

  • on_startup: Applies randomization once prior to the start of the simulation. Only available to rigid prim scale, mass, density and articulation scale parameters.

For on_reset, on_interval, and on_startup, you can specify the following settings:

  • distribution: The distribution to generate a sample x from. The available distributions are listed below. Note that parameters a and b are defined by the distribution_parameters setting.
    • uniform: x ~ unif(a, b)

    • loguniform: x ~ exp(unif(log(a), log(b)))

    • gaussian: x ~ normal(a, b)

  • distribution_parameters: The parameters to the distribution.
    • For observations and actions, this setting is specified as a tuple [a, b] of real values.

    • For simulation and view parameters, this setting is specified as a nested tuple in the form of [[a_1, a_2, …, a_n], [[b_1, b_2, …, b_n]], where the n is the dimension of the parameter (i.e. n is 3 for position). It can also be specified as a tuple in the form of [a, b], which will be broadcasted to the correct dimensions.

    • For uniform and loguniform distributions, a and b are the lower and upper bounds.

    • For gaussian, a is the distribution mean and b is the variance.

  • operation: Defines how the generated sample x will be applied to the original simulation parameter. The options are additive, scaling, direct.
    • additive:, add the sample to the original value.

    • scaling: multiply the original value by the sample.

    • direct: directly sets the sample as the parameter value.

  • frequency_interval: Specifies the number of steps to apply randomization.
    • Only used with on_interval.

    • Steps of each environemnt are incremented with each omni.replicator.isaac.physics_view.step_randomization(reset_inds) call and reset if the environment index is in reset_inds.

  • num_buckets: Only used for material_properties randomization
    • Physx only allows 64000 unique physics materials in the scene at once. If more than 64000 materials are needed, increase num_buckets to allow materials to be shared between prims.

The exact parameters that can be randomized are listed below:

simulation:

  • gravity (dim=3): The gravity vector of the entire scene.

rigid_prim_views:

  • position (dim=3): The position of the rigid prim. In meters.

  • orientation (dim=3): The orientation of the rigid prim, specified with euler angles. In radians.

  • linear_velocity (dim=3): The linear velocity of the rigid prim. In m/s. CPU pipeline only

  • angular_velocity (dim=3): The angular velocity of the rigid prim. In rad/s. CPU pipeline only

  • velocity (dim=6): The linear + angular velocity of the rigid prim.

  • force (dim=3): Apply a force to the rigid prim. In N.

  • mass (dim=1): Mass of the rigid prim. In kg. CPU pipeline only during runtime.

  • inertia (dim=3): The diagonal values of the inertia matrix. CPU pipeline only

  • material_properties (dim=3): Static friction, Dynamic friction, and Restitution.

  • contact_offset (dim=1): A small distance from the surface of the collision geometry at which contacts start being generated.

  • rest_offset (dim=1): A small distance from the surface of the collision geometry at which the effective contact with the shape takes place.

  • scale (dim=1): The scale of the rigid prim. on_startup only.

  • density (dim=1): Density of the rigid prim. on_startup only.

articulation_views:

  • position (dim=3): The position of the articulation root. In meters.

  • orientation (dim=3): The orientation of the articulation root, specified with euler angles. In radians.

  • linear_velocity (dim=3): The linear velocity of the articulation root. In m/s. CPU pipeline only

  • angular_velocity (dim=3): The angular velocity of the articulation root. In rad/s. CPU pipeline only

  • velocity (dim=6): The linear + angular velocity of the articulation root.

  • stiffness (dim=num_dof): The stiffness of the joints.

  • damping (dim=num_dof): The damping of the joints

  • joint_friction (dim=num_dof): The friction coefficient of the joints.

  • joint_positions (dim=num_dof): The joint positions. In radians or meters.

  • joint_velocities (dim=num_dof): The joint velocities. In rad/s or m/s.

  • lower_dof_limits (dim=num_dof): The lower limit of the joints. In radians or meters.

  • upper_dof_limits (dim=num_dof): The upper limit of the joints. In radians or meters.

  • max_efforts (dim=num_dof): The maximum force or torque that the joints can exert. In N or Nm.

  • joint_armatures (dim=num_dof): A value added to the diagonal of the joint-space inertia matrix. Physically, it corresponds to the rotating part of a motor

  • joint_max_velocities (dim=num_dof): The maximum velocity allowed on the joints. In rad/s or m/s.

  • joint_efforts (dim=num_dof): Applies a force or a torque on the joints. In N or Nm.

  • body_masses (dim=num_bodies): The mass of each body in the articulation. In kg. CPU pipeline only

  • body_inertias (dim=num_bodies×3): The diagonal values of the inertia matrix of each body. CPU pipeline only

  • material_properties (dim=num_bodies×3): The static friction, dynamic friction, and restitution of each body in the articulation, specified in the following order: [body_1_static_friciton, body_1_dynamic_friciton, body_1_restitution, body_1_static_friciton, body_2_dynamic_friciton, body_2_restitution, … ]

  • contact_offset (dim=1): A small distance from the surface of the collision geometry at which contacts start being generated.

  • rest_offset (dim=1): A small distance from the surface of the collision geometry at which the effective contact with the shape takes place.

  • tendon_stiffnesses (dim=num_tendons): The stiffness of the fixed tendons in the articulation.

  • tendon_dampings (dim=num_tendons): The damping of the fixed tendons in the articulation.

  • tendon_limit_stiffnesses (dim=num_tendons): The limit stiffness of the fixed tendons in the articulation.

  • tendon_lower_limits (dim=num_tendons): The lower limits of the fixed tendons in the articulation.

  • tendon_upper_limits (dim=num_tendons): The upper limits of the fixed tendons in the articulation.

  • tendon_rest_lengths (dim=num_tendons): The rest lengths of the fixed tendons in the articulation.

  • tendon_offsets (dim=num_tendons): The offsets of the fixed tendons in the articulation.

  • scale (dim=1): The scale of the articulation. on_startup only.

2.4. Domain Randomization Example

An example of applying domain randomization is provided at standalone_examples/api/omni.replicator.isaac/randomization_demo.py. To execute the example, please use the python executable provided by Isaac Sim, and run:

python.sh standalone_examples/api/omni.replicator.isaac/randomization_demo.py

Let’s walk through the key randomization components of this script.

First, we require import of both the omni.replicator.isaac extension, as well as omni.replicator.core extension. omni.replicator.isaac will provide us with the APIs to randomize physics specific attributes that are commonly required for performing domain randomization in reinforcement learning, while omni.replicator.core provides us with the core randomization APIs, such as various forms of distributions.

import omni.replicator.isaac as dr
import omni.replicator.core as rep

Next, we need to register our simulation world and the object views in the scene for which we want to apply randomization to. In the case of this example, our objects of interest are our spheres and our Franka robots.

dr.physics_view.register_simulation_context(world)
dr.physics_view.register_rigid_prim_view(object_view)
dr.physics_view.register_articulation_view(franka_view)

We can then set up the parameters that we’d like to randomize, along with when to randomize, and how to sample the randomization values. First, we define an event trigger dr.trigger.on_rl_frame(num_envs=num_envs), which allows us to increment the internal randomization counter by calling dr.physics_view.step_randomization(reset_inds). Notice how we can pass in a set of environment indices when stepping the randomization counter. This allows us to specify which environments to apply randomizations for.

with dr.trigger.on_rl_frame(num_envs=num_envs):
        ...

while simulation_app.is_running():
        ...
        dr.physics_view.step_randomization(reset_inds)
        world.step(render=True)

With this setup, we can control when to increment the randomization counter, and indicate the environments for applying randomization. To control the randomization frequency, we can specify two types of gates. First is dr.gate.on_interval(interval=20). This randomization will be triggered when the internal randomization counter hits the interval value specified, which in this case is 20. The second type is dr.gate.on_env_reset(). This randomization will be triggered for the environment indices that we passed in when stepping randomization. We can define multiple gates for different properties to randomize, including properties for simulation_context, rigid_prim_view, and articulation_view. For each randomization, we can also specify the view for which the randomization is applied to, the property, which is indicated by the argument names, along with the desired operation and distribution.

with dr.trigger.on_rl_frame(num_envs=num_envs):
    with dr.gate.on_interval(interval=20):
        dr.physics_view.randomize_simulation_context(
            operation="scaling", gravity=rep.distribution.uniform((1, 1, 0.0), (1, 1, 2.0))
        )
    with dr.gate.on_interval(interval=50):
        dr.physics_view.randomize_rigid_prim_view(
            view_name=object_view.name, operation="direct", force=rep.distribution.uniform((0, 0, 2.5), (0, 0, 5.0))
        )
    with dr.gate.on_interval(interval=10):
        dr.physics_view.randomize_articulation_view(
            view_name=franka_view.name,
            operation="direct",
            joint_velocities=rep.distribution.uniform(tuple([-2] * num_dof), tuple([2] * num_dof)),
        )
    with dr.gate.on_env_reset():
        dr.physics_view.randomize_rigid_prim_view(
            view_name=object_view.name,
            operation="additive",
            position=rep.distribution.normal((0.0, 0.0, 0.0), (0.2, 0.2, 0.0)),
            velocity=[0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
        )
        dr.physics_view.randomize_articulation_view(
            view_name=franka_view.name,
            operation="additive",
            joint_positions=rep.distribution.uniform(tuple([-0.5] * num_dof), tuple([0.5] * num_dof)),
            position=rep.distribution.normal((0.0, 0.0, 0.0), (0.2, 0.2, 0.0)),
        )

Finally, to initiate the randomization graph, we make a call to

rep.orchestrator.run()

2.5. Summary

This tutorial covered the following topics:

  1. Understanding domain randomization parameters for RL

  2. Running an example of applying domain randomization

2.5.1. Next Steps

Continue on to the next tutorial in our Reinforcement Learning Tutorials series, Transferring Policies from Isaac Gym Preview Releases, to read about tips on transferring policies from standalone Isaac Gym to Isaac Sim.

2.5.2. Further Learning

  • For more details on the domain randomization framework in OmniIsaacGymEnvs, please refer to the Domain Randomization page in OmniIsaacGymEnvs.