Camera Sensors and Render Extensions#

Camera Sensors can be created through adding a camera prim in the usd file. Data can be acquired then from the camera prim through the use of render products.

Render products can be created through multiple extensions in Omniverse such as viewport and replicator.

Note

The Omniverse Isaac Sim cameras are developed from the Omniverse cameras.

GUI Example#

Create a cube by selecting Create > Shape > Cube and change its location and scale through the property panel as indicated in the screenshot below.
Create a camera prim by selecting Create > Camera and then select it from the stage window to view its field of view as indicated below.
In order to render the frames from the camera we can switch the default viewport (which is a render product by itself) to the camera prim that we just created. Select the video icon at the top of the viewport window and then select the camera prim we just created under the cameras menu.

Python Example#

Data can be retrieved from the render product attached to the camera prim using multiple extensions in Isaac Sim. One of the methods provided is the Camera class under omni.isaac.sensor extension. Run the script below as a standalone application which is also provided at standalone_examples/api/omni.isaac.sensor/camera.py.

from isaacsim import SimulationApp

simulation_app = SimulationApp({"headless": False})

from omni.isaac.core.objects import DynamicCuboid
from omni.isaac.sensor import Camera
from omni.isaac.core import World
import omni.isaac.core.utils.numpy.rotations as rot_utils
import numpy as np
import matplotlib.pyplot as plt


my_world = World(stage_units_in_meters=1.0)

cube_2 = my_world.scene.add(
    DynamicCuboid(
        prim_path="/new_cube_2",
        name="cube_1",
        position=np.array([5.0, 3, 1.0]),
        scale=np.array([0.6, 0.5, 0.2]),
        size=1.0,
        color=np.array([255, 0, 0]),
    )
)

cube_3 = my_world.scene.add(
    DynamicCuboid(
        prim_path="/new_cube_3",
        name="cube_2",
        position=np.array([-5, 1, 3.0]),
        scale=np.array([0.1, 0.1, 0.1]),
        size=1.0,
        color=np.array([0, 0, 255]),
        linear_velocity=np.array([0, 0, 0.4]),
    )
)

camera = Camera(
    prim_path="/World/camera",
    position=np.array([0.0, 0.0, 25.0]),
    frequency=20,
    resolution=(256, 256),
    orientation=rot_utils.euler_angles_to_quats(np.array([0, 90, 0]), degrees=True),
)

my_world.scene.add_default_ground_plane()
my_world.reset()
camera.initialize()

i = 0
camera.add_motion_vectors_to_frame()

while simulation_app.is_running():
    my_world.step(render=True)
    print(camera.get_current_frame())
    if i == 100:
        points_2d = camera.get_image_coords_from_world_points(
            np.array([cube_3.get_world_pose()[0], cube_2.get_world_pose()[0]])
        )
        points_3d = camera.get_world_points_from_image_coords(points_2d, np.array([24.94, 24.9]))
        print(points_2d)
        print(points_3d)
        imgplot = plt.imshow(camera.get_rgba()[:, :, :3])
        plt.show()
        print(camera.get_current_frame()["motion_vectors"])
    if my_world.is_playing():
        if my_world.current_time_step_index == 0:
            my_world.reset()
    i += 1


simulation_app.close()

../../_images/isaac_sim_camera_sensor_4.png

Calibrated Camera Sensors#

Isaac Sim simulation approximates real camera sensors by ray tracing the scene and rendering the result. To achieve that the size of the camera sensor and optical path needs to be defined.

Calibration toolkits, like OpenCV or ROS normally provide the calibration parameters in a form an intrinsic matrix and distortion coefficients. While Isaac Sim uses physical units for the camera sensor size and focal length. To convert the calibration parameters to the Isaac Sim units, the following example can be used:

# OpenCV camera matrix and width and height of the camera sensor, from the calibration file
width, height = 1920, 1200
camera_matrix = [[958.8, 0.0, 957.8], [0.0, 956.7, 589.5], [0.0, 0.0, 1.0]]

# Pixel size in microns, aperture and focus distance from the camera sensor specification
# Note: to disable the depth of field effect, set the f_stop to 0.0. This is useful for debugging.
pixel_size = 3 * 1e-3   # in mm, 3 microns is a common pixel size for high resolution cameras
f_stop = 1.8            # f-number, the ratio of the lens focal length to the diameter of the entrance pupil
focus_distance = 0.6    # in meters, the distance from the camera to the object plane

# Calculate the focal length and aperture size from the camera matrix
((fx,_,cx),(_,fy,cy),(_,_,_)) = camera_matrix
horizontal_aperture =  pixel_size * width                   # The aperture size in mm
vertical_aperture =  pixel_size * height
focal_length_x  = fx * pixel_size
focal_length_y  = fy * pixel_size
focal_length = (focal_length_x + focal_length_y) / 2         # The focal length in mm

# Set the camera parameters, note the unit conversion between Isaac Sim sensor and Kit
camera.set_focal_length(focal_length / 10.0)                # Convert from mm to cm (or 1/10th of a world unit)
camera.set_focus_distance(focus_distance)                   # The focus distance in meters
camera.set_lens_aperture(f_stop * 100.0)                    # Convert the f-stop to Isaac Sim units
camera.set_horizontal_aperture(horizontal_aperture / 10.0)  # Convert from mm to cm (or 1/10th of a world unit)
camera.set_vertical_aperture(vertical_aperture / 10.0)

camera.set_clipping_range(0.05, 1.0e5)

Notice that the script above is an initial approximation. Its limitations are the camera sensor is in the center of the image plane and that the pixels are square. To extend this approximation a distortion model needs to be added. The following models are supported by the rendering engine:

Note

To support intrinsics with the camera sensor outside of the center of the image plane it is necessary to add a distortion model. Non-square pixels are not supported by the rendering engine.

Camera Models#
Model	Description
Pinhole (rectilinear)	$r = f \cdot t a n (θ)$
Fisheye Orthographic	$r = f \cdot s i n (θ)$
Equidistant	$r = f \cdot θ$
Equisolid (equal area)	$r = 2 f \cdot s i n (θ / 2)$
Polynomial (f-theta)	$θ = A + B r + C r^{2} + D r^{3} + E r^{4}$
Kannala Brandt 2006	$r = θ + A θ^{3} + B θ^{5} + C θ^{7} + D θ^{9}$
RadTanThinPrism	$θ = A + B r + C r^{2} + D r^{3} + E r^{4}$ , plus tangential and thin prism distortion support

Distortion parameters are normally provided by the calibration toolkits in a form of distortion coefficients. With most popular methods being Rational Polynomial, Brown Conrady, and Fisheye. These models are currently supported by approximating them with the f-theta model.

Below are a few examples of how to convert the calibration parameters from OpenCV and ROS calibration toolkits to the Isaac Sim units. The examples assume that the camera sensor pixels are square.

OpenCV Rational Polynomial Example#

To convert the OpenCV Rational Polynomial calibration parameters to the Isaac Sim units, the following example can be used (the script below as a standalone application which is also provided at standalone_examples/api/omni.isaac.sensor/camera_opencv.py):

# Given the OpenCV camera matrix and distortion coefficients (Rational Polynomial model),
width, height = 1920, 1200
camera_matrix = [[958.8, 0.0, 957.8], [0.0, 956.7, 589.5], [0.0, 0.0, 1.0]]
distortion_coefficients = [0.14, -0.03, -0.0002, -0.00003, 0.009, 0.5, -0.07, 0.017]

# Define the field of view to be rendered (diagonal), pixels outside that FOV will be black
diagonal_fov = 140

# Take the optical center in pixels from the camera matrix, and set the distortion coefficients
((fx,_,cx),(_,fy,cy),(_,_,_)) = camera_matrix
camera.set_projection_type("fisheyePolynomial")           # f-theta model, to approximate the model
camera.set_rational_polynomial_properties(width, height, cx, cy, diagonal_fov, distortion_coefficients)

OpenCV Fisheye Calibration Example#

To convert the OpenCV Fisheye calibration parameters to the Isaac Sim model, the following example can be used (see standalone_examples/api/omni.isaac.sensor/camera_opencv_fisheye.py for the complete example):

# Given the OpenCV camera matrix and distortion coefficients (Fisheye model),
width, height = 1920, 1200
camera_matrix = [[455.8, 0.0, 943.8], [0.0, 454.7, 602.3], [0.0, 0.0, 1.0]]
distortion_coefficients = [0.05, 0.01, -0.003, -0.0005]

# Define the field of view to be rendered (diagonal), pixels outside that FOV will be black
diagonal_fov = 235

# Take the optical center in pixels from the camera matrix, and set the distortion coefficients
((fx,_,cx),(_,fy,cy),(_,_,_)) = camera_matrix
camera.set_projection_type("fisheyePolynomial")           # f-theta model, to approximate the fisheye model
camera.set_kannala_brandt_properties(width, height, cx, cy, diagonal_fov, distortion_coefficients)

ROS Calibration Example#

In this example a printout of a of ROS topic, containing the intrinsic and extrinsic parameters of the camera will be used to create a camera sensor in Isaac Sim. It assumes that a physical camera sensor is calibrated by the vendor or by ROS Camera Calibration toolkit. And the calibration is published as a ROS topic.

As a first step, print the camera_info topic with:

rostopic echo /camera/color/camera_info

The output will look similar to the following (Intel® RealSense™ Depth Camera D435i is used here as an example):

frame_id: "camera_color_optical_frame"
height: 480
width: 640
distortion_model: "rational_polynomial"
D: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
K: [612.4178466796875, 0.0, 309.72296142578125, 0.0, 612.362060546875, 245.35870361328125, 0.0, 0.0, 1.0]
R: [1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0]
P: [612.4178466796875, 0.0, 309.72296142578125, 0.0, 0.0, 612.362060546875, 245.35870361328125, 0.0, 0.0, 0.0, 1.0, 0.0]

Note

The distortion model could be “plumb_bob” or “pinhole”. In this example we will use the “rational_polynomial” model. To use these models, set the distortion_model to “rational_polynomial” and compliment array D with 0.0 to 8 elements.

Note

The camera_info topic in the Isaac Sim ROS bridge will be in the Rational Polynomial format.

Then, as a second step, open standalone_examples/api/omni.isaac.sensor/camera_ros.py and copy the output into the yaml_data variable. Populate additional parameters using the sensor manual:

pixel_size = 1.4  # Pixel size in microns, 3 microns is common
f_stop = 2.0  # F-number, the ratio of the lens focal length to the diameter of the entrance pupil
focus_distance = 0.5  # Focus distance in meters, the distance from the camera to the object plane

Run the script as a standalone application. It will create a camera and a sample scene, render an image and save it to camera_ros.png file. The asset is also saved to camera_ros.usd file.

python.sh standalone_examples/api/omni.isaac.sensor/camera_ros.py

As an option, to validate the rendering product, this script also allows to overlay the rendered image with a few points projected by the OpenCV distortion model. To enable it, install the OpenCV python package:

python.sh -m pip install opencv-python

Extrinsic Calibration#

Extrinsic calibration parameters are normally provided by the calibration toolkits in a form of a transformation matrix. The convention between the axis and rotation order is important and it varies between the toolkits.

To set the extrinsic parameters for the individual camera sensor, use the following example to convert the transformation matrix from the calibration toolkit to the Isaac Sim units:

import numpy as np
import omni.isaac.core.utils.numpy.rotations as rot_utils       # convenience functions for quaternion operations

dX,dY,dZ = ...  # Extrinsics translation vector from the calibration toolkit
rW,rX,rY,rZ =   # Note the order of the rotation parameters, it depends on the toolkit

Camera(
    prim_path="/rig/camera_color",
    position    =  np.array([-dZ, dX, dY])      # Note, translation in the local frame of the prim
    orientation =  np.array([rW, -rZ, rX, rY])  # quaternion orientation in the world/ local frame of the prim
                                                # (depends if translation or position is specified)
)

As an alternative, the camera sensor can be attached to a prim. In that case, the camera sensor will inherit the position and orientation from the prim.

import omni.isaac.core.utils.prims as prim_utils

camera_prim = prim_utils.create_prim(
    name,
    "Xform",
    translation = ...
    orientation = ...
)

camera = Camera(
    prim_path=f"{name}/camera",
    ...
)

Creating Camera Sensor Rigs#

The camera sensor rig is a collection of camera sensors that are attached to a single prim. It can be assembled from the individual sensors, that are either created manually or derived from the calibration parameters.

This will be a short discussion on how we created a digital twin of the Intel® RealSense™ Depth Camera D455. The usd for the camera can be found in the content folder as: `/Isaac/Sensors/Intel/RealSense/rsd455.usd.

There are three visual sensors, and one IMU sensor on the RealSense. Their placement relative to the camera origin was taken from the layout diagram in the TechSpec document from Intel’s web site.

Most camera parameters were also found in the TechSpec, for example, the usd parameter fStop is the denominator of the F Number from the TechSpec; the focalLength is the Focal Length, and the ftheatMaxFov is the Diagonal Field of View. However, some parameters, like the focusDistance were estimated by comparing real output and informed guesses.

The horizontalAperture and verticalAperture in that example are derived from the technical specification. From the TechSpec, the left, right, and color sensors are listed as a OmniVision Technologies OV9782, and the Tech Spec for that sensor lists the image area as 3896 x 2453 µm. We used that as the aperture sizes.

The resolution for the depth and color cameras are 1280 x 800, but it’s up to the user to attach a render product of that size to the outputs.

The Pseudo Depth camera is a stand in for the depth image created by the camera’s firmware. We don’t attempt to copy the algorithms that create the image from stereo, but the Camera_Pseudo_Depth component is a convenience camera that can return the scene depth as seen from that camera position between the left and right stereo cameras. It would be more accurate to create a depth image from stereo, and if the same algorithm that is used in the RealSense was used then the same results (including artifacts) would be produced.