Audio

Audio in NVIDIA Omniverse™ Kit based apps allows for more interactivity in your USD stage. NVIDIA Omniverse™ Create’s audio system provides both spatial and non-spatial audio support. Sounds can be triggered at a specific time on the animation timeline, or triggered as needed through python scripts. There are two USD prims that are used to define a stage’s audio - a ‘sound’ and a ‘listener’ prim. Sound prims behave as emitters for sounds and can be either ‘spatial’ or ‘non-spatial’. A listener prim is necessary for spatial audio to give a point in space where the audio is intended to be heard from. A listener does not affect non-spatial sounds.

Sound Prims

A sound prim represents an object in the world that produces sound. The sound prim can either be a spatial or non-spatial audio emitter. Spatial emitters have a position and orientation in 3D space and are used to simulate the distance from the listener. The sound emitter prims can be attached to other objects in the world so that they move around the world with them.

Creating Sound Prims

There are several ways to create a new sound prim.

  • The ‘Create’ menu in the ‘Audio’ sub-menu,

    Create a spatial or non-spatial Sound Prim From the Create Menu.
  • Right clicking in the viewport area, selecting ‘Create’, and using the ‘Audio’ sub-menu, or

    Create a spatial or non-spatial Sound Prim From the Viewport Right-Click Create Menu.
  • by dragging a sound asset from the content browser window to the viewport.

By default, a new sound prim in the stage will be given the same name as its asset. This can be changed at any time from the “Stage” window. A new sound prim that was created using either of the “Create” menus will be given the default name of “Sound” since its asset is unknown at creation time.

Renaming a Sound Prim From the Stage Tree Window.

Each sound prim has the following properties that can affect its playback:

Sound Prim Properties

Usage

Attenuation Range

Defines the range over which the emitter’s sound falls off to silence. The first value is the up to which there is no attenuation. The second value is the range at which the sound will fall off to silence. This defaults to the range <0, 100>.

Attenuation Type

Defines how the sound should fall off to silence. This defines the mathematical model for the fall off calculation. This may be ‘inverse’ for an inverse square fall off, ‘linear’ for a linear fall off, and ‘linearSquare’ for a linear squared fall off. The default is ‘inverse’.

Aural Mode

Defines how the sound emitter will behave in the world. This may be either ‘spatial’ or ‘non-spatial’. Spatial sounds will render such that they appear to be coming from a specific location or distance from the listener. A non-spatial sound will render as it was originally authored. Spatial sounds are usually used for in-world sound effects and non-spatial sounds are often used for background music or dialogue. The default is ‘spatial’.

Cone Angles

Defines the angles for a cone where the sound can be heard from. This defines an ‘inner’ and ‘outer’ angle for the cone. The cone always sweeps out relative to the emitter’s front vector. When the line between the listener and this emitter is within the inner cone angle, the sound is not attenuated. When the line is outside of the outer cone angle, the emitter cannot be heard. When the line is between the two cone angles, the volume will be adjusted toward silence as it gets closer to the outer angle. When both these angles are 180 degrees, the emitter is omni-directional and the cones are disabled. The default is <180, 180>.

Cone Low-pass Filter

Defines low-pass filter parameters to use for the inner and outer cone angles. These parameters are unit-less and indicate the relative amount of filtering that should be performed at each of the cone angles. Listener to emitter lines that land between the two cone angles will also have their low-pass filter values interpolated accordingly. A filter parameter of 0.0 indicates that no filtering should occur. A value of 1.0 indicates that maximum filtering should occur. This defaults to <0, 0>.

Cone Volumes

Defines the volume levels to use at the inner and outer cone angles. These are normalized volume levels between 0.0 (silence) and 1.0 (full volume). Listener to emitter lines that land between the two cone angles will also have their volume value interpolated accordingly. This defaults to <1.0, 0.0>.

Enable Distance Delay

Defines whether distance delay calculations will be performed for this sound. When enabled, this will delay the start of playing the sound according to the current speed of sound and the total distance between the listener and this emitter. This can be ‘on’, ‘off’, or ‘default’. When set to ‘default’, the global audio settings will control whether these calculations are performed or not. This defaults to ‘default’.

Enable Doppler

Defines whether doppler shift calculations will be performed for this sound. When enabled, this will actively calculate and apply a doppler shift factor to the playing sound based on its current velocity relative to the listener. This can be ‘on’, ‘off’, or ‘default’. When set to ‘default’, the global audio settings will control whether these calculations are performed or not. This defaults to ‘default’.

Enable Interaural Delay

Defines whether inter-aural time delay calculations will be performed for this sound. When enabled, this will actively calculate and apply a small time delay to the left and right side speakers depending on the location of this emitter relative to the listener. This can be ‘on’, ‘off’, or ‘default’. When set to ‘default’, the global audio settings will control whether these calculations are performed or not. This defaults to ‘default’.

End Time

Defines the time index at which this sound should stop playing. This time index is relative to the animation timeline. If this is less than or equal to the “Start Time” value, this sound will play until its asset naturally ends. This defaults to 0.

File Path

Defines the path to the asset to use for this sound. This may be an Omniverse path or a local path. The asset will be loaded as soon as possible and cached internally. This defaults to an empty string.

Gain

Defines the volume level to play this sound at. This is a normalized linear volume level in the range <0.0, 1.0>. A value of 0.0 indicates silence. A value of 1.0 indicates full volume. This may also be larger than 1.0 to make the sound louder than it was originally authored at. This may also be negative to invert the sound. This defaults to 1.0.

Loop Count

Defines the number of times to repeat this sound after it finishes naturally. The sound will always play at least once. This indicates the number of times it will repeat after that initial play through. This may be negative to indicate that the sound should loop infinitely. This defaults to 0.

Media Offset End

Defines the end of the region of the sound to play. This is measured in video frames. This may be less than or equal to the ‘Media Offset Start’ value to indicate that the sound should play until its natural end. This defaults to 0.

Media Offset Start

Defines the start of the region of the sound to play. This is measured in video frames. This may be 0 to indicate the start of the sound. This defaults to 0.

Priority

Defines the relative priority level to use for this sound when deciding which sounds are the most important to play in the stage. This is ignored as long as their are fewer sounds playing than there are available sound processing voices. Once there are more playing sounds than voices, this value along with each sound’s effective volume are used to decide which sounds are most important to stay active in the stage. This is an arbitrary scale with 0 being the default priority and larger numbers meaning higher priority levels. Negative values indicate lower than default priority.

Start Time

Defines the time index at which this sound should start playing. This time index is relative to the animation timeline using its same scale. This may be negative to indicate that this sound is only to be dynamically triggered through python scripts. This defaults to 0.

Time Scale

Defines the rate to play back this sound at. This is a scaling factor where 1.0 indicates that the sound should be played back at its originally authored rate. A value greater than 1.0 will increase the playback speed and increase the pitch. A value less than 1.0 will decrease the playback speed and decrease the pitch. This defaults to 1.0.

Each of these prim properties can be adjusted through python scripts at runtime as needed. When modified through the python script, the changes will take effect immediately and affect all future instances of it. Some sounds may want to only be triggered based on interactive events. In these cases, they can still be added as part of the USD stage, but simply set their start time property to -1. These sounds will never be triggered as part of the animation timeline, but can instead be triggered as needed through the python script.

Listener Prims

A listener represents the point in space in the virtual world from which spatial sounds are heard. Its position and orientation affect which sounds can be heard and at which volume levels they are heard. The listener’s velocity relative to each sound emitter is also used when calculating doppler shift factors.

The listener object can be an explicit object in the USD stage, or it can be implicit from the current camera. Each camera can have a listener attached to it. If a third person view is preferred instead, some situations may be better suited to having the listener attached to the third person character object. This will have the effect of hearing the world from that character’s perspective. As with sound emitters, a listener prim can be attached to another prim in the world so that it moves with the other prim.

A listener prim does not strictly need to be created in a USD stage if it is going to be implicitly attached to the active camera. However, if a listener prim is needed, one can be created by:

  • Using the ‘Create’ menu in the ‘Audio’ sub-menu.

    Create a Listener Prim From the Create Menu.
  • Right clicking in the viewport area, selecting ‘Create’, and using the ‘Audio’ sub-menu.

    Create a Listener Prim From the Viewport Right-Click Create Menu.

By default, a new listener will be given the name “Listener”. This can be changed in the “Stage” window in the same manner as with renaming the sound prims.

Each listener prim has the following properties that can affect its playback:

Listener Prim Properties

Usage

Cone Angles

Defines the angles for a cone where the sound can be heard from. This defines an ‘inner’ and ‘outer’ angle for the cone. The cone always sweeps out relative to the emitter’s front vector. When the line between the listener and this emitter is within the inner cone angle, the sound is not attenuated. When the line is outside of the outer cone angle, the emitter cannot be heard. When the line is between the two cone angles, the volume will be adjusted toward silence as it gets closer to the outer angle. When both these angles are 180 degrees, the emitter is omni-directional and the cones are disabled. The default is <180, 180>.

Cone Low-pass Filter

Defines low-pass filter parameters to use for the inner and outer cone angles. These parameters are unitless and indicate the relative amount of filtering that should be performed at each of the cone angles. Listener to emitter lines that land between the two cone angles will also have their low-pass filter values interpolated accordingly. A filter parameter of 0.0 indicates that no filtering should occur. A value of 1.0 indicates that maximum filtering should occur. This defaults to <0, 0>.

Cone Volumes

Defines the volume levels to use at the inner and outer cone angles. These are normalized volume levels between 0.0 (silence) and 1.0 (full volume). Listener to emitter lines that land between the two cone angles will also have their volume value interpolated accordingly. This defaults to <1.0, 0.0>.

Orientation From View

Defines whether the orientation of the listener is taken directly from the prim’s orientation or from the orientation of the camera. If this option is enabled, the orientation will be taken to match the active camera. If disabled, the orientation will come directly from the listener prim. This is useful for some third person situations - if the listener does a lot of rotating, the audio output could be very confusing and disorientating if it comes from the listener’s perspective. In this case, it might be more friendly to have the listener’s orientation come from the camera instead. This defaults to enabled.

The active listener for the stage is chosen through the Audio Settings menu described below. By default, the active camera will be the listener. If this is disabled, the active listener can be explicitly chosen. As with sounds prims, all properties of listener prims can be modified from the python scripts as well. This includes dynamically selecting the active listener prim.

Audio Player

Unlike graphics assets such as textures, meshes, etc, audio assets cannot be visually previewed. To handle this, Create provides a simple audio player window that can be used to preview assets before adding them to the USD stage. The audio player window is provided as an extension in NVIDIA Omniverse™ Kit based apps. This can be enabled by opening the Extension Manager window (“Window” menu -> “Extensions”), searching for “audio player”, and enabling the “Audio Player Window” extension. This will also enable the required “Audio Player” extension.

'Audio Player Window' extension in the Extension Manager.

Once its extension is enabled, the audio player can be accessed in one of two ways:

  • by selecting “Audio Player” from the “Window” menu, or

    Audio Player in the Window Menu.
  • by choosing “Play Audio” from the right-click context menu of an audio asset in the Content Browser window.

    Audio Player in the Content Browser.

The audio player has a few simple controls on it:

Audio Player Window.

Option

Result

Asset Name / Picker

A new asset may be chosen by either typing in its path, or clicking on the file folder button to bring up an asset picker window. When a new asset path is given, the previous asset (if any) will stop playing and be unloaded. Assets from both local storage and Omniverse may be chosen. If an asset fails to load, a failure message will be displayed in red in the window.

Timeline

Shows the progress of playing the sound asset. This includes the length and current position displayed in minutes and seconds. Clicking on the timeline or dragging the slider will reposition the play cursor to a different spot in the asset, but only when in ‘stopped’ mode.

Play/Pause

The play/pause button will toggle between the two actions (Play / Pause) each time it is pressed (while an asset is loaded). When paused, playback will stop but playback position is kept. When un-paused, playback will resume from the same position it was last paused at.

Stop

Stopping the playback will reset the current position back to the start of the asset.

Stage Audio Settings

The audio settings window contains several audio settings values that are specific to the current USD stage. These are global settings for the stage that affect the audio behavior. These properties can be found in the “Layer” window by selecting the “Root Layer (Authoring Layer)” object. The audio settings will be at the bottom of the “Property” window below.

Audio Settings Window.

Audio Setting

Usage

Active Listener

Defines the path to the prim to use as the active listener for the stage. This setting is ignored if “use active camera as listener” is enabled. The active listener may be chosen by typing in the prim’s full path. The default is an empty string.

Note: this will be changed to a drop-down box containing all of the stage’s listener objects in a future version.

Use Active Camera As Listener

Defines whether to use the active camera as an implicit listener or to use an explicit listener prim. When enabled, the active listener will always be attached to the active camera using its same orientation. In this case, the “Active Listener” setting is ignored. When disabled, the active listener will be selected either through the “Active Listener” setting or from a python script setting the active listener. The default is enabled.

Doppler Default

The default global behavior for doppler effect calculations. This will affect all sound prims that choose the ‘default’ mode for their “Enable Doppler” property. This may be ‘on’ or ‘off’ to affect only the sound prims that use the ‘default’ mode. This may also be set to ‘forceOn’ or ‘forceOff’ to turn doppler calculations on or off for all spatial sound prims regardless of their “Enable Doppler” property. This defaults to ‘off’.

Distance Delay Default

The default global behavior for distance delay calculations. This will affect all sound prims that choose the ‘default’ mode for their “Enable Distance Delay” property. This may be ‘on’ or ‘off’ to affect only the sound prims that use the ‘default’ mode. This may also be set to ‘forceOn’ or ‘forceOff’ to turn distance delay calculations on or off for all spatial sound prims regardless of their “Enable Distance Delay” property. This defaults to ‘off’.

Interaural Delay Default

The default global behavior for interaural time delay calculations. This will affect all sound prims that choose the ‘default’ mode for their “Enable Interaural Delay” property. This may be ‘on’ or ‘off’ to affect only the sound prims that use the ‘default’ mode. This may also be set to ‘forceOn’ or ‘forceOff’ to turn interaural delay calculations on or off for all spatial sound prims regardless of their “Enable Interaural Delay” property. This defaults to ‘on’.

Concurrent Voices

Defines the maximum number of sounds that can be played simultaneously. This can affect the overall processing requirements of the stage. This must be at least 1 and less than 4096. If more sounds than this are active in the scene at any given time, only the ones that are the loudest or marked as the highest priority will be audible in the stage. This can be set to an optimal value that balances processing needs and correctness for the stage through a process of trial and error. This defaults to 64.

Speed of Sound

Defines the speed of sound setting for the stage. This affects doppler and distance delay calculations for the stage. This is always expressed in meters per second. This defaults to 340m/s.

Doppler Scale

Defines a scaling value to exaggerate or reduce the effect of doppler shift calculations. A value of 1.0 means that the calculated doppler shift values should be unscaled. A value greater than 1.0 means that the effects of the doppler scale calculations should be exaggerated. A value less than 1.0 means that the effect of the doppler scale calculations should be reduced. Negative values are not allowed. This defaults to 1.0.

Doppler Limit

Defines a limit for doppler shift factors. This helps to reduce unintentional audio corruption due to velocities larger than the speed of sound being used in the stage. This limit is a unit-less scaling factor that is non-linearly proportional to the speed of sound setting. For example, a value of 16.0 is approximately equivalent to a relative velocity of 95% the speed of sound. A value of 2.0 is approximately equivalent to 50% the speed of sound. This defaults to 2.0.

Spatial Time Scale

Defines the global time scale value for all spatial sound prims. This is equivalent to changing the “Time Scale” property on all spatial sound prims, except that it only needs to be managed from one spot and it doesn’t change the individual time scale values of each sound prim. This can be used for time global dilation effects and the like. Defaults to 1.0.

Non-Spatial Time Scale

Defines the global time scale value for all non-spatial sound prims. This is equivalent to changing the “Time Scale” property on all non-spatial sound prims, except that it only needs to be managed from one spot and it doesn’t change the individual time scale values of each sound prim. This can be used for time global dilation effects and the like. Defaults to 1.0.