Audio

Audio in NVIDIA Omniverse™ Kit based apps allows for more interactivity in your USD stage. NVIDIA Omniverse™ Create’s audio system provides both spatial and non-spatial audio support. Sounds can be triggered at a specific time on the animation timeline, or triggered as needed through python scripts. There are two USD prims that are used to define a stage’s audio - a ‘sound’ and a ‘listener’ prim. Sound prims behave as emitters for sounds. A listener prim is necessary for spatial audio to give a point in space where the audio is intended to be heard from.

Sound Prims

A sound prim represents an object in the world that produces sound. The sound prim can either be a spatial or non-spatial audio emitter. Spatial emitters have a position and orientation in 3D space and are used to simulate the distance from the listener. The sound emitter prims can be attached other objects in the world so that they move around the world with them.

Creating Sound Prims

There are several ways to create a new sound prim.

  • the ‘Create’ menu in the ‘Audio’ sub-menu,

    Create a Sound Prim From the Create Menu.
  • right clicking in the viewport area, selecting ‘Create’, and using the ‘Audio’ sub-menu, or

    Create a Sound Prim From the Viewport Right-Click Create Menu.
  • by dragging a sound asset from the content browser window to the viewport.

By default, a new sound prim in the stage will be given the same name as its asset. This can be changed at any time from the “Stage” window. A new sound prim that was created using either of the “Create” menus will be given the default name of “Sound”.

Renaming a Sound Prim From the Stage Tree Window.

Each sound prim has the following properties that can affect its playback:

Sound Prim Properties

Usage

Attenuation Range

Defines the range over which the emitter’s sound falls off to silence. The first value is the
up to which there is no attenuation. The second value is the range at which the sound will fall
off to silence. This defaults to the range <0, 100>.

Attenuation Type

Defines how the sound should fall off to silence. This defines the mathematical model for the
fall off calculation. This may be ‘inverse’ for an inverse square fall off, ‘linear’ for a linear
fall off, and ‘linearSquare’ for a linear squared fall off. The default is ‘inverse’.

Aural Mode

Defines how the sound emitter will behave in the world. This may be either ‘spatial’ or
‘non-spatial’. Spatial sounds will render such that they appear to be coming from a specific
location or distance from the listener. A non-spatial sound will render as it was originally
authored. Spatial sounds are usually used for in-world sound effects and non-spatial sounds are
often used for background music or dialogue. The default is ‘spatial’.

Cone Angles

Defines the angles for a cone where the sound can be heard from. This defines an ‘inner’ and
‘outer’ angle for the cone. The cone always sweeps out relative to the emitter’s front vector.
When the line between the listener and this emitter is within the inner cone angle, the sound is
not attenuated. When the line is outside of the outer cone angle, the emitter cannot be heard.
When the line is between the two cone angles, the volume will be adjusted toward silence as it
gets closer to the outer angle. When both these angles are 180 degrees, the emitter is
omni-directional and the cones are disabled. The default is <180, 180>.

Cone Low-pass Filter

Defines low-pass filter parameters to use for the inner and outer cone angles. These parameters
are unit-less and indicate the relative amount of filtering that should be performed at each of
the cone angles. Listener to emitter lines that land between the two cone angles will also have
their low-pass filter values interpolated accordingly. A filter parameter of 0.0 indicates that
no filtering should occur. A value of 1.0 indicates that maximum filtering should occur. This
defaults to <0, 0>.

Cone Volumes

Defines the volume levels to use at the inner and outer cone angles. These are normalized volume
levels between 0.0 (silence) and 1.0 (full volume). Listener to emitter lines that land between
the two cone angles will also have their volume value interpolated accordingly. This defaults
to <1.0, 0.0>.

Enable Distance Delay

Defines whether distance delay calculations will be performed for this sound. When enabled, this
will delay the start of playing the sound according to the current speed of sound and the total
distance between the listener and this emitter. This can be ‘on’, ‘off’, or ‘default’. When
set to ‘default’, the global audio settings will control whether these calculations are performed
or not. This defaults to ‘default’.

Enable Doppler

Defines whether doppler shift calculations will be performed for this sound. When enabled, this
will actively calculate and apply a doppler shift factor to the playing sound based on its current
velocity relative to the listener. This can be ‘on’, ‘off’, or ‘default’. When set to ‘default’,
the global audio settings will control whether these calculations are performed or not. This
defaults to ‘default’.

Enable Interaural Delay

Defines whether inter-aural time delay calculations will be performed for this sound. When
enabled, this will actively calculate and apply a small time delay to the left and right side
speakers depending on the location of this emitter relative to the listener. This can be ‘on’,
‘off’, or ‘default’. When set to ‘default’, the global audio settings will control whether
these calculations are performed or not. This defaults to ‘default’.

Note: this feature is not yet supported in Create.

End Time

Defines the time index at which this sound should stop playing. This time index is relative to
the animation timeline. If this is less than or equal to the “Start Time” value, this sound will
play until its asset naturally ends. This defaults to 0.

File Path

Defines the path to the asset to use for this sound. This may be an Omniverse path or a local path.
The asset will be loaded as soon as possible and cached internally. This defaults to an empty
string.

Gain

Defines the volume level to play this sound at. This is a normalized linear volume level in the
range <0.0, 1.0>. A value of 0.0 indicates silence. A value of 1.0 indicates full volume. This
may also be larger than 1.0 to make the sound louder than it was originally authored at. This may
also be negative to invert the sound. This defaults to 1.0.

Loop Count

Defines the number of times to repeat this sound after it finishes naturally. The sound will always
play at least once. This indicates the number of times it will repeat after that initial play
through. This may be negative to indicate that the sound should loop infinitely. This defaults
to 0.

Media Offset End

Defines the end of the region of the sound to play. This is measured in video frames. This may
be less than or equal to the ‘Media Offset Start’ value to indicate that the sound should play
until its natural end. This defaults to 0.

Media Offset Start

Defines the start of the region of the sound to play. This is measured in video frames. This may
be 0 to indicate the start of the sound. This defaults to 0.

Priority

Defines the relative priority level to use for this sound when deciding which sounds are the most
important to play in the stage. This is ignored as long as their are fewer sounds playing than
there are available sound processing voices. Once there are more playing sounds than voices, this
value along with each sound’s effective volume are used to decide which sounds are most important
to stay active in the stage. This is an arbitrary scale with 0 being the default priority and
larger numbers meaning higher priority levels. Negative values indicate lower than default
priority.

Start Time

Defines the time index at which this sound should start playing. This time index is relative to
the animation timeline using its same scale. This may be negative to indicate that this sound is
only to be dynamically triggered through python scripts. This defaults to 0.

Time Scale

Defines the rate to play back this sound at. This is a scaling factor where 1.0 indicates that
the sound should be played back at its originally authored rate. A value greater than 1.0 will
increase the playback speed and increase the pitch. A value less than 1.0 will decrease the
playback speed and decrease the pitch. This defaults to 1.0.

Each of these prim properties can be adjusted through python scripts at runtime as needed. When modified through the python script, the changes will take effect immediately and affect all future instances of it. Some sounds may want to only be triggered based on interactive events. In these cases, they can still be added as part of the USD stage, but simply set their start time property to -1. These sounds will never be triggered as part of the animation timeline, but can instead be triggered as needed through the python script.

Listener Prims

A listener represents the point in space in the virtual world from which spatial sounds are heard. Its position and orientation affect which sounds can be heard and at which volume levels they are heard at. The listener’s velocity relative to each sound emitter is also used when calculating doppler shift factors.

The listener object can be an explicit object in the USD stage, or it can be implicit from the current camera. Each camera can have a listener attached to it. If a third person view is preferred instead, some situations may be better suited to having the listener attached to the third person character object. This will have effect of hearing the world from that character’s perspective. As with sound emitters, a listener prim can be attached to another prim in the world so that it moves with the other prim.

A listener prim does not strictly need to be created in a USD stage if it is going to be implicitly attached to the active camera. However, if a listener prim is needed, one can be created by:

  • using the ‘Create’ menu in the ‘Audio’ sub-menu.

    Create a Sound Prim From the Create Menu.
  • right clicking in the viewport area, selecting ‘Create’, and using the ‘Audio’ sub-menu.

    Create a Sound Prim From the Viewport Right-Click Create Menu.

By default, a new listener will be given the name “Listener”. This can be changed in the “Stage” window in the same manner as with renaming the sound prims.

Each listener prim has the following properties that can affect its playback:

Listener Prim Properties

Usage

Cone Angles

Defines the angles for a cone where the sound can be heard from. This defines an ‘inner’ and
‘outer’ angle for the cone. The cone always sweeps out relative to the emitter’s front vector.
When the line between the listener and this emitter is within the inner cone angle, the sound is
not attenuated. When the line is outside of the outer cone angle, the emitter cannot be heard.
When the line is between the two cone angles, the volume will be adjusted toward silence as it
gets closer to the outer angle. When both these angles are 180 degrees, the emitter is
omni-directional and the cones are disabled. The default is <180, 180>.

Cone Low-pass Filter

Defines low-pass filter parameters to use for the inner and outer cone angles. These parameters
are unitless and indicate the relative amount of filtering that should be performed at each of
the cone angles. Listener to emitter lines that land between the two cone angles will also have
their low-pass filter values interpolated accordingly. A filter parameter of 0.0 indicates that
no filtering should occur. A value of 1.0 indicates that maximum filtering should occur. This
defaults to <0, 0>.

Cone Volumes

Defines the volume levels to use at the inner and outer cone angles. These are normalized volume
levels between 0.0 (silence) and 1.0 (full volume). Listener to emitter lines that land between
the two cone angles will also have their volume value interpolated accordingly. This defaults
to <1.0, 0.0>.

Orientation From View

Defines whether the orientation of the listener is taken directly from the prim’s orientation or
from the orientation of the camera. If this option is enabled, the orientation will be taken to
match the active camera. If disabled, the orientation will come directly from the listener prim.
This is useful for some third person situations - if the listener does a lot of rotating, the
audio output could be very confusing and disorientating if it comes from the listener’s perspective.
In this case, it might be more friendly to have the listener’s orientation come from the camera
instead. This defaults to enabled.

The active listener for the stage is chosen through the Audio Settings menu described below. By default, the active camera will be the listener. If this is disabled, the active listener can be explicitly chosen. As with sounds prims, all properties of listener prims can be modified from the python scripts as well. This includes dynamically selecting the active listener prim.

Audio Player

Unlike graphics assets such as textures, meshes, etc, audio assets cannot be visually previewed. To handle this, Create provides a simple audio player window that can be used to preview assets before adding them to the USD stage. This audio player can be accessed in one of two ways:

  • by selecting “Audio Player” from the “Window” menu, or

    Audio Player in the Window Menu.
  • by choosing “Play Audio” from the right-click context menu of an audio asset in the Content Browser window.

    Audio Player in the Content Browser.

The audio player has a few simple controls on it:

Audio Player Window.

Option

Result

Asset Name / Picker

A new asset may be chosen by either typing in its path, or clicking on the file folder button to bring up an asset picker window.
When a new asset path is given, the previous asset (if any) will stop playing and be unloaded. Assets from both local storage and
Omniverse may be chosen. If an asset fails to load, a failure message will be displayed in red in the window.

Timeline

Shows the progress of playing the sound asset. This includes the length and current position displayed in minutes and seconds.
Clicking on the timeline or dragging the slider will reposition the play cursor to a different spot in the asset, but only when
in ‘stopped’ mode.

Play/Pause

The play/pause button will toggle between the two actions (Play / Pause) each time it is pressed (while an asset is loaded).
When paused, playback will stop but playback position is kept.
When un-paused, playback will resume from the same position it was last paused at.

Stop

Stopping the playback will reset the current position back to the start of the asset.

Audio Settings

The audio settings window contains several audio settings values that are specific to the current USD stage. These are global settings for the stage that affect the audio behavior. The window can be opened from the “Window” menu by choosing the “Audio Settings” item.

Audio Settings Window.

Audio Setting

Usage

Active Listener

Defines the path to the prim to use as the active listener for the stage. This setting is
ignored if “use active camera as listener” is enabled. The active listener may be chosen either
by selecting it in the viewport or stage tree then clicking on the “link” button, or by typing
in the prim’s full path. The default is an empty string.

Use Active Camera As Listener

Defines whether to use the active camera as an implicit listener or to use an explicit listener
prim. When enabled, the active listener will always be attached to the active camera using its
same orientation. In this case, the “Active Listener” setting is ignored. When disabled, the
active listener will be selected either through the “Active Listener” setting or from a python
script setting the active listener. The default is enabled.

Doppler Default

The default global behavior for doppler effect calculations. This will affect all sound prims
that choose the ‘default’ mode for their “Enable Doppler” property. This may be ‘on’ or ‘off’
to affect only the sound prims that use the ‘default’ mode. This may also be set to ‘forceOn’
or ‘forceOff’ to turn doppler calculations on or off for all spatial sound prims regardless of
their “Enable Doppler” property. This defaults to ‘off’.

Distance Delay Default

The default global behavior for distance delay calculations. This will affect all sound prims
that choose the ‘default’ mode for their “Enable Distance Delay” property. This may be ‘on’
or ‘off’ to affect only the sound prims that use the ‘default’ mode. This may also be set to
‘forceOn’ or ‘forceOff’ to turn distance delay calculations on or off for all spatial sound
prims regardless of their “Enable Distance Delay” property. This defaults to ‘off’.

Interaural Delay Default

The default global behavior for interaural time delay calculations. This will affect all
sound prims that choose the ‘default’ mode for their “Enable Interaural Delay” property. This
may be ‘on’ or ‘off’ to affect only the sound prims that use the ‘default’ mode. This may
also be set to ‘forceOn’ or ‘forceOff’ to turn interaural delay calculations on or off for
all spatial sound prims regardless of their “Enable Interaural Delay” property. This defaults
to ‘on’.

Concurrent Voices

Defines the maximum number of sounds that can be played simultaneously. This can affect the
overall processing requirements of the stage. This must be at least 1 and less than 4096.
If more sounds than this are active in the scene at any given time, only the ones that are
the loudest or marked as the highest priority will be audible in the stage. This can be
set to an optimal value that balances processing needs and correctness for the stage through
a process of trial and error. This defaults to 64.

Speed of Sound

Defines the speed of sound setting for the stage. This affects doppler and distance delay
calculations for the stage. This is always expressed in meters per second. This defaults
to 340m/s.

Doppler Scale

Defines a scaling value to exaggerate or reduce the effect of doppler shift calculations.
A value of 1.0 means that the calculated doppler shift values should be unscaled. A value
greater than 1.0 means that the effects of the doppler scale calculations should be exaggerated.
A value less than 1.0 means that the effect of the doppler scale calculations should be
reduced. Negative values are not allowed. This defaults to 1.0.

Doppler Limit

Defines a limit for doppler shift factors. This helps to reduce unintentional audio
corruption due to velocities larger than the speed of sound being used in the stage. This
limit is a unit-less scaling factor that is non-linearly proportional to the speed of sound
setting. For example, a value of 16.0 is approximately equivalent to a relative velocity
of 95% the speed of sound. A value of 2.0 is approximately equivalent to 50% the speed of
sound. This defaults to 2.0.

Spatial Time Scale

Defines the global time scale value for all spatial sound prims. This is equivalent to changing
the “Time Scale” property on all spatial sound prims, except that it only needs to be managed
from one spot and it doesn’t change the individual time scale values of each sound prim. This
can be used for time global dilation effects and the like. Defaults to 1.0.

Non-Spatial Time Scale

Defines the global time scale value for all non-spatial sound prims. This is equivalent to
changing the “Time Scale” property on all non-spatial sound prims, except that it only needs
to be managed from one spot and it doesn’t change the individual time scale values of each
sound prim. This can be used for time global dilation effects and the like. Defaults to 1.0.

Audio App Preferences

There are some application preferences that can help to control the behavior of audio output globally in Omniverse Omniverse Create. These preferences affect all USD stages loaded in Omniverse Omniverse Create. These settings are not stored as part of the USD stage. The app preferences window can be opened by going to the “Edit” menu and choosing “Preferences”. The audio preferences can be found by selecting “Audio” in the sections list on the left.

Audio Application Preferences.

Audio Output Section

Usage

Output Device

Displays a drop-down box containing the names of all audio output devices connected to the system.
This may be used to select the desired device for output in Omniverse Omniverse Create. This affects output for the
main USD stage output and all UI audio. Once a device is selected from the list, the “Apply”
button must be pushed to accept the change. Changing both this setting and the speaker
configuration below will cause the output of all open audio contexts to be changed. If the
state of devices attached to the system has changed recently (ie: a new device was connected or
a device was disconnected from the system), the “Refresh” button can be used to collect the new
device list. By default, the system’s default output device will be chosen.

If the selected device is disconnected from the system between launches of Omniverse Omniverse Create or the device
list changes between launches, the previously selected device will attempt to be found first
on the next launch. If it is still attached to the system, it will be used. If it could not
be found in the device list, the system’s default output device will be used instead.

Speaker Configuration

Sets the speaker configuration to use for output. All configurations are supported regardless
of the device’s capabilities (ie: a 5.1 configuration is still supported on a stereo device).
In the case the output mode is not directly supported by the selected device, the final output
of the audio system will be down-mixed to the device’s preferred configuration. As much of the
original stream as is possible will be preserved in the down-mixed output.

If the “auto-detect” configuration is selected, the output will try to match the device’s
preferred format. Note that this could result in extra processing requirements on some devices
due to the larger number of speaker channels.

The “Apply” button must be pushed or Omniverse Omniverse Create relaunched after changing this setting for this to
take effect.

Audio Parameters Section

Usage

Auto Stream Threshold

Defines the asset size at which the audio system will decide to stream a compressed audio asset
instead of decompress it into memory. This threshold is expressed in kilobytes. If this is set
to zero the auto-streaming feature will be disabled. If this is set to any larger value, any
compressed audio asset with a decompressed size larger than this threshold will be streamed from
the original compressed object instead of being decompressed. This benefit of this is lower
memory usage. However, streaming sounds does require slightly more processing time. The
default value is 0KB.

Volume Levels Section

Usage

Master Volume

Defines the master volume level for all audio output in Omniverse Omniverse Create. All other volume levels are
effectively multiplied by this volume level to get the final overall volume. Setting this to
0.0 will result in silence (though audio data will still be fully processed). Setting this to
1.0 will be full volume. The volume level changes linearly across this range. This defaults to
1.0.

USD Volume

Defines the volume level to be used by all audio for the USD stage audio output. This affects
all spatial and non-spatial sounds. Setting this to 0.0 will result in silence (though audio
data will still be fully processed). Setting this to 1.0 will be full volume. The volume level
changes linearly across this range. This defaults to 1.0.

Spatial Voice Volume

Defines the volume level to be used for all spatial sounds in the USD stage. This volume level
is effectively multiplied by the “USD Volume” level setting as well before output to get the
final volume level for spatial sounds. Setting this to 0.0 will result in silence (though audio
data will still be fully processed). Setting this to 1.0 will be full volume. The volume level
changes linearly across this range. This defaults to 1.0.

Non-spatial Voice Volume

Defines the volume level to be used for all non-spatial sounds in the USD stage. This volume
level is effectively multiplied by the “USD Volume” level setting as well before output to get
the final volume level for non-spatial sounds. Setting this to 0.0 will result in silence
(though audio data will still be fully processed). Setting this to 1.0 will be full volume.
The volume level changes linearly across this range. This defaults to 1.0.

UI Audio Volume

Defines the volume level to be used for all UI audio sounds in Omniverse Omniverse Create. This affects all sounds
that go through the omni.Create.uiaudio python interface. Setting this to 0.0 will result in
silence (though audio data will still be fully processed). Setting this to 1.0 will be full
volume. The volume level changes linearly across this range. This defaults to 1.0.

Debug Section

Usage

Stream Dump Filename

Defines the filename to be used when dumping the USD stage audio output to file. This will be
written out in WAVE file format regardless of the extension on the filename. The channel count
and data format will match the current output device’s selected channel count and format. This
file will be written to disk as audio is played and will always try to remain within a few
milliseconds of audio away from what is playing on the device (as close as possible).

The output file must be on a local file volume. Sending output to an omniverse location is not
supported. Once stream dumping is enabled, the output file will be created and it will be
written to as new audio data is produced. The output will continue until stream dumping is
disabled or Omniverse Omniverse Create is exited. The default value for this setting is an empty string.

Note that as long as this feature is left enabled, data will continue to be written to the
output file. Since this is written as uncompressed data, this file will tend to grow rather
quickly. For example, a 48KHz stereo floating point signal will write approximately 22MB per
minute. For this reason, the “Enable Stream Dump” setting is not persistent in Omniverse Create user
configuration. It will always be off when Omniverse Omniverse Create launches.

Enable Stream Dump

Defines whether stream dumping is currently enabled. As soon as this is enabled and a valid
filename is selected in “Stream Dump Filename”, writing to the output file will begin. Stream
dumping will continue until this setting is disabled or Omniverse Omniverse Create is exited. This setting does not
persist in Omniverse Omniverse Create’s user configuration. It will always be disabled on a fresh launch of Omniverse Omniverse Create.