Audio2Gesture is Neural network trained to generate body motion that is derived entirely from an audio source. With various animation styles and options available to animate the full body or upper body. Connect your character with the automatic Retargeting tool. A2G provides a high quality and efficient solution to generating body motion for characters in heavy dialogue scenarios.


Overview Tutorial Video

Getting Started in Audio2Gesture

Upon loading The Audio2Gesture extension the user is presented with the following pipeline options.




A2G offline pipeline

The Audio2Gesture Offline Pipeline loads the “Regular” Audio Player for use with audio wave files to generate animation clips.

A2G Streaming pipeline

The Audio2Gesture Streaming Pipeline loads the “Streaming” Audio Player and enables a runtime workflow for TTS audio Streaming.

Base Skeleton

Loads the main skeleton that A2G manipulates to drive performances. The Base Skeleton will load by default when you build a new pipeline. The Skeleton is provided here as a convenience for retargeting reference should you encounter any problems setting up your character retarget.

Audio Players

See Audio2Face documentation links below.

A2G offline Pipeline


Target Skeleton

This field will display any valid skelroot found in the current stage.

Skeleton Connected


The Green Check mark means the selected skeleton has retargeting setup and is connected.

Skeleton not connected


Indicates the currently assigned skeleton is not ready for retargeting. Clicking the icon exposes the Run AutoRetarget command.

Auto Retarget

Success will return a green check Mark. Failure will prompt you to open the retargeting window. For a comprehensive look at the Retargeting tool - Please refer to the Documentation found here.

Open Retargeting Window


Opens the Retargeting tool for more comprehensive setup of characters. For more Details - Please refer to the documentation for Animation Retargeting.

Run A2G


Runs an optimization algorithm to find the best suited animations for the current audio source and parameters and sets A2G in a run state ready to receive audio. A progress bar will be presented during the process.


Every time you change the Parameters for Audio2Gesture, you must click “run A2G” again so the new parameters can be processed by the neural network.


This provides a variety of animation style options to suit various spoken word scenarios.

  • Neutral (default)

  • Big Gestures

  • Calm Speech

  • Public Speech

  • Public Speech - casual

  • Public Speech - behind a table

Animation Mode

A post processing feature that provides the option of a full body animation performance or upper body performance only.

Animation Option

A2G will present a number of options for motion types that best suit the processed audio file and will default to the best or “top” option. User can choose between the other options to explore character performance alternatives.

Advanced Settings

After changing these settings it is required that you “run” A2G once more.




Num Epochs

A2G performs iterative optimizations for each new audio track. More iterations generates better quality.

Num Samples

On each Iteration A2F generates a number of sample animations. More samples = Better quality.

Smoothing Time Span

Parameter to control smoothing duration to source animations as they are stitched together.

Audio Sync Strength

Animation smoothing can affect audio synchronization - this options provides control over that balance between smoothing and accuracy.

Animation Graph Setup





Select a character from the current stage.

Translation Var

Select a translation variable from an anim graph in the current stage.

Rotation Var

Select a rotation Variable from an anim graph in the current stage

Animation Recording


Destination path

Specify a folder on disk to write your animation clip. Press the folder to use a browser window to select the folder. Press the link button to browse to the folder in file explorer.

Take Name

Specify a name for your animation clip. The output USD will be: {destination_path}/{target_prim_name}_{take_name}.usd The output USD will contain one SkelAnimation with the Take Name.

Export FPS.

Set the desired Frames Per Second to record the animation data. (defaults at 60 fps)


Clicking record - will execute the “run” command and start playing the audio for a clean output of the full animation to match the audio clip duration.

Offline Pipeline Video Tutorial

Streaming Pipeline Video Tutorial