Carbonite Audio Data Interface
Overview
The carb::audio::IAudioData
interface provides access to audio assets and their general management.
All audio assets are stored in sound data objects (SDOs). The audio playback interface and audio group interface
operates only on sound data objects as the asset data for their various operations. This interface allows
assets to be loaded, created, converted, and saved to file. It also provides methods to send output to a
file, and encoding and decoding sound data objects.
Sound data objects support loading and manipulating sound assets in the following formats (PCM formats allow up to 64 channels):
8-bit unsigned integer PCM data.
16, 24, and 32 bit signed integer PCM data.
32 bit floating point PCM data.
Vorbis.
FLAC.
Opus.
MP3 (encoding to MP3 is not supported).
A sound data object consists of a set of information about the sound data’s format (ie: its frame rate, channel count, sample size and type, etc), the length of the asset, and potentially a buffer of data to be processed by a decoder. A sound data object may be fully decoded in memory, streamed from an encoded buffer in memory, or streamed from disk. These objects may also contain additional optional information such as meta data strings (ie: authoring, genre, format information), peak volume information, event point records, loop points, play list information, etc.
Each sound data object may have a single ‘user data’ object associated with it. This user data consists
of a block of data and an optional destructor function for the user data object. The user data object
is never accessed internally by any functionality, but the host app may use it to associate its own
object with the sound data object. The host app may retrieve this object from the sound data object
at any time with carb::audio::IAudioData::getUserData
. When the sound data object that
holds the user data block is destroyed, its optional destructor is called. Similarly, if the user data
block is replaced, the destructor for the previous user data block will be called.
Creating and Loading
Sound data objects may be created in one of many ways, but all from the same call -
carb::audio::IAudioData::createData
. The method of loading or creating the object depends
on the settings passed in the carb::audio::SoundDataLoadDesc
descriptor. A sound asset can
be loaded from a disk file, a blob in memory, or a ‘user decoded stream’. If the asset is loaded from a
blob in memory, it may either be copied into internally owned memory or it could reference the user memory
that was originally passed in. The asset may also either be decoded on load or decoded at runtime as it
plays in order to save memory. Additionally, a new sound data object may be created as empty of a given
length. When an asset is loaded from file and the file contains format information, that will always be
used instead of any format information in the load descriptor. If the asset data does not contain format
information (ie: a user decoded stream) or there is no asset data (ie: an empty asset), the format information
must be provided in the load descriptor.
Once created, all sound data objects are reference counted. When a sound data object is played, a reference
to the object will be taken internally as long as it is in use. The external caller may release only the
references that it has acquired. A reference is acquired on object creation and one for each call to
carb::audio::IAudioData::acquire
. A sound data object will only be destroyed once all references
have been released.
A sound asset does not necessarily have to be loaded into the sound data object in its original format. It may be decoded on load or converted to a different format on load. This is especially true for user decoded streams. A user decoded stream allows the caller to provide a stream of PCM data to be delivered from an arbitrary format. This effectively allows the caller to provide the initial decoding of the asset data from a proprietary format, or to allow for a data generator system to be used. The user decode stream consists of a single callback function that is used to provide a specified number of frames of PCM data. A user decode stream can either just provide the data at load time or it can be used as a streaming source. When it is used as a streaming source, a positioning callback function needs to be provided as well. This allows the stream to be repositioned as needed for playback instead of always having to start from the beginning.
Converting
Once created, a sound data object can be converted to any other supported format as needed. The conversion can either be done in-place and replace the existing object’s internal asset data, or done by creating a new sound data object. Whether the conversion is done in place or not, the returned object will be given a new reference that will need to be released by the caller at some point.
The conversion will attempt to be done in the least destructive manner possible. However, depending on the original format and the selected destination format, a loss of information may be unavoidable. For example, converting from 32-bit float PCM to 8 bit integer PCM will result in a lot of information lost. Note that a conversion will only change the sample format, not other aspects of the asset’s format (ie: its channel count).
Decoding & Encoding
When a sound data object needs to be played or processed, its decoded PCM data can be accessed using a
decoder. In most cases, this is not necessary for external callers. This will be done automatically
internally when the asset is played through the carb::audio::IAudioPlayback
interface. The
decoder can be used to get access to a stream of PCM data for the asset regardless of its encoded format.
Similarly, a stream of PCM data can be used to encode a stream of PCM data into another format in a
sound data object.
Encoding or decoding a sound has the following limitations:
decoding from a sound data object may be done from any sample format, but must always be decoded into a PCM format.
encoding into a sound data object may be encoded into any sample format, but must always come from a PCM source.
Performing an encode or decode operation requires that a codec state object be created first. The codec
state is created with the carb::audio::IAudioData::createCodecState
function. The codec state
allows for a sound data object to be decoded multiple times simultaneously without affecting any other instances.
Each codec state may only perform operations in a single direction - encoding or decoding. Once the
codec state is created, buffers of data may either be received from the decoder or submitted to the
encoder (depending on the direction of the codec state). A codec state’s current position may be queried
or changed as needed, though the accuracy of the positioning depends on the format. For example, some
compressed formats may not support frame accurate seeking. Once the encode or decode operation is
complete, the codec state can be destroyed with carb::audio::IAudioData::destroyCodecState
.
Saving & Output Streams
A sound data object may be written to a file on disk if needed. There may be an optional format change when writing the asset to disk or it may be written in its current format. For some formats, a conversion or re-encoding may need to occur in order to write it to disk. Future versions may allow the file to be written to a blob in memory as well.
An output stream is similar to saving a sound to file (in fact, an output stream is used internally when saving to file). As with saving to file, an optional format conversion may occur in the output stream. The output stream allows an arbitrary stream of PCM data to be sent to a file. The PCM data is sent in anonymous buffers by ‘writing’ it to the stream. Any conversion to the destination format is performed on the buffer before attempting to write it to the stream. Depending on the destination format, the converted data may not be flushed to disk immediately. The only time that it will be guaranteed to be flushed to disk is when the output stream is closed.
Event Points & Loop Points:
Each sound data object may optionally include a set of caller specified event points or loop points. An event point is a spot in the asset where an event is expected to occur. An event point may or may not be used when playing the sound - that is decided by the caller when playing the sound. An event point consists of the frame number where the event point is expected to be triggered, an arbitrary identifier (used to match it when updating, deleting, or adding event points), a text name (for UI display), optional text, and a user data object. A loop point is a specialized event point that also specifies a region length and optional play index. There is no limit on the number of event points or loop points that a sound data object may contain.
Some asset file formats may include information for event points that may be parsed out of the file on load. These event points are only guaranteed on RIFF/WAV files. However, not many authoring tools have the ability to create or store this event point information in the file.
Event points and loop points are set on a sound data object using carb::audio::IAudioData::setEventPoints
.
These are always set in a group. In a single call, individual event points may be added, deleted, or
modified depending on the arbitrary identifier value in the source buffer and the given frame number.
Event points may be retrieved either in groups or individually by various criteria (ie: by identifier,
by play index, or by index).
When playing a sound data object, event points may be enabled using the carb::audio::fPlayFlagUseEventPoints
flag. When this flag is used, a callback function is also expected to be specified in the play descriptor.
When one of the sound data object’s event points is hit, the callback will be performed with the
carb::audio::VoiceCallbackType::eEventPoint
value set in its type
parameter. The
triggered event point descriptor will be passed in the callback’s data
parameter. This callback and
event point data may be used to trigger some external action in the program. Note that if the
carb::audio::fPlayFlagRealtimeCallbacks
flag is also used, the callback should execute and return
as quickly as possible otherwise it will stall the audio processing engine. In general, the callback should
just flag that something needs to occur and store any required information for another thread to handle.
If the carb::audio::fPlayFlagRealtimeCallbacks
flag is not used, the carb::audio::IAudioPlayback::update
function must be called in order for the event points callbacks to be performed.
Some examples of using event points may be:
apply closed captioning text to a dialogue sound track. The event point’s
carb::audio::EventPoint::text
member would contain the text to be displayed or added to a scrolling display. Each new line of text would show up when the audio sound track’s playback reached each event point frame.trigger the next part of an animation that is synchronized to the sound a character is making. When the event point is fired, the character’s animation state would be updated to continue the sequence. This would allow for audio data driven animation sequences without needing to modify code.
trigger an in-game cut-scene sequence such as unlock a door or start an automated fight sequence at a certain point during a voice-over sequence. Again, this would be data driven so re-recording or editing the dialogue would not require code changes.
Typical Usage
This interface should be used to handle all operations on sound assets. This typically begins by calling
carb::audio::IAudioData::createData
to create or load the sound data object from some form
of source data (ie: a file on disk, a blob in memory, or create an empty buffer). The sound data object is
then passed to a playback context to play on a new voice using carb::audio::IAudioPlayback::playSound
.
Once the play task is done, the sound data object is released using carb::audio::IAudioData::release
.
Another usage scenario would be to convert a sound asset from one format to another. The original sound
asset is loaded with carb::audio::IAudioData::createData
. A conversion request is then setup
and a call to carb::audio::IAudioUtils::convert
. This can either create a new sound data object
with the conversion result, or replace the data in the original sound data object. The resulting sound data
object can then be played, saved to disk, or otherwise operated on. The new and original sound data objects
are then released with carb::audio::IAudioData::release
when they are no longer needed. Both
carb::audio::IAudioData::release
calls are needed even if the conversion replaces the original
object’s contents.
The carb::audio::IAudioData
interface can also be used to manually decode a sound data object
to raw PCM or encode it to another format from raw PCM. This is done by creating a ‘codec state’ object (with
carb::audio::IAudioData::createCodecState
) for an existing sound data object, then either decoding or
encoding buffers of data with carb::audio::IAudioData::decodeData
or carb::audio::IAudioData::encodeData
.