Carbonite Audio

Overview

In the world of audio processing, audio can either be captured (ie: through a microphone) or output (ie: through speakers). In both cases, the processing of the audio data itself is just about the same, the only difference being its final destination and production rate.

When capturing audio, a source will produce audio to be consumed by a caller. The source may be a microphone, a line-in jack, a disk source, an audio generator, etc. The consumer could be an audio playback device, a data buffer, a disk destination, etc. The audio data is typically produced at a rate controlled exclusively by the source.

When playing back audio data, a producer provides the audio data to be consumed by an output. The source is often in-memory data buffers, but could also be a disk streamed data source, a capture’s live output, or the output of another playback processor. The consumer in this case would be something along the lines of an audio output device, a disk file, a memory buffer, or a capture device. The audio data used as input is often immediately available, but could also be streamed at a non-infinite data rate for playback.

Before continuing, it would be useful to ensure familiarity with some Common Audio Terms and some Basics of Audio Processing.

Interfaces

In the case of Carbonite’s audio offerings, there are three major interfaces involved as well as some additional helper interfaces and objects. All objects, interfaces, structs, and constants can be found in the carb::audio namespace:

carb::audio::IAudioData: provides creation, loading, saving, and manipulation of audio assets. This includes conversion between formats, querying format information, generic encoding and decoding of data, loading existing assets from disk or memory, and creation of empty buffers. Sound Data Objects (SDOs) are always reference counted and will only be destroyed once all references have been removed.
carb::audio::IAudioCapture: provides audio data acquisition from a hardware device. This interface is intentionally simplified to only provide access to the raw PCM data itself. It is only intended to record audio from actual hardware sources.
carb::audio::IAudioPlayback: provides processing, mixing, and playback of audio data. This simulates playback of audio data as either spatial or non-spatial objects. Spatial sounds will attempt to simulate being positioned at a certain distance and orientation from a ‘listener’ object. Non-spatial sounds will play back as they were originally mastered. An SDO will be the unit of work to be operated on by a ‘voice’. A voice is an instance of a playing sound asset. Up to 16,777,216 active voices may exist at any one time in a scene, but only a small subset of those (up to 4096) will be played to an actual output at any given time. The output will go to a hardware device and optionally one or more ‘streamer’ objects. A playback context can also be created to produce output at faster than real-time rates in a ‘baking’ setting. This type of output may only target streamers however.
carb::audio::IAudioGroup: provides creation and manipulation of sound group objects. A sound group is a collection of one or more sound assets (as SDOs), optional playback ranges, and probabilities. A sound group will allow for multiple related sounds to be selected from or cycled through in some manner. Each sound in the group can have an optional playback range within the sound asset (instead of playing the whole asset). Each sound also has a relative probability value associated with it to control the frequency at which it is chosen when in ‘random’ mode.
carb::audio::Streamer: this interface provides a way for a playback context’s output to be sent to a generic destination. The streamer has three main operations - open, close, and write. The open operation provides the streamer with the expected data format. The close operation allows the streamer to clean up its instance. The write operation provides a single buffer of output audio data. It is up to the implementation of the streamer to decide what to do with the data it receives. There is also a C++ wrapper class for the streamer interface that simplifies implementation. A sample implementation of a streamer is provided in the OutputStreamer class. This simply writes the received audio data to a file on disk.

Notes On Thread Safety

Note that most of the audio interfaces are only thread-safe to a degree. For the most part, only the internal functionality that a caller has no direct control over will be guaranteed thread safe. Any external calls will need to be serialized by the caller if it is problematic. Some specifics below:

carb::audio::IAudioData: for the most part, the this interface is thread safe since many of its operations either create a new object or return constant data. However, it is always the caller’s responsibility to ensure references are taken before performing an operation if there is the change a reference may be released on another thread during the operation. Some operations such as modifying the event points or meta data on an SDO are not thread safe and must be serialized by the caller.
carb::audio::IAudioPlayback: many parameter change operations on a context or voice are thread safe to a degree. There will be chance of ordering issues in competing parameter changes, but the consumption of changes once produced will always be serialized internally. Still, it is best practice to only make parameter changes from a single thread or serialize calls externally. Attempts to play sounds should be thread safe, but may produce undesirable results if not serialized (since both will produce a unique playing instance of the sound). Attempting to stop a voice should generally be serialized by the caller. It is unlikely to cause problems but would be a good idea in general.
carb::audio::IAudioCapture: this interface is not thread safe. This is expected to only ever be called into from a single thread or multi-threaded access to it be externally serialized.
carb::audio::IAudioGroup: this interface is not thread safe. This is expected to only ever be called into from a single thread or multi-threaded access to it be externally serialized.

In general, it is best practice to assume that no operations are thread safe and externally serialize access to everything in each interface. The general intention is that any internal multithreaded access will be protected, but callers should make sure to either call into the interfaces from a single thread or to serialize multithreaded access.

Note that this behaviour is likely to change in future versions. This will be intended to simplify the usage from the host app perspective. These thread locking changes will however only affect multiple simultaneous calls from the host app side, not internal behaviour. The overall performance impact should be minimal at best.