This document describes AX's audio renderer at a conceptual level. For information about how the renderer works in practice, see the AX Programming Model reference.
Rendering sound consists of the following steps, each of which are explained in detail in later sections:
An audio channel is a buffer of continuous sound data for a particular speaker of a given output device. The number of available channels varies for each output device. For example, the TV output device supports 6 channels: left, right, center, surround-left, surround-right, and low-frequency effects (LFE).
Multiple channels comprise an audio bus. For example, the TV output device supports a main and 3 auxiliary buses. Note that the number of supported buses varies for each output device.
After all effects have been applied to the AUX buses for a device, AX mixes the output channels of the buses together to create the output for that device. For example, the left speaker channel of each of the TV's buses is mixed to create a final output buffer for the TV's left speaker channel.
|TV||Main, AUX A, AUX B, AUX C||6: Left, right, center, surround-left, surround-right, low-frequency effects (LFE)|
|DRC||Main, AUX A, AUX B, AUX C||4: Left, right, surround-left, surround-right|
AX is a voice-based sound system. Applications acquire and configure a voice to play a particular sound. The voice generation process that occurs for each voice is illustrated below.
In summary, the steps are as follows:
The mixing step consists of the combination the output of each generated voice to the channels and buses of each output device at a volume according to the voice's mixing parameters.
The channels and buses of each output device are organized in the manner illustrated below.
The bus-level mixing topology for a given output device is illustrated here:
Note that the number of available buses and channels will vary per output device.
The output level of a given voice to any channel is configured with
A single voice may be mixed to multiple output devices at the same time. That is, voices need not be dedicated to a particular output device.
While this is convenient for cross-fading a sound between outputs, the computation cost for a voice will increase proportionately to the number of devices and channels to which it is mixed.
After the final mix has been created, additional processing occurs before the audio is transmitted to the TV and DRC output devices. The processing stages are described below:
AX converts the final mix output to the 48kHz for outputting to the devices in this stage. If the mixing rate is already set to 48KHz, which is supported from Sound2, no upsampling is performed.
By default, upsampling occurs before the final mix callback.
It is possible to have it instead occur immediately after the final mix callback with
After all channels and buses have been mixed for a given output device, AX invokes an optional final-mix callback function. Applications can modify or add additional audio data to the final mix at every frame for each output device.
Possible uses for this facility include applying virtual surround or other spatialization algorithms, or mixing audio from other sources (such as a proprietary audio codec).
Note that only TV and DRC support the final mix callback. For more information about playing Wii Remote audio with AX, see the Programming Model's Wii Remotes section.
A volume compressor is implemented to help with reducing the dynamic range of the audio. Users have ability to turn on/off the compressor. By default the compressor is OFF. The compressor implemented is a basic compressor. If it detects that the output exceeded the threshold, it will apply an attenuation with ramp to get it down below the threshold.
Turning ON the compressor might result in distortion as it is a non-linear operation. Users can experiment with their own threshold levels and set the settings for compressor.
AX natively mixes audio using the maximum number of channels supported for each output device. However, the actual number of channels selected by user can vary for some devices (such as the TV).
AX provides a remix matrix that applications may set to handle any discrepancies between
the number of channels being mixed to and the number of channels available to output to.
For example, if the user selects the TV mode to be STEREO,
the application may still render 6 channels of audio and downmix it two channels using the remix matrix.
As of this writing, AX does not automatically downmix and/or upmix.
It is the duty of the application to provide such a matrix if it needs remix to be applied.
For further details, see
Note that AX does not allow the application to change the sound mode of the system; this parameter is configured by the user in the system settings. Application developers can change this parameter via System Config Tool.
At this stage, the audio is formatted to be sent out to the device. This involves interleaving audio samples to be picked up by the audio DMA engines.
SURROUND, Wii U tells connected HDMI audio device to set LPCM 5.1ch mode via HDMI info packet. Amp will change its audio mode to LPCM 5.1ch and output only 5.1ch-worth sound.
The Cafe audio system uses a dedicated DSP for most sound rendering. However, due to the high number of channels, buses, and audio output devices, a software renderer running on the application CPU supplements the DSP. The software renderer ("PPC renderer") is functionally identical to the DSP.
As of this writing, voices are rendered on the DSP by default. Extra voices may be explicitly
directed to the PPC software renderer via
AXSetVoiceRenderer. The default renderer may also be revised by calling
2013/09/03 Update to reflect Sound2 changes.
2013/05/08 Automated cleanup pass.
2013/05/06 Use real sections under See Also.
2012/08/01 Cleanup Pass.
2011/02/21 Initial version.