AX Rendering Model


This document describes AX's audio renderer at a conceptual level. For information about how the renderer works in practice, see the AX Programming Model reference.

Rendering sound consists of the following steps, each of which are explained in detail in later sections:

  1. Voice Generation.
  2. Mixing.
  3. Output formatting.

Channels, Buses, and Output Devices

An audio channel is a buffer of continuous sound data for a particular speaker of a given output device. The number of available channels varies for each output device. For example, the TV output device supports 6 channels: left, right, center, surround-left, surround-right, and low-frequency effects (LFE).

Multiple channels comprise an audio bus. For example, the TV output device supports a main and 3 auxiliary buses. Note that the number of supported buses varies for each output device.

After all effects have been applied to the AUX buses for a device, AX mixes the output channels of the buses together to create the output for that device. For example, the left speaker channel of each of the TV's buses is mixed to create a final output buffer for the TV's left speaker channel.

Device Buses Channels
TV Main, AUX A, AUX B, AUX C 6: Left, right, center, surround-left, surround-right, low-frequency effects (LFE)
DRC Main, AUX A, AUX B, AUX C 4: Left, right, surround-left, surround-right
RMT Main 1: Mono

Voice Generation

AX is a voice-based sound system. Applications acquire and configure a voice to play a particular sound. The voice generation process that occurs for each voice is illustrated below.

In summary, the steps are as follows:

  1. Sound Sample Decoding.
  2. ADPCM, PCM8, and PCM16 sound data are decoded into the native PCM16 processing format of the renderer.

  3. Sample Rate Converter
  4. Samples are read at the rate described by the voice's SRC values. After this, audio samples are at the sampling rate as specified in the selected renderer, 32KHz or 48KHz.

  5. Volume Envelope
  6. The input volume for the sound is applied here. Note that volume changes are ramped to minimize pop, click, or "zipping" artifacts.

  7. Low Pass Filter
  8. For information, see Filtering.

  9. Biquad Filter
  10. For information, see Filtering.

  11. Remote Controller Audio Filtering
  12. The Remote Controller output devices have an additional optional filtering step.


The mixing step consists of the combination the output of each generated voice to the channels and buses of each output device at a volume according to the voice's mixing parameters.

The channels and buses of each output device are organized in the manner illustrated below.

The bus-level mixing topology for a given output device is illustrated here:

Note that the number of available buses and channels will vary per output device.

The output level of a given voice to any channel is configured with AXSetVoiceDeviceMix.

Unified Matrix Mixer

A single voice may be mixed to multiple output devices at the same time. That is, voices need not be dedicated to a particular output device.

While this is convenient for cross-fading a sound between outputs, the computation cost for a voice will increase proportionately to the number of devices and channels to which it is mixed.

Output Formatting Stage

After the final mix has been created, additional processing occurs before the audio is transmitted to the TV and DRC output devices. The processing stages are described below:


AX converts the final mix output to the 48kHz for outputting to the devices in this stage. If the mixing rate is already set to 48KHz, which is supported from Sound2, no upsampling is performed.

By default, upsampling occurs before the final mix callback. It is possible to have it instead occur immediately after the final mix callback with AXSetDeviceUpsampleStage.

Final Mix Callback

After all channels and buses have been mixed for a given output device, AX invokes an optional final-mix callback function. Applications can modify or add additional audio data to the final mix at every frame for each output device.

Possible uses for this facility include applying virtual surround or other spatialization algorithms, or mixing audio from other sources (such as a proprietary audio codec).

Note that only TV and DRC support the final mix callback. For more information about playing Wii Remote audio with AX, see the Programming Model's Wii Remotes section.

Volume Compressor

A volume compressor is implemented to help with reducing the dynamic range of the audio. Users have ability to turn on/off the compressor. By default the compressor is OFF. The compressor implemented is a basic compressor. If it detects that the output exceeded the threshold, it will apply an attenuation with ramp to get it down below the threshold.

Turning ON the compressor might result in distortion as it is a non-linear operation. Users can experiment with their own threshold levels and set the settings for compressor.

For more information on the compressor functions, see AXSetDeviceCompressor , AXMakeCompressorTable , and AXSetDeviceCompressorTable .

Channel Remix

AX natively mixes audio using the maximum number of channels supported for each output device. However, the actual number of channels selected by user can vary for some devices (such as the TV).

AX provides a remix matrix that applications may set to handle any discrepancies between the number of channels being mixed to and the number of channels available to output to. For example, if the user selects the TV mode to be STEREO, the application may still render 6 channels of audio and downmix it two channels using the remix matrix. As of this writing, AX does not automatically downmix and/or upmix. It is the duty of the application to provide such a matrix if it needs remix to be applied. For further details, see AXSetDeviceRemixMatrix and AXGetDeviceRemixMatrix.

Note that AX does not allow the application to change the sound mode of the system; this parameter is configured by the user in the system settings. Application developers can change this parameter via System Config Tool.

Hardware Output Formatting

At this stage, the audio is formatted to be sent out to the device. This involves interleaving audio samples to be picked up by the audio DMA engines.

Wii U only supports 5.1 Linear PCM on HDMI channel. When audio output is set to SURROUND, Wii U tells connected HDMI audio device to set LPCM 5.1ch mode via HDMI info packet. Amp will change its audio mode to LPCM 5.1ch and output only 5.1ch-worth sound.

Rendering Processors

The Cafe audio system uses a dedicated DSP for most sound rendering. However, due to the high number of channels, buses, and audio output devices, a software renderer running on the application CPU supplements the DSP. The software renderer ("PPC renderer") is functionally identical to the DSP.

As of this writing, voices are rendered on the DSP by default. Extra voices may be explicitly directed to the PPC software renderer via AXSetVoiceRenderer. The default renderer may also be revised by calling AXSetDefaultRenderer.

See Also

Cafe Audio System Overview
Cafe Audio Programming Model

Voice Generation


Bus-Level Mixing


Upsampling Stage


Final Mixing Callback






Sound Mode


Revision History

2013/10/30 Copy-edits.
2013/09/03 Update to reflect Sound2 changes.
2013/05/08 Automated cleanup pass.
2013/05/06 Use real sections under See Also.
2012/08/01 Cleanup Pass.
2011/02/21 Initial version.