AX Overview


AX is the core API for playing audio on Cafe and the following describes the audio features of the AX library. For an overview of the process that AX performs to render sounds, see the Rendering Model reference. Information about programming applications to use AX can be found in the Programming Model reference.

AX System Parameters

The following AX settings relate to all voices.

Device Mode

The audio device mode is set by users in their Cafe system's A/V settings. The audio mode of the TV may be set to mono, stereo, or 6 channel surround. The DRC may have its audio mode set to mono, stereo, or 4 channel surround. Wii Remotes only support monaural audio. This setting indicates how many channels of audio should be output for each device and cannot be changed by applications.

For more information, see the Output Mode reference.

Device Volume

AX provides a master device volume for each supported device. This setting affects the volume of all audio output for the specified device.

Default Voice Renderer

AX has two renderers: DSP and PPC. When a voice is rendered on DSP, its voice generation processing is offloaded to the DSP hardware. When a voice is rendered on PPC, all voice generation processing occurs on the CPU. The two renderers are functionally identical. The default renderer setting is applied to each newly acquired voice, but individual voices may have their renderer set to something other than the default. Upon initialization, the default renderer setting is configured to have voices prefer the DSP but allow either renderer (as assigned by the load balancer, discussed below).

Load Balancer

AX provides an automatic load balancer. This mechanism first predicts the time required to perform voice generation for each voice. After it does this estimation for a voice, it chooses whether to assign generation of that voice to either the DSP or PPC renderer. The load balancer first loads the DSP renderer if possible. If there is too much work for the DSP to do in an audio frame, some of it can be spilled over to the PPC renderer. If the predicted load of rendering all voices exceeds what AX can fit in a single audio frame, voices will be forcibly dropped until the load is manageable. This prevents artifacts that are caused by an audio frame taking too long. The load balancing parameters are configurable, allowing applications to decrease the maximum load put on the DSP or PPC.

For more information, see the Load Balance Overview.

AUX Buses

AX supports three auxiliary (AUX) buses per device for the TV and DRC. AX allows one callback to be set per AUX bus. Once per audio frame, each AUX bus's callback is invoked. At this time, one audio frame's worth of samples have been generated. Any such samples that have been mixed to an AUX buffer are available to that buffer's callback. This buffer of samples may be modified within the callback to apply special effects such as delay, reverb, or chorus.

The SDK provides a premade set of AUX effects in the AXFX library. For more information about these effects, see the AXFX Overview.

DRC Virtual Surround Sound

DRC Virtual Surround is an optional effect that enables two surround sound channels for the DRC in addition to the standard left and right channels. Sound rendered to the surround channels will have a virtual surround effect applied that makes it appear to originate from behind the user, despite the DRC only having two speakers.

DRC virtual surround sound is enabled by default.

DRC virtual surround processing requires a significant amount of CPU time while enabled (approximately 2% of each audio frame). In applications that do not make use of this effect, it is preferable to disable it.

For more information, see the DRC Virtual Surround Overview.


AX provides an optional compressor for audio data that it renders. When audio exceeds a specific threshold, then the compressor temporarily attenuates the output signal to prevent clipping artifacts.

The compressor is off by default.

Remix Matrix

In some situations, it may be desirable to render audio to several channels that does not match the number of channels that the user has configured their Cafe system to output. This scenario is addressed by the AX Remix Matrix. This matrix maps the rendered audio of each channel to output audio channels.

For example, it' is possible to render 6 channel audio for the TV at all times and use the remix matrix to downmix to 2 channels when a user has configured their system for stereo output. Alternatively, if an application does not support surround sound, the remix matrix may be used to upmix 2 channel audio into 6 channels.

Upsampler Mode

For Sound1, AX renders its voices at a sampling frequency of 32 kHz. The Cafe audio hardware only supports output to the TV and DRC at 48 kHz. To bridge this gap, an up-sampler is built in to AX that converts up from 32 kHz to 48 kHz. This up-sampler has two modes: Linear Interpolation and Polyphase FIR. The linear up-sampler requires less computation, while the polyphase up sampler results in higher quality up-sampling.

By default, the polyphase up sampler is selected.

Starting from Sound2, 48 kHz renderer is supported. Upsampler is not needed when renderer is running at 48KHz

Voice Parameters

To play sounds, applications must acquire voices, which may be individually configured with the following voice parameters.


When acquiring a voice, a priority value must be provided. If the maximum number of voices has already been acquired, AX will look for the voice with the lowest priority. If that old voice's priority is lower than the priority requested for the new voice, the old voice will be forcibly freed to make room for the new one.

The priority of a voice is also used by the load balancer. Voices with higher priority are assigned a renderer first. If the maximum load is reached, voices with lower priority will be dropped.


The address parameters of a voice control the memory location from which the voice's sample data is read from. The address parameters include the sample buffer location and offsets for the current playback position, buffer end, and loop point.


A voice's State controls whether the voice is playing or is stopped. If a non-looping voice reaches the end of its sample buffer, it is automatically stopped.


The ADPCM parameters of a voice configure it for playback of DSP-ADPCM format audio. These parameters must be set according to the header of a DSP-ADPCM file. When playing PCM data, it is not necessary to set these parameters.

For more information about the DSP-ADPCM format, see the DSPADPCM Tool reference.

Voice Type

The type of a voice may be either normal or streamed. This parameter is only used if the voice is playing DSP-ADPCM data. When a normal DSP-ADPCM format voice loops, a couple of decompression parameters (yn1, yn2) are updated to ensure correct playback. When streaming, voices may use AX's looping mechanism to jump between sample buffers as audio is played. In the streaming case, the decompression parameters should not be updated when "looping" to another buffer. Setting a voice's type to streamed tells AX not to update these parameters when the voice's loop point is reached. This allows DSP-ADPCM voices to use AX's looping mechanism for streaming.

Volume Envelope

The volume envelope of a voice sets the input volume of an AX voice. This volume modifier applies before any other effects are applied. The volume envelope also allows control over volume ramping within a single audio frame to avoid zipping and popping artifacts.

Low Pass Filter

The low pass filter, or LPF, is a built-in filter that may be enabled for any number of individual voices. It suppresses the volume of the higher frequencies in a sound.

For more information, see the Filtering reference.

Biquad Filter

The biquad filter is a built-in filter that may be enabled for any number of individual voices. It suppresses different frequencies based on how it is configured. It is possible to configure a biquad filter to act as a low pass, high pass, or band pass filter.

For more information, see Filtering reference.

Sample Rate Conversion

Sample rate conversion, or SRC, controls the rate at which a voice plays back its samples. This can be used to correct the case where a sample buffer's sample rate does not match the the AX mixing rate of 32KHz or 48KHz. It may also be used to perform pitch bending.


The mix of a voice controls its output volume for each channel of each bus of each device. After voice generation has completed, this controls the volume at which the voice is sent to the available output channels. For more information, see the Rendering Model's mixing section.

In other terms, a voice's mix controls how loudly it applies to each audio channel. AX provides only basic volume settings for each channel. For higher-level controls, including pan and surround pan, the SDK provides an optional abstraction layer called MIX. For more information, see the MIX Overview.

Remote Settings

AX handles Wii Remote devices slightly differently than TV and DRC devices. A few extra settings exist specifically for Wii Remotes.

For information about playing Wii Remote audio with AX, see the Programming Model reference.

Multi Voice

Using the Multi Voice model, up to six multiple voices can be managed together with calls similar to the above voice parameters.


AX provides a profiling API that gathers and reports information about the timing of each step of AX's processing. This is useful for measuring the computational load of each part of AX. For information, see the AX Profiling Overview.

See Also

Programming Model
Load Balance Overview
MIX Overview
AX Profiling Overview

Revision History

2013/10/30 Copy-edits.
2013/09/03 Update to reflect Sound2 changes.
2013/05/08 Automated cleanup pass.