GX2 Texture Swizzle APIs

For best performance, it is important to swizzle surface data when reading multiple textures or writing multiple render targets.

MEM2 is organized into 2 memory channels and 8 memory banks per channel. Each bank is further subdivided into 32K pages, each of which is 4KB in size.

MEM2ChanBank.jpg

If consecutive reads occur to two different addresses that map to the same memory bank but different pages, then a memory bank conflict will occur. This conflict will result in a stall while the memory bank changes the page it can access. If the two different addresses map to different memory banks, then the stall is avoided and greater memory throughput is achieved.

There are certain situations where repeated bank conflicts can reduce effective memory throughput by up to 50%. For example, when using multiple textures of the same size that are accessed in the same pattern, bank conflicts can occur very frequently. A common example of this is when compositing the final image in deferred rendering from multiple render targets.

To avoid bank conflicts in these situations, the bank and pipe/channel swizzle bits can be used to spread the memory accesses across the memory banks. These swizzle bits are set per surface that is read by the texture unit or written by the color buffer (CB) block.

SurfaceAddressBits.jpg

When programming surface addresses into the hardware, the lower 8 bits of the address are always ignored. The swizzle bits are the next 3 bits. Bit 8 is the pipe/channel swizzle bit. Bits 9 and 10 the bank swizzle bits.

The swizzle bits configure how the data in memory is read by the GPU. In order to fetch the correct image from a texture using a particular swizzle value, the texture data needs to be loaded/created with that specific swizzle value.

Swizzle Format

Surfaces are broken up into macro tiles, which are composed of tiles. A macro tile is a set of tiles for each combination of pipes and banks. We are using a macro-tile that is 4x2 tiles (4 banks and 2 pipes) or 32x16 pixels. Tiles are 8x8 pixels, regardless of the texel size. Swizzling impacts how the tiles are arranged in the macro tiles (and doesn't affect how the pixels are arranged in the tiles) For example, a 32-bit texture macro tile is 2048 bytes (32x16x4), and the swizzle moves the 256 byte (8x8x4) tiles to different banks.

The surface alignment (for 2D tiling) is based on the macro tile size, the 2D tiling mode, the size of each texel (BPP), and the number of samples (MSAA) with 2K being the minimum alignment. See GX2 Surface Alignments Page for a detailed list of required alignments.

Surfaces that are using 1D (GX2_TILE_MODE_1D_*) or LINEAR (GX2_TILE_MODE_LINEAR_*) tiling are not compatible with bank and pipe swizzling. The function GX2GetSurfaceSwizzleOffset will return 0, indicating the swizzle will not be used, if the surface's tiling mode is incompatible with swizzling, even if the GX2Surface structure's swizzle value is non-zero. To get the actual GX2Surface structure swizzle, regardless of whether the tiling mode is compatible with swizzling or not, use function GX2GetSurfaceSwizzle.

Swizzling Texture Data

The easiest way to swizzle texture data is to use the -swizzle command line argument in the texture converter (GX2 Tool: Texture Converter 2). The swizzle value specified should be between 0 and 7. This will properly swizzle the texture data and set the correct value in the GX2Surface structure's swizzle value.

It is also possible to swizzle data at runtime using GX2CopySurface and setting the proper swizzle settings with GX2SetSurfaceSwizzle. See the deferredRendering demo for an example.

When using multiple render targets, please use GX2SetSurfaceSwizzle. The GPU7 will then render to the surface using the proper swizzle.

Picking Swizzle Values

Most of the performance advantage can be achieved by simply incrementing the swizzle value for each texture that is being used. Any variation is far better than none.

If you don't mind seeing corrupted pixels, you can experiment with different swizzle settings at runtime to empirically find the most optimal values.

Texture/Render Buffer Swizzle with MEM1

MEM1 is structured differently than MEM2. It has 8 channels instead of 2. Each channel has only 1 bank, but there are no page conflict issues.

The bandwidth from each bank is individually smaller than the bandwidth from a MEM2 bank, but since there are 8 channels, the overall bandwidth is much larger. Since there are no page conflicts, swizzle doesn't help MEM1 as much as MEM2. But, it helps in the case where reads are not well scattered across the MEM1 channels.

GPU easily saturates MEM2 with requests, so the swizzle can help use MEM2 better and be more efficient. MEM1 has much more bandwidth and if the GPU is doing a complicated shader, developer will probably be GPU bound and not hit the MEM1 bandwidth limits.

Advanced Swizzling

MEM2 has 8 memory banks, but the current HW setting (GB_TILING_CONFIG) is programmed for 4 memory banks to improve MEM1 performance. Because of this, there are only 2 bank swizzle bits available instead of 3. This is illustrated by the dotted line in the image above.

To change the 3rd memory bank bit, bit 11, for surface with a 2k minimum alignment, a different base address can be used that is offset by 2k (which toggles bit 11).


CONFIDENTIAL