Cache Performance Implications of Shared Data

How data is shared among the cores can have a large affect on Cafe performance. This document describes the Cafe cache intervention mechanism and its performance implications.

Cache intervention increases the performance of Cafe multiple caches. When a cache miss occurs in the cache hierarchy for one core, cache intervention allows the cache hierarchy from another core to intervene and supply the missing cache line. If the data is supplied from the cache hierarchy from another core, the latency is much lower than when fetching the data from main memory.

Cache intervention can occur with accesses to shared data, for example, multiple cores accessing a global variable. Usually shared data can be classified as primarily read-shared or write-shared. Read-shared data is written by one core and read by the other cores. Write-shared data is updated by more than one core (for example, lock, counter). The default Cafe configuration is optimized for read-sharing.

Shared Data Read and Data Transfer Between Caches

If multiple cores read shared data, the data is copied from the cache of one core to another, as shown in the following figures. Because the data stays in the cache on each core, each core can read the shared data efficiently.

The figures show the state of cache and memory when the code on the right is executed. A is a global variable.


Figure 1: A cache miss occurs when Core 0 reads A.

The cache line that contains A is copied from memory to cache, as shown in Figure 1.


Figure 2: A cache miss occurs when Core 1 reads A.

Cache intervention supplies the cache line to Core 1, as shown in Figure 2.


Figure 3: A cache hit occurs.

Read Shared Data

Typical applications rarely modify shared data but often read it. In this case, the CPU is configured to improve performance.

The following figures describe data sharing that centers around Read. If Core 1 reads the shared data (orange) modified by Core 0, the data is stored (orange arrow) to the memory. The data is moved rather than copied. The cache line becomes clean (green). Then, if Core 0 reads the data, it is copied and available on both Core 0 and Core 1 caches.


Figure 4: A cache miss occurs when Core 0 reads A.

The cache line of A is copied from memory to cache, and then modified by Core 0, as shown in Figure 4.


Figure 5: A cache miss occurs when Core 1 reads A.

The dirty cache line is stored to memory and becomes clean. Cache intervention moves it to Core 1. Core 1 receives the A in exclusive state, as shown in Figure 5.


Figure 6: A cache miss occurs.

Cache intervention supplies the cache line to Core 0, as shown in Figure 6.


Figure 7: A cache hit occurs.

Write Shared Data

If multiple cores modify shared data frequently, the shared data moves from cache to cache repeatedly. Every time data moves, it is also stored to the memory. Because the data is modified by different cores, the stores to update memory are not beneficial and waste bandwidth. If shared data is modified frequently within your application, the current CPU configuration may not be appropriate. Contact your local Nintendo developer support group.

Another example of this behavior is assigning cores different vertices to transform than they had in the previous frame.


Figure 8: A cache miss occurs.

The cache line of A is copied from memory to cache, and then Core 0 modifies A, as shown in Figure 8.


Figure 9: A cache miss occurs.

The cache line is stored to memory, moved to Core 1 by cache intervention, and then modified by Core 1, as shown in Figure 9.


Figure 10: A cache miss occurs.

The cache line is stored to memory, moved to Core 0, and then modified, as shown in Figure 10.


Figure 11: A cache miss occurs.

The cache line is stored to memory, moved to Core 1, and then modified, as shown in Figure 11.



Revision History

2013/05/08 Automated cleanup pass.
2013/03/20 Converted to HTML.


CONFIDENTIAL