How data is shared among the cores can have a large affect on Cafe performance. This document describes the Cafe cache intervention mechanism and its performance implications.
Cache intervention increases the performance of Cafe multiple caches. When a cache miss occurs in the cache hierarchy for one core, cache intervention allows the cache hierarchy from another core to intervene and supply the missing cache line. If the data is supplied from the cache hierarchy from another core, the latency is much lower than when fetching the data from main memory.
Cache intervention can occur with accesses to shared data, for example, multiple cores accessing a global variable. Usually shared data can be classified as primarily read-shared or write-shared. Read-shared data is written by one core and read by the other cores. Write-shared data is updated by more than one core (for example, lock, counter). The default Cafe configuration is optimized for read-sharing.
If multiple cores read shared data, the data is copied from the cache of one core to another, as shown in the following figures. Because the data stays in the cache on each core, each core can read the shared data efficiently.
The figures show the state of cache and memory when the code on the right is executed. A is a global variable.
The cache line that contains A is copied from memory to cache, as shown in Figure 1.
Cache intervention supplies the cache line to Core 1, as shown in Figure 2.
Typical applications rarely modify shared data but often read it. In this case, the CPU is configured to improve performance.
The following figures describe data sharing that centers around Read. If Core 1 reads the shared data (orange) modified by Core 0, the data is stored (orange arrow) to the memory. The data is moved rather than copied. The cache line becomes clean (green). Then, if Core 0 reads the data, it is copied and available on both Core 0 and Core 1 caches.
The cache line of A is copied from memory to cache, and then modified by Core 0, as shown in Figure 4.
The dirty cache line is stored to memory and becomes clean. Cache intervention moves it to Core 1. Core 1 receives the A in exclusive state, as shown in Figure 5.
Cache intervention supplies the cache line to Core 0, as shown in Figure 6.
If multiple cores modify shared data frequently, the shared data moves from cache to cache repeatedly. Every time data moves, it is also stored to the memory. Because the data is modified by different cores, the stores to update memory are not beneficial and waste bandwidth. If shared data is modified frequently within your application, the current CPU configuration may not be appropriate. Contact your local Nintendo developer support group.
Another example of this behavior is assigning cores different vertices to transform than they had in the previous frame.
The cache line of A is copied from memory to cache, and then Core 0 modifies A, as shown in Figure 8.
The cache line is stored to memory, moved to Core 1 by cache intervention, and then modified by Core 1, as shown in Figure 9.
The cache line is stored to memory, moved to Core 0, and then modified, as shown in Figure 10.
The cache line is stored to memory, moved to Core 1, and then modified, as shown in Figure 11.
2013/05/08 Automated cleanup pass.
2013/03/20 Converted to HTML.