Avoiding Subtle RPL Issues

Avoid a few common mistakes when using the RPL system


The RPL loading system that is used by the Wii U is powerful and flexible. It allows dynamic code linkage, reduces compile times, and even reduces application boot-up times. However, these advantages do come at a cost in terms of added complexity and some non-intuitive side effects that can occur when implementing RPL code.

RPL/RPX Background

Before the introduction of RPL/RPX files, the Wii used REL and RSO files for dynamic code linking. REL files enabled quick access code because all code offsets were known at link time. Because of this knowledge, REL files could be used only with a specific version of the main calling code. To keep the offsets up-to-date, the developer had to relink the modules when code changed. RSO files were another method of dynamic loading which maintained their own symbol tables, allowing code and data lookup tables to be accessed at runtime without relinking. This enabled RSO files to update independently of the main file, making it much easier to dynamically change code or data. But, the symbol table had to be bundled with the code, making obfuscation nearly impossible. In addition, dead-stripping of symbols was difficult because the symbols that were needed at runtime were not necessarily known. Finally, the lookup itself was relatively slow since a search tree had to be built at runtime or a linear search was be required to find symbol information.

With the Wii U came the introduction of Rel Plus (RPL) files, which combine the speed of the original REL file with correct dynamic linking. RPL files are similar to Win32 DLL files in terms of load/unload notifications, consistent entry points, and code that properly initializes itself even if the user does not define an entry point. In addition, RPL files provide true dynamic linking that does not require the relink steps that REL files required. However, like Windows DLL files, developers need be aware of a few challenges when using RPL files.

Dynamic Memory Allocation with RPL Files

By default, calls to malloc in individual RPLs use separate, dynamically growing heaps for each RPL. This operating system-like behavior makes it accessible to hide memory access across different RPL boundaries and may make it easier to free up memory when the RPL completes its tasks. Be aware though that, if memory allocated from one RPL is accidentally freed in another, the memory is leaked since the free function cannot find the pointer in its local heap, or the heap may become corrupted if the pointer is partially found and memory illegally freed. Worse, this attempted free fails silently without any warning that the pointer was leaked or corrupted. Finally, if the allocation sizes are not large, this sort of leak is not caught for some time and may even disappear if that RPL is later freed, masking this serious and difficult to track problem. Ensure you are careful about tracking memory to prevent leaks, and never attempt to free memory outside of the RPL that owns it. Use RAII style coding when the class that allocates dynamic memory internally is ultimately responsible for freeing it without additional programmer intervention.

The easiest way to handle potential RPL-related malloc issues is to implement your own memory management system. For information on how to implement a memory management system, see the shared malloc demo that is located in the cafe_sdk\system\src\demo\examplemake\cafe_sdk\shared_malloc folder. This demo illustrates in detail how to properly manage memory with RPLs and is a must read for anyone implementing RPL/RPX files. The topic Memory Allocation and RPL Usage offers valuable information that should carefully reviewed.

There are additional issues to consider: When passing naked pointers between RPLs that the pointers can become dangling pointers. If you use a pointer to some resource from an external RPL (or any source outside of your current class) and that resource is freed, your pointer is no longer valid and attempting to access it causes memory corruption and/or segmentation faults. Instead, to ensure that the data remains valid, pass smart pointers (weak or shared) that reference count off the original resource or use handles that allow you to call into the external RPL.

Finally, consider that the same function may be defined multiple times in different RPLs. For example, if care is not taken, malloc and other memory allocation routines may appear to be separate RPL entries for each RPL. The default Wii U make files handle this problem by creating a special RPL just for malloc and its derivatives that is called cos_dev_malloc. When cos_dev_malloc is created, all other RPLs use it for memory calls, reducing the memory overhead for the function in other modules by ensuring that only one version of the code is loaded. It also creates a single entry and exit point for memory allocation routines, making it easier to debug any memory issues that may surface. For even more control over memory allocation, see the shared malloc demo that is located in the cafe_sdk\system\src\demo\examplemake\cafe_sdk\shared_malloc folder.

Versioning and RPL Files

A single class may exist in multiple RPLs. If one class does exist in multiple RPLs, one version of the class may edited, updated, and compiled in one RPL with new functionality while the older version of the class in other RPLs is ignored. Errors occur if one of the unmodified class versions is accessed for the purpose of using the new class functionality. If you are not careful with source build dependencies and version control, subtle and difficult to track bugs may be created.

Templates and RPL Files

Template code is difficult to use correctly in RPL files, and if possible it should be avoided. There are several reasons for this. When a template dynamically allocates memory, it is difficult to control which RPL heap is used, which brings the malloc issues discussed above into play. Template code may also hide the code version issue discussed above. Finally, Multi, the Green Hills Software tool, assumes that all parts of template code are visible in the currently linked dynamic object. Also assuming that the imported function will behave the same way that the current RPL file source version of that template behaves, Multi attempts to optimize. Multi does not handle the loader glue code correctly and may attempt to use registers that are destroyed by the call.

For example, the template function C<x>::A is exported from A.rpl. B.rpl has a source file named B.cpp, which uses instances of C<x>::A and C<x>::B. The object file for B.cpp contains the code for C<x>::A and C<x>::B. C<x>::B has code as follows:

    if (test) { C<x>::A (); } 

The compiler optimizer converts the if statement above from a normal 24-bit offset branch:

    bne L1 ; Jump past if compare of test is false
	
    bl 	C<x>::A()

into a short, 16-bit relative offset branch (because the compiler sees the implementation for C<x>::A close in memory:

    beql A.rpl:C<x>::A()

However, the RPL import tools do not allow 16-bit relative branches to be used as import symbols, and fail to build – or may even attempt to jump to an invalid memory address.

It is possible to create code patterns that might work, but it requires careful effort to ensure that all of the template class functions are exported, including compiler-generated copy constructors, etc., and that none of the template implementation is visible (aka NO INLINING). Even though it is possible, it is a poor choice to use template code with RPLs, and should be avoided whenever possible.

Performance and RPL Files

While not a large performance issue to users, it is better to have a few larger RPLs rather than many small RPLs. Each RPL that is loaded must be decompressed separately, while the startup and tear-down time for decompression may be amortized by combining multiple RPLs into a few larger RPLs. There is also a memory cost because each RPL creates its own heap on initialization, and some internal memory fragmentation per RPL loaded does occur.

Understand that RPLs are ideal for speeding up the initial boot of your application because only code and data necessary to start the application need to load. RPLs also help by reducing the amount of code and data laying around in memory when executing code blocks independent of other parts of an application, such as game data for a level not currently loaded or a pause menu not being displayed. Still, keeping your RPLs in larger logical units, for example, for an entire menu or a game level, strikes a good balance for overall memory efficiency.

Conclusion

RPL files provide developers with an accessible way to implement dynamically linked library type solutions as along as the issues relating to dynamic memory allocation, versioning, and template aversion discussed here are kept aware of. With care, RPLs can add a large amount of code efficiency and reusability.

See Also

RPX/RPL Cross-Referencing
RPX and RPL Overview

Revision History

2013/07/20 Initial version.


CONFIDENTIAL