Nintendo DS: Developing a callstack trace

In 2009, I was experimenting how I could retrieve a callstack trace or backtrace on the Nintendo DS, using the devkitPro toolchain. A callstack trace is a list of active subroutines of a computer program. This can be used to identify from which code-path a particular subroutine was called, which is enormously helpful when detecting logical errors at run-time.

The source code developed in this article can be downloaded here.

I tend to flood code with assertion checks. Assertion’s are used to identify logic errors during program development by implementing the expression argument to evaluate to “false” only when the program is operating incorrectly and then calling a function to report the error:

When any of the incoming arguments point to NULL, ASSERT will call a subroutine to report the error. Typically the report includes the filename, line number and functionname. This information is helpful, but lacks of information from where the function was called.

I use Vector2Add all over the place, how should I know what part in the program does not work, when I only know at some point a NULL argument is passed to Vector2Add?

Right, I need to know from where Vector2Add was called, I need a callstack trace!

My earlier instrumentation approach

Around 2005-2006 I implemented a callstack trace feature in HEL Library. HEL Library is a middleware solution to ease development of Game Boy Advance titles.

The callstack trace was displayed along with an assertion-report, that looked like:

Back then, I used function instrumentation (-finstrument-functions) to build a list of subroutine calls at run-time. It was basically a callback that was called for every function that is about to be entered (__cyg_profile_func_enter) and left (__cyg_profile_func_exit).

This approach came with a huge performance overhead and slowed down program-execution dramatically, but the callstack trace was so precious, that the performance penalty didn’t matter for the debug library.

Existing callstack libraries

When I started the Nintendo DS version, one goal was to use an approach which does not have any impact on run-time performance! I tried several libraries, unfortunalety I didn’t find anything that worked for me.

This includes libunwind, backtrace, several GCC builtin-keywords to get return addresses of different calldepths, just to name a few.

I gave all up. It was either a never ending story to integrate the library in to the project, or headers / functions were missing in libraries that come with devkitPro.

Hacking to success, then fail

Rather than giving up, I decided to accept the challenge and to come up with my own callstack trace code, which worked suprisingly well at the end.

The approach I picked is pretty simple. It’s interpreting a few instructions of interest to simulate the behaviour of the Program Counter (PC) and Stack Pointer (SP). The Program Counter indicates where the program is in its instruction sequence. Stack Pointer indicates the current top of stack memory. The stack is where local variables are located.

Armed with my own Program Counter and Stack Pointer variables, I can traverse program code and respond when Program Counter is assigned a new value, which basically means to jump to a different address and continue to operate there.


Let’s assume the Program Counter is located in SecondFunc, which was previously called by FirstFunc.

The thumb assembler version of this looks like:

I need to get my hands on the Link Register (LR) to return to SecondFunc’s caller (*1). Since LR was pushed on the stack, I know where to look! What is left is to traverse the program code in reversed order and interpret the “sub” and “push” instructions.

Moving the Program Counter in reversed order, which actually would be a backtrace, didn’t work out for some reasons and I wasn’t able to fix this issue. It did, however, worked in some cases, unfortunalety too unreliable.

When in doubt, try it out

I stayed away from this problem for week and then had the idea why not trying to simulate the Program Counter the same way it would when it continues program execution normally. It means rather then traversing program code backwards, I tried what happens, when I forward advance PC, but don’t call new subroutines, just opcodes to “go back in time”. Now I have to interpret “add” and “pop” instructions and this works suprisingly well!

From here on it took a few hours to come up with a version that is robust enough to work in all places that I tested with my code base, which consists of both, C and C++ source code.

Obviously, it does not mean it always works in every sitiuation with all source code in the world. For example, optimization level -O0 generates so many unconditional branches and places return code all over the place, that the Stacktrace function fails. It supports thumb code only and has problens with hand written/optimized assembler routines (when LR is not push’ed and PC not pop’ed).

I added several instructions Stacktrace handles, such as:

The unconditional branch instruction “b” could lead to an infinite loop in Stacktrace, such as “while(1) {}”. For this reason I added two special cases:

  • Limit how many instructions are allowed to touch
  • Branch only to the target address when it’s different from PC

When any of the conditions evaluate to true, Stacktrace aborts and outputs the callstack to this position only. It will, however, report an error occured.

Support for unconditional branch is important, because the compiler sometimes generates code like this:

When we return from “AnotherFunction” we must jump to “L90” to pop the return address from the stack and return to the caller. This is the reason why I added support for unconditional branches. I haven’t noticed other branches that are of importance for Stacktrace.

At this point, I was able to return to each function-caller and have the return addresses:


How to resolve a function name

In order to map a memory address to an actual function, we have to look up the address in a so called Map file. Map files are typically plain text files that indicate the relative offsets of functions for a given version of a compiled binary. This file is generated by the linker when you pass -Map to it, here is a part of it:

Line 1 specifies where the .text section (program code) starts and its size. Line 2 specifies the start address of the object file “main.o” followed by all functions it contains. While writing this article, I have learned static functions are not listed in the map file.

In order to find the function name of the memory address 0x02000458 (from screenshot above), we have to check in which range it is located. But it would be too simple when the .map file has everything sorted by addresses, so we have to do this instead.

Here is a sorted list by addresses:

The memory address 0x02000458 is between:

So 0x02000458 belongs to “SecondFunc”. While this approach seems to work, it is complicated and really only works when you have the .map file that was used to build the project. You can’t use a .map file of a different version of your code, this is very unlikely to work.

It would be so much better when the actual function name could be displayed automatically!

How to resolve a function name automatically

Fortunalety, the compiler has an option to insert function names in plain-text infront of each function. This can be activated by adding -mpoke-function-name to the CFLAGS variable in your makefile.

But it not only inserts the name, it also inserts a bit-pattern to recognize that there is a function name as well as a relative offset to the name from this pattern.

Stacktrace takes a memory address and then decreases this address until it detects the bit-pattern of a function name and we’re done.


Well, there is of course a special case again. If -mpoke-function-name has not been specified, the function name bit-pattern is missing, thus Stacktrace would try to find it until it reaches the start of the text section. While this makes little sense, I added a limit how many 32bit words the “GetFunctionnameTag” function is allowed to touch before it returns “did not find”.

Integration in your own project

The archive contains the entire source code, you only need two files:

  • stacktrace.h
  • stacktrace.c

Copy these files to your source code directory. Then open the makefile file which should be located in your project directory also. You have to adjust the following variables:

Now do a rebuild, rather than a regular build. You can do this by performing a “make clean” first, then a “make”. Don’t forget to include stacktrace.h in that file where you want to use Stacktrace.

Optimization level -O2 worked best with Stacktrace so far. When the compiler uses more aggressive optimization levels, it also starts to inline code heavily, which can be quite confusing when the Stacktrace output does not reflect what you see in your high level code.

How to use Stacktrace

The usage of Stacktrace function is fairly simple.

All it expects is an array of StacktraceEntry elements and the count of it. Here is the interface from stacktrace.h:

Use it like this:

Improving Stacktrace

If you want to improve Stacktrace, I recommend to read through the comments in stacktrace.c and enable STACKTRACE_VERBOSE, by setting it to 1.

This will, as far as “OutputDebugString” (also in stacktrace.c) uses a debug message type your Nintendo DS software emulator understands, output what instructions Stacktrace comes across and interprets.

I’ve added an opcode structure for all instructions interpreted by Stacktrace, as well as print-routines to dump the contents.

I used no$gba during development, which was of great help because of its “debugger“.


The presented approach to get a callstack trace…

  • works suprisingly well for being such a hack
  • does not need debugging information/symbols
  • it runs directly on the target device, not needed to be connected to GDB or a similar debugger
  • does not come with any run-time performance penalty
  • works best with optimization level -O2
  • is most informative when adding -mpoke-function-name to CFLAGS in your makefile
  • comes with full documented source code
  • uses very little Nintendo DS specifics, except the text section start and length (portable to other ARM devices?)
  • works with thumb code only
  • does not detect when switching from thumb to arm mode
  • depends on LR / PC being push’ed and pop’ed, hand-written assembler routine can cause trouble
  • is tested in/with my code base only


Q: Why do I see so weird function names, like _ZN13DEMOPARTQUEUE9CutToPartEj?

A: Because in C++, the function names go through name mangling. Name mangling is a technique used to solve various problems caused by the need to resolve unique names for programming entities.

What I didn’t tell you yet

*1) LR does not always contains the address from where it was being called, but it holds the address where to return when the function completes. When using code optimization levels above -O2 the compiler e.g. generates code that does not return to functions where it was the last instruction involved anyway.

Tweet about this on TwitterShare on RedditPin on PinterestShare on Facebook