
By Darcy Wronkiewicz

ebugging the software in a complex real-time system is particularly challenging because the manifestation of a bug is frequently far removed from the actual occurrence of the bugin both time and code. Furthermore, bugs are often the result of timing problems that are difficult to reproduce or diagnose, particularly when they involve device drivers and interrupt service routines that are responding to external stimuli.
The challenges are particularly acute with a virtual memory operating system like Linux because a memory corruption may exhibit itself in a process that has a distinct virtual address space from the kernel in which the device driver was running. Consequently, a big-picture over-view of the whole system is required to find the true source of a bug. Moreover, reliably diagnosing timing bugs requires that the system be debugged nonintrusively. Finally, the developer must have a virtual memory- and task-aware view. Achieving this level of real-time debugging requires the use of integrated, operating system-aware trace mechanisms, event-recording tools and debuggers.
Debugging solutions for desktop Linux application code are mature and widely used. Nonetheless, there are cases in which desktop debugging techniques are not sufficient to efficiently identify bugs and performance problems in embedded systems. In such instances, the developer may have to explore alternatives to handle the unique challenges of embedded-system debugging.
Basic Linux debug
Once a bug shows up on your radar, the first step in fixing it is to identify where in the software it is manifest. In an embedded Linux system, that means narrowing the search to the application and further to the process, thread or device driver exhibiting the malfunction. Once that is determined, you can begin to search for the bugs root cause.
One particular limitation of commonly used debugging solutions is a lack of thread awareness. Most debuggers provide a window that simply displays the output of the Linux top utility. This shows which processes are running but does not display any information about threads.
With a thread-aware debugger you can accomplish the following basic tasks:
See a list of threads associated with each process on your target;
View the call stack when the thread is halted;
View variables and registers;
Set breakpoints on virtual addresses within a thread without threatening other threads in your system; and
Halt individual threads without affecting other threads.
If your program uses the fork() system call to create a new, child process, you must have a debugger that will give you a separate debugger window for both the parent and the child processes once the call is made. In the absence of fork() support, you will be able to continue to debug the original parent process, but you will not have the same access to the child. With fork() support, you can debug the child process without making changes to your code. Debuggers that also support exec() further allow you to debug programs spawned by your application code.
Run-time error checking is helpful for debugging a thread or process that is revealing a system failure. Run-time error checking instruments code so that the debugger will halt the application whenever one of several error conditions is detected. These conditions can include a switch statement fall-through or an out-of-bounds array access. Specialized implementations for detecting memory errors include additional checks for memory leaks, NULL pointer dereferences and writes to unallocated memory.
In some cases, run-time and memory error checking are sufficient to identify the source of a bug. In other instances, identifying the point at which an error is manifest is just the first step in finding the real root of the problem.
Virtual memory debug
Linux uses a virtual memory model. Applications reference memory through virtual addresses that are translated by a processors memory management unit into physical memory locations. Even in the Linux kernel, the MMU is enabled and physical memory accessed through virtual addressing.
Each Linux process runs with its virtual addresses mapped into unique physical addresses. Consequently, one process cannot easily corrupt physical memory that is not assigned to it or explicitly shared. Threads within the same process, however, are not protected from one another, and application code is not protected from erroneous kernel code or device drivers running in kernel mode. Without a virtual-memory-aware debugger, memory corruptions can be incredibly difficult to debug because the resulting failure is often far removed from the offending code. The bug could be in a different thread or process, or in the kernel or device driver. The virtual address used by the erroneous code could be different from the one used by the thread in which the bug was manifest.
With a virtual-memory-aware debugger, a hardware breakpoint can be set on the corrupted virtual address. That breakpoint will then cause any process that writes to the physical location referenced by that virtual address to halt. Using this technique, you can identify the code causing the problem and more easily debug it, even if it is in a device driver, the kernel, a different thread or a different process.
Debugging the kernel
If code in your kernel is the culprit, the ability to trace the kernel is indispensable. Debugging hardware collects the addresses of instructions executed and the data associated with those instructions (provided that your target CPU has on-chip trace capabilities). Debuggers with advanced trace analysis tools can reconstruct this data to provide a high-level view of the system. Using trace data, a view of the system can be obtained that clearly illustrates:
Which functions were called, and by whom;
The call stack at each point during program execution;
Which interrupt handlers were called;
How often interrupt handlers were called; and
Timing anomalies through statistical analysis.
These tools will help you get a high-level understanding of what was happening when your system failed. Of course, since you have a history of every executed instruction, you can also walk through and explore the kernel execution until you pinpoint the problem.
Profiling application code
The same tracing technology that can be applied to kernel debugging is also useful for tracing application code. In the absence of a trace collection device, however, tools that instrument application code provide a workable alternative. Such tools report the number of times each line of code was called; which code was and was not executed; and what the actual call graph of the application was, all without trace data. Additionally, the Linux kernel provides a statistical profiling capability that takes samples of the program counter on a regular basis. This can be used to analyze the run of a noninstrumented program after the application exits, giving you a general idea of the programs flow of execution and some statistical profiling information.
Device drivers are sometimes implemented as dynamically loaded kernel modules, which are especially difficult to debug, owing to their nonstatic residence in memory. Problems in the setup functions of dynamically loaded kernel modules are common, yet few debuggers can debug them. Debugging the initialization of these modules is a key feature.
Shared objects are modules that can be loaded into memory and used by multiple programs at the same time of allowing several programs to use the same code to perform common tasks. The object is only loaded once, thus minimizing memory usage. These features make using shared objects in an embedded system advantageous, but if youre using shared objects, be sure you can debug them, since most debuggers will fail once youve stepped into shared object code.