M0AGX / LB9MG

Amateur radio and embedded systems

Debugging runtime memory corruption on Cortex-M

Runtime memory corruption is one of the worst class of bugs a C/C++ application can have. I do not mean design problems like abuse of global variables but seemingly correct code clobbering memory it should never touch (for example due to runaway pointers). Compared to "regular" crashes that are obvious and much simpler to fix (even if they are rare they leave a stacktrace), memory corruption is often silent. It can go unnoticed for a long period and manifest itself in subtle ways. For example: the application sometimes acts weirdly or a particular variable is sometimes wrong. Fortunately Cortex-M3 and M4 cores are equipped with special hardware that can assist in catching rogue memory accesses.

An obvious approach is of course to use the data watchpoint feature of any decent debugger to catch the code that does the improper write. It not always possible to keep the system running under debugger control on a desk for very long (if the corruption happens very sporadically) or the final device setup can not be replicated (for example: it depends on a part of the customer's plant... which may be hard to fit into the office). In such cases the firmware itself needs to be instrumented enough to assist in debugging.

Memory protection unit

The MPU is a peripheral that is specifically designed to control access to various areas of the address space. When a protected area is accessed (read and/or write depending on MPU configuration) the MPU triggers an interrupt that has to decide what to do with the memory protection fault. An MPU is however not a universal solution. First of all it is an optional peripheral so may not be present in the MCU you are using. The firmware must also be specifically architected to take advantage of the MPU from day one. MPUs usually work only with larger blocks of memory (eg. 256 bytes) so it is impractical to protect just a single variable.

If firmware was not designed with the MPU in mind, restructuring will change the placement of variables in memory so instead of a known corruption of some variable(s) other data will be clobbered by the same buggy code. This may also make the initial problem untraceable because clobbering of other variables may lead to rare or more subtle symptoms.

Data watchpoint and trace unit

Cortex-M3 and M4 cores have a DWT unit. It can be used to set up breakpoints and watchpoints. Watchpoint (also called a data or memory breakpoint) is triggered when a particular address is read or written by the CPU. The DWT has 3 address comparators that allow to set up to 3 watchpoints total. When a value is matched the DWT simply triggers a DebugMon interrupt. Due to its simplicity it is the ideal tool to catch memory corruption at a late stage of firmware development.

Setting up watchpoints at runtime

The following code allows to enable and disable a watchpoint for a uint32_t-type variable (ie. the address is matched across all 32-bits) so it is useful for variables that occupy at least 4 bytes and are 4-byte aligned. For example: uint32_t, array or struct of uint32_t. It is best to check the address of the variable (and adjacent variables) in the map file or debugger to be sure that the watchpoints will not be triggered when accessing adjacent variables.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
void watchpoint_enable(uint8_t watchpoint_index, uint32_t *word_address){
    CoreDebug->DEMCR = CoreDebug_DEMCR_TRCENA_Msk /*enable tracing*/ |
            CoreDebug_DEMCR_MON_EN_Msk /*enable debug interrupt*/;

    switch (watchpoint_index){
    case 0:
        DWT->COMP0 = (uint32_t)word_address;
        DWT->MASK0 = 0; //match all comparator bits, don't ignore any
        DWT->FUNCTION0 = (1 << 11)/*DATAVSIZE 1 - match whole word*/
                    | (1 << 1) | (1 << 2)/*generate a watchpoint event on write*/;
        break;
    case 1:
        DWT->COMP1 = (uint32_t)word_address;
        DWT->MASK1 = 0; //match all comparator bits, don't ignore any
        DWT->FUNCTION1 = (1 << 11)/*DATAVSIZE 1 - match whole word*/
                    | (1 << 1) | (1 << 2)/*generate a watchpoint event on write*/;
        break;
    case 2:
        DWT->COMP2 = (uint32_t)word_address;
        DWT->MASK2 = 0; //match all comparator bits, don't ignore any
        DWT->FUNCTION2 = (1 << 11)/*DATAVSIZE 1 - match whole word*/
                    | (1 << 1) | (1 << 2)/*generate a watchpoint event on write*/;
        break;
    }
}

void watchpoint_disable(uint8_t watchpoint_index){
    switch (watchpoint_index){
    case 0:
        DWT->FUNCTION0 = 0; //disable everything
        break;
    case 1:
        DWT->FUNCTION1 = 0; //disable everything
        break;
    case 2:
        DWT->FUNCTION2 = 0; //disable everything
        break;
    default:
        __BKPT(0); //there are only 3 hardware watchpoints!
    }
}

The handler

The handler is "just" a regular interrupt handler based on this very good hard fault handler. Whenever the watchpoint is hit all CPU state is available to be saved and analyzed later on. The code should be extended to save the debugging breadcrumbs, make the device safe and reboot.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
void DebugMon_HandlerC(unsigned long *args){
    /* Attribute unused is to silence compiler warnings,
     * the variables are placed here so they can be inspected
     * by the debugger.
     */
    volatile unsigned long __attribute__((unused)) stacked_r0  = ((unsigned long)args[0]);
    volatile unsigned long __attribute__((unused)) stacked_r1  = ((unsigned long)args[1]);
    volatile unsigned long __attribute__((unused)) stacked_r2  = ((unsigned long)args[2]);
    volatile unsigned long __attribute__((unused)) stacked_r3  = ((unsigned long)args[3]);
    volatile unsigned long __attribute__((unused)) stacked_r12 = ((unsigned long)args[4]);
    volatile unsigned long __attribute__((unused)) stacked_lr  = ((unsigned long)args[5]);
    volatile unsigned long __attribute__((unused)) stacked_pc  = ((unsigned long)args[6]);
    volatile unsigned long __attribute__((unused)) stacked_psr = ((unsigned long)args[7]);

    // Configurable Fault Status Register
    // Consists of MMSR, BFSR and UFSR
    volatile unsigned long __attribute__((unused)) _CFSR = (*((volatile unsigned long *)(0xE000ED28))) ;

    // Hard Fault Status Register
    volatile unsigned long __attribute__((unused)) _HFSR = (*((volatile unsigned long *)(0xE000ED2C))) ;

    // Debug Fault Status Register
    volatile unsigned long __attribute__((unused)) _DFSR = (*((volatile unsigned long *)(0xE000ED30))) ;

    // Auxiliary Fault Status Register
    volatile unsigned long __attribute__((unused)) _AFSR = (*((volatile unsigned long *)(0xE000ED3C))) ;

    // Read the Fault Address Registers. These may not contain valid values.
    // Check BFARVALID/MMARVALID to see if they are valid values
    // MemManage Fault Address Register
    volatile unsigned long __attribute__((unused)) _MMAR = (*((volatile unsigned long *)(0xE000ED34))) ;
    // Bus Fault Address Register
    volatile unsigned long __attribute__((unused)) _BFAR = (*((volatile unsigned long *)(0xE000ED38))) ;

    volatile uint8_t __attribute__((unused)) watchpoint_number = 0;
    if (DWT->FUNCTION0 & DWT_FUNCTION_MATCHED_Msk){
        watchpoint_number = 0;
    } else if (DWT->FUNCTION1 & DWT_FUNCTION_MATCHED_Msk){
        watchpoint_number = 1;
    } else if (DWT->FUNCTION2 & DWT_FUNCTION_MATCHED_Msk){
        watchpoint_number = 2;
    }

    __BKPT(0); //data watchpoint!
    //TODO: save data to debugging breadcrumbs and reboot
}

extern void DebugMon_Handler(void);
__attribute__((naked)) void DebugMon_Handler(void){
    __asm volatile (
            " movs r0,#4       \n"
            " movs r1, lr      \n"
            " tst r0, r1       \n"
            " beq _MSP2         \n"
            " mrs r0, psp      \n"
            " b _HALT2          \n"
            "_MSP2:               \n"
            " mrs r0, msp      \n"
            "_HALT2:              \n"
            " ldr r1,[r0,#20]  \n"
            " b DebugMon_HandlerC \n"
    );
}

How to use it?

The basic pattern is:

1
2
3
watchpoint_disable(0);
my_clobbered_variable = something * 5; //legal operation
watchpoint_enable(0, &my_clobbered_variable);

Simple enough - allow write access to the variable in the intended place(s). Now whenever the variable is accessed outside this section the DebugMon interrupt will be triggered. The section that allows modification of the variable should be as short as possible to eliminate the opportunity window when the variable can be written. If a section is too long an interrupt or another RTOS task can clobber the variable without triggering the handler.

There can be multiple such sections for a single variable but there are only 3 hardware watchpoints so you have to use different watchpoint numbers for different variables and of course only 3 variables can be protected this way.

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
typedef struct {
    char string[16];
    uint32_t my_variable; //this sometimes has an invalid value
} my_struct_t;

my_struct_t m;

void some_legal_operation(uint32_t x){
    watchpoint_disable(0);
    m.my_variable = x;
    watchpoint_enable(0, &m.my_variable);
}

void evil_copy(const char *s){
    //if a string is too long it will hit my_variable
    strcpy(m.string, s);
}

I used a struct to ensure the order of variables in memory. The legal operation unlocks the variable, does it job and locks it again. Another function copies a string but does not check the length. If a string is too long it will try to overrun the protected variable leading to triggering of the fault handler.

That was the easy part...

How to analyze the breadcrumbs?

The relevant variables in the fault handler are:

  • program counter (stacked_pc)
  • link register (stacked_lr)
  • watchpoint number

The watchpoint number is of course needed to know which variable is affected. The program counter tells which instruction tried to access the protected variable. What if it points to functions like memcpy, memset, strcpy, strncpy that are used all over the place? The link register tells where that function was called from.

The program counter (and link register) can be mapped back to source code via a disassembly file. Most debuggers (for example Ozone) can also show the disassembly view.

A more realistic example

Let's assume a simple data acquisition application that has the following features:

  1. ADC task that samples data
  2. Processing task that does some calculations on the ADC data
  3. UART task that allows to read the calculated values
  4. User interface task controlling a display and a keypad

The user reports that sometimes the displayed data is wrong and does not make a physical sense (like 500% humidity or temperature below absolute zero). You make a firmware release that has the particular variable protected with watchpoints. After 2 weeks of uptime it turns out that one of the variables in the processing code gets clobbered by UART code. For example: the UART stores received bytes not in its buffer but somewhere in the processing task's data.

If the UART code is obviously bad, then the fix is easy and you are "lucky". But what if the UART code is correct? For example if it stores received bytes via a pointer, this can only mean that UART variables are clobbered by something else. So you have to release yet another firmware that has UART variables protected this time.

In the end it may turn out that, for example, the UI code was bad - it clobbered UART driver variables when a particular sequence of menus was entered too fast, which in turn lead to good UART code destroying the measurements (and at the same time the UART was running correctly but with its buffers located in the wrong place).

In large and complex firmware projects such chain of events can have several links so the best approach is to move the watchpoint instrumentation from the obvious symptoms up to the root cause.