HardFault from printf - freertos 9.0.0 / stm32f0 / gcc-arm-eabi 7.2.0

zzattack wrote on Tuesday, November 20, 2018:

On my STM32F091 project I’m encountering a HardFault. Frequency of occurence is about weekly. I’m looking for some help debugging this issue as I haven’t been able to reveal the source yet.

The program jumps to the hardfault handler during a printf call. This is evident from the serial port debug output. When the hardfault occurs, the code line
sync_printf("State changed from %d -> %d\n", cur_state, new_state);
is interrupted halfway and prints
State changed from HardFault_HandlerC():

For context, all my printf calls are surrounded with a mutex block so tasks’ debug output remains coherent:

 int sync_printf(const char* format, ...) {
     va_list args;
     va_start(args, format);
     int v = 0;
     if (osMutexWait(pf_lock, osWaitForever) == osOK) {
         v += print(0, format, args);
         osMutexRelease(pf_lock);
     }
     va_end(args);
     return v;
 }

I don’t believe this to cause issues but figure it needs to be mentioned.

Bunch of things I tried in order to reveal the root cause:

  • used a different, malloc-free, printf version (this one is printf-stdarg), but hardfault happens in exactly the same place with newlib-nano printf
  • configCHECK_FOR_STACK_OVERFLOW=2, configUSE_MALLOC_FAILED_HOOK=1, configASSERT is defined
  • the vApplicationStackOverflowHook and vApplicationMallocFailedHook hooks are defined but not called
  • expanded my tasks’ stack sizes significantly to rule out stack overflowing issue
  • using heap scheme 4 so no malloc/free

Here’s some info from the stack when the hardfault occurs. I’m unsure how to interpret this; my knowledge on the architecture is minimal.

     stacked_r0: 0x00000020
     stacked_r1: 0x20006fe3
     stacked_r2: 0x20004405
     stacked_r3: 0x20004401
     stacked_r12: 0x36c398e5
     stacked_lr: 0x08005fa1
     stacked_pc: 0x08011a56
     stacked_psr: 0x01000000
----------------------------
         CFSR: 0x00000000
         HFSR: 0x00000000
         DFSR: 0x00000000
         AFSR: 0x00000000
         MMAR: 0x00000000
         BFAR: 0x00000000

The stacked_pc @ 0x08011a56 points inside the ‘print’ function in the disassembly:

             if (*format == 'd') {
 0x08011a48 BB 68                ldr r3, [r7, #8]
 0x08011a4a 1B 78                ldrb r3, [r3, #0]
 0x08011a4c 64 2B                cmp r3, #100    ; 0x64
 0x08011a4e 0F D1                bne.n 0x8011a70 <print+312>
                 pc += printi(out, va_arg(args, int), 10, 1, width, pad, 'a');
 0x08011a50 7B 68                ldr r3, [r7, #4]
 0x08011a52 1A 1D                adds r2, r3, #4
 0x08011a54 7A 60                str r2, [r7, #4]
 0x08011a56 19 68                ldr r1, [r3, #0]
 0x08011a58 F8 68                ldr r0, [r7, #12]
 0x08011a5a 61 23                movs r3, #97    ; 0x61
 0x08011a5c 02 93                str r3, [sp, #8]
 0x08011a5e 01 96                str r6, [sp, #4]
 0x08011a60 00 95                str r5, [sp, #0]
 0x08011a62 01 23                movs r3, #1
 0x08011a64 0A 22                movs r2, #10
 0x08011a66 FF F7 EB FE          bl 0x8011840 <printi>
 0x08011a6a 03 00                movs r3, r0
 0x08011a6c E4 18                adds r4, r4, r3
                 continue;
 0x08011a6e 5F E0                b.n 0x8011b30 <print+504>
             }

The printing functions are taken from https://github.com/atgreen/FreeRTOS/blob/master/Demo/CORTEX_STM32F103_Primer_GCC/printf-stdarg.c

Any guidance on finding the root cause or obtaining more information in order to be able to do so is highly appreciated.

Frank

rtel wrote on Tuesday, November 20, 2018:

The instruction at the stored program counter, 0x08011a56, is (if that
is the offending instruction - it might be the instruction prior to that
that generates the fault):

ldr r1, [r3, #0]

This instruction is loading a 32-bit value from the address stored in
r3. r3 in turn has an address loaded from a 4-byte offset from the
start of the stack frame (instruction at 0x08011a50). That address is
not aligned - is that expected? Not sure if that would cause the fault
(can’t remember but the ARM V7-M manual will tell you) but it looks odd
to load a 32-bit value from an odd (as opposed to even) address.

zzattack wrote on Tuesday, November 20, 2018:

LDR/STR instructions supports unaligned accesses (but can generate a bus fault if CCR.UNALIGN_TRP is set, which in my case it isn’t).
In this case, the value that is to be printed is an enum value (C++ enum class, no forced type) with global scope. So the value being taken from the variadic arguments list is this one, unless that value was copied onto the stack during the printf call. Either way, location of that is dword aligned so I’m not sure if this could be the issue here.