Cortex M7 Hard Fault Handler - freeRTOS Aware

Hi All,

We all know how hard it often is to track down the cause of hard faults, particularly random or intermittent ones when the debugger isn’t connected and particularly when using freeRTOS in a complex system with many tasks, timers and queues. To assist I have written a hard fault handler that provides printed output of all the mcu and if used, floating point registers, special registers and the freeRTOS task that was running when the hard fault occurred. It also provides memory ranges to check for program counter, link register and stack addresses to check whether an address used is definitely invalid.

I have checked it on a range of fault types and it seems to work OK. I’m not an ARM assembler expert, in fact my assembler knowledge probably is the equivalent of Donald Trump’s knowledge of tariffs, so I would appreciate it if the assembler/mcu experts could critique it and suggest fixes and mod’s.

I have also written a small module that generates hard faults for testing purposes. To ease the use of this handler I have provided all the files required including my startup, interrupt handlers and linker script in the attached zip file.

The output of the handler from TeraTerm is:

==============================

***** HardFault Occurred *****
Link Reg Value (Lockup Addr): 0xFFFFFFED
Stack frame: PSP Process Stack Ptr (Thread mode)

Fault status registers:
HFSR (HardFault Status):          0x00000000
CFSR (Configurable Fault Status): 0x00000082
MMFAR (MemManage Fault Addr):     0x00000000
BFAR (BusFault Addr):             0x00000000
AFSR (Auxiliary Fault Status):    0x00000000

MemManage Fault:
  - MMFAR valid (0x00000000)
  - Data access violation

HardFault details:
R0 : 0x00000006
R1 : 0x00001F40
R2 : 0xDEADBEEF
R3 : 0x00000000
R12: 0xDD6CE856
Stacked PC:  0x900441A2
Stacked PSR: 0x41000000
Stacked LR:  0x900442DD

Special registers:
CONTROL: 0x00000000
PRIMASK: 0x00000000
BASEPRI: 0x00000000
FAULTMASK: 0x00000000

FPU register dump:
FPSCR: 0x00000000
S0-S1:  0x00000000 0x00000000
S2-S3:  0x00000000 0x00000000
S4-S5:  0x00000000 0x00000000
S6-S7:  0x00000000 0x00000000
S8-S9:  0x00000000 0x00000000
S10-S11: 0x00000000 0x3F800000
S12-S13: 0x4A989680 0x40000000
S14-S15: 0x4E64E1C0 0x1C9C3800

Double precision registers:
D0:  0.000000 (0x000000000000000lX)
D1:  0.000000 (0x000000000000000lX)
D2:  0.000000 (0x000000000000000lX)
D3:  0.000000 (0x000000000000000lX)
D4:  0.000000 (0x000000000000000lX)
D5:  0.007812 (0x000000000000000lX)
D6:  2.000001 (0x000000000000000lX)
D7:  0.000000 (0x000000000000000lX)
----------------------------

freeRTOS Task Status:
----------------------------
Task Name: monitorTask
Task State: 0
Task Priority: 24
Task Stack High Water Mark: 77
Task Handle Addr: 2405e5e0
==== Hard Fault Report End ====

Info to use the handler may be found in the comment block at the start of HardFault_HandlerFreeRTOS.c

I hope people find this useful.
HardFaultHandlerM7_V1.0.zip (17.1 KB)

2 Likes

Thanks for the work and sharing, Rob, very much appreciated!

As I am sure you know, however, the faulting information may or may not be helpful. In general, a hard fault is a symptom, not a cause, so the original problem that caused the memory corruption has typically taken place hundreds to thousands of cycles before the fault, frequently in a processor context completly unrelated to the currently executing task.

The other thing (as discussed before) is that printing output requires at least a minimally functioning system as well as a valid stack in which the output is to occur. That may not be a valid assumption in all faulting scenarios.

So do not put too high hopes in your tool. It is certainly preferrable to no diagnostics at all (and also not too prone to the Heisenberg effect as the damage is already done when it kicks in), but it will not revolutionize development for RTOS applications.

1 Like

Yes I agree, however my experience has been that using the freeRTOS aware functions in the CubeIDE, if I have found that the hard fault occurs with precise correlation to the running task, I have been well over half way to finding the problem. The limitation of the freeRTOS features of the IDE of course is that you must be debugging to use this information.

I also avoid using heap memory for anything other than a TouchGFX gui and I use the freeRTOS features of the CubeIDE to report stack usage and percentage run time for each task. Before CubeIDE had this feature tracking stack usage had to be done by rummaging through memory, looking at the magic number.

One of my aims in writing the handler was to provide the information in a way that makes sense of the data. So I report MSP and PSP as "“PSP Process Stack Ptr (Thread mode)” : “MSP Main Stack Ptr (Handler mode)”

Unless you do a lot of assembly language programming you simply forget what the acronyms mean. Also setting out the memory regions makes memory violations easier to quickly interpret. ARMs cryptic architecture and programming manuals fail to shed much light on what they mean in many cases. For instance if anyone can tell me what the Link Register LOCKUP value means and why it is called this I would be very great full.

My Hard Fault finding has been to (not necessarily in this order):

  • Work out whether it is a task problem
  • Do exclusions of code to see if removal gets rid of the hard fault,
  • Exclude a task from starting,
  • Check task and interrupt priorities to see if one task is having side effects on another,
  • Do timings with a scope using digital outputs,
  • Single stepping,
  • Look for silly things like not initializing a task, semaphore or whatever,
  • Forgetting to put some wait type function in a task loop, vTaskDelay, takeSemaphore, task notify etc,
  • Hitting an assert function’s while(1) loop, the CubeIDE debugger often disconnects on these for some reason. FreeRTOS asserts are a particular problem as they kill the system and tell you nothing about the problem. They invariably kill the debugger providing no way of knowing what went wrong. I wish there was an easy way of providing a custom assert that would report file and line number.

Since you provide the definition for configASSERT, you can easily do that. The main issue is that you need some way to output information when FreeRTOS isn’t running.

My FreeRTOSConfing.h defines confingAssert as (and I think this is the default for FreeRTOS):

#ifdef NDEBUG
#define configASSERT(x)
#else
extern void vAssertCalled(char const* file, unsigned long line);
#define configASSERT( x )                   \
    if( ( x ) == 0 )                        \
    {                                       \
        vAssertCalled(__FILE__, __LINE__);  \
    }
#endif

and vAssertCalled is defined as to disable the interrupts and then print the values of file and line after resetting the serial driver to work in “system crashed” mode, which means it does the raw code polling for the flag that the transmitter has room for data.

I also try to have a “debugging mode” command implemented somewhere that gets the system status with uxTaskGetSystemState and print it out, which is a good way to monitor stack usage.

1 Like

Thank you for sharing! I looked at the assembly pieces of your code and they look good.