We all know how hard it often is to track down the cause of hard faults, particularly random or intermittent ones when the debugger isn’t connected and particularly when using freeRTOS in a complex system with many tasks, timers and queues. To assist I have written a hard fault handler that provides printed output of all the mcu and if used, floating point registers, special registers and the freeRTOS task that was running when the hard fault occurred. It also provides memory ranges to check for program counter, link register and stack addresses to check whether an address used is definitely invalid.
I have checked it on a range of fault types and it seems to work OK. I’m not an ARM assembler expert, in fact my assembler knowledge probably is the equivalent of Donald Trump’s knowledge of tariffs, so I would appreciate it if the assembler/mcu experts could critique it and suggest fixes and mod’s.
I have also written a small module that generates hard faults for testing purposes. To ease the use of this handler I have provided all the files required including my startup, interrupt handlers and linker script in the attached zip file.
Thanks for the work and sharing, Rob, very much appreciated!
As I am sure you know, however, the faulting information may or may not be helpful. In general, a hard fault is a symptom, not a cause, so the original problem that caused the memory corruption has typically taken place hundreds to thousands of cycles before the fault, frequently in a processor context completly unrelated to the currently executing task.
The other thing (as discussed before) is that printing output requires at least a minimally functioning system as well as a valid stack in which the output is to occur. That may not be a valid assumption in all faulting scenarios.
So do not put too high hopes in your tool. It is certainly preferrable to no diagnostics at all (and also not too prone to the Heisenberg effect as the damage is already done when it kicks in), but it will not revolutionize development for RTOS applications.
Yes I agree, however my experience has been that using the freeRTOS aware functions in the CubeIDE, if I have found that the hard fault occurs with precise correlation to the running task, I have been well over half way to finding the problem. The limitation of the freeRTOS features of the IDE of course is that you must be debugging to use this information.
I also avoid using heap memory for anything other than a TouchGFX gui and I use the freeRTOS features of the CubeIDE to report stack usage and percentage run time for each task. Before CubeIDE had this feature tracking stack usage had to be done by rummaging through memory, looking at the magic number.
One of my aims in writing the handler was to provide the information in a way that makes sense of the data. So I report MSP and PSP as "“PSP Process Stack Ptr (Thread mode)” : “MSP Main Stack Ptr (Handler mode)”
Unless you do a lot of assembly language programming you simply forget what the acronyms mean. Also setting out the memory regions makes memory violations easier to quickly interpret. ARMs cryptic architecture and programming manuals fail to shed much light on what they mean in many cases. For instance if anyone can tell me what the Link Register LOCKUP value means and why it is called this I would be very great full.
My Hard Fault finding has been to (not necessarily in this order):
Work out whether it is a task problem
Do exclusions of code to see if removal gets rid of the hard fault,
Exclude a task from starting,
Check task and interrupt priorities to see if one task is having side effects on another,
Do timings with a scope using digital outputs,
Single stepping,
Look for silly things like not initializing a task, semaphore or whatever,
Forgetting to put some wait type function in a task loop, vTaskDelay, takeSemaphore, task notify etc,
Hitting an assert function’s while(1) loop, the CubeIDE debugger often disconnects on these for some reason. FreeRTOS asserts are a particular problem as they kill the system and tell you nothing about the problem. They invariably kill the debugger providing no way of knowing what went wrong. I wish there was an easy way of providing a custom assert that would report file and line number.
Since you provide the definition for configASSERT, you can easily do that. The main issue is that you need some way to output information when FreeRTOS isn’t running.
My FreeRTOSConfing.h defines confingAssert as (and I think this is the default for FreeRTOS):
and vAssertCalled is defined as to disable the interrupts and then print the values of file and line after resetting the serial driver to work in “system crashed” mode, which means it does the raw code polling for the flag that the transmitter has room for data.
I also try to have a “debugging mode” command implemented somewhere that gets the system status with uxTaskGetSystemState and print it out, which is a good way to monitor stack usage.