Hardfault, Busfault, Usagefault on ARm Cortex M7

autoUser · October 25, 2024, 2:42pm

Hello FreeRTOS team,

I have big problems with FreeRTOS on a microcontroller with 1x Cortex M0+ and 2x Cortex M7 cores. FreeRTOS runs on the M0+ and on only ONE M7 core. The second M7 core is disabled.

I get sporadic HardFaults, BusFaults and UsageFaults. Unfortunately I don’t see a stack trace in my debugger and can’t see which instruction was executed last.

What I have already done:

Compared configuration of the priorities in FreeRTOS with the priorities of the interrupts
Increased the stack for the tasks (in case the StackOverflowHook cannot be called); by the way: configCHECK_FOR_STACK_OVERFLOW = 2
Stepped through the asm commands
- sometimes, but not always, there is a fault in xPortPendSVHandler with the instruction stmdb r0!, {r4-r11, r14} (why could this fault occur?)
- in xTaskRemoveFromEventList the pointers pxUnblockedTCB->xEventListItem->pxContainer contains an invalid memory address (0x502F3A48) which causes a BusFault with PRECISERR bit set
Read documentations and other forum topics
- Running the RTOS on a ARM Cortex-M Core - FreeRTOS™
- FreeRTOS stack usage and stack overflow checking - FreeRTOS™
- some forum topics (I’m new user and cannot insert links in the posts)

Another notes:

When I get an UsageFault, then I can see the UNALIGNED bit is set but I don’t know how to detect the certain place in code for that fault
When I get a BusFault, then I can see the IMPRECISERR is set (and sometimes (as mentioned above) the PRECISERR is set too)

I would like to ask what approaches there are to debug such errors. Could you please give me some hints?

Thanks!

RAc · October 25, 2024, 2:46pm

Are you ports taken unmodified from the contrib repo on GIT, or are they vendor provided?

Did you ensure that the service ISR has lowest priority?

Does a blinky app run without faults, ie can you pinpoint the problems to your application?

autoUser · October 28, 2024, 7:36am

Hi @RAc,

Thanks for your response.

The port is provided by the vendor of the MCU and I didn’t change anything.
I checked the priorities again. The Kernel Interrupt has the lowest priority. I also checked out that the interrupts, which call interrupt safe API functions, have a logical priority equal or less than configMAX_API_CALL_INTERRUPT_PRIORITY.
A blinky runs without any issues. But my current application runs sometimes fine, too. I detected that especially after some small changes at any place in my code, then the application either runs more stable (crashes not so often) or it crashes much more often. I think the small change causes some relevant differences in the binary code so the behaviour is very different.

I would like to know which debugging techniques there are to be able to debug the issue more efficient.

Thanks!

RAc · October 28, 2024, 8:11am

debugging in concurrent systems is almost by definition rather difficult because the cause of an issue is frequently completly unrelated to the symptom, and the relationship between the two can change from one fault to the next.

From experience, I would suggest to strip down your application layer by layer until the problem disappears, then you at least know which piece of your code causes your problems (though not necessarily why).

If you are willing to share your code, we can collectively have a look at it.

Also, a stack trace at fault time may help.

Some possible other issues to focus on are:

Memory overtrampling (statically or dynamically allocated memory) and/or abuse of dynamic allocation rules (eg accessing memory after it has been freed)
underdimensioned interrupt stack (you can rather easily verify or outrule this by filling a signature into your .stack section and looking at the memory at fault time, looking for the signature)
also be aware that the application stack overflow check is not 100% reliable, so you may want to record all of your application stacks at creation time and inspect the stacks manually at fault time)
misuse of interrupt OS API usage (ie use OS services for isrs > MAX_SYSCALL or fail to properly call xxxFromISR variants in ISRs)
insufficient protection of resources from being serialized properly in concurrent environments

All of the above issues can of course also occur in third party/vendor provided modules or libraries, so you should also reach out to the vendor of your BSP.

aggarg · October 28, 2024, 11:00am

In addition to what @RAc mentioned, you can also try to find the faulting instruction which may be helpful. This is a good document from ARM about how to do that - https://www.keil.com/appnotes/files/apnt209.pdf.

autoUser · October 28, 2024, 1:17pm

Hi @RAc,

I think I found the reason right now.
In xPortStartScheduler the priorities of PendSV and SysTick interrupt were set to the lowest priority:

/* Make PendSV and SysTick the lowest priority interrupts. */
portNVIC_SHPR3_REG |= portNVIC_PENDSV_PRI;
portNVIC_SHPR3_REG |= portNVIC_SYSTICK_PRI;

In my application, even before the FreeRTOS kernel starts by calling vTaskStartScheduler(), I initialize the timer of the MCU and enable the SysTick interrupt with default priority (highest priority).
In this article I read that the SVC handler would cause a hard fault when the SysTick happens earlier. In my application the SysTick interrupt is enabled with highest priority before FreeRTOS kernel is running. When I set the of SysTick interrupt to lowest priority before starting FreeRTOS kernel, then the application seems to run without any issues.

Can you please confirm my thoughts and what I have read in the linked article?

RAc · October 28, 2024, 1:24pm

from my understanding, you should not enable the sys tick ISR before the scheduler is started at all, it assumes that all data structures needed for scheduling are set up correctly. I would expect random crashes and unpredictable behavior due to race conditions even with the “correct” priority assigned.

Why is your sys tick interrupt configured and started before the OS starts? That should be a no go, the service and sys tick interrupts must be completly under FreeRTOS control.

autoUser · October 28, 2024, 1:32pm

In my application I use some initializations before FreeRTOS is running. For these initialization routines I need a timer. Therefore the SysTick interrupt needs to be enabled before FreeRTOS runs. Once the initialization is completed, I create my two tasks and start the scheduler.

If I understand you correctly, EVERY application related code must run in FreeRTOS tasks, right?

RAc · October 28, 2024, 1:38pm

If you need a timer, do NOT use the sys tick timer, the Cortex ARM M MCUs provide programmable timers galore, create a distinct timer for your initalization. You can NOT use the sys tick timer for purposes other than defined by FreeRTOS.

Re your second question: Well yes, the OS can not give time slices to anything else but tasks, so even if you have code that does not use any OS services, there wouldn’t be a way for it to share the CPU if it is not embedded in a task context.

autoUser · October 28, 2024, 2:28pm

Thank you so much for your support and your hints. I will consider this points in my application and my future projects with FreeRTOS. Now my application runs stable without any issues.

richard-damon · October 28, 2024, 2:37pm

Another thing to think about is why did that initialization need to happen before bringing FreeRTOS up. Things needing timers are the sort of thing that work well under an RTOS. If some tasks need that initialization done before they start, have them wait for an event that the initialization is done.

aggarg · October 29, 2024, 5:41am

As @RAc already suggested, you may be masking the real problem here. Consider using a different timer for doing your initialization and run it at a priority higher than configMAX_SYSCAL_INTERRUPT_PRIORITY so that it is not affected by FreeRTOS. Note that FreeRTOS masks interrupts when you call any API to create a FreeRTOS object (such as task, queue, mutex etc.) and these are unmasked when the scheduler is started.