HardFault - Can't figure out why

I am getting hard fault in my code, running on STM32G0 . does not look like Overflow, all stacks are good, ucHeap is large enough .
Sometime, when is happens, I get the following screen , pTCB is all corrupted .
How can I further understand what call casued the hard fault ?

You can use the following instructions to find the faulting instruction which can provide some hint - Debugging and diagnosing hard faults on ARM Cortex-M CPUs

Since you know that pxCurrentTCB is getting corrupted, you can define a variable right next to it and put a data breakpoint on it. Assuming that the memory corruption is more than just the pxCurrentTCB, it will allow you to catch the corruption right when it happens.

I’ve noticed that when setting optimization for GDB (-Og ) , hard fault is “gone” .
What can explain this ?

Very likely code problems like undefined behaviour, missing volatile qualifier where required often revealed by the optimizer.
I guess -Og makes the program running and problems occur with -O2 or -Os.
You can find many, many of such kind of issues in the web :wink:
Also different optimizer / code generation flags produce different code which usually differs with regard of stack usage, of course.
Did you define configASSERT and enabled stack checking (as strongly recommended) ?

configASSERT and enabled stack checking are enabled.
The hard_fault usually happens when trying to access an address within the ucHeap address range.
I’ve tried analyzing save stack registers after hard_fault , but this leads me to nothing .
Only thing that works is running in release or -Og .
meaning - program crashes only when running with no optimization.

Any Ideas ?

Hello Ran,

Did you try the method that Guarav mentioned above? Can you set a data breakpoint?

- Aniruddha

It is difficult without looking at the code. Would it be possible for you to create a minimal sample demonstrating the problem?

you mean running in DEBUG or -Og?

As a side note: -Og is NOT the same as -O0, it does not disable all optimizations (cf gcc docs).

It is also possible that the problem is still there without optimization, but the timing cnditions change, making the problem show only much less frequently.

Which memory manager are you using? Are you ensuring memory serialization corretly (either via prvPortMailoc() etc using muteces or implementing malloc_lock())?

The problem is that when I’m reducing code segments, problem also disappear .
Already tried “comment out and test” path, didn’t help .
I am going through the code with a fine comb now

I am running in -Og and the problem goes a way .
I am using prvPortMalloc from FreeRTOS

It does not look like a data \ unaligned access, I’ve inspected the CPU registers when the fault happens, the LR points to different function every crash

but which heap manager?

Again, you do not know whether the problem goes away or just shows much less frequently unless you know what it is.

Heap 4.
I agree , can’t really tell if it’s goes away or timing is different.