portRESTORE_CONTEXT error in FreeRTOS_SWI_Handler function

Hi all,
I’m developing a project for a board based on STM32MP135 chip.
In the firmware I fire 28 tasks.
When I run the firmware everything is ok at the beginning, but when I start the communication
via USB port, after a few seconds, the flow hangs in the Default_handler()
function and in the stack I only see the FreeRTOS_SWI_Handler().
None other task is visible in the STM32CubeIDE list. (?)

The (part of) list of the registers is:
r0 0xc0314fec (Hex)
r1 0
r2 24
r3 0x80000197 (Hex)
r4 67372036
r5 84215045
r6 101058054
r7 3223946832
r8 134744072
r9 151587081
r10 269488144
r11 286331153
r12 303174162
sp 0xdffff800
lr 0xc00256dc <FreeRTOS_SWI_Handler+92>
pc 0xc00256dc <FreeRTOS_SWI_Handler+92>
cpsr 2147484055
..

If I point the debugger at the pc register I see that the fault instruction reside
on the portRESTORE_CONTEXT macro; near the ‘pop {r1}’ code .. see the attached image.

Which problem(s) should be there?
Any idea?
Thanks!

It may be a memory corruption. Can you try to comment out some of your task in order to narrow down the problematic task? You can also try to increase the stack size for each task just to rule out stack overflow.

Heello @aggarg ,

thanks for reply. I’ll try it soon.

I’ve modified the configCHECK_FOR_STACK_OVERFLOW value to 2: should it be able to detect stack overflowes?

It doesn’t detect all stack overflows.

As @RAc mentioned, it does not catch all stack overflows - the ones which lead to a fault before this check runs, cannot be caught. That is why it is important to narrow down the problematic task and focus on that.

Also, if the application stack contains an uninitialized array in the area where the top of stack signature is maintained, that area may be left untouched, even though the stack overflows. We discussed that several times on this forum.

1 Like

Ok, thanks for reply.

I cannot simply comment out a task cause it is a communication task; Without communication there is no crash.

I doubled the stack sizes of the tasks related with the communication but again the crash; sometime I have the problem on xTaskGenicNotifyFromISR() but rarely.

Just my curiosity: the problem is on

c00256d8: ldr r0, [pc, #612] @ 0xc0025944

or at the

c00256dc: pop {r1} @ (ldr r1, [sp], #4)

line?

That is, it is at the pointer at pc+612 or in the sp register?

Is it possible to have a clue on the last task from which we are switching off, task that probably broke the stack?

Thanks!

If you know the TCB location, you can examine the TCB memory in the debugger and check task name.

When you look at those values, do they look suspicious?

This is the snapshot of the crash and I have some doubts ..

The PC register points the ‘pop {r1}’ instruction, the R0 register should contains the ulPortTaskHasFPUContextConst address (previous instruction) but it still contains the pxCurrentTCBConst address (?) as if it had not been performed ..

The sp value is valid but it is the TOP of a stack, i guess: i found in the map file:

            0x00000400                        __FIQ_STACK_SIZE = 0x400
            0x00000400                        __IRQ_STACK_SIZE = 0x400
            0x00000400                        __ABT_STACK_SIZE = 0x400
            0x00000400                        __SVC_STACK_SIZE = 0x400
            0x00000400                        __UND_STACK_SIZE = 0x400
            0xe0000000                        FIQ_STACK = (__MEM_START__ + __MEM_SIZE__)
            0xdffffc00                        IRQ_STACK = (FIQ_STACK - __FIQ_STACK_SIZE)
            0xdffff800                        ABT_STACK = (IRQ_STACK - __IRQ_STACK_SIZE)
            0xdffff400                        SVC_STACK = (ABT_STACK - __ABT_STACK_SIZE)
            0xdffff000                        UND_STACK = (SVC_STACK - __SVC_STACK_SIZE)
            0xdfffec00                        SYS_STACK = (UND_STACK - __UND_STACK_SIZE)
            0xc0000000                        . = __MEM_START__

so the pop is not possible, is it? The address refers a ABT_STACK location … (?)

Again: the R1 is NULL, value contained in pxCurrentTCBConst location (line 98). The crash should be at line 99 where the [R1] is not possible ??


Checking the R1 value before SP assignment, I verified the R1 is null.

Now I need to understand why pxCurrentTCB value is NULL.

This should not happen but it may just be debugger showing incorrect information.

Would you please increase all of the following stack sizes to something like 0x2000:

__FIQ_STACK_SIZE = 0x400
__IRQ_STACK_SIZE = 0x400
__ABT_STACK_SIZE = 0x400
__SVC_STACK_SIZE = 0x400
__UND_STACK_SIZE = 0x400

Are you calling FreeRTOS APIs from an ISR which runs at a priority higher than configMAX_SYSCALL_INTERRUPT_PRIORITY?

you are running into a common fallacy, trying to understand the complete chain of events that leads to the symptomatic failure.

See, the root cause is very likely some code somewhere in the entire firmware overtrampling memory outside its destined allotments. The symptomatic failure happens many many many cycles later when other code that “owns” the overwritten memory stumbles across the corrupted values. Which code exactly is affected depends on the memory layout and frequently changes with different executions (that is, you get crashes with different symptmos every time it crashes). If you are lucky, the crashes manifest themselves identically in which case hardware breakpoints are a great help in oinpointing the root cause.

When developing for RTOSs, you need to adjust your way of thinking by not looking too close into the symptoms but knowing your usual suspects (there is generally just a handful). @aggarg is trying to point you to those.