Hard fault in xPortPendSVHandler at startup, STM32L476, STMCube

Previously had the same tasks working OK. When I went to add some functionality, and now I cannot even get the tasks running, osKernelStart() does not complete.

Trying to use the hardfault handler here, it does not complete either.

The instruction that causes the fault is in xPortPendSVHandler :
ldmia.w r0!, {r4, r5, r6, r7, r8, r9, r10, r11, lr}

It’s tripping on the second pass to this function after reset. None of the tasks are up yet.

Most of those registers have a5a5a5a5, except r7 which has a valid ram address.

lr has 0x8010185, which oddly is not aligned.

r0 is set to the end of RAM. 0x20018000 - which looks wrong. But my assembly is rusty.

Can anyone suggest what I’m looking for to fix this?

PendSV_Handler:
08010150:   mrs     r0, PSP
08010154:   isb     sy
08010158:   ldr     r3, [pc, #84]   ; (0x80101b0 <PendSV_Handler+96>)
0801015a:   ldr     r2, [r3, #0]
0801015c:   tst.w   lr, #16
08010160:   it      eq
08010162:   vstmdbeq        r0!, {s16-s31}
08010166:   stmdb   r0!, {r4, r5, r6, r7, r8, r9, r10, r11, lr}
0801016a:   str     r0, [r2, #0]
0801016c:   stmdb   sp!, {r0, r3}
08010170:   mov.w   r0, #80 ; 0x50
08010174:   msr     BASEPRI, r0
08010178:   dsb     sy
0801017c:   isb     sy
08010180:   bl      0x800e990 <vTaskSwitchContext>
08010184:   mov.w   r0, #0
08010188:   msr     BASEPRI, r0
0801018c:   pop     {r0, r3}
0801018e:   ldr     r1, [r3, #0]
08010190:   ldr     r0, [r1, #0]
08010192:   ldmia.w r0!, {r4, r5, r6, r7, r8, r9, r10, r11, lr}

PendSV should not executed if the scheduler has not started, so I suspect the first task has already executed, so also suspect osKernelStart() (which is NOT a function we provide, but wraps vTaskStartScheduler(), which IS a function we provide) has also executed correctly.

You don’t say what functionality you added. Are you sure the new functionality executes correctly without crashing, or that the new functionality does not use a lot more stack causing the task stack to overflow? Make sure to go through the following page to have things like stack overflow detection on, configASSERT() defined, etc: https://freertos.org/FAQHelp.html

Inside the interrupt LR does not hold an address but the EXC_RETURN value. You may want to check the value is valid, but it does not have to be aligned.

0xa5a5a5a5 is the value used to fill the stack when a task is created, so it is normal for this value to be seen in any registers that have not been used since the task started running.

Couple of notes. On the Arm series of processors, the LSB of the LR is the Thumb Mode bit, not the LSB of the PC (which will always be 0) and on Cortex-M machines, that will always be set, so the ‘unaligned’ value is correct.

I suspect that 0x20018000 is the top of the main stack with nothing on it, that would be for a machine with 96k of internal ram.

Thanks for the tips. I had implemented vApplicationStackOverflowHook() yet it never ran for this stack overflow. Some of the higher priority tasks were beginning, but not getting far. Other tasks never got a chance to run.

I was only able to diagnose by removing tasks one at a time until I could isolate the culprit.

Overflow detection detects the problem AFTER the overflow has occurred and the task gets switched out, and it is possible that the system will crash before that happens.

When the system crashes, you can look at the pxCurrentTCB to see what task was running, which is one likely culprit for the problem. (Though it is still possible some other task previously corrupted something in memory, without directly overflowing the stack, which is the source of the problem)