SMP Port - Debugging xPortPendSVHandler on second core

Hey there !
Still on my port for my dual-core Cortex-M7 ! My core 0 is working without any troubles, and it handles all the tasks of my demo app (just “Hello my name is…” tasks.) But the core 1 goes into HardFault. I observed several things, but I don’t know how to conclude.
First, the Fault Report indicates the register PENDSVACT is set to 1, and the address it indicates is xPortPendSVHandler, so I can affirm I did a mistake while reworking this handler. Unfortunately, I didn’t saw it before : The core 0 did not encoutered any issue, now in SMP configuration, or sooner when I tried to use my modifications in single core (to check if I have the same behavior.)
Here is my modified version :

void xPortPendSVHandler( void )
{
    /* This is a naked function. */

    __asm volatile
    (
        "   mrs r0, psp                         \n"
        "   isb                                 \n"
        "                                       \n"
    #if configNUMBER_OF_CORES == 1
        "   ldr r3, pxCurrentTCBConst           \n" /* Get the location of the current TCB. */
        "   ldr r2, [r3]                        \n"
    #else /* configNUMBER_OF_CORES == 1 */
/* [A */"   push {r4}                           \n" /* save r4 */
/* [B */"   push {r0-r3, r12, lr}               \n" /* Save only these registers as the other are "callee-saved" */
        "   bl LLD_V7M_SCB_GetCoreID            \n" /* Get the current core ID */
        "   mov r4, r0                          \n" /* the core ID is in r4, so we can restore r0 */
/* B] */"   pop{r0-r3, r12, lr}                 \n" /* Restore the previously saved registers */
        "   ldr r3, pxCurrentTCBConst           \n" /* Get the location of the current TCB. */
        "   ldr r2, [r3, r4, lsl #2]            \n" 
/* A] */"   pop {r4}                            \n"
    #endif /* configNUMBER_OF_CORES == 1 */
        "                                       \n"
        "   tst r14, #0x10                      \n" /* Is the task using the FPU context?  If so, push high vfp registers. */
        "   it eq                               \n"
/* [C */"   vstmdbeq r0!, {s16-s31}             \n"
        "                                       \n"
/* [D */"   stmdb r0!, {r4-r11, r14}            \n" /* Save the core registers. */
        "   str r0, [r2]                        \n" /* Save the new top of stack into the first member of the TCB. */
        "                                       \n"
        /* ... */

(sorry, don’t know how to add the lines numbers.)

the comments /* [A */ were tools for me to ensure all my push and pop in the correct order. But let’s check the 4th line from the end : stmdb{r0-r3, r12, lr} I kept it from the existing FreeRTOS port for M7, so I don’t think it’s wrong to use it. from what I understand, this instruction pushes my registers onto the stack, and decrements the value of r0. I observed the registers before executing this instruction :

  • r0, r4-r11 = 0x00000000
  • R14 = LR = 0xFFFFFFF9

r4-r11 and R14 have strange values, but why not. When the Fault occurs, it’s the first time a task is started, so let’s assume it’s because they are not used yet. but why r0 is 0x00000000 ? according to the first instruction mrs r0, psp, r0 must hold the value of the process stack pointer. 0x00000000 is the base address of the stack for core 0, while the base address for the stack of core 1 is 0x34000000 (verified in the VTOR.) But if I put a breakpoint at mrs... I see it actually returns 0x00000000.
This what I don’t understand and where I need help. :frowning:

Thanks for reading !!
– Edit –
Corrected a typo : pop{r0-r3, r12, lr}stmdb{r0-r3, r12, lr}

If you are describing the pop instruction, it is not correct. This pop instruction pops the values from the stack into the registers r0-r3, r12, lr and updates the SP.

Is the Core 1 setup correctly? Are you able to run code on Core1 without FreeRTOS?

Hii, Thanks for answering !

Indeed, I did not describe pop, I did a typo (now corrected). I was talking about stmdb r0!, {r4-r11, r14} so “STore Multiple Decrement Before” (ARM Reference.)

Yes, it was the first thing I did when I started this project. The core starts correctly, and if I don’t try to start the scheduler, its main is implemented as a loop sending “core 1 is alive” each second (on a dedicated UART line), only using LLD.

Then your explanation is correct.

Are you making sure that the first task is started correctly on both the cores? Take a look at the RP2040 port for a reference - FreeRTOS-Kernel/portable/ThirdParty/GCC/RP2040/port.c at main · FreeRTOS/FreeRTOS-Kernel · GitHub.

Well, I can’t do a side by side, as I’m not starting the core 1 as the RP2040 do. The senior devs here set (using DCF) the address from where the core fetch its code at startup. I placed here the scatter loading operations (far shorter than for core 0) then it branches to what I want : the core 1 main function. Maybe there is the same thing behind multicore_launch_core1 ? I see it enter the function prvDisableInterruptsAndPortStartSchedulerOnCore on core 1. In the implementation of this function, I see portDISABLE_INTERRUPTS();is called, and I indeed forgot it. Corrected ! A future issue avoided !!
But it don’t solve my Fault. However, I’ve new observations !!
I wanted to see, on core 0, when/where psp is modified, to compare with the core 1. After a short session of step-by-step execution, I found it’s set by vPortSVCHandler :

void vPortSVCHandler( void )
{
    __asm volatile (
        "   ldr r3, pxCurrentTCBConst2      \n" /*Takes the address pxCurrentTCB in r3 */
    #if configNUMBER_OF_CORES == 1
        "   ldr r1, [r3]                    \n" /* Takes the CurrentTCB itself in r1 based on the address pxCurrentTCB */
    #else /* configNUMBER_OF_CORES == 1 */
/* [A */"   push {r4}                       \n" /* Saving r4 */
/* [B */"   push {r0-r3, r12, lr}           \n" /* Save only these registers as the other are "callee-saved" */
        "   bl LLD_V7M_SCB_GetCoreID        \n" /* Get the current core ID */
        "   mov r4, r0                      \n" /* the core ID is in r4, so we can restore r0 */
/* B] */"   pop{r0-r3, r12, lr}             \n" /* Restore the previously saved registers */
        "   ldr r1, [r3, r4, lsl #2]        \n"
/* A] */"   pop {r4}                        \n" /* Restoring r4 */
    #endif /* configNUMBER_OF_CORES == 1 */
        "   ldr r0, [r1]                    \n" /* Takes first item in pxCurrentTCB : this is the task top of stack. */
        "   ldmia r0!, {r4-r11, r14}        \n" /* Pop the registers to restore them */
        "   msr psp, r0                     \n" /* set psp as the current task top of stack. */
        "   isb                             \n"
        "   mov r0, #0                      \n"
        "   msr basepri, r0                 \n"



        "   bx r14                          \n"
        "                                   \n"
        "   .align 4                        \n"
    #if configNUMBER_OF_CORES  == 1
        "pxCurrentTCBConst2: .word pxCurrentTCB             \n"
    #else /* configNUMBER_OF_CORES == 1 */
        "pxCurrentTCBConst2: .word pxCurrentTCBs           \n"
    #endif /* configNUMBER_OF_CORES == 1 */
    );
}

Precisely, it’s the instruction msr psp, r0. r0 depends on r1, itself sets by ldr r1, [r3, r4, lsl #2]. Here is the issue imo: the word in r1 is loaded from the address 0x30004334, which is in esRAM. the address is where I expect to find pxCurrentTCBs[1], according to the map produced during build, but the loaded word is 0x00000000 !! I don’t know why it have this value, if I use the memory view at this address, I don’t have the same values. :open_mouth:

For core 1 :

addresses . . . . .
0x30004314 00000000 00000000 00000000 00000000
0x30004324 00000000 00000000 00000000 00000000
0x30004334 00000000 00000000 30004340 FFFFFFFF
0x30004344 30004340 30004340 00000000 30004354
0x30004354 FFFFFFFF 30004354 30004354 00000006

For core 0 :

addresses . . . . .
0x30004314 00000000 0000000A 0000000C 0000FFFF
0x30004324 00000000 00000000 00000000 30004680
0x30004334 300048E8 00000002 30004340 FFFFFFFF
0x30004344 30004684 300048EC 00000001 30004354
0x30004354 FFFFFFFF 30004B54 30004B54 00000006

And, if I go check at 0x300048E8 (the address stored at 0x30004334,) It’s the same observation : the addresses are totally different !

So I think my issue is about how my RAM is initialized ?? I’ve another clue telling me my RAM is a mess, the disassembly in Ozone, in the file wiew, give me the warning “Value may be incorrect: Runtime code areas (RAM) not yet initialized.”

So now my next steps are to see what this warning means, maybe it will give me an indication about what to correct. :smiley:

I think before jumping to PendSV handler, you should try to make sure that the first task on both the cores start successfully.

I think you are on the right track!