Privileged mode with SMP Configuration on secondary cores

Hey there !!

I’m trying to investigate a Fault that occurs when trying to start the first task on core 1 in SMP Configuration.

I’ve an UART trace, where I’m supposed to see my four tasks say “Hi I’m <TASK_ID> - Current core : <CORE_ID>”. What I see is tasks 1, 3 and 4 actually printing their trace from core 0. The task 2 seems to never run. My thought is that the core 1 tried to run it, but the fault it meets stop it. As the OS is in cooperative mode (to avoid issues related to the systick in a first time) the core 0 never run it.
The Fault report for core 1 indicates FORCED=1, INVSTATE=1 and PENDSVACT=1 (PendSV is always in my troubles! :open_mouth: ) The fault do not occur if I break the core 1 when entering the PendSVHandler : It’s probably the source of my issue.

After some tests I tried, I figured out that the cores are always in privileged mode, even when running the tasks (observed on core 0.) From what I understand of arm datasheet, I’m supposed to switch between Thread mode and Handler mode. Is being always in privileged mode a bad practice ?
The first instruction of my task implementation is - according to the debugger - SUB SP, SP, #4. Is it possible its causing maybe a race condition-like with PendSVHandler ? It’s the same for core 0 and core 1, but core 0 don’t meet any trouble, so I’m not sure. Is there some specific condition about the privilege levels in an SMP system ??

If it can help, here is my implementation of pendSVHandler :

void xPortPendSVHandler( void )
{
    /* This is a naked function. */

    __asm volatile
    (
        "   dsb                                 \n"
        "   isb                                 \n"
        "   mrs r0, psp                         \n"
        "   isb                                 \n"
        "                                       \n"
    #if configNUMBER_OF_CORES == 1
        "   ldr r3, pxCurrentTCBConst           \n" /* Get the location of the current TCB. */
        "   ldr r2, [r3]                        \n"
    #else /* configNUMBER_OF_CORES == 1 */
/* [A */"   push {r4}                           \n" /* save r4 */
/* [B */"   push {r0-r3, r12, lr}               \n" /* Save only these registers as the other are "callee-saved" */
        "   bl LLD_V7M_SCB_GetCoreID            \n" /* Get the current core ID */
        "   mov r4, r0                          \n" /* the core ID is in r4, so we can restore r0 */
/* B] */"   pop{r0-r3, r12, lr}                 \n" /* Restore the previously saved registers */
        "   ldr r3, pxCurrentTCBConst           \n" /* Get the location of the current TCB. */
        "   ldr r2, [r3, r4, lsl #2]            \n" 
/* A] */"   pop {r4}                            \n"
    #endif /* configNUMBER_OF_CORES == 1 */
        "                                       \n"
        "   tst r14, #0x10                      \n" /* Is the task using the FPU context?  If so, push high vfp registers. */
        "   it eq                               \n"
/* [C */"   vstmdbeq r0!, {s16-s31}             \n"
        "                                       \n"
/* [D */"   stmdb r0!, {r4-r11, r14}            \n" /* Save the core registers. */
        "   str r0, [r2]                        \n" /* Save the new top of stack into the first member of the TCB. */
        "                                       \n"
/* [E */"   stmdb sp!, {r0, r3}                 \n"
        "   mov r0, %0                          \n"
        "   msr basepri, r0                     \n"
        "   dsb                                 \n"
        "   isb                                 \n"
    #if configNUMBER_OF_CORES == 1
        "   bl vTaskSwitchContext               \n"
    #else /* configNUMBER_OF_CORES == 1 */
/* [A */"   push {r4}                           \n" /* save r4 */
/* [B */"   push {r0-r3, r12, lr}               \n" /* Save only these registers as the other are "callee-saved" */
        "   bl LLD_V7M_SCB_GetCoreID            \n" /* Get the current core ID */
        // "   mov r0, r4                          \n" /* Place the core ID in r0, the first parameter of a function call */
        "   bl vTaskSwitchContext               \n" /* Actually calling the function */
/* B] */"   pop{r0-r3, r12, lr}                 \n" /* Restore the previously saved registers */
/* A] */"   pop {r4}                            \n"
    #endif /* configNUMBER_OF_CORES == 1 */         /* All as it was before after !!!! */
        "   mov r0, #0                          \n"
        "   msr basepri, r0                     \n"
/* E] */"   ldmia sp!, {r0, r3}                 \n"
        "                                       \n"
        "   ldr r1, [r3]                        \n" /* The first item in pxCurrentTCB is the task top of stack. */
        "   ldr r0, [r1]                        \n"
        "                                       \n"
/* D] */"   ldmia r0!, {r4-r11, r14}            \n" /* Pop the core registers. */
        "                                       \n"
        "   tst r14, #0x10                      \n" /* Is the task using the FPU context?  If so, pop the high vfp registers too. */
        "   it eq                               \n"
/* C] */"   vldmiaeq r0!, {s16-s31}             \n"
        "                                       \n"
        "   msr psp, r0                         \n"
        "   isb                                 \n"
        "                                       \n"
        "                                       \n"




        "   bx r14                              \n"
        "                                       \n"
        "   .align 4                            \n"
    #if configNUMBER_OF_CORES == 1
        "pxCurrentTCBConst: .word pxCurrentTCB  \n"
    #else /* configNUMBER_OF_CORES == 1 */
        "pxCurrentTCBConst: .word pxCurrentTCBs \n"
    #endif /* configNUMBER_OF_CORES == 1 */
        ::"i" ( configMAX_SYSCALL_INTERRUPT_PRIORITY )
    );
}

Thanks for reading

You do not need to worry about privileges as you are not using Memory Protection Unit.

If the fault is forced, you may want to enable usage fault and bus fault. Refer this document from ARM - https://www.keil.com/appnotes/files/apnt209.pdf.

I actually use it: I’ve a custom platform, and it can’t start if cache is disabled. So I created a non-cacheable / non-buferable / shareable region, for kernel data such as pxCurrentTCBs (it was the solution to my previous topic.)
I didn’t worry about it until now, as the “normal version” of my project (single core software, developed by senior devs) use the MPU without precautions (or precautions I don’t see, instead), even the MPU wrappers are not used. I think I’ll must ask them about it, as it starts to be very specific to this hardware. :confused:

I’ll take look, thanks ! :smiley:

Even then, all of your task will run as privileged and you do not need to worry about it. When you enable usage fault, bus fault and mem fault as I mentioned in my last response, it will be clear if you are hitting a memory fault.

Ok some news !
The 3 faults are now enabled, its confirmed with USGFAULTENA & Cie. set to 1. So I no longer have an HardFault. I still have the INVSTATE at 1, so I investigate on it.

The PC is not pertinent (already in the fault handler.) I checked my MVIC_IABR ( 0 to 15) and all are equal to 0x00000000 : I don’t have any active IT. I interpret it as “the PSP is used when the fault occurs.” So I looked the content at the PSP and around: nothing to signal. As I had a doubt, I looked around the MSP too, it’s quite unexpected:

  • What was pushed in it match what is currently in R0-R3;
  • Stored LR and PC are equal, and when I go check the disassembly at this address, I see DC32 0x72727543.

I’m not worried by the word, it matches a piece of string the task must use. But I think I understood that DC32 is about a memory access, and it’s not a classic instruction to be executed, it’s a directive ? It’s actually executed (try), the PC goes on it, I have an execution breakpoint on it that is triggered. When I step once after this breakpoint, the fault occurs, and in my call stack, I see (before the “”) “@10001DFC()”, which is the address where I have the DC32 and the breakpoint!

I went to this address in core 0 disassembly. I see this DC32, but it’s not “executed”, an execution breakpoint on it is never triggered, I suppose this observation is the expected one ?

So, for a reason I can’t figure out yet, the MSP is used instead of the PSP, and my PC is pointing on an address it should not. Still investigating!!

Seems like an invalid jump/branch to an incorrect address.

Yes, you are not supposed to execute data.

Good luck!

If the return address from the trap is pointing to data, and not code, then the likely cause is that the program somehow jumped into the data, perhaps (since LR points to it) as a return from a subroutine. One cause is that some task overwrote the stack, and corrupted the stack and its return addresses.

One thing you can do is look at the current TCB for the core that trapped, and see which task it was that crashed. That is most likely the task that overwrote the stack, but if it passes the address of its stack to some other task which is running on another core, that could have done it too.

Hey!!

I finally found the answer, if someone is interested! The issue was in xPortPendSVHandler. At the beginning of this handler, I correctly use the core ID to index pxCurrentTCBs properly, with its address in r3. However, after the call to vTaskSwitchContext, which change the address contained in pxCurrentTCBs[0] or pxCurrentTCBs[1], there is a pop, that restore the address of pxCurrentTCBs` in r3. My mistake was to use r3 directly without indexing the array, resulting in the core 1 trying to use the TCB and the stack of the task actually affected to the core 0. ^^

thanks for help :heart:

1 Like

Thank you for reporting back!