Assert happends in prvCheckForRunStae() on smp

Hi
I am porting SMP on Cortex-A32 arch32*2 but encountered a problem. The assert happend in prvCheckForRunState()

         /* Enabling interrupts should cause this core to immediately
         * service the pending interrupt and yield. If the run state is still
         * yielding here then that is a problem. */
        configASSERT( pxThisTCB->xTaskRunState != taskTASK_SCHEDULED_TO_YIELD );

The API that changes TCB->xTaskRunState to taskTASK_SCHEDULED_TO_YIELD is prvYieldCore()

            portYIELD_CORE( xCoreID );                                               \
            pxCurrentTCBs[ xCoreID ]->xTaskRunState = taskTASK_SCHEDULED_TO_YIELD;   \

In my implement, portYIELD_CORE( xCoreID) sends a IPI interrupt ( a SGI in gic-v2 ), which changes “ulPortYieldRequired[ portGET_CORE_ID() ] = pdTRUE” and can response rightly.

Besides tick interrupt and IPI interrupt, there is only one IPC interrupt. The problem is not easy to reproduce, so I’m looking for some suggestions to debug it.

Thanks!

Hi @Saiiijchan ,
One quick check would be to find out if the Interrupts and Core-id mapping in the GIC is fine. Are all the interrupts coming on the core they are supposed to be handled.

When you enable interrupts on this line, the task should yield and change its state, isn’t it? When the assert hits, can you examine the value of pxThisTCB->xTaskRunState? In one such case that I helped someone debug before, the value was correct when examined in the debugger - the was some cache coherency problem in that case.

Hi @aggarg and @Shub
After assert happends, I use Jlink to watch pxCurrentTCBs on assert core, the xTaskRunState is -2 and the TCB message is complete. It may not be a problem of cache consistency and stack overflow. I will check irq setting and IRQ handling flow

Hi @aggarg & @Shub
The implement of FreeRTOS_Tick_Handler causes the problem. After IPI sets ulPortYieldRequired, a pending systick is fired and clear ulPortYieldRequired. After modify tick handler, it works well.

@@ -401,9 +401,7 @@ void FreeRTOS_Tick_Handler( void )
                /* Increment the RTOS tick. */
-                ulPortYieldRequired[ xCoreID ] = xTaskIncrementTick();
+               if ( xTaskIncrementTick() != pdFALSE ){
+                       ulPortYieldRequired[ xCoreID ] = pdTRUE;
+               }
        }

Thanks for your help :smiley:

Glad that you figured!

Dear Aggerwal
I met the case you mentioned. I am not familiar with cache coherency, could you please provide some solutions? Thanks a lot.

When test another case, I also encountered this problem. Should I add memory barrier instructions in somewhere?

Is it possible to test by turning off the caches? Does the problem go away when cache is turned off? Alternatively, can we move all the TCBs and into a memory which is not cached and then try?

Was @aggarg suggestion to disable processor caches helpful? Do you need any further assistance debugging?

If so, it might be better to create a new forum post describing your particular environment and problem you are debugging so we may better assist.

There are still some questions that I haven’t figured out yet
I set MMU to move heap to non-cache ( include TCB) and the problem still exits. I am tracing the setups of cache, MMU and page table. Besides, the debugger behave is confused that the command " x + address" in GDB shows the value of address in virtual memory which is the value in cache or phisical memory.

I set the breakpoints on 0x60391a38 before entered vAssertCalled. At this time, the value of r3 was 0xfffffffe (-2, taskTASK_YIELDING) and was different from the value calculated by the tcb address in [fp, #-8]. Is it possible that the value has been modified again at this time because is was not in critical section?

60391a18: ebffd4e9 bl 60386dc4
60391a1c: f1080080 cpsie i
60391a20: f57ff04f dsb sy
60391a24: f57ff06f isb sy
60391a28: e51b3008 ldr r3, [fp, #-8]
60391a2c: e5933034 ldr r3, [r3, #52] ; 0x34
60391a30: e3730002 cmn r3, #2
60391a34: 1a000003 bne 60391a48 <prvCheckForRunStateChange+0x128>
60391a38: e300129f movw r1, #671 ; 0x29f
60391a3c: e3040514 movw r0, #17684 ; 0x4514
60391a40: e346003d movt r0, #24637 ; 0x603d
60391a44: ebffd4de bl 60386dc4

The source code in tasks.c on line 671

666 portENABLE_INTERRUPTS();
667
668 /* Enabling interrupts should cause this core to immediately
669 * service the pending interrupt and yield. If the run state is still
670 * yielding here then that is a problem. */
671 configASSERT( pxThisTCB->xTaskRunState != taskTASK_YIELDING );

Should not happen in a normal case - so if that is happening, we need to find out how is that value getting changed. Do you want to have a debug session to debug this? If yes, please drop me your email and preferred times in the DM.

Thanks for your help, I noticed the v11.0.0 has been released and I will try it.

Thanks! Let us know whatever you find.