FreeRTOS SMP Port to VS

Why do you think that Queue can not be used on SMP systems?

They can’t be used between cores on an AMP system, but SMP is fine.

If your queues are having problems, it is more likely that your SMP port isn’t doing what it needs to.

Your viewpoint is likely correct, but I am uncertain about the root cause. Let me describe the current state of affairs on my end:

A timer guardian thread is bound to core 0.

In the main function, the “Start For Core0 Task” is created and bound to core 0 with a priority of 5.

In the main function, the “Start For Core1 Task” is created and bound to core 1 with a priority of 5.

The “Start For Core0 Task” creates the “TestA” task bound to core 0. The priority of TestA is 10; TestA waits for semaphore A. Then, a periodic timer A is created, and after the creation of periodic timer A is successful, Timer A is started. Subsequently, a one-shot Timer B is created. The callback function of Timer A, upon counting to 14, activates Timer B. In the callback function of Timer B, semaphore A is sent, and after TestA task waits for the semaphore, it sends a message to TestB.

The “Start For Core1 Task” creates the “TestB” task bound to core 0. The priority of TestB is 7. TestB uses xStreamBufferIsEmpty to check if the stream buffer has any messages. If the stream buffer is empty, it prints no message to process; otherwise, it uses xStreamBufferReceive to retrieve messages from TestA and processes them.

#define portEnter_CRITICAL() vTaskEnterCritical()
#define portExit_CRITICAL() vTaskExitCritical()

configNUMBER_OF_CORES: 2
configUSE_CORE_AFFINITY: 1
configTIMER_SERVICE_TASK_CORE_AFFINITY: 0
configUSE_PREEMPTION: 0
configUSE_TIME_SLICING: 1
configRUN_MULTIPLE_PRIORITIES: 1

From the trace, the following task switch process is observed:

On core 0: IDLE0 → Tmr Svc → Start For Core0 Task → TestA task → Tmr Svc (periodic timer A) → TestA task →
…→Tmr Svc (one-shot timer B): sends semaphore → TestA task:sends message to TestB→…

On core 1: IDLE1 → Start For Core1 Task → TestB (prints no message to process) → TestB (prints no message to process) → TestB (prints no message to process) → TestB (prints no message to process) → TestB (prints no message to process)→…

Note: TestA continuously sends messages to TestB until the stream buffer is full, at which point TestA is blocked. If TestB does not use xStreamBufferIsEmpty and directly uses xStreamBufferReceive, TestB will be blocked immediately.If TestA and TestB communicate using a queue, both sides would be blocked directly. From the current perspective, it appears that each core is scheduling its own tasks, and the TestA task on core 0 cannot communicate with the TestB task on core 1. However, tasks on the same core should be able to communicate with each other using semaphores.

In the current state, are there any good solutions? Or are there any good directions to try out?

When task A adds a message to the buffer, if there is a task ON ANY CORE waiting for that buffer, it should move it from blocked to ready. and if that task is higher in priority than a task currently executing on any core that task can run on, it should send the cross core signal to force the reschedule on that core.

Your description sounds like the cross core signaling isn’t working, which would be a problem in the SMP port layer.

Could you take the stream as an example to demonstrate: when TestA on core 0 sends a message to TestB on core 1 through the stream, where in the kernel’s code does this cross-core signal take effect? I will debug according to the displayed segment of code to find out what is causing the cross-core signal not to work properly?

It is implemented internally and uses the macro portYIELD_CORE. Check your implementation of this macro and see if you are able to interrupt core 1 from core 0.

if this meas that I will look for the macro portYIELD_CORE in the stream api,and then debug the portYIELD_CODE?but I don’t understand stream api, so I want Your help for give the accurate code position of the macro portYIELD_CORE,So that I can debug this source more accurately and quickly, and find the root cause from the source downwards。

The Stream Buffer will call xTaskNotify to wake up the receiving task, to let it know data is present.

Notify will call taskYIELD_ANY_CORE_IF_USING_PREEMPTION, which will call prvYieldForTask which is in the port layer if you are in SMP mode.

Does this mean that configUSE_PREEMPTION should be set to 1? However, during the actual debugging process, configUSE_PREEMPTION cannot be set to 1 because sometimes there is a deadlock situation where the IDLE task YIELDS and the TICK interrupt service routine causes both cores to YIELD simultaneously. Therefore, I have set configUSE_PREEMPTION to 0.

I would like to inquire, is it necessary to use the PREEMPTION mechanism when scheduling other cores? Can the other core be made to perform scheduling just by sending a PendSV interrupt without using the preemption mechanism?

When configUSE_PREEMPTION is set to 0, a context switch will only occur when the Running task enters the Blocked state, or the Running task explicitly yields (manually requests a re-schedule) by calling taskYIELD().

No, it is not. As explained above, running task either need to block or yield explicitly.

Thank you very much. Next, I will proceed to debug the reason why cross-core scheduling is not possible on SMP using xTaskNotify and taskYIELD_ANY_CORE_IF_USING_PREEMPTION.

1.Due to in-depth debugging, I have gained a deeper understanding of the previous phenomenon, and I need to correct the previous phenomenon.

A timer guardian thread is bound to core 0.

In the main function, the “Start For Core0 Task” is created and bound to core 0 with a priority of 5.

In the main function, the “Start For Core1 Task” is created and bound to core 1 with a priority of 5.

The “Start For Core0 Task” creates the “TestA” task bound to core 0. The priority of TestA is 10; TestA waits for semaphore A. Then, a periodic timer A is created, and after the creation of periodic timer A is successful, Timer A is started. Subsequently, a one-shot Timer B is created. The callback function of Timer A, upon counting to 14, activates Timer B. In the callback function of Timer B, semaphore A is sent, and after TestA task waits for the semaphore, it sends a message to TestB.

The “Start For Core1 Task” creates the “TestB” task bound to core 0. The priority of TestB is 7. TestB uses xStreamBufferIsEmpty to check if the stream buffer has any messages. If the stream buffer is empty, it prints no message to process; otherwise, it uses xStreamBufferReceive to retrieve messages from TestA and processes them.

#define portEnter_CRITICAL() vTaskEnterCritical()
#define portExit_CRITICAL() vTaskExitCritical()

configNUMBER_OF_CORES: 2
configUSE_CORE_AFFINITY: 1
configTIMER_SERVICE_TASK_CORE_AFFINITY: 0
configUSE_PREEMPTION: 0
configUSE_TIME_SLICING: 1
configRUN_MULTIPLE_PRIORITIES: 1

From the trace, the following task switch process is observed:

On core 0: IDLE0 → Tmr Svc → Start For Core0 Task → TestA task → Tmr Svc (periodic timer A) → TestA task →
…→Tmr Svc (one-shot timer B): sends semaphore → TestA task:sends message to TestB by Xqueue→…

On core 1: IDLE1 → Start For Core1 Task → TestB(receive message by xQueue) →TestB(receive message by xQueue)→TestB(receive message by xQueue)

Note: The issue with xStream is likely the same as with xQueue, so I will choose xQueue as an example since I am more familiar with it.I will continue to use xQueue for message passing between cores. The previous blocking issue with inter-core messaging using xQueue was because TestB was created too early, before xQueue was successfully created, which led to an Assert failure. After adjusting the code, this ASSERT issue is resolved. When TestB receives messages, it uses xQueue and configures xTicksToWait as portMAX_DELAY. Because TestB starts relatively early, the execution sequence on CORE1 is roughly as follows:
TestB → xQueueReceive → vTaskPlaceOnEventList → vListInsert → prvAddCurrentTaskToDelayedList → taskYIELD_WITHIN_API in xQueueReceive →
vTaskSwitchContext(Core1) → pxCurrentTCBS[core1] = TestB, which causes TestB to be executed repeatedly. During the third execution(At this point, no message has been sent to TestB yet.),
in vListInsert for(pxIterator = (ListItem_t*)&(pxList->xListEnd); pxIterator->pxNext->xItemValue <= xValueOfInsertion; pxIterator = pxIterator->pxNext)
a recursion is formed and it cannot get out. The reason is that the address of xListEnd and the pxNext below xListEnd are exactly the same, and xItemValue is equal to xValueOfInsertion. I don’t understand why TestB is not removed from the ready list and placed in the DelayedList? I think the normal phenomenon should be that after the first execution of the TestB task, it is removed from the ready list and the IDLE1 task is executed, and when there is a message in xQueue, TestB will re-enter the ready list for execution.

2.The code snippet in prvAddCurrentTaskToDelayedList is as follows:

if(uxListRemove(&(pxCurrentTCB->xStateListItem)) == (UBaseType_t)0)
{
    portRESET_READY_PRIORITY(pxCurrentTCB->uxPriority, uxTopReadyPriority);
}

Why not use configNUM_OF_CORES to differentiate between signal core and SMP core? Instead, why is the above method used directly?

3.Why can’t the idle task created for each core be bound to a specific core, and why does every idle task have the core 0 as its parent core attribute?

This seems list or memory is corrupted.

Can you try to debug why that is not happening and if that is the cause of the corruption?

The code to remove a task from ready list would be same in single core and SMP. Why do you think that the code should be different?

There is no benefit of doing that.

What do you mean by this? Can you link the specific code line you are referring to?

Because a single core uses pxCurrentTCB to obtain the current TCB, SMP (Symmetric Multi-Processing) cores use pxCurrentTCBs to obtain the current TCB for each core. Why can pxCurrentTCB be used to obtain the TCB for each core in SMP within this function? How is this achieved? Initially, I thought that the reason why the TestB task on core 1 could not be removed from the ready state was due to this issue.

Because pxCurrentTCB is defined to xTaskGetCurrentTaskHandle() for SMP here - FreeRTOS-Kernel/tasks.c at main · FreeRTOS/FreeRTOS-Kernel · GitHub.

Okay, I will continue to debug in the direction you suggested. Currently, it seems that using uxQueueMessagesWaiting has bypassed the aforementioned issue, and TestB is able to start as planned. I may need to continue debugging to see if it meets the expected outcome.

The English translation of the given text is:

In prvCreateIdleTasks, it calls

xTaskCreateStatic(pxIdleTaskFunction,
                                  cIdleName,
                                  uxIdleTaskStackSize,
                                  (void*)NULL,
                                  portPRIVILEGE_BIT,
                                  pxIdleTaskStackBuffer,
                                  pxIdleTaskTCBBuffer);

to create the Idle task for each core. However, in xTaskCreateStatic,

#if(( configNUMBER_OF_CORES > 1) && ( configUSE_CORE_AFFINITY == 1))
{
    pxNewTCB->uxCoreAffinityMask = configTASK_DEFAULT_CORE_AFFINITTY;
}
#endif

This results in both idle tasks having the core affinity set to core 0.

This is the reason why the task switching on core 1, when calling prvSelectHighestPriorityTask, includes the following code snippet:

if(pxTCB->xTaskRunState == taskTASK_NOT_RUNNING)
{
    #if(configUSE_CORE_AFFINITY == 1)
    if((pxTCB->uxCoreAffinityMask&((UBaseType_t)1U << (UBaseType_t)xCoreID))
    #endif
    {
        pxCurrentTCBs[xCoreID]->xTaskRunState = taskTASK_NOT_RUNNING;
        #if(configUSE_CORE_AFFINITY == 1)
            pxPreviousTCB = pxCurrentTCBs[xCoreID];
        #endif
        pxTCB->xTaskRunState = xCoreID;
        pxCurrentTCBs[xCoreID] = pxTCB;
        xTaskScheduled = pdTRUE;
    }
}

This code cannot be executed, and as a result, the task pointed to by pxCurrentTCBs[xCoreID] always refers to TestB. Therefore, even if TestB is removed from pxReadyTaskLists, it will still be executed again after the scheduling is complete. Therefore, it is necessary to bind the core affinity of the idle task for core 1 to core 1 itself.

The default value of configTASK_DEFAULT_CORE_AFFINITTY is tskNO_AFFINITY and not 0 - FreeRTOS-Kernel/include/FreeRTOS.h at main · FreeRTOS/FreeRTOS-Kernel · GitHub. Are you setting configTASK_DEFAULT_CORE_AFFINITTY to 0 in your FreeRTOSConfig.h?

Yes,but if I configure it as 1, then both idle tasks will be bound to core1,which also is a problem. that is the idle task on core0 will not continue to run after the first run.

Remove that setting from your FreeRTOSConfig.h so that the default value will be used.

thank you, I will try the suggest