FreeRTOS SMP: port testing

Hi, I am trying to test my SMP port as suggested here (even though I didn’t specify SMP was involved at all, my bad). I think the test implemented by IntQueue.c keeps failing because some assumptions may be violated. The reported error line is 614:

if( xQueueSend( xNormallyFullQueue, &uxTxed, intqONE_TICK_DELAY ) != errQUEUE_FULL )
        {
            /* Should only succeed when the higher priority task is suspended */
            if( eTaskGetState( xHighPriorityNormallyFullTask1 ) != eSuspended )
            {
                prvQueueAccessLogError( __LINE__ );
            }

            vTaskResume( xHighPriorityNormallyFullTask1 );
            uxLowPriorityLoops2++;
        }

That condition seems something that would hold only in a single-core environment.
Am I missing something about the test it actually works only in single-core? And in this case, is there any set of tests for SMP ports?

Hi @Matth9814
You can take a look at these demos for reference .
This includes comprehensive demo as well, which you are trying to run.

These tests were designed for single core FreeRTOS, so I won’t be surprised if you find an issue in the test when running them on SMP. Having said that, can you try setting configRUN_MULTIPLE_PRIORITIES to 0 in your FreeRTOSConfig.h?

Thank you, that worked. Unfortunately I am also having an issue at line 686, inside the second interrupt handler, because it seems that the tasks that interact with the normally empty queue are draining it in the middle of transmission-reception sequence.

timerNORMALLY_EMPTY_TX();
timerNORMALLY_EMPTY_TX();

timerNORMALLY_EMPTY_RX(); 
timerNORMALLY_EMPTY_RX();

Aside from this, all other tests pass without problems.

EDIT: Placing the reported code in a critical section solves the problem but I do not know if it invalidates the test in some way.

That seems okay as it ensures that that tasks running on other cores do not interfere with this timer.

1 Like

I have to correct my initial statement. Setting configRUN_MULTIPLE_PRIORITIES to 0 does not solve the problem. There are still cases that trigger the error, although they are really rare and I cannot think of a scenario in which this could happen without multiple priority running at the same time. For the moment the only way I can observe the issue is after a lot of time (30 minutes, 2 hours and even 6 hours up to now) with all the tests enabled. At this point the solution could be to just remove those checks in the low priority tasks when configNUMBER_OF_CORES > 1. I want to underline that the interrupts frequency I am using is the same of the ZC702 demo.

@Matth9814 I mentioned in another thread that I was also working on this test, and I have it running now.

I am using #define configRUN_MULTIPLE_PRIORITIES 0 and I also needed to put these macros in critical section:

timerNORMALLY_EMPTY_TX();
timerNORMALLY_EMPTY_TX();

timerNORMALLY_EMPTY_RX(); 
timerNORMALLY_EMPTY_RX();

So far I have not seen this error yet:

/* Should only succeed when the higher priority task is suspended */
            if( eTaskGetState( xHighPriorityNormallyFullTask1 ) != eSuspended )
            {
                prvQueueAccessLogError( __LINE__ );
            }

But I have only run for less than an hour so far. I’ll let it run over the weekend and see what happens.

1 Like

I have finally captured the trace of the error; I am attaching it to the post if someone wants to analyze it with Percepio View: trace_normallyEmpty_extended.bin.zip (418.9 KB)

The error is detected inside L2QRx (prvLowerPriorityNormallyFullTask) when both H1QTx (xHighPriorityNormallyFullTask1) and H2QTx (xHighPriorityNormallyFullTask2) are blocked. However, only the state of H1QTx is actually checked. The sequence of events is the following:

  1. H1QTx blocks itself for 140 ticks.

  2. H2QTx is blocked because xNormallyFullQueue is full.

  3. RegTask1 and RegTask4, two tasks with the IDLE priority, execute respectively on Core#1 and Core#0, an ISR (xSecondTimerHandler) is executed on Core#0. This prevents RegTask4 from yielding while RegTask1 does it.

  4. L2QRx starts executing on Core#0 even though, in the meantime, xSecondTimerHandler received one item from xNormallyFullQueue waking up H2QTx from the blocked state. However, H2QTx cannot be switched in because configRUN_MULTIPLE_PRIORITIES is set to 0 and RegTask4 is still technically running on Core#1.

  5. L2QRx successfully executes xQueueSend(). At this point the error could already be detected because H1QTx is blocked, not suspended, but, in the analyzed trace, L2QRx is preempted before this can happen because the ISR on Core#0 ends.

  6. H2QTx is blocked again because the only available spot in the queue has been taken by L2QRx.

  7. When L2QRx is resumed the error is finally spotted.

Basically, the error happens because the interrupted low priority task blocks the ability of H2QTx to be switched in before L2QRx. I don’t think it makes sense to make assumptions about the state of H1QTx and H2QTx in a multi core environment. Even checking that H1QTx is not in the suspended nor blocked state would not be safe in my opinion because the previous scenario, for what I know, could also happen with H1QTx in the ready state. Removing the check seems to solve the problem. The tests have been run for 15+ hours and no error has been detected.

I raised a PR to include the demo in the community supported demos.

I think this may show a problem with configRUN_MULTIPLE_PRIORITIES 0 code, as if the interrupt enabled a higher priority task, that SHOULD be started right away, and all the other cores should also switch to a same priority task (which really shouldn’t exist unless several got woken in that ISR) or an idle task.

I can well believe that the “hack” of limiting the scheduler to just run a single priority may have problems, as it is only there for code that has SMP bugs as a work around, so might not get the same level of testing.

I agree with you, ideally the higher priority task should be immediately switched in on the other core when it is ready, even if the interrupt task has a different (lower) priority. I can also confirm that setting configRUN_MULTIPLE_PRIORITIES to 0 is definitely a workaround for tests that were developed for single core environments.

For what it’s worth I ran all weekend without seeing the error. I am also using the ZC702 demo as the basis. Here are some notes on my test setup:

There is apparently a bug in the newest version of Xilinx drivers so that only 2 out of 6 timers are available to use. So I ran the test only with 2000Hz and 2001Hz timer. The 20,000Hz timer is not used for the test result anyway.

The Xilinx port before modification did not support FPU in interrupt so I did not include the asserts for the FPU calculations.

What optimization flag are you using? I started seeing the error with -O2/-O3.
I used the embeddedsw 2024.1 version and there was no problem with the TTC driver so I enabled all the three TTCs.
For what concerns the FPU calculations, I also excluded them because using vApplicationFPUSafeIRQHandler() with so many interrupts seems to completely bloat the CPU so that nothing is printed on the UART. But that could also happen because I did not use a thread-safe print function.

Here is the TTC driver issue. I guess its only an issue with SDT drivers which have had a lot of issues since they introduced them.

"Each TTC timer is supposed to have three timer instances. This flow is treating them as a single instance.

The last two timer instances for each timer are not available for use."

And my testing was with -O2 build. It is still on my todo to check -O3 build.

Ok, at this point the only things left are:

  1. Which FreeRTOS version are you using? I observed the error on FreeRTOS 11.1.0.
  2. How many tests are you running? I could observe the error only when enough tests were enabled.

The fact that you are not using the 20 kHz timer may also be a problem because in my test it may have contributed to leave L2QRx enough time to execute xQueueSend() before H2QTx was switched in.

I’m using latest release 11.2

I’m running every task in main_full except for vStartMessageBufferAMPTasks, vUARTCommandConsoleStart, and vRegisterSampleCLICommands.

I agree I want to implement the 20kHz timer for completeness. If the next Xilinx release fixes the timer issue I’ll update the test. Otherwise I might setup a PL interrupt at 20kHz instead.

1 Like