SMP Porting to Cortex R5

We make an attempt to port SMP version to Cortex R5. Work is inspired by RP2040 port.
I’d appreciate some porting checklist.

  1. Tick interrupt. Happens only on master core or identical on both (all cores)? Other scheme?
  2. Ignition and initialization of Second core. Simple loop that waits for sp, pc (xPortStartSchedulerOnCore or xPortStartScheduler) and interrupt table address , once set jumps to given pc. Is this correct?
  3. What else we need to know to implement multi-core safe mutexes, locks, etc. Is there some dependency of Processor SDK (out of FreeRTOS source tree)?
  4. Any static data that kernel uses that we need to be aware of?
  5. If I understand correctly, both cores run the same scheduler, and it picks tasks from queue that is visible on both cores. Is it correct?

Our target is offloading of computation tasks to the second core, no need in Interrupts/peripherals access on the second core.

Any help would be appreciated.

Best regards

Good questions and we certainly need to create a porting checklist for SMP.

For now here you go:

  1. You will need a tick interrupt on 1 core. The tick handler will execute portYIELD_CORE(corenumber) when the other core(s) need to reschedule their work.
  2. You will need to do something specific to your architecture to launch the second core. The first core to run will call xPortStartScheduler() and your porting layer is expected to get all the cores running. On the RP2040 you can see it with an API call to multicore_launch_core1(functionPointerToLaunch).
  3. The key piece is to ensure the two LOCKS (below) are implemented with hardware.
  4. The kernel data in the is protected by two hardware spinlocks. These are the TASK lock and the ISR lock. You will need to ensure you have implemented portGET_TASK_LOCK(), portRELEASE_TASK_LOCK(), portGET_ISR_LOCK() and portRELEASE_ISR_LOCK(). These need to be implemented in such a way that NO CORE can get past the lock. On the RP2040 this is a hardware spinlock so when the second core reaches the GET, it will halt until the other core executes RELEASE.
  5. Exactly right. You can lock tasks to specific cores with core affinity. Doing so may increase the execution cycles available to a specific task but it will also make SOME context switches require two checks of the waiting task queue because the highest priority task may not be allowed to run on the core.

Thank you very much for fast reply!
I need some clarification for question #1.
I want to make sure that I understand you correctly.
Tick interrupt runs only on first (0) master core and triggers scheduling on the second core.
There no tick interrupt on slave cores.
Is that correct?

Best regards

That is correct.

You can see this behavior in the RP2040 port by looking here: xPortStartSchedulerOnCore() You will see that the first thing done in that function is check the core # and if it is the primary core it starts the tick timer interrupt.

The tick interrupt eventually calls portYIELD_CORE(corenumber) which is macro’ed to vYieldCore(corenumber) in portmacros.h (RP2040 port). The implementation is below but the key pieces are:

  1. It will only yield a different core than the one running…
  2. It writes to a HW FIFO for intercore communications. That write triggers an interrupt that is used to cause the core to run the scheduler.
void vYieldCore( int xCoreID )
    configASSERT(xCoreID != portGET_CORE_ID());
        /* Non blocking, will cause interrupt on other core if the queue isn't already full,
        in which case an IRQ must be pending */
        sio_hw->fifo_wr = 0;

OK. So instead of tick interrupt, master core send a software interrupt to the slave core, right?

Another question about cache coherency.
We have 2 R5 cores, caches are not coherent. How would you suggest to ensure RTOS memory coherency ? Memory barriers are enough ?

Yes. the master core takes the tick and issues a soft interrupt to the remaining cores. The XMOS port is a good example of going beyond 2 cores.

Any I-Cache will be fine. D-Cache could be used but you will need to take special care.

  1. Lock the affinity of tasks that will use the data cache so they stay on a single CPU.
  2. Ensure that cached data is not shared between tasks that are not affined to a single CPU.

If the cache was flushed on every context switch you could use the cache in a single core system, but because the context switches can happen asynchronously tasks that share data can easily run into issues with the cache.

For the interested observers the ARM technical documentation is here: Documentation – Arm Developer

Some thought experiments:

  1. Task X moves from core A to core B.
    — The context save should flush the cache and this should be fine.
  2. Task X shares data with Task Y. X is running in A and Y is running in B.
    — The shared data must have a mutex. The FreeRTOS mutex would need to be using non-cached RAM.
  3. FreeRTOS operations are running in both cores.
    — Critical memory is protected with with HW spinlocks.
    — The Critical memory must either be non-cached… or,
    — The critical sections must flush the cache on release and invalidate the cache on get. This is not demonstrated in the current ports.

Posting a link to this page: Officially supported and contributed FreeRTOS code - some definitions to show the path for contributing your port once its running :slight_smile:

Once we get acceptable results :crossed_fingers: and clean core we will upload code to github.

Hi Joseph,

I’m able to trigger an interrupt on core1 from void vYieldCore( int xCoreID ) when executed on core0.
When you say “cause the core to run the scheduler”, which function exactly must be invoked in the ISR executed on core1?



Going over RP2040/port.c: void prvFIFOInterruptHandler() again,

My guess is that
should be invoked in the ISR. Is that correct?

Another question. xPortStartSchedulerOnCore() is invoked on both core0 and core1, which suggests that the inter-core interrupt is registered on both cores. If so, core1 will also trigger an interrupt on core0 when vYieldCore( int xCoreID ) is invoked.

Is that correct?

Thanks a lot,


Yes, you will call portYIELD_FROM_ISR(TRUE). You can see that behavior here: FreeRTOS-Kernel/port.c at 4832377117b4198db43009f2b548497d9cdbf8da · FreeRTOS/FreeRTOS-Kernel · GitHub

Your second question is exactly correct. In the pico port, the FIFO interrupt is configured for both directions between the cores and either core can trigger portYIELD_FROM_ISR(TRUE) for the opposite core. This allows core B to reschedule core A for reasons other than tick. I.E. a write to a queue can cause a higher priority task(s) to unblock.