I am debugging an issue on my costum Cortex-R5 FreeRTOS port. I’m currently using FreeRTOS Kernel V11.1.0.
I traced my wait API call OS_WaitForNotificationIndexed() down to xTaskNotifyWaitIndexed, which maps to xTaskGenericNotifyWait in tasks.c.
Observed behavior:
With cache enabled, notification wait behaves incorrectly. I don’t get out of the wait state
With cache disabled, it appears to work.
After applying this Change (inserted snippet from V10.6.1):
--- a/src/os/freertos/tasks.c +++ b/src/os/freertos/tasks.c @@ -7769,6 +7769,12 @@ TickType_t uxTaskResetEventItemValue( void ) { traceTASK_NOTIFY_WAIT_BLOCK( uxIndexToWaitOn ); prvAddCurrentTaskToDelayedList( xTicksToWait, pdTRUE ); + + /* All ports are written to allow a yield in a critical + * section (some will yield immediately, others wait until the + * critical section exits) - but it is not something that + * application code should ever do. */ + portYIELD_WITHIN_API(); } else {
After adding portYIELD_WITHIN_API at that point, behavior is stable again. My understanding is:
Once the task is moved to the delayed/blocked list, it must yield immediately.
portYIELD_WITHIN_API forces an immediate scheduler handover from inside the kernel API path.
On this Cortex-R5 port, it also includes ordering barriers (DSB/ISB) after triggering the software interrupt, which likely matters more when cache is enabled.
With cache disabled, slower execution can mask this race.
Is this the expected explanation, or is there another port-specific reason why cache-enabled mode exposes this when portYIELD_WITHIN_API is omitted?
Is your custom port a SMP port? When using SMP proper cache usage becomes very important, as all the processes need to keep a coherent view of memory, and those barrier instructions are important for that.
Note that one of the big changes in the code from v10 to v11 was the built in support for SMP.
To be precise it is a port for the TMS570CL435. This processor is arm R5-based. As there is no FreeRTOS port for this we are writing our own based on the R4 port and the provided HalCoGen (Code generator from Texas Instruments) code. As HalCoGen only supports older FreeRTOS versions we had to adapt it ourselves.
When adapting this code we did not make any adaptions for SMP. I’m also not familiar with the implications of SMP on the port code.
The TMS570CL435 behaves like a single core. So my understanding is that this also not necessary if I’m only using the single core variant. Shouldn’t it then behave like before, without SMP?
taskYIELD_WITHIN_API is called here after resuming the scheduler and it should be enough. Attempting to yield when the scheduler is suspended does not seem like the right thing to do.
If your part is single core, you do not need to worry about SMP.
Unfortunately this port is not applicable for our MCU.
Yes, I also saw that taskYIELD_WITHIN_API() is called shortly after. Could that if clause in between take too much time and cause my Problems? I see that this small change makes the difference between a working version and a non working version and I want to understand why.
This likely means that the issue is elsewhere and you are just masking the problem by making these small changes. Have you written this R5 port yourself or did you get it from some vendor? Can you do a quick comparison with the port I shared above and see if something is missing (like a barrier etc)?
As I mentioned above it is a modified version of the Texas Instruments version. I cannot see any changes to the port that could cause this. I’ll upload the files if you want to have a look (declared the .asm file to .c) to be able to upload it)
One thing I notice it that the port does not disable MPU before programming it in the context restore code. This is needed for Cortex-M but I am not sure about Cortex-R. May be check with TI if they are aware of some gotcha while using this port?
Another thing to look for is if some MPU region’s (including the internal ones set in vPortStoreTaskMPUSettings) setting is disabling cache unexpectedly.
I got some answers from the TI forum. I’m not sure on how to implement that, but maybe it helps you to unterstand the problem better. The answer from TI:
I used internal AI to found root cause and recommendations then i found some useful information, refer it once:
What is happening?
With the L1‑D cache enabled the ISR that delivers a notification writes the
notification fields (ulNotifiedValue, ulNotificationState) into a cache
line that is not written back before the scheduler looks at them.
Consequently xTaskNotifyWait() never sees the change and the task stays
blocked.
Why does adding portYIELD_WITHIN_API() fix it?
The macro you added ends with a DSB/ISB (and, on the TMS570, a cache‑clean
of the whole data cache). The extra barrier forces the pending write‑back,
so the scheduler sees the updated TCB and the task wakes.
What needs to be done?
Add proper cache‑maintenance to every place where the ISR or the
scheduler touches a TCB or any other data structure that is also read by
the other context.
Add the required data‑synchronisation barriers (DSB/ISB) after every
write to the PendSV set‑register, after svc/eret, and after leaving
a critical section.
Make the RAM region that holds all FreeRTOS kernel objects (TCBs,
ready‑lists, event‑lists, etc.) non‑cacheable or configure the L1‑D cache
as write‑through for that region.
If the issue is that a write to a cache line “is not seen” by the scheduler, then the system must be in a multi-core mode, as a single core can’t “not see” the change in the cache line. That, or the problem is the AI you are asking doesn’t understand what is actually happening (as is normal for them) and is just answering about the more general most likely cause of something like this without knowing the details.
@richard-damon, Looking at the port files they shared, they are not using SMP. Given that, the only possibility I can think of is that the cache is being disabled for a memory region without first flushing it, causing cached values to be lost. I have encountered this before once when MPU settings for a memory region were marking it as non-cacheable, which resulted in values in that region being lost when the task was switched in.
in vPortStoreTaskMPUSettings() we set the Stacks and KERNEL_DATA to normal outer and inner write-back, write allocate and non shared and privileged. The rest is just normal outer and inner write-back, write allocate and non shared.
The last region is set to normal outer and inner non-cachable and shared