Race in xTaskGenericNotifyWait() when cache enabled

Hi everyone,

I am debugging an issue on my costum Cortex-R5 FreeRTOS port. I’m currently using FreeRTOS Kernel V11.1.0.
I traced my wait API call OS_WaitForNotificationIndexed() down to xTaskNotifyWaitIndexed, which maps to xTaskGenericNotifyWait in tasks.c.

Observed behavior:

  1. With cache enabled, notification wait behaves incorrectly. I don’t get out of the wait state

  2. With cache disabled, it appears to work.

After applying this Change (inserted snippet from V10.6.1):

--- a/src/os/freertos/tasks.c
+++ b/src/os/freertos/tasks.c
@@ -7769,6 +7769,12 @@ TickType_t uxTaskResetEventItemValue( void )
{
traceTASK_NOTIFY_WAIT_BLOCK( uxIndexToWaitOn );
prvAddCurrentTaskToDelayedList( xTicksToWait, pdTRUE );
+
+ /* All ports are written to allow a yield in a critical
+ * section (some will yield immediately, others wait until the
+ * critical section exits) - but it is not something that
+ * application code should ever do. */
+ portYIELD_WITHIN_API();
}
else
{

After adding portYIELD_WITHIN_API at that point, behavior is stable again. My understanding is:

  1. Once the task is moved to the delayed/blocked list, it must yield immediately.

  2. portYIELD_WITHIN_API forces an immediate scheduler handover from inside the kernel API path.

  3. On this Cortex-R5 port, it also includes ordering barriers (DSB/ISB) after triggering the software interrupt, which likely matters more when cache is enabled.

  4. With cache disabled, slower execution can mask this race.

Is this the expected explanation, or is there another port-specific reason why cache-enabled mode exposes this when portYIELD_WITHIN_API is omitted?

Thanks!

Is your custom port a SMP port? When using SMP proper cache usage becomes very important, as all the processes need to keep a coherent view of memory, and those barrier instructions are important for that.

Note that one of the big changes in the code from v10 to v11 was the built in support for SMP.

To be precise it is a port for the TMS570CL435. This processor is arm R5-based. As there is no FreeRTOS port for this we are writing our own based on the R4 port and the provided HalCoGen (Code generator from Texas Instruments) code. As HalCoGen only supports older FreeRTOS versions we had to adapt it ourselves.

When adapting this code we did not make any adaptions for SMP. I’m also not familiar with the implications of SMP on the port code.

The TMS570CL435 behaves like a single core. So my understanding is that this also not necessary if I’m only using the single core variant. Shouldn’t it then behave like before, without SMP?

This one does not work for you - FreeRTOS-Kernel/portable/GCC/ARM_CR5 at main · FreeRTOS/FreeRTOS-Kernel · GitHub?

taskYIELD_WITHIN_API is called here after resuming the scheduler and it should be enough. Attempting to yield when the scheduler is suspended does not seem like the right thing to do.

If your part is single core, you do not need to worry about SMP.

Hi,

Unfortunately this port is not applicable for our MCU.

Yes, I also saw that taskYIELD_WITHIN_API() is called shortly after. Could that if clause in between take too much time and cause my Problems? I see that this small change makes the difference between a working version and a non working version and I want to understand why.

This likely means that the issue is elsewhere and you are just masking the problem by making these small changes. Have you written this R5 port yourself or did you get it from some vendor? Can you do a quick comparison with the port I shared above and see if something is missing (like a barrier etc)?

As I mentioned above it is a modified version of the Texas Instruments version. I cannot see any changes to the port that could cause this. I’ll upload the files if you want to have a look (declared the .asm file to .c) to be able to upload it)

portasm.c (16.4 KB)

portmacro.h (15.5 KB)

port.c (18.5 KB)

One thing I notice it that the port does not disable MPU before programming it in the context restore code. This is needed for Cortex-M but I am not sure about Cortex-R. May be check with TI if they are aware of some gotcha while using this port?

Thank you for having a look on it. I will investigate that further and come back with my outcomes.

Another thing to look for is if some MPU region’s (including the internal ones set in vPortStoreTaskMPUSettings) setting is disabling cache unexpectedly.

portMPU_STRONGLYORDERED_SHAREABLE       
portMPU_DEVICE_SHAREABLE                
portMPU_NORMAL_OIWTNOWA_NONSHARED       
portMPU_NORMAL_OIWBNOWA_NONSHARED       
portMPU_NORMAL_OIWTNOWA_SHARED          
portMPU_NORMAL_OIWBNOWA_SHARED          
portMPU_NORMAL_OINC_NONSHARED           
portMPU_NORMAL_OIWBWA_NONSHARED         
portMPU_NORMAL_OINC_SHARED              
portMPU_NORMAL_OIWBWA_SHARED            
portMPU_DEVICE_NONSHAREABLE             

Hi Gaurav,

I got some answers from the TI forum. I’m not sure on how to implement that, but maybe it helps you to unterstand the problem better. The answer from TI:

I used internal AI to found root cause and recommendations then i found some useful information, refer it once:

  • What is happening?
    With the L1‑D cache enabled the ISR that delivers a notification writes the
    notification fields (ulNotifiedValue, ulNotificationState) into a cache
    line that is not written back before the scheduler looks at them.
    Consequently xTaskNotifyWait() never sees the change and the task stays
    blocked.

  • Why does adding portYIELD_WITHIN_API() fix it?
    The macro you added ends with a DSB/ISB (and, on the TMS570, a cache‑clean
    of the whole data cache). The extra barrier forces the pending write‑back,
    so the scheduler sees the updated TCB and the task wakes.

  • What needs to be done?

    1. Add proper cache‑maintenance to every place where the ISR or the
      scheduler touches a TCB or any other data structure that is also read by
      the other context.

    2. Add the required data‑synchronisation barriers (DSB/ISB) after every
      write to the PendSV set‑register, after svc/eret, and after leaving
      a critical section.

    3. Make the RAM region that holds all FreeRTOS kernel objects (TCBs,
      ready‑lists, event‑lists, etc.) non‑cacheable or configure the L1‑D cache
      as write‑through for that region.

From TMS570LC4357: TMS570LC4357 FreeRTOS Porting to FreeRTOS V11.1.0 - Arm-based microcontrollers forum - Arm-based microcontrollers - TI E2E support forums

Thank you and best regards

If the issue is that a write to a cache line “is not seen” by the scheduler, then the system must be in a multi-core mode, as a single core can’t “not see” the change in the cache line. That, or the problem is the AI you are asking doesn’t understand what is actually happening (as is normal for them) and is just answering about the more general most likely cause of something like this without knowing the details.

@richard-damon, Looking at the port files they shared, they are not using SMP. Given that, the only possibility I can think of is that the cache is being disabled for a memory region without first flushing it, causing cached values to be lost. I have encountered this before once when MPU settings for a memory region were marking it as non-cacheable, which resulted in values in that region being lost when the task was switched in.

Indeed. As mentioned before we are not using SMP.
We have the following memory setup:

/* RAM */
STACKS : origin = 0x08000000
length = 0x1800
KERNEL_DATA : origin = 0x08001800
length = 0xB000
RAM : origin = 0x0800C800
length = 0x71800
SHARED_RAM : origin = 0x0807E000
length = 0x2000

in vPortStoreTaskMPUSettings() we set the Stacks and KERNEL_DATA to normal outer and inner write-back, write allocate and non shared and privileged. The rest is just normal outer and inner write-back, write allocate and non shared.
The last region is set to normal outer and inner non-cachable and shared

These region configurations do not seem correct. MPU regions are must strictly adhere to power-of-two sizes and be aligned to their region size.