Intro
We are currently evaluating FreeRTOS v11.2.0 for the use on a Cortex-M33 using the MPU (STM32 H563 or H573 if relevant).
We are currently not using TrustZone, but this may change in the future.
To offload processing burdens, we plan to use a DMA for many peripherals e.g. UART communication.
Documentation
Regarding this topic, I looked at the documentation of xTaskCreateRestricted, which I think is incorrect:
It refers to xMemoryRegions.ulParameters using portXXX macros, but the code I saw indicates they should use the tskXXX macros defined in task.h.
Did I understand correctly that the documentation is incorrect in this regard?
Sharing with DMA
To me, it looks like the shareable setting for configurable MPU regions is hardcoded as non-shareable:
// portable/GCC/ARM_CM33_NTZ/non_secure/port.c l. 1953 ff
xMPUSettings->xRegionsSettings[ ulRegionNumber ].ulRBAR = ( ulRegionStartAddress ) |
( portMPU_REGION_NON_SHAREABLE );
However, to allow correct usage of other bus masters like the DMA, the corresponding setting must be portMPU_REGION_OUTER_SHAREABLE.
Did I understand correctly that using tskMPU_REGION_NORMAL_MEMORY currently will result in a setting that is incompatible with the usage of DMA (or another bus master) accessing this memory region?
Assuming I am right, I am confused that I didn’t find more about this. I would have assumed that using DMA to write to a MPU-enabled memory region is a common use case.
The non-secure side of the Cortex-M33 only supports 8 regions, out of which 5 are already used by FreeRTOS itself (protect kernel and task stacks), which only leaves 3 remaining regions.
Assuming that a task which uses DMA-enabled hardware also needs access to a region with peripheral registers, this already uses up 2 (peripherals+DMA buffers) regions, leaving only one region for “everything else”. This makes it likely that the region that includes the DMA buffers also includes other data with different access patterns.
I see the following options to solve the DMA problem:
Option (1): Device memory
We could use tskMPU_REGION_DEVICE_MEMORY instead of tskMPU_REGION_NORMAL_MEMORY, since the shareable attributes are ignored for device memory.
This will reduce performance of accesses to this region, which is both relevant for large objects like communication buffers as well as for objects which might be accessed very often.
Option (2): Always shared
Locally patch port.c to default to use portMPU_REGION_OUTER_SHAREABLE.
This would enable the shareability for all regions, which will decrease performance for other tasks, while being better than (1) for the tasks accessing the buffers.
I have no idea which of options (1) or (2) are better performance-wise, and I also don’t know how relevant the performance difference really is.
Option (3): Additional flag
Add another setting like tskREGION_SHARED_MEMORY and set the shareability according to this flag.
What approach would you recommend? Or is there another solution to this problem I haven’t thought about?
Cache and task switching
There is another potential problem which I encountered on a Cortex-M7 processor using SafeRTOS, which might also be relevant in this case, and I would like to know whether it is realistic that this problem occurs here as well.
Assume the following structure:
Region R contains both a DMA buffer B and some other data D.
The DMA writes to B.
ISR I writes to D. (I is not necessarily related to the DMA, but could also be e.g. a DMA transfer complete interrupt which copies some DMA status registers to D)
Task T1 reads B and D and has access to region R configured.
Task T2 does not use any of this data, and therefore does not have R configured.
When I fires while T2 executes, the MPU does not contain a setting for R, so I accesses D without an active MPU region.
If PRIVDEFENA=1, this would generate a fault.
If PRIVDEFENA=0, which is the default, I accesses D using the “default memory map”, which may differ in cache settings from the setting in T1/R.
This difference in cache settings may prevent invalidation of the cache used by T1.
After a task switch, T1 may access outdated data from the cache.
On the other processor, we enabled the DMA buffer region for all tasks, so even while executing T2, there will still be an active cache setting for R (maybe with access set to privileged-only).
However, the low number of configurable regions for the M33 would be a problem for this approach.
Unfortunately, I was so far unable to find documentation of the default cache settings for my processor, so I do not know which option (if any) would provide compatible caching and shareability settings.
Would this problem also occur on the FreeRTOS Cortex-M33 port or does FreeRTOS implement some other measures (e.g. explicit cache invalidation on context switch)?