FreeRTOS STM32F4-7 TASK structs in DTCM

glenenglish wrote on Monday, April 24, 2017:

The subject title says it all.
Has anyone put the FreeRTOS data structs that are used most often into the DTCM memory of the STM32F4/7 ?

This would mean the processor doesnt have to get the structures out of cache (possibly splatting the cache ) during a context switch. I already put interrupt handlers in ITCM, might be useful to force FreeRTOS into ITCM, also, for when I have lots of tasks going fast. Depends on the usage case, of course.

FreeRTOS has very fast and cheap context switches… That’s the huge advantage over say, Linux when task switch overhead is compared. ( but not trying to compare linux to freertos here- they are different use cases) .

rtel wrote on Monday, April 24, 2017:

Never tried. If you statically allocate the structures then it should
be easy enough to put them in DTCM and experiment. How big is the DTCM?

glenenglish wrote on Tuesday, April 25, 2017:


STM32F429 "core coupled memory - data " 64kB (doesnt have to access SRAM via AHB0 which is good for dealing with heavy DMA users)
(STM32F4- no data cache. has instruction cache of sorts to deal with flash speed.)

STM32F7 instruction cache, data cache varies from 4 to 16kB
TCM has direct processor connect like CCM.
DTCM varies 64kB to 128kBytes,

Which is plenty for my work !

STM32H7 : 64kB ITCM, 128kB DTCM… the DTCM is actually 2 x 64kB and can be used dual issue (dual instruction, operand fetch ) if you are on your game.

With the advent of cache, micro controller programmers need to learn about that stuff and think carefully about how they write their variable access.
I learned about it in 2007 on Blackfin and I found I had to completely re write my algorithms to avoiding thrashing. Then I learned to love cache and embrace it.

I try to group all my frequently access globals packed up into one cache line’s worth etc.

The use of DTCM reduces the programmers need to be so careful with cache misses.

glenenglish wrote on Tuesday, April 25, 2017:

Hmm I probably want TWO different types of PortMalloc

one for DTCM and one for everything else…

This is probably much more important for the F7 that has data cache, rather than the F4.

Like in TaskCreate() , the stack can go in DTCM so stack access doesnt have to fight DMA masters

I guess I could use the TaskCreateStatic() and supply it memory pointers.

DMA able memory must go in normal SRAM.