Initial comparison between SMP and AMP configurations

TheSemaphoreNoob · March 26, 2024, 4:31pm

Hey there !
I’m currently on an internship at… heh, not sure I can say it. I’m working on a chip with 2 cortex-M7, some esRAM, and some eNVM. My mission is to prototype a solution to use FreeRTOS on both cores. A great opportunity to improving my skills about FreeRTOS and multicores processors !!
Firts of all, I’ve to make a comparative study about AMP and SMP configurations. I’ve to find pros and cons of both options, then to present it.
My tutor will confront these criterias with our chip and then make his choice.
I found several criteria : hardware requirements, ease to use (while designing the tasks attributions…), need to rewrite some macro-functions (sbSEND_COMPLETED() for message buffers, for example), scalability, cache coherency…
By reading doc, forums, and few examples I found, I can’t think of more points, but it seems too light imo. Am I unseeing some other important criterias ? If you had to make the same choice as qualified developers, would you have a look on other criterias ?

Another question : I see on some topics the question “Are you writing a new port for SMP?” about some plateforms. I also found a guide to do it. If I understand the guide (not allowed to link it ), it means “write the portable files to make it works for my specific platform”, isn’t it ? Or something else / more ??
If that so, then I think it’s another criteria ! I can find some topic about SMP on dual cortex-M7, but I can’t find a publication on how it’s done. Maybe it’s something that companies don’t want to share ?

Thanks for your help !

richard-damon · March 26, 2024, 5:21pm

I will comment that first thing to look at for deciding between SMP and AMP is if the processors ARE symmetric. A lot of these sorts of system give each processor some private memory to be doing most of its work out of so they aren’t competing all the time for shared ram.

If the processor have private esRAM, then you need to give that up to work as SMP.

I find symmetric design more common on “bigger” processors (which then have enough local cache to avoid the conflicts) then the smaller M series parts, which tend to have more local resources, and some shared for communication.

aggarg · March 27, 2024, 4:29am

Another thing to consider is isolation - AMP provides you better isolation as those are 2 separate instances of FreeRTOS.

Yes, that is correct.

We do not have an SMP port for Cortex-M7 but you can certainly write one (and upstream too ).

TheSemaphoreNoob · March 27, 2024, 9:11am

Hey there ! thanks for answers !

If the processor have private esRAM, then you need to give that up to work as SMP.

(sorry don’t know how to cite with names ? )

Well that’s my “hardware requirements” point : eNVM and esRAM are not private. But I’ve some concerns about cache and TCM. From what I see, cache and TCM are not inside the cores, but between the 2, within a “TCM matrix” and “cache matrix”. According to the intern datasheet, these matrixes can be splited at 70/30 or 100/0 (depending on configurations.) Here, I don’t understand well if even these memories have to be symmetric. I think it’s necessary, for example if a task (with no affinity setting) is moved from core 0 to core 1 by scheduler, with less available cache… Moreover, I don’t know if these matrix splits are imposed by hardware or software. In the first case, We’ll give up on SMP, of course. But other points are important : If they are all in sense of SMP, my tutor’s team do not exclude to modify hardware for next iterations, so I’m gathering the informations !

Another thing to consider is isolation - AMP provides you better isolation as those are 2 separate instances of FreeRTOS.

Thanks, another trail to follow ! may it could be important for the safety component of the project.

We do not have an SMP port for Cortex-M7 but you can certainly write one (and upstream too ).

I’d like to ! as free software is important for me. But it will depend on the company : I can’t share what I want without the approval of my tutor. I guess it’s normal in international companies. If they choose to use AMP, then it could be a great exercice to make the port on my free time. In that case there will be any issue if I share it.

richard-damon · March 27, 2024, 11:37am

Remember, Tightly Coupled Memory (TCM) is just Embedded Static RAM (esRAM) with a special path to the processor to be able to get single-cycle access (i.e. be tightly coupled). This tends to make it (mostly) private to the processor it is connected on, as arbitrating the accesses can slow down the connection.

It sounds like you can adjust the partitioning of TCM on this processor (like so it can be uses as just a single processor at times) but the other option of 70/30 makes me wonder if the processors aren’t actually equal either. Many Dual-Processors are set up with a High-Efficiency processor for lowest power operation, with a second High-performance processor that is put to sleep when not needed, but can be woken up when needed. This makes the processors asymterical and thus AMP a better model for them.

TheSemaphoreNoob · March 27, 2024, 1:55pm

Remember, Tightly Coupled Memory (TCM) is just Embedded Static RAM (esRAM) with a special path to the processor to be able to get single-cycle access (i.e. be tightly coupled). This tends to make it (mostly) private to the processor it is connected on, as arbitrating the accesses can slow down the connection.

/me taking notes
OK ! I didn’t saw it this way. I thought TCM was an extension of cache, so it had to be considered as cache, but with a RAM tech for lowering cost.

the other option of 70/30 makes me wonder if the processors aren’t actually equal either.

Initially, this choice was made to use core 0 as main core, with its FreeRTOS, and core 1 as a co-processor, in baremetal. So it was coherent to give more to core 0 than to core 1. I just asked to my tutor : it’s imposed in silicon, so for this chip, it can’t be changed. So I added a constraint to my list about SMP configuration : we should disable the TCM.
I also get confirmation the 2 cores are strictly identical even the cache split (I misunderstanded.) It’s necessary for lockstep mode. So the only difference is the TCM size for the 2 cores.
So for now, TCM (given the possibility of unable it) is not decisive. I’ve a first list of criterias, I’ll have to present them, then give them weights to rationally make a first suggestion. I guess I’ll give an important weight to the TCM point, as the “solution” will have an impact on performances…

Thanks for your help !!

richard-damon · March 27, 2024, 2:13pm

Tightly coupled memory is SRAM at a specific address range in the processor address space, and is used as program or data storage to allow the program to run faster (many processor have separate Code and Data TCM, optimized for code fetches or data fetches). Normally only the processor that it is tightly coupled to can access it (or access it normally).

Cache also tends to use SRAM technology, but a given cell of cache doesn’t have a totally fixed address, but when the processor goes to fetch a word of memory from some slower memory, if first sees if it has “cached” a copy somewhere in the cache, and if so, gets it quickly from the cache, but if not, the system will bring in a block of memory from the slower memory and stores it in a line of the cache, and then use that value. Since programs tend to refer to values near each other in memory, this means that you can frequently find locations you want in the cache, and not need to wait for the longer access time of the more external memory.

To use SMP mode, you wouldn’t need to “disable” TCM, just not use it (or only use it inside tasks that are pinned to that core, and never try to pass the address of something in that memory to anything not running on that core).

Cortex M processors tend to have a significant performance boost using TCM, (and not just counting on cache to keep access time down) which is one reason that I suspect there isn’t an already built SMP port for the CM7.