FreeRTOS SMP: trace the execution of multiple cores

Matth9814 · April 8, 2025, 12:10pm

Hi, I am currently trying to trace the execution of multiple cores without affecting too much the behavior of the running program. This obviously requires to manage a shared resource in the trace macros (e.g. a UART peripheral) so it would require to define the trace sections as critical before acquiring a “TRACE lock”. However I do not know if this is possible since the trace macros are scattered around in the code. Do you have any suggestions?

richard-damon · April 8, 2025, 2:01pm

My first thought is that directly logging to the UART is just not going to be feasible due to the contention you mention. I would probably give each core its own area of memory to record logging information (perhaps a log record includes where the other processor was in its logging area), and then having some task output the results, perhaps after the logging period to avoid filling that cores log with the act of logging, or find some way to skip most of the logging of the act of logging.

Matth9814 · April 8, 2025, 3:33pm

That was my backup plan too, however I was keeping it as last resort because it needs to be adapted too much to the traced program. Moreover, if the program implements a periodic schedule I still need to ensure the core that frees the log buffers is the only one accessing them, otherwise every now and then it could read some garbage. That’s why I am trying to directly output over the UART peripheral. The only problem I can think of with the approach I mentioned in the initial message are interruptions because they can lead to deadlock the program.

Structure of a trace macro:
portGET_TRACE_LOCK()
write_to_UART(...);
portRELEASE_TRACE_LOCK()

For example if CORE1 owns both the ISR and the TASK locks and CORE0 owns the TRACE lock
and an interrupt comes when CORE0 is using the UART if CORE0 tries to get the ISR/TASK lock (e.g. inside xTaskIncrementTick or vTaskSwitchContext) and CORE1 is blocked because it cannot access the TRACE lock, then the program is deadlocked. Defining a trace macro as a critical section (or simply disabling interrupts) should prevent this scenario but I just wanted someone that knows the OS better than me to tell me if I am missing something or this is just not possible.

richard-damon · April 8, 2025, 5:47pm

Serialize the traces to ownership of the UART is going to be a much bigger effect than to in memory core buffers, and. as you point out can lead to deadlock if you are not careful.

You don’t “free” the log buffers, they are statically allocated as a circular buffer, and there are way to use them that avoid the danger of looking at values that haven’t been written yet. Basically, you have a pointer for available data that isn’t update until after you have written the data to the buffer, and that has been flush (if you need cache coherency) then you write the updated pointer.

The problem with trying to allocate/free the buffers is the point where tracing is occuring isn’t a point where you can afford the interlocks to allocate memory, so you need to preallocate “enough” memory, with a error handling in case you get to a trace point and there isn’t room left in the buffer. You can either stop logging, and resume when the buffer gets empties, or you can spin wait the core for room (unless you are the core that is doing the emptying).

RAc · April 8, 2025, 8:48pm

Your initial requirement is a contradiction in terms. It is a well known fact that serial debugging significantly changes the run time behavior, frequently to the degree that problems that show without it do not show with it and vice versa. The enforced serialization is only one aspect of it.

I will second Richard’s suggestion that a much less invasive and more promising way is “silent runtime monitoring of events” that can be evaluated postmortem.

Matth9814 · April 9, 2025, 10:37am

Thank you, I get what you are saying, I just though that having “small” critical sections in some parts of the code would have had a minor impact on the execution than having a bigger time slot where a core access to a shared resource (i.e. other cores’ buffers) to read it and output everything over UART. Obviously I am assuming the buffer is not big enough not to be completely filled at least one time during the program execution. Even implementing a circular buffer with N entries per core there is a problem in case one core hangs but the other doesn’t (that is exactly the case I am trying to debug) because some useful entries may be lost before I am able to stop the execution.
Sorry for the confusion, when I said “free” I was not talking about allocated memory, I would have resorted to static allocation too, I was talking about reading the buffers so that unread entries were not overwritten.

Matth9814 · April 9, 2025, 10:48am

Thank you for the suggestion. I do not have a lot of experience in tracing the execution without a debugger (no error happens while doing instruction step) but I was trying to see if what I described before was doable as it needs less program-specific adjustments.

aggarg · April 9, 2025, 11:25am

You may want to explore Tracelyzer which supports SMP - Multicore Tracing on FreeRTOS 11 and TI AM62x - Percepio.

Matth9814 · April 9, 2025, 11:39am

Thank you, I was aware of it but I am working on my thesis and unless my university agrees to fund it I cannot afford any third-party software that is not free.

aggarg · April 9, 2025, 11:47am

Percepio recently released a limited capability version that is free - New free trace tool from Percepio.

Matth9814 · April 9, 2025, 11:52am

Thanks a lot, I didn’t know there was a “lite” version.