2024 RTOS performance report

robert.berger · November 1, 2024, 11:14am

Here [1] on the Zephyr Project git repo started some discussions about the poor performance of Zephyr, but looking at the results, FreeRTOS does not shine either. What especially puzzles me is the poor Message Processing performance, since many inter-task communication facilities are built around message queues in FreeRTOS. What are your thoughts? Is someone willing to try to replicate the results? Maybe some implementation on the tests or even FreeRTOS could be improved?

Let me add, that FreeRTOS failed the “deterministic scheduling test”, so I kind of believe that something might be wrong with the test case.

[1] Discussion on Zephyr's Poor Real-Time Performance · zephyrproject-rtos/zephyr · Discussion #79785 · GitHub

OliM · November 1, 2024, 11:53am

Looking to the Settings used: configUSE_TRACE_FACILITY is probably not without timing impact. Also 56 priorities are far from the standard configuration.
Semaphores might still be a tad slow, but this is what direct to task notifications were made for.

robert.berger · November 1, 2024, 12:20pm

One issue is that the code executed is not quite “in the open” as far as I can tell. At least I did not find it anywhere so far. So it’s a bit difficult to reproduce the results.

richard-damon · November 1, 2024, 1:05pm

The paper has a link to the suite: threadx/utility/benchmarks/thread_metric at master · eclipse-threadx/threadx · GitHub

I will add that STM’s isn’t the best source for “well configured” code, and that shows in the settings they used.

The “non-deterministic” behavior of cooperative processing might have been due to the fact that their settings had pre-emption enabled, so it wasn’t purely cooperative.

robert.berger · November 1, 2024, 2:56pm

Yes I found this, but it is not written for FreeRTOS, but just just for Threadx and I am wondering how the “porting” was done to FreeRTOS - and probably the FreeRTOS configuration was also not so optimal.

rtel · November 1, 2024, 3:33pm

The report is pretty meaningless unless you optimise the settings for performance first. Using configPORT_OPTIMIZED_TASK_SELECTION for example switches between a single assembly instruction selecting the next task, and a generic C algorithm selecting the next task. In that last link you can see having configASSERT() defined will make a massive difference too. Stack overflow checking is the part of context switching that takes the most time - stack overflow checking and asserts are very useful during development and can save a lot of debugging time but don’t come for free and can be removed with simple updates to the configuration file.

In any case - the queue mechanism in FreeRTOS is comprehensive and can be optimised to make it faster. It was originally written to minimise size, so has the event mechanism built in. It has also been formally proven (i.e. using formal methods) to be “correct”. For example, it contains a loop so if a task is unblocked because there is data in the queue (or a semaphore is available) it will re-calculate its remaining block time and block again if another task consumes the data/semaphore before it actually gets to execute. That takes time, but like I say is “correct”. It could be made faster by passing the data item (or semaphore) directly to the blocked task, rather than placing the data in the queue and have the task remove it from the queue, but then it would be “incorrect” if a higher priority task attempted to read the data in the mean time.

[edit]I completely disregarded the Zephyr results when I first read the report because I assumed Zephyr was doing lots more than any of the other RTOSes - like full memory protection, stack guards, etc. whereas you need to use the specific MPU port to have the equivalents in FreeRTOS. You may as well compare Zephyr with real time Linux and note how slow Linux is.[/edit]