Mutex/Priority Inheritance Performance on STM32F4-Discovery (Cortex-M4)

system · June 24, 2015, 3:57pm

amtsarztlenin wrote on Wednesday, June 24, 2015:

Hello,
I am currently experimenting with Mutexes in connection with FreeRTOS 8.2.0 on a Cortex-M4 based (168Mhz) STM32F4-Discovery board using the GNU C Compiler (gcc-arm) with an optimisation level of 2.

I am interested in the performance of Mutexes, in particular I want to know how fast they can solve a priority inversion problem. For measuring the time FreeRTOS needs to detect and fix priority inversion I have thought of the following scenario:

There are three tasks - LowPriorityTask (LP), MiddlePriorityTask (MP) and HighPriorityTask (HP) - and a mutex. MP and HP begin in a sleeping state. LP takes the mutex and wakes up HP which means that LP gets preempted by HP. HP then starts the time measurement, attempts to take the mutex, fails and enters a blocked state waiting for the mutex to get available. Therefore LP continues to execute, wakes up MP and would now theoretically be preempted by MP - this would result in slowing down the execution of HP. The RTOS detects this situation and raises the priority of LP so that it can continue executing until it gives the mutex. After this has happened HP can continue executing and take the mutex itself, after this the time measurement stops.

The same scenario in pseudocode:

function LowPriorityTask
    while true do
        Take_Semaphore
        Wakeup_HighPriorityTask
        Wakeup_MiddlePriorityTask
        Give_Semaphore
    end while
end function

function MiddlePriorityTask
    while true do
        DO_SOMETHING
    end while
end function

function HighPriorityTask
    while true do
        START_MEASUREMENT
        Take_Semaphore
        END_MEASUREMENT
    end while
end function

The time measurement is being done by toggling a GPIO Pin before and after the test sequence and checking the output on an oscilloscope.

Everything works as supposed, but I have noticed one thing that I do not understand:

If I increase the stack size of the three tasks, then the performance gets better. For example with a stack size of 128kB, the resulting time is constantly 31.3µs, but when the stack size is 512kB, then the resulting time is 30.8µs.

Does anyone have an idea why increasing the stack size of the tasks has a positive influence on the performance in this scenario? If wished, I can also supply the actual code that I used.

I am looking forward to reading your responses!

rtel · June 24, 2015, 4:34pm

rtel wrote on Wednesday, June 24, 2015:

I don’t think the performance change will be to do with FreeRTOS, but the hardware. The stack size itself is not going to make any difference to FreeRTOS unless it is too small, but increasing the stack size will move the stacks around, and maybe you have hit an alignment that the hardware’s pipeline or memory fetches prefer.

Regards.

richard-damon · June 25, 2015, 4:43am

richard_damon wrote on Thursday, June 25, 2015:

Actually, your time line is a bit off.

LP runs, takes mutex.
LP starts HP.
HP starts
HP tries to take mutex, LP has priority raised to HP, and HP blocks
LP starts MP
Since LP has a temporary priority above MP, LP continues to execute.
LP gives up the mutex, and has priority dropped back.
HP gets the mutex and resumes.
HP finishes (and presumably blocks for something, and gives back mutex for later)
MP now can start and do its stuff.

As to why the timing improved, a possibility is that the extra stack size moves things a bit and changes cache performance (maybe moving two highly used items into different cache lines).

system · June 25, 2015, 10:33am

amtsarztlenin wrote on Thursday, June 25, 2015:

Thank you very much for your quick replies, these reasons seem very reasonable to me!