Synchonisation in Trustzone secure world by callbacks to non-secure world

Jan · January 18, 2021, 11:48am

Hi,

I am using FreeRTOS on a Cortex-M33 where I use Trustzone-M to communicate
with a DSP. While the DSP is doing its work I want to switch the M33 to
another runable task if present. I am trying to realize this with
callbacks to non-secure world and a semaphore in non-secure world.

When the DSP reports that it is finished the interrupt handler performs
a callback to non-secure world and in callback functions I execute:

xResult = xSemaphoreGiveFromISR(semaphore, &xHigherPriorityTaskWoken);

portEND_SWITCHING_ISR(xHigherPriorityTaskWoken);

The code in secure world waits on the DSP by also sending a callback
to non-secure world which executes:

xResult = xSemaphoreTake(semaphore, portMAX_DELAY);

The semaphore is created with xSemaphoreCreateCounting().

At first sight the code seems to work. However when I run it for a longer
time then xSemaphoreGiveFromISR does no longer unblock the
xSemaphoreTake. Sometimes after a few interactions with the DSP. Sometimes
after many thousands. What could be wrong? In my debug tracing, I see
that xSemaphoreGiveFromISR returns but it does not unblock xSemaphoreTake.
There are not other tasks in the test code.

Jan

rtel · January 18, 2021, 5:19pm

Can you please give a little clarification as to what using Truszone-M to communicate with a DSP means. I’m assuming you have a dual core system with a Cortex-M33 and a DSP as two separate cores (or two separate chips even) and you are communicating with the DSP from the secure side of the Cortex-M33. Is that correct?

Also can you elaborate on the what it means for the code in the secure world waiting on the DSP - is the code that is waiting a FreeRTOS task?

Jan · January 18, 2021, 6:02pm

I am using Trustzone because the usage of the DSP needs security. To use the DSP from the non-secure world, the SW calls a non-secure callable function in secure world that sends a command to the DSP. Then in secure world we call a callback function in non-secure world that will do a ‘semaphore take’ so that it blocks on a samaphore until the DSP finishes the operation. When the DSP finishes the operation it will send an interrupt to the M33 that arrives in the secure-world. The secure world will then again call a callback function in non-secure world that does a ‘semaphore give’ operation. This unblocks the task that was waiting on the result of the DSP.

In an implementation with a spin-loop to wait on the DSP result everything seems to work fine. But that is a waste of cpu cycles and power. The semaphore implementation is sort of working. It can work for a few minutes or a few hours but then it deadlocks. The ‘give’ for the ISR does not unblock the task that is blocked by the ‘take’.

aggarg · January 18, 2021, 11:00pm

When the DSP finishes the operation it will send an interrupt to the M33 that arrives in the secure-world. The secure world will then again call a callback function in non-secure world that does a ‘semaphore give’ operation.

Are you calling the callback function in non-secure world from the secure ISR? If so, are you using the “FromISR” function for the ‘semaphore give’ operation?

Thanks.

Jan · January 19, 2021, 9:04am

Yes and Yes. I also do portEND_SWITCHING_ISR(xHigherPriorityTaskWoken).

And I verify with an assert that the semaphore operations return pdTRUE. The asserts are never failing.

aggarg · January 19, 2021, 8:15pm

I think this is what is happening: FreeRTOS enables secure exception priority boosting to ensure that the secure interrupts are not masked by FreeRTOS critical sections. As a result, a secure ISR cannot call FreeRTOS API functions because it can interrupt FreeRTOS critical section and therefore, can potentially corrupt FreeRTOS internal data structures.

Instead of calling the semaphore give API from the secure ISR, would you please pend a non-secure interrupt using NVIC_ISPRn_NS register and call the semaphore give API from the non-secure ISR. If you are using CMSIS, you can use the TZ_NVIC_SetPendingIRQ_NS function.

Note that you will need to ensure that the non-secure interrupt that you pend has a logical priority less than (i.e. numerically higher priority) condigMAX_SYSCALL_INTERRUPT_PRIORITY. If you have configASSERT defined, it will be triggered if the priority configuration is not correct.

Thanks.

Jan · January 20, 2021, 12:06pm

Thanks a lot! Your thinking makes sense and I assume that this is the problem. It is now running for a couple of hours without problems.

Instead of posting the interrupt from secure world, I still do a callback to non-secure world and in the non-secure world I post the interrupt.

Jan.

OliM · September 15, 2023, 8:19am

I realized today that I am also running into the problem described here. Since I can also prevent the secure ISRs from being blocked by choosing an appropriate priority, what is the sensible way to prevent FreeRTOS from enabling the secure priority boosting?

OliM · September 15, 2023, 9:37am

In my case it’s not IO. It’s the AES unit, to use secure memory for the keys, telling the OS the encryption was done.

aggarg · September 15, 2023, 11:17am

That is not an option currently. Does the solution I mentioned above work for you?

OliM · September 15, 2023, 12:19pm

To be honest my intermediate solution looks like this:

I do understand the idea of settings a non secure IRQ as pending, but for what I am doing it seems to be a lot of effort without any benefit. I have more than one security related drivers on the secure side and expected them to unblock tasks via callbacks. Those callbacks can change depending on the piece of code currently using the peripheral. With the NS_IRQ solution I would need to implement all of their IRQs also on the unsecure side and I would need to invent a way how I can get the info about the callbacks address (yet alone arguments for that callback) to those IRQs though the Trustzone.

kstribrn · September 15, 2023, 10:55pm

I’m not familiar with this port so I’ll defer to @aggarg here. I’d assume he mentions it’s not possible as you can have interrupt priority clashes. Commenting out the priority boosting will likely lead to problems.

jefftenney · September 16, 2023, 7:33am

Hi Oliver. Your temporary solution should work OK. Theoretically it could be made into a FreeRTOS configuration option, even though it is not currently an option. (I think that’s what Gaurav meant.)

But a better fix would be to lower the priority of the secure exceptions that make FreeRTOS API calls. They must be lower priority (higher value) than the deprioritized max syscall priority. Here’s the Architectural Reference Manual’s visual indication of what happens to non-secure interrupts in a TZ application that uses FreeRTOS:

On STM32U5, there are 4 priority bits. Let’s say you set configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY to 5. So priorities 5 through 15 are good for FreeRTOS API calls. However, on the secure side, only 10 through 15 are good for FreeRTOS API calls. That’s because the deprioritized configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY is 10, not 5. Can you give that a try?

As a side note, unfortunately FreeRTOS cannot validate interrupt priorities of secure interrupts because the ARMv8-M architecture prevents the nonsecure FreeRTOS API function from determining which secure interrupt is active. So you’re on your own setting the correct interrupt priorities for secure interrupts that call FreeRTOS API functions.

OliM · September 16, 2023, 9:07am

Thank you, I will try this first thing when I’m back in the office. This also sounds like a better solution for the original problem.
My understanding before, especially considering the old solution, was that with the “boost bit” set, the secure IRQ trumps every non secure IRQ, no matter the original priority.

aggarg · September 18, 2023, 5:24am

That is a great suggestion @jefftenney. The mapping in your case should translate to the following:

+------------+--------+
| Non Secure | Secure |
+------------+--------+
| 0,1        | 8      |
+------------+--------+
| 2,3        | 9      |
+------------+--------+
| 4,5        | 10     |
+------------+--------+
| 6,7        | 11     |
+------------+--------+
| 8,9        | 12     |
+------------+--------+
| 10,11      | 13     |
+------------+--------+
| 12,13      | 14     |
+------------+--------+
| 14,15      | 15     |
+------------+--------+

Therefore, as Jeff suggested, if your configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY is 5, setting secure interrupt’s priority to 10 or higher (mathematically) should address your problem. Let us know whatever you find.

OliM · September 18, 2023, 8:25am

Works as intended (my test setup which before broke after about 500 rounds of NotfiyWait->NotifyFromISR did a few million rounds without any lockup). As I didn’t actually want to use all those levels anyway but only had two defines to be used for high or low priority interrupts below FreeRTOS, I just redefined those to 11 and 12 and everything worked.

aggarg · September 18, 2023, 9:00am

Thank you for reporting back! I will mark Jeff’s suggestion as the solution.

jefftenney · September 18, 2023, 3:53pm

This is a pretty subtle requirement – namely that secure interrupts that call FreeRTOS API functions must use priorities at or below the deprioritized max syscall priority. And on top of that, function vPortValidateInterruptPriority() is not able to detect violations.

I thought maybe we could add a secure function to help vPortValidateInterruptPriority() catch this configuration error, but that idea doesn’t seem to be panning out. Would be nice if we could improve the port somehow to catch this. Ideas welcome.

OliM · September 18, 2023, 4:42pm

Could you expand on why you don’t think a secure function for that check would work out?

jefftenney · September 18, 2023, 5:40pm

The check in vPortValidateInterruptPriority() is done by reading IPSR because that register indicates the exception number of the exception being handled. The code then looks up the priority of that interrupt.

When a secure ISR calls an nonsecure function (eg, a FreeRTOS API function), the core changes IPSR to 1. The intent is to hide from the nonsecure side which interrupt is active. The value 1 is good enough for xPortIsInsideInterrupt(), but it’s no good for looking up the priority of the current interrupt.

I had hoped IPSR would automagically change back to the correct value if we jumped back over to the secure side, acting something like the state-banked registers. But what I found in the Architecture Manual shot that down. The BLXNS instruction, used to call the nonsecure function from the secure ISR, stacks the original IPSR value and changes IPSR to 1. IPSR is not a banked register. I didn’t even bother trying it on the hardware.