ulTaskNotifyTake: taskENTER_CRITICAL with portYIELD_WITHIN_API

wrong. See my previous explanation.

@Austin Do you actually have a problem in your current application or is a request for comment/clarification ?

We have problem in our application and this is one suspect area, but not confirmed yet.

We got the point Rac mentioned, but not fully align with the code. Also tried to understand if “svc 0” is deferred by taskEnter_Critical, but didn’t get any document for “svc 0” behavior.

We are using the port from FreeRTOS-Kernel/portable/GCC/ARM_CA53_64_BIT/, the yield is a “svc 0” call.

What do you mean “not fully align with the code?”

https://www.keil.com/support/man/docs/armasm/armasm_dom1361289909139.htm

This one reads that the SVC instruction “causes an exception.” I read this to mean that this is subject to the standard Cortex A9 exception handling which is documented in the “ARM Generic Interrupt Controller Architecture Specification” which you can download from ARM.

Note that the ability to schedule instead of unconditionally invoke an exception handler is at the very heart of each and every processor architecture. Without this, you were right; we would necessarily deadlock. If the SVC would indeed immediately and unconditionally invoke the handler with disregard to priorities and interrupt mask, we wouldn’t need an interrupt in the first place, though. We might as well call a subroutine.

BTW, what is this #ifdef GUEST all about?

Ok - there is a problem and you suspect that the FreeRTOS portable layer is the reason ?
Could you explain what exactly the problem is ?
Besides that could you add which FreeRTOS version on which hardware you’re using ?

What do you mean “not fully align with the code?”

Let me clarify. We using the code from https://github.com/FreeRTOS/FreeRTOS-Kernel.
What I talked above is from folder portable/GCC/ARM_CA53_64_BIT. It is a portable part for Cortex A53 core with AARCH64

In the code:
taskENTER_CRITICAL: it just mask IRQ, it won’t affect SVC call (right?)
portYIELD_WITHIN_API==>svc 0 ==> Triger exception==>FreeRTOS_SWI_Handler is called for task switch (it is my understanding)

If the SVC would indeed immediately and unconditionally invoke the handler with disregard to priorities and interrupt mask

This is my understanding.

BTW, what is this #ifdef GUEST all about?

GEUST means it is running in EL1. For our case, GUEST is true/defined

This is a similar Can a Switch to another task occur between the call “taskENTER_CRITICAL” and “taskEXIT_CRITICAL”? with comments in this thread:

but such as on CA9, the volunteer yield operation is done through “svc” call, which would step into svc flow directely and do the swich at once, i did not see any condition would block this switch,
it seems the swith would happed and success on CA9 port., which meas voluteer yiled could be done in “taskENTER_xxxx” and “taskEXIT_xxxx” region.

there is a problem and you suspect that the FreeRTOS portable layer is the reason ?

I am not sure. I firstly look at ulTaskNotifyTake and see there is yield between taskENTER_CRITICAL/taskEXIT_CRITICAL and then I dug into the code.

In task.c, it looks ulTaskNotifyTake/xTaskNotifyWait are the only case that has yield between taskENTER_CRITICAL/taskEXIT_CRITICAL

Could you explain what exactly the problem is ?

Problem is the task that has “ulTaskNotifyTake” can’t get notified, and the IRQ that has xTaskGenericNotifyFromISR is not triggered

Besides that could you add which FreeRTOS version on which hardware you’re using ?

Answered in above post

If this WAS true, then you were right; there’d be an immediate deadlock. Yet I’m almost positive your understanding is wrong. If this would be an unconditional jump to the ISR, it would completly bypass the exception handling architecture. Also there wouldn’t be a point in the SVC call; it would be exactly like a bl, possibly switching processor states.

It’s fairly easy to verify this: Set a breakpoint to the SVC call in the disassembly window, then do a single step. Do this with and without the critical section active. I’m almost 100% sure that in the first scenario, it’ll very simply jump over the instruction onto the next one and enter the context switch ISR only after the critical section is left, whereas in the second scenario, you’ll be right at the ISR. If that wasn’t the case, nothing would work because the scheduler crucially depends on the deferred invocation mechanism.

Let’s get on the same page: Your ISR isn’t (never ?) triggered and you think this is root caused by the ulTaskNotifyTake internal implementation ?
That would mean that the ARM_CA53_64_BIT FreeRTOS port is fundamentally broken… to be honest I can’t imagine that this got slipped through the FreeRTOS release tests.
Is there any other possible reason why the ISR is not triggered as expected ?

It is very helpful to mention which port you are using in your original post. Another tip is to say what your problem actually is, rather than something about a workaround or investigation into the problem.

The FreeRTOS code is portable across more than 40 architectures - so it won’t be a surprise it has to cope with many different context switch mechanisms. On very simple architectures it can just be a function call - but generally there are two classes: Synchronous and asynchronous.

When it is asynchronous, a context switch is requested by pending a low priority interrupt. If that is done inside a critical section then the interrupt remains pending until the critical section is exited, so context switches only ever happen when interrupts are enabled (or unmasked if interrupts are never globally disabled).

When it is synchronous, as in your case where an SVC is used, then the context switch will occur immediately whether you are inside a critical section or not. In those cases the kernel stores the critical section nesting depth as part of the task’s context. If a task switches out form within a critical section (so with interrupts masked), the next time it is switched back in interrupts will again be disabled again (when the critical nesting depth gets restored) before it starts to run - then it will exit the critical section to re-enable interrupts. The net effect is therefore the same as for the asynchronous method but the context switch occurs the other side of exiting the critical section. Likewise if a task switches out with interrupts enabled, the next time it is switched in interrupts will again be enabled even if it switched from a task that had interrupts disabled. The kernel is designed to work like this - it is not recommended for application code to yield in a critical section!

4 Likes

In that case I stand corrected, ask for apologies for my misassesment and would like to thank Richard for his clarifying explanation!

If you are running at EL1, one thing to watch out for is the interrupt priority view of a non-secure group 1 interrupt. The following is from the GIC doc:

For Non-secure writes to a priority field of a Non-secure Group 1 interrupt, before storing the value:
• The value is right-shifted by one bit.
• Bit [7] of the value is set to 1.
This translation means the priority value for the Non-secure Group 1 interrupt is in the bottom half of the priority range.

If your hardware implements 5 priority bits (32 unique priorities) and you want to set a non-secure group 1 interrupt’s priority to 18 (it can be greater than 16 only to ensure that it is in the bottom half range), you should use the following value:

( 18 << 4 ) & 0xFF

instead of

( 18 << 3 ) & 0xFF

Thanks.

Richard, Thanks for the explanation.

Want to make it clear. Let’s say we use Cortex A53 port which uses “svc 0” to yield

You mentioned

it is not recommended for application code to yield in a critical section!

The call ulTaskNotifyTake from APP will have this situation, and is not recommended to use from APP? Am I right?

If it is not the case, do you think bellow usage model is the correct way?
Any IRQ priority settings should be taken care?

Task thread (bottom half of the IRQ):

IRQ_bottom_half_task()
{
  while (true) {
      ulTaskNotifyTake(pdTRUE, portMAX_DELAY) 
      /* We get the notification from ISR, and do the remaining thing here */
  }
}

IRQ_handler

my_irq_isr()
{
   /* Get an IRQ, and notify the bottom half task */
   xTaskGenericNotifyFromISR()
}  

Thanks for the reminder. Will have a check

No. The official FreeRTOS API can be used, of course.
You just shouldn’t call (undocumented) portYIELD_WITHIN_API directly in your application code.
BTW Instead of using undocumented call to xTaskGenericNotifyFromISR I’d recommend to stick to the official API like xTaskNotifyFromISR.

If a task switches out form within a critical section (so with interrupts masked), the next time it is switched back in interrupts will again be disabled again (when the critical nesting depth gets restored) before it starts to run - then it will exit the critical section to re-enable interrupts. The net effect is therefore the same as for the asynchronous method but the context switch occurs the other side of exiting the critical section.

For Cortex A53 with “svc 0”, when context switch occurs, the interrupt is masked. Before someone notify and wake up it, it causes side effect to IRQ trigger. This is my puzzle here.

Hi
On Cortext A53, if App calls ulTaskNotifyTake, it will call portYIELD_WITHIN_API which yields in critical section by “svc 0”. Although App doesn’t directly call portYIELD_WITHIN_API, portYIELD_WITHIN_API is indirectly called through ulTaskNotifyTake

I know. Again FreeRTOS code takes care about the internals and (usually) works as specified. You’re still supposing that the FreeRTOS ulTaskNotifyTake implementation is broken and doesn’t work, right ?
You didn’t tell the exact problem you have. Is the ISR is never triggered or just sometimes and you miss some interrupts ? How did you verify that the ISR is not triggered as expected ? Did you set a breakpoint in the debugger or use a test flag/counter ?

I know. Again FreeRTOS code takes care about the internals and (usually) works as specified. You’re still supposing that the FreeRTOS ulTaskNotifyTake implementation is broken and doesn’t work, right ?

It is code analysis. I had doubt about ulTaskeNotifyTake on Context A53. The code switches out (svc 0) in critical section and IRQ is masked.

After digging the code more with the statement from Richard, I am clear now. More specifically, “IRQ is masked” is only in a short time, it is unmasked by the following context switch

The code is well designed and understandable now. Thank you all for the answers.

If a task switches out form within a critical section (so with interrupts masked), the next time it is switched back in interrupts will again be disabled again (when the critical nesting depth gets restored) before it starts to run - then it will exit the critical section to re-enable interrupts. The net effect is therefore the same as for the asynchronous method but the context switch occurs the other side of exiting the critical section.

Hi, Hartmut
The portable layer is CoreTex A53 with GIC v4.
The ulTaskNotifyTake which called the " [taskENTER_CRITICAL]" that masked the interrupt . This means all the IRQ with priority lower than configMAX_API_CALL_INTERRUPT_PRIORITY will not delivered to CPU from GIC.

Then in the critical section, the portYIELD_WITHIN_API will be called switch to other highest priority tasks by calling vTaskSwitchContext. There is no place to unmask the IRQ.

The timer interrupt will unmask the interrupt Mask but timer IRQ is blocked in this case also.

Is there any other place to unmask the interrupt?

BTW, if we used xQueueReceive/xQueueSendFromISR instead of ulTaskNotifyTake/vTaskNotifyGiveFromISR pairs, the code worked as expected.
Thanks.