Critical sections & FreeRTOS API calls

conara · May 3, 2023, 12:29pm

Hi,

A equivalent question has been asked earlier, but I still have some questions about critical sections & FreeRTOS API calls. In the documentation the following sentence is included: “FreeRTOS API functions must not be called from within a critical section.” However the reason why this is not allowed is not documented. Is it just good practise to do so or can something really break? I understand the following things:

Critical sections must be kept very short, otherwise they will affect system response times.
You should not block on a FreeRTOS primitive while in a critical section.

Questions:

If I use a FreeRTOS API call in a critical section can this really break / crash? E.g. I use xSendQueue() without a block time (xTicksToWait = 0) in a critical section. I understand: A potential context switch (to a higher priority task) cannot happen and is thus delayed until the critical section is exited. Are there other problems I do not foresee?
I read the following in a blog post: “The ‘rule of thumb’ description is given because it is not only dependent of the API function in use, but also dependent on the FreeRTOS port in us”. Is this true? Is there an overview of which ports are susceptible? Can somebody explain what the root cause is?

I hope somebody can give a bit more insight in critical sections & FreeRTOS API calls.

Best regards,
Boris

RAc · May 3, 2023, 1:25pm

As long as the critical section is claimed, the scheduler is suspended because system tick interrupts are inhibited. Thus, regardless of whether a timeout is specified or not, a task that holds the critical section and then in turn becomes suspended will leave the system dead.

richard-damon · May 3, 2023, 1:48pm

TTo my understanding it is made a blanket restriction because it is too difficult to fully define the behavior for all functions for all ports.

For instance, in you xQueueSend case, if you send data to a Queue that is being waited on by a higher priority task, most ports won’t switch to the task until the Critical Section ends, but some ports (sone of the ports that don’t use an interrupt to call the scheduler) might “pause” the critical section and switch to that task, and the Critical Section will “resume” when this task get switched back in. This sort of variability makes it very hard to fully document exactly how the API will work in a portable manner.

It is possible, by studying the port and the function being called, to figure out under what cases it might be OK to do this sort of thing, but the documentation doesn’t describe these cases.

conara · May 4, 2023, 11:20am

Thank you for your explanation. Do you have an example of a port that doesn’t work correctly?

richard-damon · May 4, 2023, 11:58am

What do you mean “that doesn’t work correctly”?. Ports that work differently, but still within the specifications are still “correct”.

I remember the Pic24/dsPIC, at least years ago, didn’t use an interrupt for the scheduler, but it was just called when needed. A task could do things in a critical section that caused the scheduler to be called and another task switched in. This action caused the critical section to be “paused” as FreeRTOS remembered that that task was in a critical section, so the port layer would disable the interrupts again when that task got scheduled later.

Nothing was “broken” about the behavior, but it was fully within the specifications.

The key is that if you are told not to do something, then if you do it, you don’t have the promises you might have had otherwise.

Some “illegal” actions are tested for, and can cause an assert (if enabled) to occur. Others just aren’t well-documented about what will happen.

rtel · May 4, 2023, 3:22pm

Providing additional detail to the already correct answers above, very approximately, and maybe not completely accurately:

Making any blocking call (by which I mean one with a timeout) in a critical section will result in a logic error as time stands still in critical sections (hence you can’t do anything relative to time).
Using the example above of sending to a queue that unblocks a higher priority task, but without a timeout from within a critical section, the behaviour will either:

a) Cause an immediate switch to the unblocked task on ports that use trap like functionality to perform a context switch. For example, a Cortex-A devices. I loosely call these “synchronous” ports as the context switch happens immediately regardless of other state. In this case the critical section state is part of the task context. You switch away from a task that is inside a critical section to a task that is (potentially) not in a critical section causing all interrupts to be enabled again. Then, at some point you switch back to the original task, which re-instates the critical section state before it starts executing instructions again. The kernel is designed to work this way without any problems, but application code using the kernel might not be.

b) Will not cause an immediate switch to the unblocked task, but pend the switch until the critical section is exited on ports that use an interrupt to perform a context switch. For example, an ARM Cortex-M. I loosely call these “asynchronous” ports as the context switch doesn’t occur immediately upon request, but later when the critical section is exits causing interrupts to be unmasked. Note this still happens within the API function, so the API function doesn’t return until the task runs again. These ports don’t have the critical section state as part of the task context as they can only context switch outside of a critical section - so only restart execution from outside of a critical section the next time they execute.

c) Most likely have port specific and unwelcome behaviour for all other ports, such as PIC24 mentioned above which uses neither method a or b described above.

richard-damon · May 5, 2023, 2:15am

Actually, the (c) for the PIC24 isn’t really that different than (a), it’s just using a regular call instruction rather than a “trap”.

conara · May 5, 2023, 8:13am

Thank you very much for your extensive explanation!

conara · May 10, 2023, 6:13am

It should be safe use FreeRTOS *FromISR API calls in critical section, because yielding is outsourced to the caller. Is this correct, see example?

Example:

A function not called from an ISR, that wants to send something to a queue and set a certain pin HIGH atomically . In this example it would be logical to use the xQueueSend, because the function is not called from an ISR. However xQueueSend API function may not be called from a critical section due to the possible yield.

void example_non_isr_function(void) {
    BaseType_t xHigherPriorityTaskWoken = pdFALSE;
    int myMessage = 42; // Define the message to send

    // Enter a critical section.
    taskENTER_CRITICAL();

    // Send the message to the queue. Use FromISR variant to make sure the function doesn't yield in a critical section.
    xQueueSendFromISR(myQueue, &myMessage, &xHigherPriorityTaskWoken);

    //Enable io pin.
    set_io_pin_level(10, true);

   // Exit the critical section
    taskEXIT_CRITICAL();

    // If sending the message unblocked a task with a higher priority, request a context switch.
   // Make sure the critical section is really exited.
    portYIELD_FROM_ISR(xHigherPriorityTaskWoken);
}

What do you think? Does it work correctly? Or do I missing something? I agree that it can cause some confusion, but just trying to understand the whole thing.

hs2 · May 10, 2023, 6:54am

But why not just enclosing the really critical part by the critical section (setting the pin in your case) and use the API the right way as documented ?

RAc · May 10, 2023, 7:10am

No, this is not correct. It may or may not work, depending on the inner workings of the port.

I chime in with Hartmut. Why not use the APIs as documented and intended?

richard-damon · May 10, 2023, 11:39am

If you look at Intertask communication performance - advanced topics it will seem that the FromAPI functions are designed to be able to be used in this manner (see Example 3).

Note as a comment, this tends to lead to an overlong critical section. The alternative is to suspend the scheduler for the period (which still means you can’t block, so use 0 block time normal API calls).

RAc · May 11, 2023, 7:35am

Thanks for the link, Richard, I had not seen that before. I am actually very surprised about that chapter.

Is that part of the official documentation? If so, it does need revision; for example, the final sentence “The principles demonstrated here for accessing queues also apply when accessing semaphores and mutexes” does not make any sense whatsoever (by definition, there is not and can not be, a FromISR() variant of mutex access calls, so the previous discussion can not apply to muteces).

It is my impression that this chapter had been added rather as a result of reverse engineering than adding inteded function usage documentation. If the function set had indeed been designed or intended as a “lightweight” variation of the queue API, I am sure Richard B. would have named the calls differently than …FromISR().

I would strongly discourage the use of …FromISR() functions as task usable APIs and revise the documentation accordingly; at the very least, point out that task notifications are the by far preferred choice of leightweight inter task communication mechanism when performance is an issue. It needs to be very clear that ISRs and tasks behave so fundamentally different that anything (by name) designed to be used within ISRs should under no circumstances be used in task contexts.

richard-damon · May 11, 2023, 11:59am

That wording has been there for a VERY long time. It points to the FromISR versions as a lighter-weight version of the API (as well as being the version usable inside an ISR), and that is a true statement.

There is nothing “port-specific” about this behavior. The routines will use the ISR critical section routines from the port layer, but that behavior is strictly defined.

The section points out the major drawback to using the routines in this manner, as it means you are disabling the interrupts for a somewhat lengthy period of time (longer than FreeRTOS’s own design guidelines allow) but that can be a conscious choice of the programmer.

Note, effectively, a critical section can be, in some sense, seen as entering a “private” ISR-like region of code, and such code sections should follow most of the rules of an ISR.

marcosatti · July 4, 2023, 3:55am

Came across this thread via searching…

Had this exact question while working on a Modbus RTU library involving a state machine - an interrupt is used for both the timer and UART peripherals, in which a critical section is used to protect the state machine while it’s being updated, and to queue the packet for processing on the timer task. This application is not time sensitive, so I’m not concerned about how long a critical section is used for (within reason).

In my case, I am using a Cortex M which follows the asynchronous context switch model, and using an API function under a critical section is fine. This was a super quick way of implementing this, and didn’t need to spend time thinking too hard about it.

But…

Out of curiousity, how would you do this easily under a synchronous context switch model?

Off the top of my head, moving the queuing outside of the critical section works, but introduces way more complexity (separation of updating the state machine, tracking that a packet is pending exiting the critical section, some kind of buffer mechanism to copy into outside of shared state, etc).

This is where using API functions inside critical sections seems like an appropriate use case.

aggarg · July 4, 2023, 7:11am

Even in the synchronous model, the same code should work as long as you are not trying to block (i.e. block time parameter to the API is zero).

marcosatti · July 4, 2023, 7:27am

Quoting from the earlier post by Richard:

If this happens and the critical section is ended implicitly through the context switch, it will introduce a potential data race bug.

aggarg · July 4, 2023, 9:06am

Right. It would not be a problem if posting to the queue was last thing in your CS but otherwise, you are right.

marcosatti · July 5, 2023, 1:50am

Gets even more risky when optimisations are turned on and all that…

Thinking a bit more about this last night, an easy way might be to use the FromISR functions everywhere inside critical sections (both task and ISR code) and delay the context switch until after the critical section.

For ISR’s it’s also really easy to context switch right at the end, either by passing the variable around everywhere or just forcing it to always.

Are there any plans to officially support this way?

(edit: oops… this is pretty much exactly what example 3 is describing in this page)

aggarg · July 5, 2023, 5:08am

As you rightly said, for ISRs you can control when the context switch happens and you should that at the end. FromISR functions are not supposed to be called from tasks and therefore, the question does not apply.