Critical section in xTaskResumeAll() appears to disable IRQs for too long

On a Cortex M0+ running at almost 30 MHz, I noticed that some interrupts are not served in due time. By patching taskENTER_CRITICAL()/taskEXIT_CRITICAL() to toggle a GPIO line and attaching a scope, I found that apparently IRQs are occasionally disabled for as long as 25µs, and this seems to happen in xTaskResumeAll().
However, searching the archives, I also found this discussion, which suggests that my measurement may be misleading. Is this true? I do not understand why it should be wrong.
If the measurement is correct, can anything be done to reduce the time spent in this critical section or can it safely be broken up into several shorter sections?

It depends how you are making the measurement as, depending on the port, a context switch can occur within the critical section, or immediately the critical section is exited, depending on the port. I think in the Cortex-M0 a yield will occur immediately after the critical section is exited so you would have to set the GPIO as the first instruction after entering the critical section and clear it before existing the critical section - otherwise you may measure the time another task runs too.

That’s exactly where I placed the GPIO level changes for the measurement. So assuming that the measurement is correct, how can I speed this critical section up (use fewer tasks?) or break it up?
Or, if that is impossible, at least prevent it from being called in the first place during certain critical periods of time?

If the time measurement is correct (again – it’s very easy to get this part wrong), then you might be suspending the scheduler for too long. If many tasks become “ready” while the scheduler is suspended, or if many OS ticks occur while the scheduler is suspended, then xTaskResumeAll() will have a lot of work to do inside the critical section.

I am never directly suspending the scheduler, but I make heavy use of queues and semaphores, which in turn call vTaskSuspendAll().
So would lowering the configTICK_RATE_HZ help (but wouldn’t that make it worse)?
What about deliberately suspending the scheduler before and resuming it only right after a critical interrupt was served in due time (at that point, I have almost a millisecond before it can fire again, so plenty of time)?

In that case, the critical section we’re talking about already has minimal work. Changing configTICK_RATE_HZ wouldn’t help (or hurt). And deliberately suspending the scheduler or trying to time your calls to the queue/semaphore API functions may result in fragile code.

A couple of thoughts.

  • 25us in the critical section in xTaskResumeAll() seems a little high for a 30MHz Cortex M. It’s worth double checking both your CPU rate and the location of program control. The worst case is when the function has to process a tick here. But since you don’t suspend the scheduler in your own code, then xTaskResumeAll() would only ever process one tick there. You might also compare the times spent in the xTaskResumeAll() critical section to the times spent in xTaskIncrementTick() just to see if 25us is ever reasonable. The critical section in xTaskResumeAll() shouldn’t take much longer than xTaskIncrementTick() in your case.

  • If you need very low ISR latency, you may need to set the interrupt priority higher than configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY (a lower value numerically). Then don’t make any FreeRTOS API calls from that ISR. Better yet utilize other MCU resources (DMA, FIFO, automation, etc.) to alleviate the need for such low latency.

EDIT: Just remembered you’re on a CM0. You don’t have the option to use a higher priority interrupt that can interrupt FreeRTOS critical sections.

I have a high priority UART IRQ which triggers on every single byte received, but I only need to handle it (and send a reply via UART) very quickly after the last byte of a particular sequence was received. So the idea is to suspend the scheduler (or maybe even disable the SysTick?) after the first byte received and resume it after the last one, because the next sequence is known not to be coming in for almost a millisecond (please do not ask why my protocol’s requirements are so strict and so asymmetric - it’s out of my hands).
So I think I may be safe here - however, there are other IRQs and tasks in the system, which may be more fragile.

It’s about 14µs max for xTaskIncrementTick(), and 25µs max for the entire critical section. The former seems to have a more regular pattern, the latter only peaks at 25µs very occasionally. Which suggest that the peak is not caused by xTaskIncrementTick().

Yes, the M0(+) is a bad choice in this system, mostly because of the simple PRIMASK mechanism. But that was discovered a bit too late into this particular project.

Apparently, it is not possible to vTaskSuspendAll() from my UART IRQ while other interrupted tasks are accessing queues and semaphores (they run into an configASSERT( !( xTaskGetSchedulerState() == taskSCHEDULER_SUSPENDED )).

So you’re able to send (or start) the reply directly from the RX ISR?

That’s probably reasonable after all. When you happen to have a long xTaskIncrementTick() plus a task made ready while the scheduler was disabled, you get the combination of cases in xTaskResumeAll(), resulting in a 25us critical section.

Right, there is no ...FromISR() version of vTaskSuspendAll(). You may need to add a tiny task for this purpose, at a higher priority than any other task. Notify that task when the critical event is imminent (eg, 1ms away). Then the task would suspend the scheduler, spin wait on the event (no FreeRTOS calls here), and then unsuspend the scheduler. As you noted, this could cause real problems for the other tasks and other real-time responsibilities.

Out of curiosity, what is the actual timing requirement?

Yes, but once in a while the reply is received a little (4-5µs) too late on the other side. The communication still works, of course, but a formal protocol test which specifically measures this timing fails.

That’s what I am currently trying to implement properly. First quick 'n dirty tests look promising.

The serial protocol (IO-Link) demands the reply to be sent within a maximum of 43.4µs (measured from the end of the received request).

Can you use FIFOs or DMA transfers for your UART rather than trying to handle each character?

For outgoing, I use DMA. But for incoming, I have to handle the bytes individually, because otherwise I would not know when to start the reply and what to send back.