Consecutive PendSV IRQ Handler Delays Other Interrupt Handler

Good day

The problem I am having is described in detail in my NXP Community Forum post.
I am posting an adapted version of it here with the goal of reaching another audience - one that is very clued up on FreeRTOS!

Now that NXP has fixed the MCUXpresso IDE’s RT1170 SWO tracing problem (as described in my previous post), I can use it to debug my problem.

I am running FreeRTOS on NXP’s RT1170 processor. When I take a trace, I see that some of the interrupt handlers are active for very long - in the range of 50 - 60us. There are no obvious reasons that any of these interrupt handlers should take this long to execute (no while loops, etc.).

I have verified the delay of these interrupt handlers by setting a GPIO pin high when a certain interrupt handler starts, and setting the pin low again when the interrupt handler exits.

To ease debugging, we simplified our system by disabling many of the interrupts.
Here are two traces that show the problem clearly:


It is clear that GPT2’s IRQ handler is active for abnormally long. We have confirmed this with GPIO pin toggling and an oscilloscope. These ‘delayed’ interrupt handlers seem to precede a PendSV IRQ handler.

We believe the problem could be linked to when the PendSV handler is called multiple times, in close succession. In the first trace, one can see that the ‘delay’ of the GPT2 IRQ handler worsens each time the PendSV IRQ handler is called.

We would greatly appreciate any help/advice.

Kind regards

D_TTSA

Not ready the other post (answering on my cell phone) so sorry if repeating questions from there -

Are interrupts nesting, so making the preempted interrupt appear to run too long?

Is the PendSV interrupt the lowest priority, as it should be? Ideally the tick interrupt will be too.

Please post the outline of the interrupt that is taking too long.

Hi Richard

Thanks for your quick response.
I am not sure if the interrupts are nesting, but from the traces above, they don’t seem to be.

Both the FreeRTOS SysTick and PendSV interrupts have the lowest priority (unchanged from FreeRTOS source) - which is 0x0F in my case.

Here is the GPT2 IRQ handler’s code:

void GPT2_IRQHandler(void)
{
	GPIO_PinWrite(GPIO3, 13, 1);

	/* Clear interrupt flag.*/
	GPT_ClearStatusFlags(GPT2, kGPT_OutputCompare1Flag);
	McuRTOS_RunTimeCounter++; /* increment runtime counter */
	CyclusTimer100us++;

	GPIO_PinWrite(GPIO3, 13, 0);
	__DSB();
}

In the above traces, the GPT2 interrupt’s priority is 0x01 (first trace) and 0x0F (second trace).
Changing the interrupt priority therefore doesn’t have an effect on the problem.
I have also tried GPT2 interrupt priorities between configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY ( = 2) and 0x0F, but this has no effect on the problem.

Please let me know if there is further information that you require.

Kind regards
D_TTSA

Good day @rtel

We have done further testing, and believe that we have narrowed down the problem.

To test if the problem was caused by interrupt nesting, we set all interrupt priorities to 0x0F (same priority as SysTick and PendSV). However, this worsened the interrupt delays:

It is worth noting that the problem is not specific to the GPT2 ISR, as I may have implied in my first post (above delay is in SysTick handler, for example).

We noticed that the SysTick and PendSV ISR’s don’t call __DSB() in their last line (other ISR’s do).
To test if this was related, we added __DSB() in the last line of the PendSV ISR. This caused the PendSV ISR to be delayed every time it was called:

Consequently, our hypothesis is that the __DSB() call is what is slowing down our interrupt handlers, intermittently. This could be caused by ARM’s non-sequential memory accesses.

Do you agree with our hypothesis?
Do you have advice for us?

Kind regards
D_TTSA

Hi there,

in your first graph, the third invocation of the pendsrv interrupt takes about twice as long as the first two invocations. For a “regular” implementation of the isr, this can only mean that some other isr which is not subject to monitoring interrupts the pend srv isr and thus delays it. Or there are “hidden” delays such as can be found in HAL_xxx routines.

Have you tried running tracealyzer with your setup? It should cover all isr invocations.

This is a curious one.

Some thoughts, but as yet no conclusion:

  • It is not clear what is pending the PendSV interrupt. There is nothing in your GPIO ISR code that pends a context switch, and the tick interrupt doesn’t appear to execute so that is not pending a context switch either (pending a context switch being pending the PendSV interrupt). Can you determine why the PendSV executes after the GPIO interrupt, that may highlight something running that is not in the picture?
  • The RT117 is a fast MCU - is there a big discrepancy between the core’s clock speed and the GPIO clock speed? What is the bus between the core and the GPIO?
  • What does the trace look like if you toggle the GPIO in a loop from main() (without the kernel or any interrupts running).
  • The requirement for barriers is nuanced. Exiting an exception itself acts as a barrier, but the Cortex-M7 has a longer write buffer than the M3 or M4 and it may be possible that writes to peripherals, such as clearing the interrupt, don’t take effect until the interrupt exits. That will depend on the chip’s internal wiring though.
  • Clutching at straws - are there any settings that may cause the chip to try to enter low power states, or are the core and peripherals running full speed all the time?

Hi @RAc and @rtel

Thank you for your replies.
I looked into suggestions, but we stumbled upon the solution elsewhere.

Problem Explanation
By default, in the RT1176, the program’s code is stored in QSPI (external) FLASH memory. This FLASH has a clock speed of 133MHz. The connection between this FLASH and the processor is only 4 bits wide, so it takes 4 sequential accesses to the FLASH to retrieve one instruction (16 bits). The rate at which we can retrieve instructions from the FLASH is thus 33.25MHz. This is much slower than the (industrial) processor’s 792MHz clock. The processor thus heavily relies on ARM’s ‘performance-enhancing’ instruction ‘speculative accesses’ (explanatory link in previous reply) to reduce the effects of this bottleneck. This feature copies the processor’s ‘predicted’ instructions into its level 1 cache, so that they are immediately available if the processor requires them.

However, these ‘speculative accesses’ are inaccurate when interrupts occur, since the processor cannot predict when interrupts will occur. Therefore, whenever an ISR exits, “__DSB()” and/or “__ISB()” is called, and the processor’s instruction and data pipelines are flushed.

Solution
Code that is stored in the RT117’s SRAM_ITC is never copied into cache. Since it is ‘tightly-coupled’ to the processor (and can thus be accessed synchronously, with no delays), there is no benefit if its contents is cached. Consequently, the ARM processor does not apply its ‘performance enhancing’ speculative accessing techniques, or any other ‘optimisations’ on this code. Therefore, there are less/no instructions in the processor’s pipeline when returning from an ISR, so this delay is minimised.
Therefore, this problem is solved by storing as much of the ISR code as possible (preferably all) in the SRAM_ITC memory. This is in line with the recommendation found in the conclusion of NXP’s application note on the ARM Cortex-M7’s L1 cache.

1 Like

Great info - thanks for taking the time to report back.