Am243x (cortex r5) critical section does not block interrupt

stx-sas · February 11, 2024, 9:05pm

hi,
we have problem with weird behavior of freertos on am243x (cortex r5)
ti port uses “cpsid i” to block interrupts in critical section.
finally we found that disabling interrupts from task by “cpsid i” or by setting bit I in cpsr do not block interrupts.

cpu is in system mode in task context.

did anyone face something similar?

thanks
rasty

RAc · February 11, 2024, 9:20pm

if the interrupt is asserted by hardware right before you submit the cpsid call, there may be race conditions, I presume. Have you looked ar barriers?

How exactly do you know interrupts are serviced while disabled? Do you have a deep trace?

stx-sas · February 12, 2024, 5:39am

hi
just disabled interrupts (called to enter_critical) in task and put breakpoint in interrupt. this should disable interrupt and create unbalanced nesting, which is deadlock.
but debugger keeps stopping in interrupt ,showing that bit i in cpsr is 1, means that interrupt is masked out. i do not expect to stop in interrupt , while but i in cpsr is 1.
what do i miss ?
thanks

aggarg · February 12, 2024, 7:00am

Which interrupt is this? Can you share your definition of enter critical?

RAc · February 12, 2024, 7:45am

I would not trust debugger breakpoints. Instead, I would do something like

volatile int glbDebugBreak = 0; //glbal scope

if (glbDebugBreak) __asm(“bkpt”)

and set glbDebugBreak to 1 when I would want the bp to be hit, or alternatively, increment a global variable in the isr and monitor it in a real time watch window.

It is very very unlikely that something as fundamental to mcu operation as interrupt masking does not work.

stx-sas · February 12, 2024, 8:12am

I cannot explain this. How program ended up in ISR while I and F are 1, “masked” and mode is system. Debugger lies? I think it must be something wrong in port to this architecture, maybe mode of operation is not set correctly and writing to CPSR is silently ignored?
void vTaskEnterCritical( void )
{
portDISABLE_INTERRUPTS();

    if( xSchedulerRunning != pdFALSE )
    {
        ( pxCurrentTCB->uxCriticalNesting )++;

        /* This is not the interrupt safe version of the enter critical
         * function so  assert() if it is being called from an interrupt
         * context.  Only API functions that end in "FromISR" can be used in an
         * interrupt.  Only assert if the critical nesting count is 1 to
         * protect against recursive calls if the assert function also uses a
         * critical section. */
        if( pxCurrentTCB->uxCriticalNesting == 1 )
        {
            portASSERT_IF_IN_ISR();
        }
    }
    else
    {
        mtCOVERAGE_TEST_MARKER();
    }
}


#define portDISABLE_INTERRUPTS()                  __asm__ volatile ( "CPSID	i" ::: "memory" )

RAc · February 12, 2024, 8:33am

The debugger does not “lie,” but optimizations may make it impossible for the debugger to correctly match source and assembler code. The comment in the screen shot suggests that the location you break at does nothing but return to the caller.

stx-sas · February 12, 2024, 10:33am

You can see disassembly on the right in green. Comment is wrong or out of context. it is just adjustment of LR.

RAc · February 12, 2024, 11:09am

Can you confirm that the address 0x70080b3c is stored in the ivt at the location belonging to the corresponding interrupt?

stx-sas · February 12, 2024, 12:02pm

this code is located in DDR.
Trampoline (below) is stored in IVT.

RAc · February 12, 2024, 12:22pm

which POD are you using exactly? Judging from the data sheets it looks as if this could be a multi core MCU, could it be possible that you disable interrupts on one core but the interrupt gets asserted on another core?

stx-sas · February 12, 2024, 1:10pm

It is not AMP CPU. Second core is parked. No software on it. Debugger is attached to core 0.

RAc · February 12, 2024, 2:00pm

Ok, can you try to setup your system such that the problem shows without FreeRTOS? For example, in your main(), prior to starting the scheduler, setup your interrupt source, then add a cpsid instruction manually and see if the interrupt still fires? I agree with you that the isr should not be reached with the I flag set, but this should really be independent of FreeRTOS.

Thanks!

stx-sas · February 12, 2024, 2:19pm

Problem may not be directly related to freeRTOS, rather to mode of operation of CPU (system, SVC, other) that FreeRTOS uses.
Port to R5 is pretty straighforward, even simpler than M3. But maybe some specific instructions are required to gain access to CPSR are missing.
That’s is why I search for similar complains that may shed some light.
Interesting thing that we already made POC 2 years ago with the IC from the same line but with A cores in addition to R. We did not see anything like this.
Hope that this is not error in silicon.

RAc · February 12, 2024, 2:25pm

Just a shot in the dark: Is it possible that some other control path dispatches to the isr outside of an interrupt context? I do not know the R series too well, but on an M, I would expect something like 0xfffffffd in the lr if we are truly in an interrupt context. And wouldn’t the M field hold an 0x12 if we are in IRQ (judging from your screen shot)?

stx-sas · February 12, 2024, 3:50pm

It’s a good point. b1111 in LSB of CPSR in Irq looks suspicios.
I need to explain some backround.
We have timer interrupt that fires every 125 uSecs.
As the first step we confirmed that interrupt latency and jitter is good. First with cpu timestamp conter and then with digital output. It confirms that there are no spurious/missing interrupts.
Then we intend to confirm that task latency (give/take binary semaphore) is acceptable (few uSecs).
Here we start to scratch our heads, because picture we see is beyound our understanding. Number of events counted by ISR and Task is pretty consistent, but context switch time from ISR to task is pretty randon.
We trace code execution from return from interrupt back to task switch, found that we forget to remove “trace”, which has a lot of overhead.
We removed “trace”, confirmed that return from ISR resumes the right task, pretty expected.
But it did not help with understanding of latencies.
We started to search for anomalies , like interrupts priority (max app priority vs kernel priority), like in M3 ports, but R5 port is much simpler, it uses global interrupt mask, which does not behave as expected, as I described in the first message.
I start to think that maybe CPSR.I is ignored in this implementaion and replaced with interrupt controller. Francly, I expect that TI guys would read ARM/SOC documentation not me

RAc · February 12, 2024, 3:57pm

I do not think that an ARM core licensee has a chance to modify that behavior as that should be in the microcode provided by ARM, but I could be wrong. In any case, contacting TI sounds like a good idea.

One more thing that comes to mind as a possibility is some kind of a stack popping back to the wrong context. Once you are at the breakpoint,. can you skip to the end of the isr and do a single step to see where the code returns to?

stx-sas · February 12, 2024, 4:27pm

… Once you are at the breakpoint,. can you skip to the end of the isr and do a single step to see where the code returns to?"
As expected, restore context and return from semaphore take, but it is one single run from many .
Unfortunately, I do not find other R5 ports for reference.

RAc · February 12, 2024, 4:41pm

why would you expect this? Interrupts are asynchronous, so unless the code that waits on the semaphore immediately prior to that enforces the interrupt condition, the isr could return to anywhere. In particular, it can not return to “return from semaphore take” because FreeRTOS must execute some code to manage that (eg enforce a context switch, normally in the context of a service interrupt). So it can at best return to the beginning of the semaphore wait call sequence.

Can you share the relevant code snippets of the task that interacts with the isr and the isr itself?

skptak · February 12, 2024, 10:21pm

The R5 CPU has a separate SPSR for each operating mode, per ARM’s documentation. The SPSR that you’ve shown is for SVC Mode, have you ensured your User/System Mode CPSR has the IRQ/FIQ disable bits set high when you’re seeing this issue?