Wrong ELR value in portSAVE_CONTEXT of SMP

Hi!

I’m using FreeRTOS kernel V11.0.0.
I’m testing with SMP in ARMv8-A ca53 aarch64 dual core.

FreeRTOS-Kernel-Partner-Supported-Ports/blob/dc3afc6e837426b4bda81bbb6cf45bfb6f34c7e9/TI/CORTEX_A53_64-BIT_TI_AM64_SMP/portASM.S

SMP is being tested in dual core by referring to the above code.
I have created several tasks only in Core0,
Core1 has only an idle task created.
Therefore, task context switches rarely occur between cores.

At a certain point, the ELR_EL3 value stored in the stack in the portSAVE_CONTEXT macro suddenly becomes 0x0.
Therefore, the portRESTORE_CONTEXT marker generates an undef handler because the ELR_EL3 value is 0x0.
At this time, the stored SPSR_EL3 value is “0x0010000d”.

If it is set to MSR SPSEL, #0, it should be M[3:0] = b1100: EL3t.
Why is it b1101:EL3h?

Do you have any ideas on how to debug?

One of the possible reason is stack memory corruption. Is there any chance that you stack is being overflowed?

To be precise, the ELR_EL3 value to be stored in the stack itself is 0x0…
I wonder why ELR_EL3 value is 0x0.

I have another question.
Is it okay to jump directly to the IRQ handler because IRQ occurs during the SWI_Handler () function operation called through SMC instruction?

Or should I go to IRQ handler after returning to task from SWI_Handler() and becoming IRQ enabled?

Assuming it is the same issue as we are talking on this post, can you try my suggestion there?

is it okay to disable irq before vTaskSwitchContext in SWI_Handler()?
Or I wonder if it is right to disable irq before entering SWI_Handler().

Does it work normally when IRQ is called before setting the LR value stored in the stack in SWI_Handler() to ELR_EL3 and then returns to SWI_handler?

Does it work normally when IRQ is called before setting the LR value stored in the stack in SWI_Handler() to ELR_EL3 and then returns to SWI_handler?

please answer me…^^

I think my previous comment was not correct. Based on my reading so far, on Cortex-A, interrupts are disabled as soon as you take an exception and are not re-enabled unless explicitly enabled.

How did you verify that an IRQ was taken by the core while handling SMC?

The following is from the architecture manual -

When taking an exception to EL3, ELR_EL3 holds the address to return to.

If ELR_EL3 is 0x0, that means exception was taken from address 0x0. Is 0x0 a valid address or somehow control is reaching an invalid address? Also, which interrupt is firing?

Would you please help me understand this question?

Does that mean the interrupt will be disabled automatically when it goes into SWI_Handler() exception??

I checked the sequence by putting a value in a specific debug register of AP.

There are two problems I have encountered.
(1) wrong pxCurrentTCBs[ xCoreID ]
(2) wrong ELR

All of these are problems caused by an interrupt occurring during FreeRTOS_SWI_Handler() operation and jumping to FreeRTOS_IRQ_Handler().

These problems do not occur when it is “configNUMBER_OF_CORES=1”. I wonder why these problems occur when it is “configNUMBER_OF_CORES=2”.

I think so, yes.

And how did you verify that the FreeRTOS_SWI_Handler() was interrupted by IRQ?

It is hard to make a guess. Do you want to have a debug session to debug this together? If yes, please drop me a DM.

When the processor takes an exception to AArch64 execution state, all of the PSTATE interrupt
masks is set automatically. This means that further exceptions are disabled. If software is to
support nested exceptions, for example, to allow a higher priority interrupt to interrupt the
handling of a lower priority source, then software needs to explicitly re-enable interrupts.

Your statement is right.
I misdefined the portCLEAR_INTERRUPT_MASK in portMacro.h
I think it was a problem because IRQ was enabled unconditionally.
the problem was solved by defining portCLEAR_INTERRUPT_MASK as portRESTORE_INTERRUPTS.
Thank you for your help.

Thank you for sharing your solution.