Cortex-A9 port cause FreeRTOS_Undefined exception

system · November 23, 2018, 6:48pm

supergaute wrote on Friday, November 23, 2018:

I’m using FreeRTOS 10, with Xilinx’s Zynq 7000 Chip. This is running Linux on core 0 and FreeRTOS on core 1.

When apply load to core 1, FreeRTOS will eventually crash to the FreeRTOS_Undefined exception handler.
This is the stack trace:

MyFreeRTOSApp	
	Thread #1 57005 (Suspended : Breakpoint)	
		FreeRTOS_Undefined() at port_asm_vectors.S:96 0x30000040	
		0x10101018

R14_und is 0x10101018, this looks like the contents of the stack (R10) as set in port.c

How can i figure out what causes this exception?

rtel · November 23, 2018, 6:57pm

rtel wrote on Friday, November 23, 2018:

Do you know which instruction generated the fault (that is, the value of
the program counter at the time the fault occurred)? It should be
obtainable from within the exception handler. I have example of how to
obtain the offending PC for Cortex-M code, but can’t recall how to do it
for Cortex-A.

Other than that - have you looked through the list of usual suspects
here: https://www.freertos.org/FAQHelp.html Pay particular attention to
the interrupt priority requirements.

system · November 23, 2018, 7:10pm

supergaute wrote on Friday, November 23, 2018:

Yes, the PC was 0x10101014 at the time the fault occured. This is way outside the valid program space which is 0x30000000 - 0x3800000.

I’ve read through the general FAQ and the Cortex-A specific article.

0x10101014 is 1 word above 0x10101010 which is initialized in port.c.
If i change 0x10101010 to for instance 0x12301010 in port.c, this will be reflected in the crash, where PC is now 0x12301014.

rtel · November 23, 2018, 7:17pm

rtel wrote on Friday, November 23, 2018:

Right - agree that value almost certainly must have come from an
initialised register value - looks like it has been used to hold a byte,
hence the rest of the register is untouched. This could be a stack
issue then, where returning from a function or interrupt, etc. has
resulted in the wrong value being popped into the PC (by which I really
mean, the address used to pop the PC was wrong as the stack pointer was
wrong or stack corrupted).

I think the first thing to do is check which task was running at the
time, assuming it was a task, not an interrupt. You can do that by
adding “(tskTCB*)pxCurrentTCB” to the expressions window in the debugger

that should then decode pxCurrentTCB as a task control block structure
that can be expanded to see the task’s name as a string. Alternatively,
if you store the handles of the tasks you create, the value of
pxCurrentTCB will equal the task’s handle.

system · November 23, 2018, 7:25pm

supergaute wrote on Friday, November 23, 2018:

That was the idle task running.
Does that mean it has to be an interrupt?

rtel · November 23, 2018, 7:37pm

rtel wrote on Friday, November 23, 2018:

I wouldn’t say it ‘has’ to be an interrupt, but would agree it is very
likely to be an interrupt. Unless you application has added any
functionality to the idle task through an idle task hook function or a
trace macro?

system · November 23, 2018, 8:51pm

supergaute wrote on Friday, November 23, 2018:

No functionality has been added to the idle task.
I’ve only installed one interrupt handler on top of what the FreeRTOS port does (the tick handler). The interrupt handler is related to OpenAMP, used to communicate to the other core.
I’m not sure how to procede the debugging now, maybe you can give some tips?

system · November 25, 2018, 4:29pm

supergaute wrote on Sunday, November 25, 2018:

After some more investigation I found out the stack trace changes when I disable optimizations.

The stack trace now becomes this:

Thread #1 57005 (Suspended : Signal : SIGTRAP:Trace/breakpoint trap)	
	FreeRTOS_Undefined() at port_asm_vectors.S:96 0x30000040	
	ucHeap() at 0x31400994

I also found that the line causing the problem is in tasks.c.
The macro traceTASK_CREATE( ) in the function prvAddNewTaskToReadyList() is defined by Tracealyzer, and it contains calls to portSET_INTERRUPT_MASK_FROM_ISR() and portCLEAR_INTERRUPT_MASK_FROM_ISR().

If I remove the portSET_INTERRUPT_MASK_FROM_ISR() and portCLEAR_INTERRUPT_MASK_FROM_ISR() calls, everything works ok.

If I don’t use Tracealyzer, I can mimic the same behaviour by adding

uint32_t irq_status = portSET_INTERRUPT_MASK_FROM_ISR();
portCLEAR_INTERRUPT_MASK_FROM_ISR(irq_status);

or

portDISABLE_INTERRUPTS();
portENABLE_INTERRUPTS();

after the traceTASK_CREATE() call.
It will execute around 1000 -2000 times before it crashes.

What can be the issue here?

rtel · November 25, 2018, 5:56pm

rtel wrote on Sunday, November 25, 2018:

Are you saying the problem is in the trace macro? So if you remove the
trace macros altogether (by not defining them, which makes them take
their default empty implementation) everything runs ok?

system · November 25, 2018, 10:25pm

supergaute wrote on Sunday, November 25, 2018:

Yes, that is correct.

Will you consider it to be an issue if the trace macro calls portSET_INTERRUPT_MASK_FROM_ISR() and portCLEAR_INTERRUPT_MASK_FROM_ISR()?

system · November 25, 2018, 10:38pm

richarddamon wrote on Sunday, November 25, 2018:

One big thing I see here is that …FROM_ISR stuff is supposed to be called from inside an ISR, while the traceTASK_CREATE isn’t going to be called from an ISR, but from a task context, so the definition in that macro sounds incorrect.

system · November 26, 2018, 3:49pm

supergaute wrote on Monday, November 26, 2018:

If I disable Tracealyzer completely, and instead insert

		portENTER_CRITICAL();
		portEXIT_CRITICAL();

right after the traceTASK_SWITCHED_IN() call,
this will result in the same behaviour, the system crashes. It will execute a few thousand times before it crashes.

Should the system behave ok when doing this, or is it expected to crash?

rtel · November 26, 2018, 4:20pm

rtel wrote on Monday, November 26, 2018:

I would expect that to crash. traceTASK_SWITCHED_IN() is executed
inside an interrupt - and those macros are not interrupt safe. There
are two reasons I would not expect that to work properly: First exiting
the critical section could result in interrupts becoming enabled in a
part of the code where they should be disabled, and second those macros
are using a critical nesting count that is part of a task’s context -
each task has its own nesting count so using it in the interrupt (which
is not a task) doesn’t make sense - especially if you switch tasks
before exiting the critical section.

In this case is sounds like there could be an issue in the
implementation of the trace macro, which is provided by Percepio.

system · November 26, 2018, 4:43pm

supergaute wrote on Monday, November 26, 2018:

Would you consider it to be an issue if the trace macro calls portSET_INTERRUPT_MASK_FROM_ISR() and portCLEAR_INTERRUPT_MASK_FROM_ISR()?

Because disabling those lines in the trace implementation fixes the problem.

I just need to figure out if the problem is on my part or Percepio’s.

rtel · November 26, 2018, 4:51pm

rtel wrote on Monday, November 26, 2018:

Not sure - they will enable global interrupts, but leave the interrupt
mask in the correct state, and the kernel only really uses the mask. In
that particular place (inside the context switch) they could well be
necessary.

system · November 27, 2018, 9:27pm

supergaute wrote on Tuesday, November 27, 2018:

Who can answer this?

I’ve replicated the issue on a Xilinx dev board with a “Xilinx lwIP TCP perf” example application, and the latest version of Tracealyzer, so I’m pretty sure my setup is not part of the problem.

rtel · November 28, 2018, 1:34am

rtel wrote on Wednesday, November 28, 2018:

If I recall correctly our last exchange on this was suggesting an issue
in the implementation of the trace macros, which are provided by
Percepio - so you could ask Percepio - they generally have response support.

system · December 4, 2018, 6:04pm

ldb wrote on Tuesday, December 04, 2018:

Not sure if it is related but I had something similar on the Raspberry PI with preemptive tasking for a very interesting reason.

The trace functions are C code and when you call C code on the ARM abi the stack had to be 8 byte aligned even though the normal alignment for a local variable push etc is only 4. This means you can randomly come into the interrupt with the stack in an align 4 position. So check the stack alignment restrictions on your system as this seems to be common with ARM.

I had to make sure I aligned the stack up before calling out to c code … failing to do so would randomly crash some time later. So my Irq handler ended up looked like this

	/* Save the current context */
	portSAVE_CONTEXT

	/* the stack pointer is 4-byte aligned at all times, but it must be 8-byte aligned	*/
	/* to call external C code	*/
    mov r1, sp
    and r1, r1, #0x7									;@ Ensure 8-byte stack alignment
    sub sp, sp, r1										;@ adjust stack as necessary
    push {r1, lr}										;@ Store adjustment and LR_svc

	bl irqHandler										;@ Call irqhandler

	/* Reverse out 8 byte padding from above */
    pop {r1, lr}										;@ Restore LR_svc
    add sp, sp, r1										;@ Un-adjust stack

	/* restore context which includes a return from interrupt */
	portRESTORE_CONTEXT

system · December 12, 2018, 4:14pm

supergaute wrote on Wednesday, December 12, 2018:

Was this on the Cortex-A7 or the Cortex-A53 version of the Raspberry Pi?
Were you using this with FreeRTOS?

Topic		Replies	Views
Tracking down the cause of an unaligned memory access exception Kernel	36	524	November 18, 2024
Undefined exception not managed for ARM A9 processor in FreeRTOS Kernel	2	658	January 17, 2020
_abort_stack_end Kernel	36	2738	May 14, 2021
An unknown function is invoked during the ISR's execution Kernel	23	1370	June 5, 2021
ARM Cortex M7 fault exception and stack corruption Kernel debug	15	3867	July 4, 2023

Cortex-A9 port cause FreeRTOS_Undefined exception

Related topics