Do you know which instruction generated the fault (that is, the value of
the program counter at the time the fault occurred)? It should be
obtainable from within the exception handler. I have example of how to
obtain the offending PC for Cortex-M code, but can’t recall how to do it
for Cortex-A.
Other than that - have you looked through the list of usual suspects
here: https://www.freertos.org/FAQHelp.html Pay particular attention to
the interrupt priority requirements.
Yes, the PC was 0x10101014 at the time the fault occured. This is way outside the valid program space which is 0x30000000 - 0x3800000.
I’ve read through the general FAQ and the Cortex-A specific article.
0x10101014 is 1 word above 0x10101010 which is initialized in port.c.
If i change 0x10101010 to for instance 0x12301010 in port.c, this will be reflected in the crash, where PC is now 0x12301014.
Right - agree that value almost certainly must have come from an
initialised register value - looks like it has been used to hold a byte,
hence the rest of the register is untouched. This could be a stack
issue then, where returning from a function or interrupt, etc. has
resulted in the wrong value being popped into the PC (by which I really
mean, the address used to pop the PC was wrong as the stack pointer was
wrong or stack corrupted).
I think the first thing to do is check which task was running at the
time, assuming it was a task, not an interrupt. You can do that by
adding “(tskTCB*)pxCurrentTCB” to the expressions window in the debugger
that should then decode pxCurrentTCB as a task control block structure
that can be expanded to see the task’s name as a string. Alternatively,
if you store the handles of the tasks you create, the value of
pxCurrentTCB will equal the task’s handle.
I wouldn’t say it ‘has’ to be an interrupt, but would agree it is very
likely to be an interrupt. Unless you application has added any
functionality to the idle task through an idle task hook function or a
trace macro?
No functionality has been added to the idle task.
I’ve only installed one interrupt handler on top of what the FreeRTOS port does (the tick handler). The interrupt handler is related to OpenAMP, used to communicate to the other core.
I’m not sure how to procede the debugging now, maybe you can give some tips?
After some more investigation I found out the stack trace changes when I disable optimizations.
The stack trace now becomes this:
Thread #1 57005 (Suspended : Signal : SIGTRAP:Trace/breakpoint trap)
FreeRTOS_Undefined() at port_asm_vectors.S:96 0x30000040
ucHeap() at 0x31400994
I also found that the line causing the problem is in tasks.c.
The macro traceTASK_CREATE( ) in the function prvAddNewTaskToReadyList() is defined by Tracealyzer, and it contains calls to portSET_INTERRUPT_MASK_FROM_ISR() and portCLEAR_INTERRUPT_MASK_FROM_ISR().
If I remove the portSET_INTERRUPT_MASK_FROM_ISR() and portCLEAR_INTERRUPT_MASK_FROM_ISR() calls, everything works ok.
If I don’t use Tracealyzer, I can mimic the same behaviour by adding
Are you saying the problem is in the trace macro? So if you remove the
trace macros altogether (by not defining them, which makes them take
their default empty implementation) everything runs ok?
One big thing I see here is that …FROM_ISR stuff is supposed to be called from inside an ISR, while the traceTASK_CREATE isn’t going to be called from an ISR, but from a task context, so the definition in that macro sounds incorrect.
If I disable Tracealyzer completely, and instead insert
portENTER_CRITICAL();
portEXIT_CRITICAL();
right after the traceTASK_SWITCHED_IN() call,
this will result in the same behaviour, the system crashes. It will execute a few thousand times before it crashes.
Should the system behave ok when doing this, or is it expected to crash?
I would expect that to crash. traceTASK_SWITCHED_IN() is executed
inside an interrupt - and those macros are not interrupt safe. There
are two reasons I would not expect that to work properly: First exiting
the critical section could result in interrupts becoming enabled in a
part of the code where they should be disabled, and second those macros
are using a critical nesting count that is part of a task’s context -
each task has its own nesting count so using it in the interrupt (which
is not a task) doesn’t make sense - especially if you switch tasks
before exiting the critical section.
In this case is sounds like there could be an issue in the
implementation of the trace macro, which is provided by Percepio.
Not sure - they will enable global interrupts, but leave the interrupt
mask in the correct state, and the kernel only really uses the mask. In
that particular place (inside the context switch) they could well be
necessary.
I’ve replicated the issue on a Xilinx dev board with a “Xilinx lwIP TCP perf” example application, and the latest version of Tracealyzer, so I’m pretty sure my setup is not part of the problem.
If I recall correctly our last exchange on this was suggesting an issue
in the implementation of the trace macros, which are provided by
Percepio - so you could ask Percepio - they generally have response support.
Not sure if it is related but I had something similar on the Raspberry PI with preemptive tasking for a very interesting reason.
The trace functions are C code and when you call C code on the ARM abi the stack had to be 8 byte aligned even though the normal alignment for a local variable push etc is only 4. This means you can randomly come into the interrupt with the stack in an align 4 position. So check the stack alignment restrictions on your system as this seems to be common with ARM.
I had to make sure I aligned the stack up before calling out to c code … failing to do so would randomly crash some time later. So my Irq handler ended up looked like this
/* Save the current context */
portSAVE_CONTEXT
/* the stack pointer is 4-byte aligned at all times, but it must be 8-byte aligned */
/* to call external C code */
mov r1, sp
and r1, r1, #0x7 ;@ Ensure 8-byte stack alignment
sub sp, sp, r1 ;@ adjust stack as necessary
push {r1, lr} ;@ Store adjustment and LR_svc
bl irqHandler ;@ Call irqhandler
/* Reverse out 8 byte padding from above */
pop {r1, lr} ;@ Restore LR_svc
add sp, sp, r1 ;@ Un-adjust stack
/* restore context which includes a return from interrupt */
portRESTORE_CONTEXT