Hi All
I’m looking for general thoughts/advice on trying to track down an unaligned memory access exception.
We are occasionally seeing this on an embedded system with only a few tasks, but nested interrupts and potentially many interrupts occurring. From the debugger, it looks like a misaligned address is being pushed to the stack within the dispatcher, and a later attempt to pop that is causing the exception.
We have watermarked the stack and (recently) enabled stack checking via configCHECK_FOR_STACK_OVERFLOW and added a hook for vApplicationStackOverflowHook(). So far that is not showing any problems.
FWIW we have FreeRTS v10.4.6, but have our own dedicated ISR stack (we might revert that latter part as part of the debug).
We do not have any dynamic memory allocation and use static resource creation.
This is using an SoC CPU which is not part of the ‘blessed’ FreeRTOS ported ecosystem, which I realise doesn’t help. However we have not had issues previously, he said unhelpfully.
This is not something I have had to debug before; what kind of thing might we need to be looking for?
Actually, I realise I am able to add some more information regarding the port. We are using the (Synopsys) ARC_EM_HS port of FreeRTOS (found in /portable/ThirdParty/GCC/ARC_EM_HS/)
Then, are you sure your interrupt stack is large enough? Interrupts use the same stack as main(), so the stack is set up by your code run time, often in the linker script (depending on the tools). Overflows in the stack used by interrupts will not be caught by the stack overflow detection.
Can you provide more information about this? Which address is this? Is this a return address that is pushed to the stack and later an instruction (e. pop {pc}) tries to jump to the misaligned address? If yes, does the address look legitimate i.e. can you look at the code around this address?
Hi Gurav
yes, that seems to be broadly the situation. I had thought of the approach you suggest- look around the ‘bad’ address - but was unsure whether that was likely to be worthwhile. I will see if we can learn anything from that approach.
If the code looks correct around the address and we can see find the function call, then it would likely be in alignment issue. We would look for things like - if we are specifying an alignment in the code or the code is written in assembly and the alignment is not specified. On the other hand, if the code around the address does not make sense, then it is likely a stack corruption.
we are (currently) not getting the unaligned memory exception; this after rearranging the relative priority of some tasks. Time will tell whether it returns…
We are still seeing a (related) problem, which I didn’t describe earlier. We get into a situation where there appear to be no tasks running (including IDLE). I thought that it was the misalignement which was causing this, but perhaps not.
We did a bit of tracking down and found that we are getting stuck inside xTaskIncrementTick(). Specifically we can trigger a configASSERT at this point in the FreeRTOS code:
/* tasks.c, ~Line 2790, in xTaskIncrementTick() */
else
{
/* The delayed list is not empty, get the value of the
* item at the head of the delayed list. This is the time
* at which the task at the head of the delayed list must
* be removed from the Blocked state. */
pxTCB = listGET_OWNER_OF_HEAD_ENTRY( pxDelayedTaskList ); /*lint !e9079 void * is used as this macro is used with timers and co-routines too. Alignment is known to be fine as the type of the pointer stored and retrieved is the same. */
--> configASSERT(pxTCB); <-- this gets asserted
xItemValue = listGET_LIST_ITEM_VALUE( &( pxTCB->xStateListItem ) );
I presume ‘delayed task list’ holds the list of Blocked tasks? Maybe there is some historical reason for the name.
Anyway, something is clearly going awry here. Any thoughts?
I use taskENTER_CRITICAL_REGION_FROM_ISR() and taskEXIT_CRITICAL_REGION_FROM_ISR(). These are defined (task.h) in terms of portSET_INTERRUPT_MASK_FROM_ISR and portCLEAR_INTERRUPT_MASK_FROM_ISR.
My port has got definitions for portENTER_CRITICAL and portEXIT_CRITICAL … but not for portSET_INTERRUPT_MASK_FROM_ISR and portCLEAR_INTERRUPT_MASK_FROM_ISR.
That has to be wrong, surely?
It’s not causing a compilation error due to this in FreeRTOS.h:
No, its the/a as there is no problem if it shares the lowest priority with other interrupts, and in fact for the ARM-M, it typically does as it share that lowest priority with the tick interrupt, both having the priority of configKERNEL_INTERRUPT_PRIORITY
Yes, this looks wrong. What are you using taskENTER_CRITICAL_REGION_FROM_ISR() and taskEXIT_CRITICAL_REGION_FROM_ISR() for? Is it possible to remove the functionality to confirm if that is the issue?
Hi Guarav, thanks for the comment. However, thinking a bit further, although this may be an issue in our port I don’t think it can be the cause of our problem.
The (single) use of taskENTER/EXIT_CRITICAL region in our firmware is in an ISR, to protect a small structure (a couple of entries). This structure gets updated with ‘return values’ by a task, and then read by the ISR to use when communicating externally (it’s an I2C ISR). So:
/* my_task.c */
taskENTER_CRITICAL();
// update fields in my critical region structure
taskEXIT_CRITICAL();
/* my_isr.c */
taskENTER_CRITICAL_REGION_FROM_ISR();
// read contents of critical region structure
taskEXIT_CRITICAL_REGION_FROM_ISR();
// use read contents
If the ENTER/EXIT...FROM_ISR() macros are null, then the only this that will occur is that the values read from the critical region might be wrong. That is a ‘benign’ error for our case right now. I don’t see it causing the issue with the ‘null’ pxTCB??
The macro taskENTER_CRITICAL_FROM_ISR is called from within the kernel as well and as @richard-damon mentioned, if your port supports nested interrupts, then you need portSET_INTERRUPT_MASK_FROM_ISR and portCLEAR_INTERRUPT_MASK_FROM_ISR.
P.S. - I guess taskENTER_CRITICAL_REGION_FROM_ISR is just a typo and it is actually taskENTER_CRITICAL_FROM_ISR or are you using a modified copy of kernel?