Exception is noticed while executing xTaskRemoveFromEventList

Hi,
version: FreeRTOS V9.0.0

I am getting an exception when I tried to stress my application, which eventually causing watchdog reset on LPC55S69 cortex M33 controller.
I tried to capture the instruction which is causing exception and it points to first line of code in xTaskRemoveFromEventList() which is invoked by xQueueGenericSend().
could someone please let me know what could be the reason for this exception?

xQueueGenericSend is calling xTaskRemoveFromEventList() when the task list waiting on queue is not empty but not sure why is it failing.

				/* If there was a task waiting for data to arrive on the
				queue then unblock it now. */
				if( listLIST_IS_EMPTY( &( pxQueue->xTasksWaitingToReceive ) ) == pdFALSE )
				{
					if( xTaskRemoveFromEventList( &( pxQueue->xTasksWaitingToReceive ) ) != pdFALSE )
					{

the equivalent assembly code generated for first few lines of API xTaskRemoveFromEventList() is as below

xTaskRemoveFromEventList
0x20029928: b570 p. PUSH {r4-r6,lr}
0x2002992a: 68c0 .h LDR r0,[r0,#0xc]
0x2002992c: 68c5 .h LDR r5,[r0,#0xc]
0x2002992e: b155 U. CBZ r5,0x20029946 ; xTaskRemoveFromEventList + 30
0x20029930: f1050418 … ADD r4,r5,#0x18
0x20029934: 4620 F MOV r0,r4
0x20029936: f7fefaf5 … BL uxListRemove ; 0x20027f24

when exception occurred, program counter is pointing to 0x2002992c.

BaseType_t xTaskRemoveFromEventList( const List_t * const pxEventList )
{
TCB_t *pxUnblockedTCB;
BaseType_t xReturn;

pxUnblockedTCB = ( TCB_t * ) listGET_OWNER_OF_HEAD_ENTRY( pxEventList );
configASSERT( pxUnblockedTCB );
( void ) uxListRemove( &( pxUnblockedTCB->xEventListItem ) );

Perhaps a data corruption of the FreeRTOS internal data ?
Did you define configASSERT and enabled stack checking ?

stack overflow is checked and it is not overflow issue.

ConfigASSERT is not defined but using the default implementation of it, which is disabling interrupts and waiting indefinitely.

/* Define to trap errors during development. */
#define configASSERT(x) if(( x) == 0) {taskDISABLE_INTERRUPTS(); for (;;);}

the exception is occurring before configASSERT line and I suspect it is failing when it tries to load the head of the list, would like to understand why is it failing to fetch?
#define listGET_OWNER_OF_HEAD_ENTRY( pxList ) ( (&( ( pxList )->xListEnd ))->pxNext->pvOwner )

Could it be that the queue you try to access is a NULL pointer? What is the exact call stack at corruption time?

And in addition to the suggestions above, can you try with the latest FreeRTOS version?

Oh yes.
@phaneesh86 Recent versions of FreeRTOS come with more useful assertions which could help to narrow down the root cause problem that the list is invalid or got corrupted.
Another reason for internal data corruption might be invalid interrupt priorities when using FreeRTOS (FromISR) API in ISRs.
When in doubt see the FreeRTOS docs regarding RTOS for ARM Cortex-M and
e.g. Understanding priority levels of ISR and FreeRTOS APIs - #16 by aggarg which contains a pretty good explanation.

it is production firmware and I can’t move to another version of OS.

I do suspect the same that it is trying to access NULL pointer but is there any possibility for it? before calling xTaskRemoveFromEventList() it is checked that list is not empty.

/* If there was a task waiting for data to arrive on the
queue then unblock it now. */
if( listLIST_IS_EMPTY( &( pxQueue->xTasksWaitingToReceive ) ) == pdFALSE )
{
if( xTaskRemoveFromEventList( &( pxQueue->xTasksWaitingToReceive ) ) != pdFALSE )
{

regarding call stack, I don’t have debugger connected to it so can’t say exact call stack but based on the exception handler data and code walk through, it is as below.

xQueueSendToBack-> xQueueGenericSend → xTaskRemoveFromEventList

xQueueSendToBack() is invoked by SPI send function to respond back with some data for a command received from master.

can we see some code?

Did you check interrupt priorities as suggested by @hs2?

Is the send API called from a task or an ISR?

send API is called from task.

I tried to add some code for reading the fault status registers, now the exception occurring at vListInsert API instead of xTaskRemoveFromEventList().

looks like RTOS heap memory used for the Queue is getting corrupted and when it tries to dereference it, throwing exception.

is there any way to check whether Kernel heap and stack are overlapping?

There is little merit in trying to get more information about the fault because the real corruption problem has occurred many many cycles before the crash. As @aggarg pointed out, data breakpoints are very valuable tools in pinpointing the root cause. For example, if you happen to discover that the corruption always involves the invalid value 0xdeadbeef at address 0x24448888, define a data watch point to break into the debugger for a write of 0xdeadyyyy to that address. That kind of thing gets you to the root cause much quicker.

Did you already verify that the ISR priorities are valid and the correct FromISR FreeRTOS API is used and also the heap implementation is right ?
Since FreeRTOS allocates task stacks from heap (by default/if configured accordingly) they can’t overlap. In addition on Cortex-M33 the stack check (if enabled) is very reliable because it has HW support for it (stack limit registers).
You also could use a recent FreeRTOS version with the mentioned more sophisticated asserts and checks just for testing purposes to find the problem and apply it to your production application if you really can’t upgrade.

finally, issue is root caused as stack overflow. unfortunately, the stack overflow hooks couldn’t catch it. after running same command for 10k+ iterations, it is calling an mbedTLS api which requires additional 1.2KB of stack, which was not available hence causing stack overflow and if the corrupted region is de-referenced, it is throwing an exception.

is there any standard way to check the allocated stack is sufficient or not for the given task in a stressed condition?

in my previous project, DEOS (Digital Engine Operating System) an RTOS was used and had a datmon to measure all OS resources in worst case scenario by stressing the application.

Thank you all for your suggestions and support.

Thank you for reporting back. You can use the uxTaskGetStackHighWaterMark function to tune your stack size - FreeRTOS - A FREE RTOS for small real time embedded systems

Do I miss something @aggarg ? Since it’s an Cortex-M33 I’d expect a stack overflow IS catched reliably if stack checking is enabled :thinking: Even with the older FreeRTOS version.

They said that they are using FreeRTOS version 9 which did not have Cortex-M33 port with PSPLIM registers support. My guess is that they are running M4 port.

The Cortex-M33 ports always catch stack overflows immediately regardless of the value of configCHECK_FOR_STACK_OVERFLOW.

1 Like