Tasks list corruption / HardFault in xPortPendSVHandler()

dhemaius wrote on Wednesday, July 06, 2016:

FreeRTOS v8.2.3

System has been running for a while, then after a vTaskSwitchContext() call inside xPortPendSVHandler(), a hard fault occured when it tried to pop the core registers. Here is the list of values that did not make sense at that point :

pxCurrentTCB = 0x00000005 ( found out it came from corrupt pxReadyTasksLists[6] )

pxReadyTasksLists[6].uxNumberOfItems = 0xFFFFFFFF
pxReadyTasksLists[6].pxIndex = 0x2001ADA8
pxReadyTasksLists[6].pxIndex->pxNext = 0x2001ADA8
pxReadyTasksLists[6].pxIndex->pxPrevious = 0x2001ADA8
pxReadyTasksLists[6].pxIndex->pvOwner = 0x00000005 ( @ xDelayedTaskList1.uxNumberOfItems)
pxReadyTasksLists[6].pxIndex->pvContainer = 0x2001ADBC ( points on xDelayedTaskList1.xListEnd )
pxReadyTasksLists[6].xListEnd @ 0x2001ADA8

xDelayedTaskList1.uxNumberOfItems = 0x00000005
xDelayedTaskList1.pxIndex = 0x2001ADBC
xDelayedTaskList1.xListEnd @ 0x2001ADBC

uxTopReadyPriority = 81 ( configMAX_PRIORITIES = 7 !!! )

Any idea how such a corruption can happen?

Futhermore, sneeking around trying to find out how an empty list is still being used, I found something peculiar inside listGET_OWNER_OF_NEXT_ENTRY macro… Why reaffecting ( pxConstList )->pxIndex->pxNext to ( pxConstList )->pxIndex when it just found out that ( pxConstList )->pxIndex->pxNext is the end of the list marker, causing ( pxConstList )->pxIndex->pvOwner to be invalid?

#define listGET_OWNER_OF_NEXT_ENTRY( pxTCB, pxList )										\
{																							\
List_t * const pxConstList = ( pxList );													\
	/* Increment the index to the next item and return the item, ensuring */				\
	/* we don't return the marker used at the end of the list.  */							\
	( pxConstList )->pxIndex = ( pxConstList )->pxIndex->pxNext;							\
	if( ( void * ) ( pxConstList )->pxIndex == ( void * ) &( ( pxConstList )->xListEnd ) )	\
	{																						\
		( pxConstList )->pxIndex = ( pxConstList )->pxIndex->pxNext;						\
	}																						\
	( pxTCB ) = ( pxConstList )->pxIndex->pvOwner;											\
}

edwards3 wrote on Wednesday, July 06, 2016:

These things are nearly always because of interrupt priority problems. Are your interrupt priorities set below configMAX_SYSCALL_INTERRUPT_PRIORITY? Do you have configASSERT() defined? Do you have stack overflow checking defined to 2? For links to the info http://www.freertos.org/FAQHelp.html

dhemaius wrote on Wednesday, July 06, 2016:

A colleague just revised that on my request before I saw your post and the answer is yes, some ISR using the API were above (meaning number < ) the configMAX_SYSCALL_INTERRUPT_PRIORITY. I should have thought about it earlier, because I have been down that road.

I had added a configASSERT verification inside uxListRemove() in the past to trap an underroll of pxList->uxNumberOfItems and I still get that error.

Call Stack is always as follow :

prvIdleTask
SysTick_Handler
xTaskIncrementTick
uxListRemove
assert_failed (aka configASSERT )

I will personnally check again for a wrong priority I guess.