pxCurrentTCB gets NULL and trigger hard fault

pierrechevalier wrote on Tuesday, May 13, 2014:

Hi,

I am using cortex M3 port of FreeRtos V8 (but i had this problem with 7.5 also)for an stm32f207 pltaform.

The problem I am facing is that the pxCurrentTCB pointer gets null in particular conditions. I cannot find the reason why. It happens randomly in the time can be after 10minutes few seconds or 45 minutes.

I checked and rechecked the interrupt priorities and they seems correct (
-configMAX_SYSCALL_INTERRUPT_PRIORITY 0x10
-the highest prority that I am using is 3.
)

I added a configAssert to the vTaskSwitchContext to break when the pointer gets corrupted. Traces(with perceptio software) shows that the problem happens when a task gets ready just before an interrupt, and another task becomes ready during the interrupt (caused by a xSemaphoreGiveFromISR), after that a ready task becomes corrupted.
A total of 5 tasks are running (including idle and TmrSVC), and main source of interrupts are USART ( 2 usarts are particularly used at 115k and 230k )

Does anyone has an idea of what can be the cause of this ?

Thanks for your help !

rtel wrote on Tuesday, May 13, 2014:

Does anyone has an idea of what can be the cause of this ?

Nearly all support requests on STM32 parts come from incorrect interrupt priority assignment.

Did you take care of the instructions in bold red text half way down this page?

Also, if I don’t think 0x10 can be a valid value for configMAX_SYSCALL_INTERRUPT_PRIORITY as that particular constant takes a value in a format that is expected by the Cortex-M core itself, rather than any library function. This is also explained on the link above, and is particularly tricky on ARM Cortex parts. If you have configASSERT() defined and are using FreeRTOS V8.x.x then you should be hitting an assert if the configuration is wrong.

If there are 15 interrupt priorities on your STM32 (which there probably are) then only the top four bits of the Cortex-M interrupt registers are valid, so any value less that relies on the bottom four bits (as 0x01 does) will be ignored, and definitely cause a crash if interrupts are being used.

FreeRTOSConfig.h should contain something like the following, which first tests to see if the CMSIS _NVIC_PRIOR_BITS is defined, and if not, defines it manually. Then it derives the configMAX_SYSCALL_INTERRUPT_PRIORITY setting from a more natural (to a human) value:

/* Cortex-M specific definitions. */

#ifdef __NVIC_PRIO_BITS
/ __BVIC_PRIO_BITS will be specified when CMSIS is being used. /
#define configPRIO_BITS __NVIC_PRIO_BITS
#else
#define configPRIO_BITS 4 /* 15 priority levels */
#endif

/ The lowest interrupt priority that can be used in a call to a "set priority"
function.
/
#define configLIBRARY_LOWEST_INTERRUPT_PRIORITY 0xf

/ The highest interrupt priority that can be used by any interrupt service
routine that makes calls to interrupt safe FreeRTOS API functions. DO NOT CALL
INTERRUPT SAFE FREERTOS API FUNCTIONS FROM ANY INTERRUPT THAT HAS A HIGHER
PRIORITY THAN THIS! (higher priorities are lower numeric values.
/
#define configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY 5

/ Interrupt priorities used by the kernel port layer itself. These are generic
to all Cortex-M ports, and do not rely on any particular library functions.
/
#define configKERNEL_INTERRUPT_PRIORITY ( configLIBRARY_LOWEST_INTERRUPT_PRIORITY << (8 - configPRIO_BITS) )
/ !!! configMAX_SYSCALL_INTERRUPT_PRIORITY must not be set to zero !!!
See http://www.FreeRTOS.org/RTOS-Cortex-M3-M4.html.
/
#define configMAX_SYSCALL_INTERRUPT_PRIORITY ( configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY << (8 - configPRIO_BITS) )

Regards.

rtel wrote on Tuesday, May 13, 2014:

Sorry the formatting in the above is all messed up, even when in HTML - try cutting and pasting into a C file - or just looking at a file in the FreeRTOS download for an example - for example: https://sourceforge.net/p/freertos/code/HEAD/tree/trunk/FreeRTOS/Demo/CORTEX_M4F_STM32F407ZG-SK/FreeRTOSConfig.h

Regards.

pierrechevalier wrote on Tuesday, May 13, 2014:

The configMAX_SYSCALL_INTERRUPT_PRIORITY is set to 0x10 in my case because as you said I have 4 bits priority, thus the calculation gives :
configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY << (8 - configPRIO_BITS) = 1 <<(8-4) = 0x10
Because in my case I have set the configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY to 1( even if the max priority is actually 3 among all my interrupts, and 5 for the one that are using FreeRTOS functions).
configASSERT is defined indeed and there is no crash due to the priority, I actually tested using a lower priority for configMAX_SYSCALL_INTERRUPT_PRIORITY ( 6 ) to test the configASSERT feature and it crashed, so I can assume that my priorities are correctly set… no ?

To complete the description of my problem, all the tasks are set to tskIDLE_PRIORITY+1 ( except idle of course), and it seems that the pattern of the crash follows the following scenario :

  • T2 waits for semaphore S1
  • T1 becomes ready due to Tick.
  • Interrupt is triggered
  • The Interrupt gives the S1 semaphore, and T2 gets ready
  • T2 gets executed
  • T2 finishes its execution until it blocks for S1
  • At the next SVC, pxCurrentTCB is null after selecting highest priority task, T1 is not visible in any task list…

rtel wrote on Tuesday, May 13, 2014:

The configMAX_SYSCALL_INTERRUPT_PRIORITY is set to 0x10 in my case because

Sorry - discount the nonsensical part of my previous post - you are of course correct and I was suffering from brain fade.

Could you please post the interrupt service routine code.

Regards.

pierrechevalier wrote on Tuesday, May 13, 2014:

Usart 1 to 6 uses that kind of interrupt handler :

void USART1_IRQHandler() {
	static signed portBASE_TYPE xHigherPriorityTaskWoken;

	TRACE_ISR_ENTER(ISR_ID_UART1);
	xHigherPriorityTaskWoken = pdFALSE;
	USART_Isr(&UartDevices[COM0], &xHigherPriorityTaskWoken);
	TRACE_ISR_EXIT;

	portEND_SWITCHING_ISR( xHigherPriorityTaskWoken);
}

with Usart_ISR defined as below ( charReceivedEvent is a function pointer that can change depending on the usart but the usart concerned calls

void SensorSignalEventFromISR(signed portBASE_TYPE * xHigherPriorityTaskWoken){
	xSemaphoreGiveFromISR(sensorEvent,xHigherPriorityTaskWoken);
}

)
:

inline void USART_Isr(UART_DeviceConfig * puart,
		signed portBASE_TYPE * xHigherPriorityTaskWoken) {
	assert_param(puart);
	assert_param(puart->pUsart);

	volatile signed char cChar;

	/* USART transmit interrupt -------------------------------------------------*/
	if ((puart->eTxComMode == COM_TYPE_INTERRUPT)
			&& USART_GetITStatus(puart->pUsart, USART_IT_TXE ) == SET) {
		/* The interrupt was caused by the THR becoming empty.  Are there any
		 more characters to transmit? */
		if (Fifo_Pull(puart->txQueue, &cChar,1) > 0) {
			/* A character was retrieved from the buffer so can be sent to the
			 THR now. */
			USART_SendData(puart->pUsart, cChar);
		} else {
			USART_ITConfig(puart->pUsart, USART_IT_TXE, DISABLE);
		}
	}

	/* USART receive interrupt --------------------------------------------------*/
	if (puart->eRxComMode == COM_TYPE_INTERRUPT
			&& USART_GetITStatus(puart->pUsart, USART_IT_RXNE ) == SET) {
		if (USART_GetITStatus(puart->pUsart, USART_IT_ORE_RX ) == SET) {
			puart->overRunErrorCount++;
		}
		cChar = USART_ReceiveData(puart->pUsart);
		if (Fifo_Push(puart->rxQueue, &cChar,1) < 0) {
			puart->queueFullErrorCount++;
			// Queue full should not happen, used as statistic information
		}
		if (puart->charReceivedEvent) {
			puart->charReceivedEvent(xHigherPriorityTaskWoken);
		}
	} else if ((puart->eRxComMode == COM_TYPE_DMA)) {
		int curdatacounter = DMA_GetCurrDataCounter(puart->pRxDMAStream);
		int cnt=0;
		while ((puart->rxDMABufferCounter
				!= ( USART_DMA_RX_BUFFER_SIZE -curdatacounter))) {

			cChar = puart->rxDMABuffer[puart->rxDMABufferCounter];
			if (Fifo_Push(puart->rxQueue, &cChar,1) < 0) {
				puart->queueFullErrorCount++;
				// Queue full should not happen, used as statistic information
			}
			cnt++;
			puart->rxDMABufferCounter++;
			if (puart->rxDMABufferCounter >= USART_DMA_RX_BUFFER_SIZE) {
				puart->rxDMABufferCounter = 0;
			}
		}

		if (cnt!=0 && puart->charReceivedEvent) {
			puart->charReceivedEvent(xHigherPriorityTaskWoken);
		}
	}
}

Fifo function are simple fifo (doesn’t use any FreeRtos call, just character mode non blocking fifo)

rtel wrote on Tuesday, May 13, 2014:

So far, nothing obviously wrong. I will carry on just asking questions as we discuss it something may show up.

Has the code been proven to work in applications that don’t use FreeRTOS.
Does commenting out the TRACE_ISR_ENTER(ISR_ID_UART1); and TRACE_ISR_EXIT macros make any difference?

I imagine functions like USART_SendData() are just writing to registers directly, but is there a change one of them is doing something to the system’s interrupt status (BASEPRI or global interrupts - changing the peripheral’s interrupt enable/disable status should not make a difference - in fact changing the global interrupt status should not really make a difference either as FreeRTOS only uses BASEPRI).

Is there a chance the interrupt stack is overflowing? Interrupts re-use the stack that was originally used by the main() function, as after the scheduler is running there is no way back to main(). The size of the stack used by main is probably set in a project option or linker script.

Does commenting out portEND_SWITCHING_ISR( xHigherPriorityTaskWoken); resolve the issue? That would give a clue.

Regards.

pierrechevalier wrote on Wednesday, May 14, 2014:

Hi,

TRACE_ Macros are defined this way :

#define TRACE_ISR_ENTER(id)  vTraceStoreISRBegin(id);
#define TRACE_ISR_EXIT  vTraceStoreISREnd();

I tried to put them in critical sections but no change.
Send and receive functions are just basic register read/write indeed, no BASEPRI change.

I’ll check the interrupt stack overflow and try to comment the portEND_SWITCHING_ISR.

Thanks a lot for your help !

Best regards

pierrechevalier wrote on Wednesday, May 14, 2014:

I just tested to remove the portEND_SWITCHING_ISR and no change.
The stack didn’t overflow neither.

I’ll continue to find more clues on that problem…

pierrechevalier wrote on Wednesday, May 14, 2014:

After looking at older trace dump in which I didn’t identify the pattern before, I found out that both task could be NULL, either T1 or T2 in the example above.

I am currently testing without semaphore for waking the task and so far it works…

rtel wrote on Wednesday, May 14, 2014:

I would still somehow suspect the interrupt priorities, but you say you have FreeRTOS 8 and have configASSERT() defined? Does your definition of configASSERT() contain an infinite loop so there is no way out?

Have you made any changes to any code in the FreeRTOS/source directory - no matter how inconsequential you think it might be?

Could you try a different configMAX_SYSCALL_INTERRUPT_PRIORITY value, just in case there is some special case when it is 0x10 that I am not aware of. For example, try 0x70.

Regards.

pierrechevalier wrote on Wednesday, May 14, 2014:

According to other similar post on the forum I though that first, and since I noticed this problem I checked and rechecked the priority settings, but I never found a problem on that side.

I have made a single change to the source : adding configASSERT in task.c line 2149 :

taskSELECT_HIGHEST_PRIORITY_TASK();
configASSERT(pxCurrentTCB);
traceTASK_SWITCHED_IN();

configASSERT is set this way :

#define configASSERT(  x )     if( ( x ) == 0 ) vAssertCalled( __FILE__, __LINE__ )

void vAssertCalled( char * file, int line )
{
	DBG("ASSERT %s(%d)",file,line);
	taskDISABLE_INTERRUPTS();
	for( ;; );
}

As you see assert contains a loop and the I added a breakpoint at the first line, so no way out…

To check, i pasted the code chunk you gave me yesterday for setting interrupt priority, and i have set :

#define configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY	3

The problem is still present after that.

pierrechevalier wrote on Wednesday, May 14, 2014:

I added in application tick hook a trace marker.
And it seems that a tick happen just before the crash.

-It happens during usart interrupts
-A tick happen and change the current task to another
-When the running task gets blocked it goes back to the previous
-And the previous is null…

rtel wrote on Wednesday, May 14, 2014:

Can you please cut your project down to the absolute minimum amount of code that still exhibits this problem, ensure it builds without any reliance on absolute file paths, then send it (together with the project files, so not just the source files) to the ‘business contact’ address on the following page: http://www.freertos.org/contact

Regards.

pierrechevalier wrote on Thursday, May 15, 2014:

The problem is that the software runs on our custom platform, with interactions with other devices and cpu. The problem seems to appear when all the tasks are running.

A weird thing i just noticed :
In the pxReadyTasksListsArray i have for the priority 1 (the max priotrity of my tasks, so all the tasks should be at that level). if I look at the data structure to see the tasks, i have the following when i break randomly, before any crash happens :

*uxNumberOfItem = 2
*pxIndex
|...
\-pvOwner => points to T1
 -pxNext
 |...
 \-pvOwner => 0x00
  -pxNext
  |...
  \-pvOwner => points to T2
   -pxNext -> points to pxIndex

Does such a thing possible /correct ?

And attached a screenshot of what happens just before the crash.
There it happens not in usart interrupt, which shows that there is a link with the give semaphore, or the fact that a task gets ready from an interrupt, then immediatelly after, a tick happen ( tickEvent is triggered) and then an event show that a yield is pending…
The crash happens when comes the time to switch to the DSPCom task which is null…

pierrechevalier wrote on Thursday, May 15, 2014:

Am i doing something wrong in starting the tasks, as it seems that the list is weird ?

I do that :
-int main() creates Main task.
-vTaskStartScheduler()
-Main task starts 2 other tasks

is that right ?

rtel wrote on Thursday, May 15, 2014:

-pvOwner => 0x00

That is probably just the list end marker. You are starting to traverse the list from the pxIndex, which walks through the list. If you traversed the list from the start of the list you would probably find the null owner is the last in the list.

I do that :
-int main() creates Main task.
-vTaskStartScheduler()
-Main task starts 2 other tasks

You can create tasks both before and after the scheduler has started, so there should be no problem with that.

Regards.

pierrechevalier wrote on Thursday, May 15, 2014:

OK i see.

pierrechevalier wrote on Friday, May 16, 2014:

Another thing I noticed with traces, it seems interrupt nests inside the systick interrupt, that should not happen right ?
Systick priority is 15 and configMAX_SYSCALL_INTERRUPT_PRIORITY is 3 sot his seems to be ok, but what if in systick prvAddTaskToReadyList gets called, vListInsertEnd gets called too. If an interrupt raiss meanwhile in the middle of that call it can corrupt the list, this must be why you asking me to check for interrupt…
But then there must be something I don’t understand because systick should mask other interrupts right ?

rtel wrote on Friday, May 16, 2014:

The systick and yield interrupt handlers include critical sections that mask up to the max syscall interrupt priority to protect internal data structures that could also be accessed by an application interrupt.

Regards.