HardFault due to bad task handling

anonymous wrote on Wednesday, June 27, 2012:


I am implementing an Application using FreeRTOSv7.1.1 on a stm32f2xx.
My main_Task is set at tskIDLE_PRIORITY + 1 and uses a binsemaphore to wait for an interrupt.
After a random runtime, the system Hardfaults. It is due to pxCurrentTask go to 0x0.

I wandered why and found that for a still unknown reason, my pxReadyTasksLists->pxIndex is still set to ** &(pxReadyTasksLists->xListEnd)** after the task being resumed from the ISR.

I use for this xSemaphoreGiveFromISR and the isr priority is configured between configMAX_SYSCALL_INTERRUPT_PRIORITY and configLIBRARY_KERNEL_INTERRUPT_PRIORITY.

Could someone help me find some new trails to the solution ?


anonymous wrote on Wednesday, June 27, 2012:

I forget to mention that running a dummy_task at the same priority make the system working fine. But that workaround does not really satisfy me :confused:

void dummy_task(void *pvParameters)
	while(true) {

rtel wrote on Wednesday, June 27, 2012:

So, if I follow what you are saying, a task at priority 1 is being signaled by an interrupt via a semaphore, and this is working ok for some time before the system crashes.  When the system crashes, pxCurrentTCB is NULL, and the ready tasks list for tasks of priority 1 is empty.

Even with the task not referenced from the ready list, the scheduler should at least select the idle task to run, so pxCurrentTCB should never be zero.  This points to a corruption of the data structures somewhere.

It sound like you have debugged this quite deeply, and are familiar with the interrupt priority settings configuration parameters etc. already - so my usual reply of “its probably the interrupt priority settings” might not be the case here.  However, it is still worth double checking that the interrupts are running at the priority you think, and that no interrupts you are not aware of are running.  These are the items documented on the http://www.freertos.org/FAQHelp.html and http://www.freertos.org/RTOS-Cortex-M3-M4.html web pages.

After that it is a matter of looking at any other potential source of corruption in your code, where are buffers being used, is any shared memory being used, are resources being used by multiple tasks, etc.

Can you keep cutting bits out of the code until the problem stops happening, so you have the smallest amount of code left to look at.

It is also possible to look at the task’s TCB itself, and see if you can work out where the task is.  If it is not in its ready list, is it referenced from another list somewhere?  You can do this by obtaining the tasks TCB from its handle, then looking at the generic list item’s container value.  It should always be one of the ‘state’ lists - be that one of the ready lists, the blocked list, etc. etc. - but in this case that is probably the last resort option as learning where you task is after the problem is not necessarily going to lead you directly to the actual cause of the problem.


anonymous wrote on Thursday, June 28, 2012:

As I dont need IT without Kernel Calls my kernel intterupt priority is set to the lowest (15 on cortex m3) and I allow all interrupts to use syscall and therefore being masked while in kernel calls.

In fact, when the system crashes, the task is in the ready tasks list for priority 1.
But the pxIndex of the list point to the ListEnd item. That’s why ->pxOwner is NULL and pxCurrentTCB gets the NULL pointer.
This gives me the idea to test with a dummy_task just to have a task always being pointed by pxIndex.
So I suspect that somewhere pxIndex is not updated correctly but it seems too big to be this.

i will investigate on this today.

anonymous wrote on Thursday, June 28, 2012:


As everything looked fine in the list managment (as i wasn’t doubting), I reviewed the FreeRTOS interrupt priority settings.
As mentionned in my previous post i had :

#define configKERNEL_INTERRUPT_PRIORITY 	0xFF	/* Priority 15 */
#define configMAX_SYSCALL_INTERRUPT_PRIORITY 	0x00		/* Priority 0 */

configMAX_SYSCALL_INTERRUPT_PRIORITY is set in BASEPRI on cortex-m3, I read the programming manual for this reg and saw that I quote :

BASEPRI Priority mask bits(1)
0x00: no effect
Nonzero: defines the base priority for exception processing.

I switch to 0x10 and everything seems to be fine now.

Thank you very much for you answer.
I’m sorry for the disturbance.


rtel wrote on Thursday, June 28, 2012:

Glad it works now (not setting a value of 0 is documented on this page: http://www.freertos.org/RTOS-Cortex-M3-M4.html )