STM32F4 loses task

dpursell wrote on Tuesday, October 01, 2013:

Hello,

I’m developing using Red Suite 5 and FreeRTOS 7.5.2 on the STM32F4. A certain task seems to be getting dropped from the task list after running for an hour or so, after which the program will either behave erratically or hardfault right away.

When crashing, the program appears to be trying to swap in an invalid task address (pxCurrentTCB == 0x0) during the PendSV interrupt handler. Additionally, the internal bookkeeping data is inconsistent: uxCurrentNumberOfTasks == 9, but only 8 tasks exist in the list returned by uxTaskGetSystemState(). Also pxReadyTasksLists[3].uxNumberOfItems == 2 (the lost task and another task ran at the same priority), but there only seems to be one valid item, the other is a pointer to pxReadyTasksLists[3].xListEnd.

I’m 99% sure it’s not a stack problem, I’m using “#define configCHECK_FOR_STACK_OVERFLOW 2” and when calling uxTaskGetSystemState() or manually checking stacks in the debugger I’ve never come within 150 bytes for any task, and never within 500 bytes for the task in question.

I’m guessing I have an interrupt priority problem, since that seems to be the most common issue on the Cortex-M chips, but as far as I can tell I’m doing everything correctly.

// Definitions in FreeRTOSConfig.h
#define configLIBRARY_LOWEST_INTERRUPT_PRIORITY			0xF
#define configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY	5
#define configKERNEL_INTERRUPT_PRIORITY 		( configLIBRARY_LOWEST_INTERRUPT_PRIORITY << (8 - configPRIO_BITS) )
#define configMAX_SYSCALL_INTERRUPT_PRIORITY 	( configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY << (8 - configPRIO_BITS) )

// Called when the program starts before any other NVIC functions
NVIC_SetPriorityGrouping(3);

// Calls to install each individual interrupt, where:
//   irq is the interrupt number
//   prio is the assigned priority, I only use (configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY + 0) to (configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY + 2)
NVIC_SetPriority(irq, prio);
NVIC_EnableIRQ(irq);

Here’s information from the hard fault:
R0 = 2001C064
R1 = 0
R2 = 0
R3 = 20000F74
R12 = 200004BC
LR [R14] = 557CF342 subroutine call return address
PC [R15] = 557CF342 program counter
PSR = 6000000E
BFAR = E000ED38
CFSR = 00000001
HFSR = 40000000
DFSR = 00000001
AFSR = 00000000
SCB_SHCSR = 00000400

Any insight into what might be going wrong would be greatly appreciated, I’ve been banging my head against this for a few days without much progress. If anything is unclear I would be happy to elaborate.

Thanks,
David

rtel wrote on Tuesday, October 01, 2013:

NVIC_SetPriorityGrouping(3);

This is almost certainly the cause of your problem. Change the “3” to a “4”. As you are using V7.5.2 I would also recommend defining configASSERT() as some additional asserts were added to catch exactly this problem.

If you have not already done so, take a look at:

Noting the red text ;o)

Regards.

dpursell wrote on Tuesday, October 01, 2013:

Thanks for the quick response Richard,

I should have pointed this out, but I’m using the function NVIC_SetPriorityGrouping(), not NVIC_PriorityGroupConfig() (note slightly different name). I am not using the standard peripheral library so NVIC_PriorityGroupConfig() and NVIC_PriorityGroup_4 are not available. Instead I found this function in my core_cm4.h file:

static __INLINE void NVIC_SetPriorityGrouping(uint32_t PriorityGroup)
{
  uint32_t reg_value;
  uint32_t PriorityGroupTmp = (PriorityGroup & (uint32_t)0x07);               /* only values 0..7 are used          */

  reg_value  =  SCB->AIRCR;                                                   /* read old register configuration    */
  reg_value &= ~(SCB_AIRCR_VECTKEY_Msk | SCB_AIRCR_PRIGROUP_Msk);             /* clear bits to change               */
  reg_value  =  (reg_value                                 |
                ((uint32_t)0x5FA << SCB_AIRCR_VECTKEY_Pos) |
                (PriorityGroupTmp << 8));                                     /* Insert write key and priorty group */
  SCB->AIRCR =  reg_value;
}

I am fairly certain that my call to NVIC_SetPriorityGrouping(3) does the exact same thing as your suggested NVIC_PriorityGroupConfig(NVIC_PriorityGroup_4) since NVIC_PriorityGroup_4 is defined as 0x300.

Additionally I have tried using 0 instead of 3, and also have the assert defined as:

#define configASSERT( x ) if( ( x ) == 0 ) { taskDISABLE_INTERRUPTS(); for( ;; ); }	

but it never registers any problems.

rtel wrote on Tuesday, October 01, 2013:

Ah yes, I didn’t notice that subtle change.

Are you using the STM32 standard peripheral library at all? If not, you can probably not call any of the priority grouping functions - and just leave it at its default. That is how all the other Cortex-M ports work. There is only a special case for STM32 because the standard peripheral libraries silently bug your code if the priority grouping is left at its default value (you will see why if you step through the STM32 version of “priority set”).

Regards.

dpursell wrote on Tuesday, October 01, 2013:

Ah, thanks for the clarification. As suggested, I’ve removed the NVIC_SetPriorityGrouping() call, and the configASSERT still never triggers so I assume you are correct that it was not necessary in my case. However, the original problem still remains, the same task continues to disappear after running for a while.

I will try to keep troubleshooting and post results here if I discover anything, in case others are having a similar problem. If anything comes to mind you think I should try, or any specific variables I should examine in the debugger, please let me know.

Thanks,
David

dpursell wrote on Wednesday, October 02, 2013:

It looks like the problem was caused by interrupt nesting, but I’m still not sure why the nesting caused corruption of the internal FreeRTOS structures. I doubt it’s a bug in FreeRTOS since nobody else seems to have noticed this, so I’m probably doing something wrong, but at least for now I’ve worked around it by just assigning my interrupts to the same priority.

I’m attaching a FreeRTOS+ Trace clip showing the problem. My two tasks are named “MPU” and “Amulet”, and they have the same FreeRTOS priority. This exact sequence seems to be present immediately before each crash:

  1. Idle task is active
  2. SPI interrupt (prio = 7) releases a semaphore
  3. “MPU”, waiting on SPI semaphore, becomes ready
  4. Before “MPU” becomes active, UART5 interrupt (prio = 6, logically higher than SPI) pushes into a queue
  5. “Amulet”, waiting on UART5 queue, becomes ready
  6. “Amulet” becomes active
  7. “MPU” tries to become active, but somehow pxCurrentTCB is now NULL and the program crashes

If by chance anyone is experiencing a similar issue, my advice is to set your interrupts to be the same priority, seemed to work for me.

Thanks,
David