vListInsert() hang during multi-threaded stress scenarios on STM32F446

rksheth wrote on Tuesday, March 22, 2016:

I’m running FreeRTOS 8.2.1 on an STM32F446 MCU. Compiler: gcc-arm-none-eabi-4_9-2015q3

I’m aware that there are some common causes of this issue; I’ve investigated them but have not found any fundamental issues so far. I tend to see this issue crop up when I’ve got 5+ threads actively blocking and releasing each other on their respective queues. There are some timer ISRs also giving to semaphores and queues. All of this logic contained within these tasks is tested to be stable when running alone - it’s only when I put it all together that I start seeing these issues. I see it crop up most often in a thread that makes heavy use of a USART peripheral.

Here are the root causes that I’ve investigated, based on the comment in list.c above the hang location:

  1. Stack Overflow
    I’ve enabled stack protection to level 2. To double check, I got the pointer to the top of the current Task’s stack from pxCurrentTCB after a hang and then examined memory around there. I see the 0xa5a5a5 striping at the top. I’m also using a 1024 bytes of stack for my threads, and tried increasing it to 2048 to see if it made the hang go away - no luck.

  2. Incorrect Interrupt Priority Assignment
    I changed all interrupt priorities and subpriorities to 0xff with the exception of the systick handler, which is still at 0. I’m using NVIC priority grouping 4. To ensure I’m not forgetting to configure a priority for an interrupt that’s being enabled, I grep’d for every call of HAL_NVIC_SetPriority() and HAL_NVIC_EnableIRQ() to ensure that they’re always used together.

  3. Calling an API function from an ISR
    I checked all my interrupt handlers, and they are indeed using the FromISR functions for queue/semaphore ‘gives’.

  4. Using Uninitialized Queues or Semaphores
    All threads operate normally for some amount of time before the hang happens, and I explicitly initialize all queues/semaphores in one location before the scheduler is started.

If anyone can see a glaring flaw in my debug/thought process or has suggestions on how to proceed, I would appreciate it greatly. I’ve been banging my head at this for a few days now.


rtel wrote on Tuesday, March 22, 2016:

I changed all interrupt priorities and subpriorities to 0xff with the exception of the systick handler, which is still at 0

This MUST be the lowest possible priority, you have it set to the highest possible.

dirkshrivels wrote on Wednesday, December 13, 2017:

rks… did you ever solve this?