FreeRTOS hangs when no task needs to run.

martin-etla wrote on Tuesday, June 18, 2019:

MCU: STM32F030
FreeRTOS: 10.2.1
Toolchain: gcc-arm-none-eabi-8-2018-q4-major-linux.tar.bz2
Debugger: Segger Ozone V2.62, Win 10 x64

I’m stumped. Consider this code:

void task1(void *x)
{
    (void)x;

    while (1)
    {
        HAL_GPIO_WritePin(GPIOC, LD3_Pin, GPIO_PIN_SET);
        vTaskDelay(configTICK_RATE_HZ);
        HAL_GPIO_WritePin(GPIOC, LD3_Pin, GPIO_PIN_RESET);
        vTaskDelay(configTICK_RATE_HZ);
    }
}

void task2(void *x)
{
    (void)x;

    while (1)
    {
        HAL_GPIO_WritePin(GPIOC, LD4_Pin, GPIO_PIN_SET);
        vTaskDelay(configTICK_RATE_HZ*1.3);
        HAL_GPIO_WritePin(GPIOC, LD4_Pin, GPIO_PIN_RESET);
        vTaskDelay(configTICK_RATE_HZ*1.3);
    }
}

void task3(void *x)
{
    (void)x;

    while (1)
    {
    }
}

int main()
{
    MX_GPIO_Init();

    xTaskCreate(task1, "test1", 100, 0, tskIDLE_PRIORITY, 0);
    xTaskCreate(task2, "test2", 100, 0, tskIDLE_PRIORITY, 0);
    xTaskCreate(task3, "test3", 100, 0, tskIDLE_PRIORITY, 0);
    xPortStartScheduler();
    while (1)
        ;
    return 0;
}

This works perfectly, but if I add a vTaskDelay in task3 or comment out the xTaskCreate for task3 the kernel gets stuck in xTaskRemoveFromUnorderedEventList (in tasks.c) according to the debugger. If I look into the assembler it looks like:

0800136c <vTaskSwitchContext>:
 800136c:       b580            push    {r7, lr}
 800136e:       b082            sub     sp, #8
 8001370:       af00            add     r7, sp, #0
 8001372:       4b22            ldr     r3, [pc, #136]  ; (80013fc <vTaskSwitchContext+0x90>)
 8001374:       681b            ldr     r3, [r3, #0]
 8001376:       2b00            cmp     r3, #0
 8001378:       d003            beq.n   8001382 <vTaskSwitchContext+0x16>
 800137a:       4b21            ldr     r3, [pc, #132]  ; (8001400 <vTaskSwitchContext+0x94>)
 800137c:       2201            movs    r2, #1
 800137e:       601a            str     r2, [r3, #0]
 8001380:       e037            b.n     80013f2 <vTaskSwitchContext+0x86>
 8001382:       4b1f            ldr     r3, [pc, #124]  ; (8001400 <vTaskSwitchContext+0x94>)
 8001384:       2200            movs    r2, #0
 8001386:       601a            str     r2, [r3, #0]
 8001388:       4b1e            ldr     r3, [pc, #120]  ; (8001404 <vTaskSwitchContext+0x98>)
 800138a:       681b            ldr     r3, [r3, #0]
 800138c:       607b            str     r3, [r7, #4]
 800138e:       e007            b.n     80013a0 <vTaskSwitchContext+0x34>
 8001390:       687b            ldr     r3, [r7, #4]
 8001392:       2b00            cmp     r3, #0
 8001394:       d101            bne.n   800139a <vTaskSwitchContext+0x2e>
 8001396:       b672            cpsid   i
 
 It's stuck in an infinite loop here:
 8001398:       e7fe            b.n     8001398 <vTaskSwitchContext+0x2c>
 
 800139a:       687b            ldr     r3, [r7, #4]
 800139c:       3b01            subs    r3, #1
 800139e:       607b            str     r3, [r7, #4]
 80013a0:       4919            ldr     r1, [pc, #100]  ; (8001408 <vTaskSwitchContext+0x9c>)

So it conforms that it’s stuck without being able to task switch, but I simply can’t figure out why. I’ve checked the vector table and it seems fine with the correct vectors pointing at SVC_Handler, PendSV_Handler and SysTick_Handler.

If I add a break point in xPortSysTickHandler it activates if task3 is running as an infinite loop but it’s not even called once if I either disable task3 or add a vTaskDelay in it.

So as I said, I’m stumped. And I’ve googled myself silly.

Help?

rtel wrote on Tuesday, June 18, 2019:

At address 8001398 is it branching to itself, right? Do you have
configASSERT() implemented as an infinite loop when the assert fails?
If so I would guess you have failed an assert test. Please post the C
code as well as the asm code so we can see.

heinbali01 wrote on Tuesday, June 18, 2019:

Richard wrote:

Please post the C code as well as the asm code so we can see.

Or, if you can produce it, can you post an LSS file, which shows both C-code and assembler?

martin-etla wrote on Wednesday, June 19, 2019:

Hm. It hangs in tasks.c at line 2999 (“taskSELECT_HIGHEST_PRIORITY_TASK();”) which is a macro that expands to:

#define taskSELECT_HIGHEST_PRIORITY_TASK() \
{ \
 UBaseType_t uxTopPriority; \
 /* Find the highest priority list that contains ready tasks. */ \
 portGET_HIGHEST_PRIORITY( uxTopPriority, uxTopReadyPriority ); \
 configASSERT( listCURRENT_LIST_LENGTH( &( pxReadyTasksLists[ uxTopPriority ] ) ) > 0 ); \
 listGET_OWNER_OF_NEXT_ENTRY( pxCurrentTCB, &( pxReadyTasksLists[ uxTopPriority ] ) ); \
} /* taskSELECT_HIGHEST_PRIORITY_TASK() */

So I found that there actually were an configASSERT defined and that it hung the MCU! Thank you for that. I removed the assert but after that it goes into a HardFault Exception instead, in startup_stm32f030x8.s at:

/**
 * @brief  This is the code that gets called when the processor receives an
 *         unexpected interrupt.  This simply enters an infinite loop, preserving
 *         the system state for examination by a debugger.
 *
 * @param  None
 * @retval : None
*/
    .section .text.Default_Handler,ax,%progbits
Default_Handler:
Infinite_Loop:
  b Infinite_Loop

This is weird, never seen this before when I’ve used FreeRTOS. Surely FreeRTOS should just idle until a task is ready again?

I’ve objdumped both the working and broken version of the elfs and zipped them. The only difference between the two is that the version that hangs has a vTaskDelay(10) in task3, otherwise the code is identical.

martin-etla wrote on Wednesday, June 19, 2019:

See below. Yep, it was an assert in a macro that I missed, but now it go into an exception instead. Doh.

rtel wrote on Wednesday, June 19, 2019:

The assert is checking for a condition that should never happen, and if it does happen, will cause the code to crash. Therefore, by removing the assert, you are removing the check, and the code continues to the point where it has already predicted that it will crash. In this case it has selected a priority at which there are no tasks - hence it can’t select a task. It looks like basic data corruption somewhere - most likely to do with stack overflow (do you have overflow checking on?) or an invalid interrupt priority.

martin-etla wrote on Thursday, June 20, 2019:

Doh.

I had already tried to increase the stack for the big task (in the “real” code, not in this simplified example) and didn’t see any difference. Didn’t even think of increasing the stack size in the little simple “blink a LED”-task because I couldn’t imagine that blinking a LED would need a stack bigger than 100, I guess that was realy naïve.

Todays lesson: NEVER assume that the stack is big enough. And always activate the overflow checking.

Thanks.