Strange behavior of vTaskDelay() when optimization is turned off (O0)

I am building a project for STM32 based on libopencm3 and FreeRTOS with one task: blinking LED with a delay.

void vApplicationStackOverflowHook(TaskHandle_t *pxTask, signed portCHAR *pcTaskName)
{
    (void)pxTask;
    (void)pcTaskName;

    gpio_set(GPIOD, GPIO14);

    while (1);
}

void vLedToggle(void *args)
{
    while (1)
    {
        gpio_toggle(GPIOD, GPIO12);
        vTaskDelay(pdMS_TO_TICKS(1000));
    }
}

int main(void)
{
    clock_setup();
    gpio_setup();

    xTaskCreate(vLedToggle, "vLedToggle", configMINIMAL_STACK_SIZE, NULL, tskIDLE_PRIORITY + 1, NULL);

    vTaskStartScheduler();

    return 0;
}

Building is done with optimization disabled. As a result, incomprehensible functioning of the software (yes, the inclusion of optimization solves the problem, but I want to understand what is the reason for this behavior).

I noticed a strange thing: initially the LED is off, after the microcontroller starts up, it turns on, but it no longer goes out. It turns out that the program falls into blocking_handler() due to Hard Fault (blocking_handler() - standard libopencm3 stub for specific exceptions). But why this happens is not clear. I debug in VSCode using JLink. Unfortunately, the call did not find anything interesting on the stack (the last function is my attempt to get information about the error; removing the written code, the program will appear in blocking_handler()):

6sLfV

The second strange thing: if put a breakpoint at the beginning of the task, step through all the functions (with enter in each), everything works perfectly.

No interrupts are used. The OS is configured according to the documentation and examples (I can publish the contents of the file, if necessary; configASSERT() is defined). I tried to get the contents of the registers according to the instructions on the FreeRTOS website, but to no avail - pc, for example, contains: 0xA5A5A5A4.

(If replace vTaskDelay() with a banal cycle, then everything works.)

What could be the problem and how to deal with it without changing the level of optimization?

Just as quick experiment, you may try increasing the stack size for your task. Use for example configMINIMAL_STACK_SIZE * 2 in your call to xTaskCreate().

How is configCHECK_FOR_STACK_OVERFLOW defined in FreeRTOSConfig.h?

There is hardly a problem in the stack overflow - vApplicationStackOverflowHook() is not called, and besides, I tried to increase the size of the stack earlier.

configCHECK_FOR_STACK_OVERFLOW is set to 2.

Just checking some basic things. Does your linker file provide some heap space and main stack too?

And for clarification, you’re saying you get hard fault in vTaskDelay, unless you step through it in the debugger?

Linker file contents: https://pastebin.com/pvibDCMN

Yes, if go through all the stages (macros, function calls) sequentially in the debugger, then everything works (maybe this is a cumulative error). However, during normal startup, Hard Fault occurs.

Is configUSE_MALLOC_FAILED_HOOK defined?

I can’t make a lot of sense out of what you’re seeing. Did you start this project with a FreeRTOS example project? Which one?

configUSE_MALLOC_FAILED_HOOK is set to 0.

Not certainly in that way. I used the libopencm3 template, in which I configured the Makefile and added FreeRTOS (source and header files, as well as the configuration file).

Set configUSE_MALLOC_FAILED_HOOK to 1, and add the corresponding handler vApplicationMallocFailedHook().

If nothing else (and if nobody else on the forum has any better ideas) you may need to start with an example project from FreeRTOS or at least carefully review one as a reference.

Declared a function, but this did not help - the program does not get there.

Is there any debugger capable of tracing assembler code (like x64dbg for Windows, for example)?

And one more thing - can this be due to the FPU?
When creating the project, I selected a file for the M4F core, but perhaps for it to work correctly it needs to be configured somehow (the microcontroller contains this block)?

Getting a trace buffer on STM32 requires an upgraded debugger (like J-Trace).

Possibly. Are you using the FreeRTOS port for Cortex M4? And is your STM32 a Cortex M4? You definitely have to use the port that matches your MCU. Which STM32 is it?

I don’t know anything about libopencm3, and it may need some configuration too. There are so many things that could go wrong, that’s why both ST and FreeRTOS provide working example projects. I use CubeIDE from ST (though I don’t use the code generator), newlib, and J-Link, and for what it’s worth they work fine, especially for the price. :grinning:

I am using STM32F407VGT6 (STM32F4-DISC1). Took the file for the GCC compiler for the M4F architecture.

I’m not sure I can help. There are compiler and linker settings, runtime library configuration questions, and you are not starting from a known-working demo project. If I were you I would switch to CubeIDE and start with a demo project. Then slowly replace their pieces with your pieces until you find out what breaks the project.

If you don’t do that, I think the best lead you have going for you is that the code works when you step through it in the debugger. Explore the space between single-stepping through the code and not using the debugger at all. (Lots of space in between those two extremes by placing breakpoints strategically.) You’ll need to understand how FreeRTOS works in order to do that, but that knowledge is worthwhile anyway.

First check the LED on and off functions execute as you expect by simply having a sequence of on, off, on, off in main() before you do anything else, then step over each call to turn the LED on and off in the debugger.

Which heap implementation are you using?

FreeRTOS requires a couple of interrupt handlers - SVC and PendSV. If you enter the first task ok then the required SVC handler must be installed ok, as SVC is used to start the first task. If the first task runs, but vTaskDelay() crashes, then it is possible you don’t have the correct PendSV handler installed. You said when you replace the call to vTaskDelay() with a null loop (one that doesn’t switch away from the task) it runs ok - when you do that does the tick interrupt execute? The simplest way to tell is to break in the debugger then inspect the value of xTickCount in tasks.c - if the tick interrupt is not executing it could mean PendSV is not installed.

I made several toggles in main() - everything is in order (it also works with cycle).

I’m using the fourth implementation of the heap.

All handlers are installed and operational (first SVC, then twice PendSV, - on the last call Hard Fault occurs; When pend_sv_handler is called a second time, the xTickCount value is increased from 0 to 1000):

Untitle

Can you post the disassembly for pend_sv_handler(), with and without optimization? Curious about main stack (mis-)use.

This is amazing, but the functions turned out to be exactly the same, with the exception of the addresses, of course (https://pastebin.com/mB5RvM93
https://pastebin.com/h0jVYEUX).

By the way, I noticed a check for stack overflow (macro) in the context switching task. I decided to look there to find out exactly how the check for stack overflow occurs (in the context of the IDLE task, of course). I got a pointer to the stack, after manually checking each value - and it only became more incomprehensible. The first coincided, but the other three have the last byte different. But why is the overflow handler not called? How can 128 bytes not be enough in such a simple task? Although, perhaps, I was mistaken in some ways during the research, because even before entering vLedToggle the values ​​do not match.

Among other things. The task flow, after switching contexts and returning to vTaskDelay, falls into xTaskResumeAll, where it supposedly gets stuck on the taskEXIT_CRITICAL macro, although in fact the mysterious address 0xa5a5a5a4 is again located at top on the stack (I didn’t see this on the stack when I performed a manual check). And the pc register, respectively, is equal to this address. If release the program, it falls into the Hard Fault handler, otherwise it simply will not exit this section.

It’s 128 32-bit words, and it’s the minimum stack required for any task.

Sounds like the stack pattern you are checking is not the overflow fence but the pattern for finding the high-water mark (perhaps you have INCLUDE_uxTaskGetStackHighWaterMark set to 1).

The disassembly you posted isn’t for pend_sv_handler(). It’s for xPortPendSVHandler(), which of course doesn’t change with optimization because it’s a naked pure-asm function.

Can you post the disassembly for pend_sv_handler() – with and without optimization. If it stacks the LR (or anything else) before calling xPortPendSVHandler(), that’s a bad thing. That function doesn’t return. Function xPortPendSVHandler() is intended to be the ISR not be called by the ISR.

So, as many as 512 bytes are allocated per task?

What does it mean? INCLUDE_uxTaskGetStackHighWaterMark is disabled.

Does it not change because it is written in assembler or because it is declared naked?

Do you mean that xPortPendSVHandler should be immediately in the interrupt table, and not called manually?

Without optimization:

08000290 <pend_sv_handler>:
 8000290:	b580      	push	{r7, lr}
 8000292:	af00      	add	r7, sp, #0
 8000294:	f000 fbe4 	bl	8000a60 <xPortPendSVHandler>
 8000298:	bf00      	nop
 800029a:	bd80      	pop	{r7, pc}

With:

0800024c <pend_sv_handler>:
 800024c:	b508      	push	{r3, lr}
 800024e:	f000 fa07 	bl	8000660 <xPortPendSVHandler>
 8000252:	bd08      	pop	{r3, pc}

A minimum of 512 bytes are allocated per task, yes.

When configCHECK_FOR_STACK_OVERFLOW is 2, or when INCLUDE_uxTaskGetStackHighWaterMark is 1, a task’s stack area is filled with a pattern before the task starts. This pattern helps FreeRTOS detects a “high-water” mark, or the maximum amount of stack the task actually uses. In the case of configCHECK_FOR_STACK_OVERFLOW, the pattern also helps the FreeRTOS notice that the task used too much stack.

The compiler doesn’t try to optimize code inside __asm statements. Naked functions contain only __asm statements, so that code doesn’t change with/without optimization.

Yes, exactly.

The disassembly you posted shows the problem. Each time pend_sv_handler() executes, it pushes two registers onto the main stack. However, because xPortPendsSVHandler() never returns, they never get cleaned up. The main stack continues to grow until it overflows and corrupts something important perhaps in the heap or elsewhere.

Installing xPortPendSVHandler directly in the vector table will solve the problem. FreeRTOS does this typically with #define, but you can probably edit the vector table directly if you want. Be sure to install vPortSVCHandler() the same way. xPortSysTickHandler() doesn’t have the same requirement.

One last mystery though. Why is your app running the PendSV handler so much? Do you have preemption turned off (configUSE_PREEMPTION)?