Target: Nucleo F411RE (Cortex-M4)
Toolchain: arm-none-eabi-gcc (multiple versions)
FreeRTOS: v26.4 (also reproduces on v24.4)
Optimization flags: works with -Og / -Os / -O1 / -O2, fails with -O0
Problem summary
I have a minimal FreeRTOS project with two tasks. One task toggles an LED every 500 ms using vTaskDelay(pdMS_TO_TICKS(500)). The other does almost nothing.
When compiled with -O0:
-
LED turns ON once
-
LED never turns OFF (the task blocks forever after the first
vTaskDelay)
When compiled with any higher optimization (-Og, -Os, -O1, -O2), everything works correctly.
The same behavior occurs with FreeRTOS v24.4 and v26.4.
Root cause analysis (debugger + disassembly)
Stepping through vTaskDelay at -O0 shows:
-
taskYIELD_WITHIN_API()writesPENDSVSET_BITtoICSR. -
Immediately after that, the function epilogue executes:
assembly
adds r7, #16 mov sp, r7 pop {r7, pc} -
R7 is modified in the epilogue (it is used as a frame pointer at
-O0). -
PendSV does not fire immediately. It fires much later – after
vTaskDelayhas already returned and the task continues running. -
When PendSV finally saves the context, the saved R7 (and possibly SP) is corrupted.
At -Og or higher, the compiler:
-
stops using R7 as a frame pointer (
-fomit-frame-pointer) -
generates a much shorter or no epilogue
-
the race window disappears
Why this is not just a debug–build issue
The race condition is still present in optimized builds. It only becomes easier to reproduce at -O0 because:
-
the epilogue is longer and explicit
-
R7 is used as a frame pointer
-
more instructions exist between PendSV request and exception entry
If PendSV latency is high (due to other interrupts or system load), the same corruption could theoretically occur even at -Os or -O2, though much more rarely.
This is not about priority – even with highest PendSV priority, the exception does not fire immediately after the ICSR write. The CPU continues executing the epilogue before entering PendSV.
Historical context from the FreeRTOS forum
I have reviewed several old threads with similar symptoms (BusFault, 0xa5a5a5a5 stack pattern, return to 0x000000):
-
Thread 4257 (2013) – STM32F4 +
-O0: “Cannot access memory at address 0xa5a5a5a5”. No root cause was identified at the time. -
Thread 4736 (2013) – XMC4500 (Cortex-M4F): BusFault on critical section exit. Root cause was a wrong PendSV vector definition (veneer vs
.long). -
Thread 1485 (2014) – LPC43xx M0 core: missing SysTick + wrong port (M3/4 port used on M0).
In all those cases, the exact mechanism was not fully described. This report provides the missing detail:
R7 frame pointer corruption in the function epilogue, between the PendSV request and the exception entry, visible only at
-O0.
Minimal reproducible example
A complete project with build/flash instructions is available.
Due to forum restrictions I cannot post a direct link, but the repository can be found by searching for nucleo on GitHub under user nikolay-rogovoy.
Steps to reproduce:
bash
git clone <repository_url>
cd nucleo
git submodule update --init --recursive libs/STM32CubeF4
xpm run rebuild # uses -Os by default
xpm run flash
To see the failure: edit Makefile, change -Os to -O0, rebuild, flash, and observe the LED (stays ON).
Hardware: Nucleo F411RE
Debug probe: OpenOCD + GDB (multiarch)
Build system: XPM + GNU Make
Questions for the community
-
Is this a known limitation of using
-O0with FreeRTOS on Cortex-M? If so, should it be mentioned in the documentation? -
Would marking
vTaskDelay(and similar API functions that calltaskYIELD) as__attribute__((naked))be a correct fix? This would prevent the compiler from generating an epilogue after the yield request. -
Would a better fix be to implement
taskYIELDvia SVC instead of directly setting PendSV? That would at least move the problematic epilogue before the exception entry. -
Has anyone else observed PendSV context corruption at
-O0that disappears at-Og/-Os/-O2? I suspect this is a long‑standing but rarely documented issue.
Workaround for now
Use -Og, -Os, -O1, or -O2 in production builds.
Reserve -O0 only for single‑stepping debugging where you never rely on blocking API calls.
Thank you for any insights. I am happy to provide additional disassembly, debug logs, or test on other STM32F4 devices if needed.