Race condition when using -O0 with GCC on Cortex-M4 (PendSV context corruption)

Target: Nucleo F411RE (Cortex-M4)
Toolchain: arm-none-eabi-gcc (multiple versions)
FreeRTOS: v26.4 (also reproduces on v24.4)
Optimization flags: works with -Og / -Os / -O1 / -O2, fails with -O0


Problem summary

I have a minimal FreeRTOS project with two tasks. One task toggles an LED every 500 ms using vTaskDelay(pdMS_TO_TICKS(500)). The other does almost nothing.

When compiled with -O0:

  • LED turns ON once

  • LED never turns OFF (the task blocks forever after the first vTaskDelay)

When compiled with any higher optimization (-Og, -Os, -O1, -O2), everything works correctly.

The same behavior occurs with FreeRTOS v24.4 and v26.4.


Root cause analysis (debugger + disassembly)

Stepping through vTaskDelay at -O0 shows:

  1. taskYIELD_WITHIN_API() writes PENDSVSET_BIT to ICSR.

  2. Immediately after that, the function epilogue executes:

    assembly

    adds r7, #16
    mov sp, r7
    pop {r7, pc}
    
  3. R7 is modified in the epilogue (it is used as a frame pointer at -O0).

  4. PendSV does not fire immediately. It fires much later – after vTaskDelay has already returned and the task continues running.

  5. When PendSV finally saves the context, the saved R7 (and possibly SP) is corrupted.

At -Og or higher, the compiler:

  • stops using R7 as a frame pointer (-fomit-frame-pointer)

  • generates a much shorter or no epilogue

  • the race window disappears


Why this is not just a debug–build issue

The race condition is still present in optimized builds. It only becomes easier to reproduce at -O0 because:

  • the epilogue is longer and explicit

  • R7 is used as a frame pointer

  • more instructions exist between PendSV request and exception entry

If PendSV latency is high (due to other interrupts or system load), the same corruption could theoretically occur even at -Os or -O2, though much more rarely.

This is not about priority – even with highest PendSV priority, the exception does not fire immediately after the ICSR write. The CPU continues executing the epilogue before entering PendSV.


Historical context from the FreeRTOS forum

I have reviewed several old threads with similar symptoms (BusFault, 0xa5a5a5a5 stack pattern, return to 0x000000):

  • Thread 4257 (2013) – STM32F4 + -O0: “Cannot access memory at address 0xa5a5a5a5”. No root cause was identified at the time.

  • Thread 4736 (2013) – XMC4500 (Cortex-M4F): BusFault on critical section exit. Root cause was a wrong PendSV vector definition (veneer vs .long).

  • Thread 1485 (2014) – LPC43xx M0 core: missing SysTick + wrong port (M3/4 port used on M0).

In all those cases, the exact mechanism was not fully described. This report provides the missing detail:

R7 frame pointer corruption in the function epilogue, between the PendSV request and the exception entry, visible only at -O0.


Minimal reproducible example

A complete project with build/flash instructions is available.
Due to forum restrictions I cannot post a direct link, but the repository can be found by searching for nucleo on GitHub under user nikolay-rogovoy.

Steps to reproduce:

bash

git clone <repository_url>
cd nucleo
git submodule update --init --recursive libs/STM32CubeF4
xpm run rebuild   # uses -Os by default
xpm run flash

To see the failure: edit Makefile, change -Os to -O0, rebuild, flash, and observe the LED (stays ON).

Hardware: Nucleo F411RE
Debug probe: OpenOCD + GDB (multiarch)
Build system: XPM + GNU Make


Questions for the community

  1. Is this a known limitation of using -O0 with FreeRTOS on Cortex-M? If so, should it be mentioned in the documentation?

  2. Would marking vTaskDelay (and similar API functions that call taskYIELD) as __attribute__((naked)) be a correct fix? This would prevent the compiler from generating an epilogue after the yield request.

  3. Would a better fix be to implement taskYIELD via SVC instead of directly setting PendSV? That would at least move the problematic epilogue before the exception entry.

  4. Has anyone else observed PendSV context corruption at -O0 that disappears at -Og/-Os/-O2? I suspect this is a long‑standing but rarely documented issue.


Workaround for now

Use -Og, -Os, -O1, or -O2 in production builds.
Reserve -O0 only for single‑stepping debugging where you never rely on blocking API calls.


Thank you for any insights. I am happy to provide additional disassembly, debug logs, or test on other STM32F4 devices if needed.

vPortSVCHandler and xPortPendSVHandler are naked functions written in assembly and must not be wrapped in a C function but must be installed directly as SVC and PendSV handlers. Please make the following changes:

  1. Delete this definition of SVC_Handler.
  2. Delete this definition of PendSV_Handler.
  3. Add the following lines to FreeRTOSConfig.h:
    #define vPortSVCHandler								SVC_Handler
    #define xPortPendSVHandler							PendSV_Handler
    
  4. Use a different timer other than systick to drive HAL tick.