I thought I would share this bug I reported on gcc that may affect FreeRTOS users and see if there were any thoughts or ideas on mitigating it.
I was able to reproduce unexpected behavior on stock FreeRTOS V11.2.0 on with the GCC/ARM_CM4F port and the following program snippet when compiling with -O3 -flto, where the waiter function is compiled into an infinite loop instead of re-reading flag each iteration. All versions of gcc I tried on godbolt showed similar behavior when using -O1 -fwhole-program on a simple example with an explicit “memory” clobber. In FreeRTOS, I believe the "memory" clobber within portYIELD should be sufficient to let the compiler know it needs to reload from any memory it may have accessed before, rather than assuming it is unchanged, but that’s not what occurs.
int flag;
SemaphoreHandle_t sem;
static void setter(void *)
{
xSemaphoreTake(mutex, portMAX_DELAY);
flag = 1;
xSemaphoreGive(mutex);
}
static void waiter(void *)
{
for (;;)
{
xSemaphoreTake(mutex, portMAX_DELAY);
int done = flag;
xSemaphoreGive(mutex);
if (done)
break;
}
}
If I wanted that code to work then I would declare flag with type volatile int, or if compiling in C++ with type std::atomic<int>. I guess the issue is that the interprocedural analysis that the linker does doesn’t understand the context switch code invoked by xSemaphoreTake. Even if it did, other examples could be constructed whereby the context switch is triggered by an interrupt, making it invisible to the linker when it analyses these functions.
Nevertheless, I see your point that portYIELD uses a gcc memory clobber, and one would hope that this information is preserved in the files read by the linker, and taken account of.
volatile or _Atomic will prevent the flag from being optimized out, but the point is that synchronization like mutexes should also suffice to ensure that multiple tasks can manipulate data that is not volatile nor _Atomic, provided they do so while holding the mutex. The expectation is that these accesses will not be moved outside the section where the mutex is held, but that’s exactly what happens.
See the linked bug thread for more details that explain why this should be the case, but the tl;dr is that portYIELD has a "memory" clobber on its inline assembly memory barrier instructions, at least for the Arm ports in common use, but gcc is ignoring these when doing inter-procedural analysis. gcc’s documentation is clear that values need to be re-loaded across this boundary, but that’s not happening.
I think the problem is that GCC’s “full program” analysis doesn’t really handle well inline-assembly, there have been several problems around that.
Basically, the “clobber” in the in line assembly doesn’t get to the full-program optimizer, so it sees no possibility for the variable to change, because it can’t see that the flow gets interrupted here.