Where I got confused was this sentence from Mastering the FreeRTOS™ Real Time Kernel, “The xEventGroupWaitBits() API Function”,
The xClearOnExit parameter is provided to avoid these potential race conditions. If
xClearOnExit is set to pdTRUE, then the testing and clearing of event bits appears to the
calling task to be an atomic operation (uninterruptable by other tasks or interrupts).
The part that I missed is that
xEventGroupSetBits
is a single step, and one that unblocks all waiting tasks in this case. The fact that they all atomically clear the bit if xClearOnExit
is pdTRUE
doesn’t help because they’re all already running. So, a second atomic step, xEventGroupClearBits
, is needed to select a winner and send the others back to xEventGroupWaitBits
.
I agree, but it’s really the old, old method. I don’t know who wrote it in the first place, but they knew what they were doing (although I do wonder about the efficiency of this locking mechanism).
You mean, recreating the problem with the broken code on a single core? It is not easy to come up with good stress tests for this. Hard to get the timing right so that the first task doesn’t always proceed before there are any other waiters. With this particular bug, the problem only showed up when there was contention. Adding a configASSERT to FF_LockFAT
helps with early detection.