Hi, I get semi-random crashes on an ESP32S3 dual-core platform (ESP-IDF v4.4.2) with the following message:
assert failed: xQueueSemaphoreTake queue.c:1624 (xInheritanceOccurred == ( ( BaseType_t ) 0 ))
This happens rather irregularly, but always on the semaphore protecting the logging function (see log_freertos.c in ESP-IDF). This semaphore is taken very frequently from various threads. In most cases, it is returned quickly as the logs are disabled, but the semaphore is needed to query enabled/disabled state. The tasks are running on 2 cores. I’m having difficulty reproducing this in a minimal example, it only happens in a complex code base that interacts via USB with a PC software; I suspect the particular timing induced by the USB communication triggers this.
The code in xQueueSemaphoreTake
looks like this:
/* For inheritance to have occurred there must have been an
* initial timeout, and an adjusted timeout cannot become 0, as
* if it were 0 the function would have exited. */
#if ( configUSE_MUTEXES == 1 )
{
configASSERT( xInheritanceOccurred == pdFALSE );
}
#endif /* configUSE_MUTEXES */
However, I think that comment is wrong: If the thread currently holding the semaphore has lower priority than the current one, priority inheritance is performed and xInheritanceOccurred
is set to pdTRUE
. If the other thread does not release the semaphore yet, we run into the timeout (queue.c:1719). The queue is then unlocked (on ESP: spinlock is released). In this precise moment, the other thread might release the mutex. Therefore, the return (queue.c:1758) does not happen. Instead, the outermost loop is restarted, but xTicksToWait
is now 0
but xInheritanceOccurred
is still pdTRUE
.
I think this assertion should simply be removed, as the reasoning behind it is wrong (“For inheritance to have occurred there must have been an initial timeout, and an adjusted timeout cannot become 0, as if it were 0 the function would have exited” - not true, the loop can also restart if the semaphore was just returned). Instead, we should just return errQUEUE_EMPTY
(queue.c:1632).
What do you think about this reasoning? Is this indeed a bug or is something else wrong?
The line numbers refer to ESP-IDF v4.4.2 components/freertos/queue.c
, sorry I can’t post links or I’d have linked them…