JacobApium
(Jacob Christ)
January 8, 2022, 7:29am
1
I’ve have a single task that is blocking on xStreamBufferReceive for 10ms (xBlockTime = pdMS_TO_TICKS( 10 )).
receivedBytes = xStreamBufferReceive( _streamBufferHandle, ( void * ) ucRxData, ucRxDataSize, xBlockTime );
The task is being fed by an serial port interrupt that is sending UART data ins a stream via xStreamBufferSendFromISR.
xBytesSent = xStreamBufferSendFromISR( xStreamBufferUsart2, data, len, &xHigherPriorityTaskWoken);
This has been working for years, but recently we have been trying to push as much data as possible through this port. When we started doing this the task that has the xStreamBufferReceive works for a few seconds then becomes permanently blocked. I’ve narrowed it down to a call to xStreamBufferReceive that never returns from the 10ms block. The stream buffer is 8K and once the task blocks I can see the 8K buffer fill up over about a 4 second period before calls to xStreamBufferSendFromISR start failing due to the buffer being full. This seem like it is related to this issue (although the author claims that it should not happen from xStreamBufferSendFromISR):
opened 10:35PM - 16 Mar 20 UTC
The FreeRTOS StreamBuffer (and thus, also MessageBuffer) has a race condition in… versions V10.2.1 and below.
**Symptom:**
When calling
```cpp
xStreamBufferReceive(xStreamBuffer, pvRxData, xBufferLengthBytes, xTicksToWait);
```
, the function will sometimes immediately return a value of zero to indicate zero bytes read. It is similar as if the `xTicksToWait` timeout has expired with no data, however, the timeout value is not respected (the calling Task does not block for "xTicksToWait" ticks waiting for data). Depending on the application code, this could look like false hardware timeouts, or in time-sensitive synchronous data streams, like a broken data stream.
Here is an example result we produced on an ARM system. It was with two tight looping Tasks and a 100KB StreamBuffer, reading data samples at approximately 1 KHz. We could reproduce this intermittently, roughly one out every ten test runs.
In the example timeline below, "block to wait for a sample" means calling `xStreamBufferReceive()` to wait for data. That eventually results in a call to `portYIELD_WITHIN_API()`, which is shown as a "CONTEXT SWITCH" below.
"Write sample" and "notify the blocked Reader task" means calling `xStreamBufferSend()`, the function where both of those things occur. The race condition is within that function (`xStreamBufferSend()`), between lines 562 and 571.
```
Reader Task: Writer Task:
------------- -------------
(Kickoff writing loop)
(Kickoff reading loop)
block to wait for a sample
[CONTEXT SWITCH via portYIELD_WITHIN_API()]
write sample 1 into the StreamBuffer (stream_buffer.c:562)
notify the blocked Reader task of sample 1 (stream_buffer.c:571)
(loop / repeat for another sample)
write sample 2 into the StreamBuffer (stream_buffer.c:562)
[CONTEXT SWITCH via SysTick_Handler()]
read sample 1
(loop / repeat for another sample)
read sample 2 -- StreamBuffer is now empty
(loop / repeat for another sample)
block to wait for a sample
[CONTEXT SWITCH via portYIELD_WITHIN_API()]
notify the blocked Reader task of sample 2 (stream_buffer.c:571)
(loop / repeat for another sample)
[CONTEXT SWITCH via SysTick_Handler()]
error: got a TaskNotify, but the StreamBuffer is empty!
```
`xStreamBufferSendFromISR()` doesn't have this issue because the SysTick interrupt has the lowest priority (it's 15 on ARM) and so won't preempt your application's ISR.
**Root Cause:**
The error case mentioned above was attempted to be guarded against in `xStreamBufferSend()` with this conditional:
```cpp
if( prvBytesInBuffer( pxStreamBuffer ) >= pxStreamBuffer->xTriggerLevelBytes )
{
sbSEND_COMPLETED( pxStreamBuffer );
}
```
This protects against notifying if the data has already been consumed by the reading thread. Unfortuntely, after the sending thread calls `prvBytesInBuffer()`, a SysTick interrupt can occur which wakes up the other reader thread to consume the buffer before getting sent its notification with `sbSEND_COMPLETED` (which calls `xTaskNotify()`).
**Workaround:**
Here is the default (and most commonly used) definition of `sbSEND_COMPLETED`, which is the user-customizable macro that sends the `xTaskNotify()` to the receiving thread:
```cpp
#ifndef sbSEND_COMPLETED
#define sbSEND_COMPLETED( pxStreamBuffer ) \
vTaskSuspendAll(); \
{ \
if( ( pxStreamBuffer )->xTaskWaitingToReceive != NULL ) \
{ \
( void ) xTaskNotify( ( pxStreamBuffer )->xTaskWaitingToReceive, \
( uint32_t ) 0, \
eNoAction ); \
( pxStreamBuffer )->xTaskWaitingToReceive = NULL; \
} \
} \
( void ) xTaskResumeAll();
#endif /* sbSEND_COMPLETED */
```
Note that this macro protects itself against a SysTick race by using `vTaskSuspendAll()`/`xTaskResumeAll()`, which prevents any Task switching in this block of code. The race condition is fixed by moving the check for `prvBytesInBuffer()` into this protected section, so that the SysTick can't invoke the race condition:
```cpp
#ifndef sbSEND_COMPLETED
#define sbSEND_COMPLETED( pxStreamBuffer ) \
vTaskSuspendAll(); \
{ \
if( ( pxStreamBuffer )->xTaskWaitingToReceive != NULL && \
prvBytesInBuffer( pxStreamBuffer ) >= pxStreamBuffer->xTriggerLevelBytes) \
{ \
( void ) xTaskNotify( ( pxStreamBuffer )->xTaskWaitingToReceive, \
( uint32_t ) 0, \
eNoAction ); \
( pxStreamBuffer )->xTaskWaitingToReceive = NULL; \
} \
} \
( void ) xTaskResumeAll();
#endif /* sbSEND_COMPLETED */
```
Existing users can fix their deployment of FreeRTOS by customizing the macro above within their build system, using either the corrected default version as shown above, or (if it's already been customized) by adding the test for `prvBytesInBuffer()` within their pre-existing customized version.
For a permanent fix, the upstream FreeRTOS source code should update the default `sbSEND_COMPLETED` to the above example, and also remove the unprotected and extraneous call to `prvBytesInBuffer()` (and also it's conditional if statement) from the function `xStreamBufferSend()`.
At this point I kind feel like I’m on my own here and will need to run down the issue in FreeRTOS but I at least wanted to post my findings in case anyone has suggestions as to where to look or as a warning to others that may be seeing similar issues.
As in the GitHub issue we are using V10.2.1 as packaged with STM32 Cube MX.
Jacob
rtel
(Richard Barry)
January 8, 2022, 4:43pm
2
It looks like there was a long discussion on this before. I’m not in a position to digest it right now (on my cell phone), but have you confirmed you get the same behavior with the latest kernel version?
JacobApium
(Jacob Christ)
January 8, 2022, 6:52pm
3
I think I have been confusing myself, my bit I was looking at to determine where the task last hit was inverted in my head and I am pretty sure that my task is still running but blocking in a different place for a different reason not related to FreeRTOS. If this turns out to be a non-issue I will update this thread.
JacobApium
(Jacob Christ)
January 8, 2022, 7:07pm
4
Yeah, I was fooling myself, nothing to see here. Move along.
Jacob
rtel
(Richard Barry)
January 8, 2022, 9:53pm
5
Thanks for taking the time to report back.