Possible race condition in TCP stack

Hi,

I believe I’ve found a very tight race condition within the tcp stack. Within the function uxStreamBufferAdd you update the head and the front pointers. However, the update to the front pointer is not done atomically.

Excerpt from uxStreamBufferAdd:

if( uxOffset == 0U )
        {
            /* ( uxOffset == 0 ) means: write at uxHead position */
            uxNextHead += uxCount;

            if( uxNextHead >= pxBuffer->LENGTH )
            {
                uxNextHead -= pxBuffer->LENGTH;
            }

            pxBuffer->uxHead = uxNextHead;
        }

        /* ----> context switch to a task that reads this socket causes failure <---- */

        if( xStreamBufferLessThenEqual( pxBuffer, pxBuffer->uxFront, uxNextHead ) != pdFALSE )
        {
            /* Advance the front pointer */
            pxBuffer->uxFront = uxNextHead;
        }

If, in the unlikely case, the timer interrupt occurs between updating the head and the front pointer, and a task is switched in which reads from this socket, then I believe the following failure occurs:

  1. The read from the socket works, and the tail pointer is incremented
  2. If in that same task a write is attempted to the socket, in the window update logic, the front pointer is now behind the tail pointer
  3. This causes the stack to errenously state the TCP receive window is 0

I think the updates to the queue pointers should happen within a critical section (or scheduler disabled). Let me know your thoughts please.

Thanks for your report. What is calling uxStreamBufferAdd() in the first step of the described scenario - that is - which task gets preempted between the head and front pointers being updated? I want to check as in your following three steps the same task both reads from and writes to the socket. A socket can only have one reader and one writer (which can be the same task or different tasks). That means, if it were any task other than the task that is running the TCP/IP stack that is calling uxStreamBufferAdd() then there would be a usage error.

@shayan_mukhtar, thank you for reporting this.

First a short explanation: every socket has an RX and a TX stream, i.e. for reception and for the transmission of data. These are independent objects.

These streams are only created when needed. And like @rtel mentioned: “a socket can only have one reader and one writer”, which may be the same task.

An RX stream uses 3 pointers: uxTail - uxHead - uxFront
A user task reads from the RX stream with FreeRTOS_recv()
The IP-task writes to this stream in a “private” function lTCPAddRxdata()

A TX stream also uses 3 pointers: uxTail - uxMid - uxHead
A user task writes to the TX stream with FreeRTOS_send()
The IP-task reads from the TX stream in a private function prvTCPPrepareSend().

The problem that you describe has to do with the RX-stream, right?

Data has been received and is being added to the RX stream when lTCPAddRxdata() calls uxStreamBufferAdd().
Theory: just before uxFront is advanced, there is a context switch and FreeRTOS_recv() compares uxTail with uxFront, and concludes erroneously that the RX socket has no space.

I think this was not reported before because most developers assign a different (usually higher) priority to the IP-task.
In your project the IP-task and the user task seem to have the same priority and the clock tick will lead to a task switch, correct?

You wrote:

  1. The read from the socket works, and the tail pointer is incremented

Right, that happens in FreeRTOS_recv()

  1. If in that same task a write is attempted to the socket, in the window update logic, the front pointer is now behind the tail pointer

I don’t understand this. Writing to a socket with FreeRTOS_send() has nothing to do with reading from a socket. uxFront only plays a role in the RX stream.

The only thing where the user task reads uxFront is here in uxStreamBufferFrontSpace(), which is called here:

    /* We had reached the low-water mark, now see if the flag
     * can be cleared */
    size_t uxFrontSpace = uxStreamBufferFrontSpace( pxSocket->u.xTCP.rxStream );

    if( uxFrontSpace >= pxSocket->u.xTCP.uxEnoughSpace )
    {
        pxSocket->u.xTCP.bits.bLowWater = pdFALSE;
        pxSocket->u.xTCP.bits.bWinChange = pdTRUE;
    }

The “the window update logic” is performed by the IP-task. Seen from the IP-task, the value of uxFront is up-to-date.

  1. This causes the stack to erroneously state the TCP receive window is 0

In case the tasks have equal priority and when preemption / time-slicing is enabled.

How easy is it for you to replicate the problem?

If it occurs easily you could try to suspend the scheduler while updating uxHead and uxFront

+    vTaskSuspendAll();
+    {
         if( uxOffset == 0U )
         {
             /* ( uxOffset == 0 ) means: write at uxHead position */
             uxNextHead += uxCount;
         
             if( uxNextHead >= pxBuffer->LENGTH )
             {
                 uxNextHead -= pxBuffer->LENGTH;
             }
         
             pxBuffer->uxHead = uxNextHead;
         }
         
         if( xStreamBufferLessThenEqual( pxBuffer, pxBuffer->uxFront, uxNextHead ) != pdFALSE )
         {
             /* Advance the front pointer */
             pxBuffer->uxFront = uxNextHead;
         }
+    }
+    xTaskResumeAll();

Thanks

Thanks @rtel and @htibosch for the replies. Yes, placing a critical section around those areas should fix it. The reason I brought this up is because we’ve seen this issue in the field, with a zero-window being errenously reported.

The socket is only read to and written from one place, so I don’t believe there’s any usage error. Furthermore, we only interact with the socket using the send and recv functions.

For several reasons, the IP task has a low priority in our system. I can guarantee recreation of this event by placing a taskDELAY in between the updating of those two pointers. I can provide pcaps of what this looks like if that would be helpful.

Thanks again,
Shayan

Hello Shayan,

You said:

The IP stack is designed from the base up with these priorities in mind:
It is best if these priorities are used:
prio( EMAC task ) > prio( IP task ) > prio( tasks using sockets )
Other tasks not using the IP task can have any priority they like.

Thus, if you set the priority of the IP task lower than priority of the tasks using sockets, then there can be other race conditions as well that I am not aware of. But in all those cases, you’d need to add a critical section.

Thanks for the reply. Is that called out in the documentation anywhere? Seems like a rather important system constraint.

I made an edit to the previous post. I am not sure whether the TCP stack was designed like that or this is just a bug.
I’ll look into it and get back to you.

In either case, if this a bug - we’ll fix it; if it is a design decision, then we’ll update the documentation.

Thanks

Appreciate it! I’ve gone ahead and added the critical sections as mentioned by @htibosch in our source. Just wanted to report this in case anyone else ever faces a similar issue.

1 Like

Hello @shayan_mukhtar,

We figured that this is a bug. The priority order is just a very strong recommendation.
We’ll raise a PR to fix this quickly. Thank you for finding this bug.

Thanks!

1 Like

I created this PR to fix this: https://github.com/FreeRTOS/FreeRTOS-Plus-TCP/pull/517