TCP: timing may cause socket to enter invalid state which cannot reconnect

I’m running FreeRTOS in virtual environment that creates unusual timing issues which is probably why I ran into this but I believe this could also happen on real hardware.

The scenario is that I lose a connection and I try to shutdown the socket and reconnect. This fails because the socket is already in the wrong state and FreeRTOS_shutdown returns pdFREERTOS_ERRNO_EOPNOTSUPP causing the AWS stack to abort execution and failing to reconnect later as the the socket was not fully shutdown.

The reason pdFREERTOS_ERRNO_EOPNOTSUPP is returned is because the socket that loses connection can have its state change away from established before FreeRTOS_shutdown is called.

    BaseType_t FreeRTOS_shutdown( Socket_t xSocket,
                                  BaseType_t xHow )
    {
        FreeRTOS_Socket_t * pxSocket = ( FreeRTOS_Socket_t * ) xSocket;
        BaseType_t xResult;

        if( prvValidSocket( pxSocket, FREERTOS_IPPROTO_TCP, pdTRUE ) == pdFALSE )
        {
            /*_RB_ Is this comment correct?  The socket is not of a type that
             * supports the listen() operation. */
            xResult = -pdFREERTOS_ERRNO_EOPNOTSUPP;
        }
        else if( pxSocket->u.xTCP.ucTCPState != ( uint8_t ) eESTABLISHED )
        {
            /*_RB_ Is this comment correct?  The socket is not of a type that
             * supports the listen() operation. */
            xResult = -pdFREERTOS_ERRNO_EOPNOTSUPP;
        }

The condition else if( pxSocket->u.xTCP.ucTCPState != ( uint8_t ) eESTABLISHED ) is entered.

The comment inside the condition makes no sense to me. It is fine for the FreeRTOS_shutdown to not do anything in this case but it should return a more useful error, something like maybe pdFREERTOS_ERRNO_ENOTCONN. This way the stack can disregard the error and interpret it as shutdown not needed so I can continue safely instead of aborting. Right now SOCKETS_Shutdown in lib/amazon_freertos/libraries/abstractions/secure_sockets/freertos_plus_tcp/iot_secure_sockets.c has no way to distinguish between the two and passes an error further up the stack causing execution to be aborted leaving the socket in unusable state.

Apologies for the delay in your post showing up - it went into moderation. It will take a short while to investigate what you are reporting - thanks for the detail.

Hello, I just replied to your post reporting an issue on github.

Does that answer your question?

To add to what @htibosch has said, if SOCKETS_shutdown returns a zero value then it means that the shutdown is in progress. It is a stream of events that need to happen before the connection actually closes. Thus, to actively shutdown a connection, one must use code similar to what Hein added to the issue. I am adding his code here (with a minor change) for simplicity:

TickType_t xTicksToWait = MAX_TIME_TO_WAIT;
TimeOut_t xTimeOut;

FreeRTOS_shutdown( xSocket, FREERTOS_SHUT_RDWR );
vTaskSetTimeOutState( &xTimeOut );

while( xTaskCheckForTimeOut( &xTimeOut, &xTicksToWait ) == pdFALSE )
{
    int32_t rc = FreeRTOS_recv( xSocket )
    if( ( rc < 0 ) && ( rc != -pdFREERTOS_ERRNO_EAGAIN ) )
    {
        /* The TCP connection is broken. */
        break;
    }
}

In the above example, you can replace FreeRTOS_xxx() calls with the corresponding SOCKETS_xxx() calls.

To actually close a socket, one then needs to call SOCKETS_close()/FreeRTOS_closesocket().

Let me know if that helps. :slight_smile:

Thank you @kanherea for making the example complete. It must be said that the code assumes that the call to FreeRTOS_recv() is blocking, i.e. FREERTOS_SO_RCVTIMEO has been set to a non-zero value.

I just found out that @paul-szczepanek-arm is using older code. Today’s version is like this:

if( prvValidSocket( pxSocket, FREERTOS_IPPROTO_TCP, pdTRUE ) == pdFALSE )
{
    /*_RB_ Is this comment correct?  The socket is not of a type that
     * supports the listen() operation. */
    xResult = -pdFREERTOS_ERRNO_EOPNOTSUPP;
}
else if( pxSocket->u.xTCP.ucTCPState != ( uint8_t ) eESTABLISHED )
{
    /* The socket is not connected. */
    xResult = -pdFREERTOS_ERRNO_ENOTCONN;
}
else

The comment in the first if statement can be extended to :

/* Either "pxSocket" is not a valid pointer, or the
* socket is not bound, or it is not a TCP socket. */

But once again: FreeRTOS_shutdown()should only be called on a connected socket. The shutting down is not immediate, it takes time to exchange a FIN packets.
During the shutting down, FreeRTOS_recv() might still return data received from the peer.

Ah, didn’t realise this change has already been made, I guess we came to the same conclusion independently. Apologies this escaped my attention.

It’s not my code that was failing, it’s the AWS ota agent code. Thank you.