FreeRTOS connect and accept functions (in FreeRTOS+TCP) do not return even after the TCP connection closes

First, here is the current behavior
When I call FreeRTOS_connect, the connection handshake is triggered, and the current task blocks until a connection is established. But even if the connection fails and times out (state changes from CONNECT_SYN to CLOSE_WAIT), the task remains blocked, and the FreeRTOS_connect does not return.

Here’s the behavior I expect
If the connection times out or is canceled while waiting for the TCP connection to happen (the state changes from CONNECT_SYN to CLOSE_WAIT or CLOSED), the FreeRTOS_connect function should be woken up, and it should return to the user with an error message.

Here’s what I observed
When FreeRTOS_recv is called, the call/task blocks and waits for signals - eSOCKET_RECEIVE, eSOCKET_CLOSED, eSOCKET_INTR. Hence, if the connection closes while the socket is waiting for a read, the task unblocks and the FreeRTOS_recv function returns an error. If FreeRTOS_connect or FreeRTOS_accept is called, the task only waits for the eSOCKET_CONNECT or eSOCKET_ACCEPT signals.

Here’s what I suggest
FreeRTOS_connect and FreeRTOS_accept should also wait for an eSOCKET_CLOSED signal and the task calling these function should unblock if the connection times out or closes.

@AmPaschal Thanks for the report. I’m attempting to reproduce the issue. Is there a particular platform that you have experienced this on?

Apologies for the late response @AmPaschal. I was looking at this issue earlier and got distracted by some other work.

For FreeRTOS_Connect, looking at the code, it seems that you are right. If the timeout occurs and the socket gets put to into CLOSE_WAIT state, the FreeRTOS_Connect call is not notified right away. But, I don’t think that the calling task should be stalled forever since the event group is waited on for a certain time only. The API call would eventually wake up.
But, unless the state of the socket is beyond CLOSE_WAIT (i.e., eCLOSING, eLAST_ACK or eTIME_WAIT) the call will start waiting again → this seems like an inefficient behavior. This can be fixed if we modify the code to wait on eSOCKET_CLOSED signal and also add a check to break out of the loop if the eSOCKET_CLOSED bit was set.

For FreeRTOS_accept API, it is similar to the above issue.

To summarize, after just looking at the code and without reproducing the issue, I agree with your analysis of the issue and also the fix (with a small modification). What do you think @htibosch and @PaulB-AWS?

If all seems well, would you like to create a pull request (PR) to the FreeRTOS+TCP repository @AmPaschal? If you are not sure how to do that, we’ll be more than glad to help you along the way.

Thanks,
Aniruddha

Thank you for your response @kanherea.

I can make the needed modifications and send a pull request (I already modified for FreeRTOS_connect to enable the functionality I was working on, and it worked. I’ll update it according to your suggestion and do a similar modification for FreeRTOS_accept).

On your statement, “I don’t think that the calling task should be stalled forever since the event group is waited on for a certain time only.” While this is true, the event group can also be configured to wait indefinitely. This is because it uses the same wait timeout as the FreeRTOS_recv call, and for some scenarios, the FreeRTOS_recv could be configured to wait indefinitely (not sure if this is a correct configuration, but it is possible).

I run the FreeRTOS Linux port demo application on my Linux system.

What I did is the following: I connected from an embedded device to my laptop. The first time, it works well. The second time, the echo-server is not running, so connect() reaches a timeout.

I attached both a PCAP, the source code, and the logging.

freertos_tcp_connect.zip (2.5 KB)

Maybe you want to have a detailed look and see why this example works well.

Note that connect() gets a timeout of 20 seconds, quite long.

Okay, thank you for your reply.
I noticed you used two different timeout. One, when starting the connection and the second, after the connection has been established. In my own code, I configured only one timeout at the start of the connection (following the convention I saw in the demo application and many sources online).

Can we still make FreeRTOS_connect and FreeRTOS_accept return once the connection or socket closes without having them wait for a timeout (since the socket is already closed and we can know when the socket closes)? That way, even if users configure longer timeouts, those system calls would still return if the socket or connection gets closed.

In my first response, I had read too quickly, and didn’t get the point. I’m sorry about that.

Yes feel free to come up with a proposal. We will review and test it.
Of course, existing applications should not get affected by the change.

I must say that in my personal projects, I rarely call API’s in a blocking way. My favourite is to use ipconfigSOCKET_HAS_USER_SEMAPHORE: it allows you to connect a semaphore to a socket. The semaphore will be given to after any important event. The task using +TCP will block on a call to xSemaphoreTake().

There is one thing to know about the implementation of FreeRTOS_connect() : the process of getting connected doesn’t stop after a user time-out (defined by SO_SNDTIMEO) has been reached. However, you can abort the attempt by calling FreeRTOS_closesocket(), which is synchronised with the IP-task.

I will also do an attempt to change the behaviour and come back to it :slight_smile:

@AmPaschal, if you like, I prepared some changes that affect FreeRTOS_connect() that you find here.

It has 2 changes:

The first one in vTCPStateChange(), it sets the wakeup bit:

if( ( xPreviousState == eCONNECT_SYN ) && ( eTCPState == eCLOSE_WAIT ) )
{
    /* The application is waiting for a connect(), let wake it up. */
    FreeRTOS_printf( ( "vTCPStateChange: Setting the 'eSOCKET_CLOSED' bit. Before/after: %d %d\n",
                       ( int ) bBefore,
                       ( int ) bAfter ) );
    pxSocket->xEventBits |= ( EventBits_t ) eSOCKET_CLOSED;
    #if ( ipconfigSUPPORT_SELECT_FUNCTION == 1 )
    {
        if( ( pxSocket->xSelectBits & ( EventBits_t ) eSELECT_EXCEPT ) != 0U )
        {
            pxSocket->xEventBits |= ( ( EventBits_t ) eSELECT_EXCEPT ) << SOCKET_EVENT_BIT_COUNT;
        }
    }
    #endif
}

The second one in FreeRTOS_connect(), it makes sure the xEventGroupWaitBits() sill also unblock on the event eSOCKET_CLOSED.

And also it introduces a “bug” that show that the changes are working as expected:

if( pxSocket->u.xTCP.eTCPState == eCONNECT_SYN )
{
    ucExpect |= tcpTCP_FLAG_SYN;
    /* Adding a bug here. */
    FreeRTOS_printf( ( "Code for testing only\n" ) );
    ucTCPFlags &= ~( ( uint8_t ) tcpTCP_FLAG_SYN );
}

Do you have an easy setup to this this event?

I am curious to see if you had thought of more changes. Thanks

Hi,

I’ve looked at your changes, and I think they look good. Those were the changes I talked about.

I’ll test your branch, but then, I think even after the eSOCKET_CLOSED bit gets set, the indefinite loop in the FreeRTOS_connect function may still not return. The loop is designed to break only when the socket gets connected or the timeout period expires.

One approach will be to also check if the socket is closed and break from the loop if the socket is closed. That’s what I plan to implement in the PR I’ll send.

Yes, you are right, FreeRTOS_connect() should stop looping when the eSOCKET_CLOSED event is received.

I just pushed the latest changes, please have a look. I introduced a variable:

BaseType_t xTCP_Introduce_bug;

when made nonzero, an error will be simulated in the SYN phase.

As for FreeRTOS_accept(), I wonder why it should return with NULL, after eSOCKET_CLOSED has happened? This means that in the SYN phase there was a fatal error.
I often see code where FreeRTOS_accept() is called with a timeout of portMAX_DELAY, while the return code of FreeRTOS_accept() is not tested. The reasoning is: if it returns, there must be a connection. A timeout on portMAX_DELAY is not allowed.

Why would you like FreeRTOS_accept() to return after a SYN problem and before a timeout?
For FreeRTOS_connect() I understand.

I think my problem was actually with FreeRTOS_connect (I just looped in FreeRTOS_accept). Yeah, FreeRTOS_accept do not need to return until a successful connection or it times out.

I finally found time to solve the above mention problems in PR #559.

It also addresses the issues mentioned in this post.

Thank you for reporting and for helping to find a solution.