Failing configASSERT on FREERTOS_INVALID_SOCKET

Pilot · April 27, 2021, 8:18pm

I have an app that is creating a TCP Server in much the same way as the TCP Echo Server Demo. 99.9% of the time, it’s working perfectly.

At what seems to be complete random, the configASSERT will fail on the FreeRTOS_accept.

/* Wait for a client to connect. */ xConnectedSocket = FreeRTOS_accept(xListeningSocket, &xClient, &xSize); configASSERT(xConnectedSocket != FREERTOS_INVALID_SOCKET);

Is there a better way to handle this situation? Is a simple

if ( xConnectedSocket == FREERTOS_INVALID_SOCKET ) { return to FreeRTOS_accept (pseudo code) }
An acceptable way to handle the invalid socket, or does the failure indicate something more sinister?

Thank You!

hs2 · April 27, 2021, 8:33pm

~~Just a wild guess: you’re running out of heap sporadically. Any chance to increase the heap ? And check the heap usage, too ?~~
Edit:Then it’d return NULL. I was wrong, sorry.

kanherea · April 27, 2021, 8:35pm

Hello @Pilot!

Yes, you can handle the error in that way. It should work fine. You can skip the configASSERT

Although I wonder why is FREERTOS_INVALID_SOCKET being returned. You can set a breakpoint at this statement pxClientSocket = FREERTOS_INVALID_SOCKET; (appears twice) in the FreeRTOS_accept() function to see what exactly is going wrong.

EDIT:
~~Maybe the backlog parameter of FreeRTOS_listen needs to be increased?~~
Backlog won’t return an invalid socket but just send a RST to the client.

RAc · April 27, 2021, 8:53pm

If this is like most TCP implementations, there is a global limit on sockets to draw from. If connections are opened and closed in rapid succession, it may happen that the pool is exhausted, ie all connections are still technically closed, but the underlying sockets linger in one of the wait states before they are being deallocated. In that case, your accept wouldn’t be able to allocate a socket for the connection.

kanherea · April 27, 2021, 9:00pm

It seems to me that @hs2 was right initially.
If the memory (heap is exhausted), the malloc will return a NULL, which will be checked and a FREERTOS_INVALID_SOCKET shall be returned.

See these lines: FreeRTOS-Plus-TCP/FreeRTOS_Sockets.c at FreeRTOS/FreeRTOS-Plus-TCP (github.com)

The FreeRTOS_Socket call is made from prvHandleListen (here) to create a duplicate socket of the listening one.

Pilot · April 27, 2021, 9:07pm

Thanks everyone. I’ll investigate a little further. It doesn’t seem to be a heap problem - I have tons of it available, and no sign of a memory leak. After 16 hours of operating, the heap is showing:

Current free heap 297424 bytes, minimum ever free heap 286744 bytes

I also have a pretty big stack of Network Descriptors (60) available, so it’s unlikely that it’s a lack of available buffers.

For now, I’ll just replace the Assert with an if(), but I will keep investigating.

Pilot · April 27, 2021, 10:42pm

Changed my code to:

for (;;) {
		/* Wait for a client to connect. */
		xConnectedSocket = FreeRTOS_accept(xListeningSocket, &xClient, &xSize);
		if ( (xConnectedSocket != FREERTOS_INVALID_SOCKET) && (xConnectedSocket != NULL) )  {
                xTaskCreate(CommandRecv, "CommandRecv", CONFIG_COMMANDRECV_STACK,
                        (void *) xConnectedSocket,
                        tskIDLE_PRIORITY + CONFIG_COMMANDRECV_PRIORITY, NULL);                
        } else {
            FreeRTOS_printf( ("FREERTOS_INVALID_SOCKET DETECTED.  HEAP AVAILABLE: %d BUFFERS AVAILABLE: %d", xPortGetFreeHeapSize(), uxGetNumberOfFreeNetworkBuffers()) );
        }

	}

Threw the printf in there in case it is a heap/buffer problem - hopefully I can catch it. I’ll be removing the printf once development is done.

This more closely mimics the behavior of the demo TCP Web and FTP Servers, so hopefully it will cure the problem. Still would be interesting to know what is causing an invalid socket to be returned though.

Thanks again.

RAc · April 27, 2021, 11:08pm

what did the code look like before you changed it? What is the change?

Pilot · April 27, 2021, 11:18pm

Code was

for (;;) {
		/* Wait for a client to connect. */
		xConnectedSocket = FreeRTOS_accept(xListeningSocket, &xClient, &xSize);
        configASSERT(xConnectedSocket != FREERTOS_INVALID_SOCKET);

        xTaskCreate(CommandRecv, "CommandRecv", CONFIG_COMMANDRECV_STACK,
                        (void *) xConnectedSocket,
                        tskIDLE_PRIORITY + CONFIG_COMMANDRECV_PRIORITY, NULL);                

	}

Basically, if the socket wasn’t valid, the assert caused a breakpoint to fire within the debug environment. It wasn’t the end of the world, but I like to investigate the causes of all asserts because they shouldn’t be happening.

RAc · April 27, 2021, 11:25pm

My suspicion is that you have a socket leak, meaning not all control paths in your CommandRecv() task function close the socket passed into it.

Pilot · April 27, 2021, 11:27pm

I’ve chased that down pretty hard. Was my first thought before I even posted here.

Nothing seems to get left hanging. I have even run a script that connects, sends some data and disconnects - repeated a few hundred thousand times.

Sometimes the script runs all the way though, other times the ASSERT gets hit.

If a socket is being left hanging, I sure can’t find where.

RAc · April 28, 2021, 12:13am

ok, I checked with my old files, and I believe I have seen a scenario that may explain what you are seeing.

When a client connects to your service and the listen backlog is not exhausted, TCP will perform the three way handshake, even if your code does not have accept()ed the connection yet. The peer won’t know until data has been sent by you, so it (the peer) will start the protocol right away, meaning it will either (very very likely and hopfully with a timeout) wait for the server to send something or send something itself, then (very very likely and hopfully with a timeout) wait for some return data.

If the timeout expires, the peer will likely close the connection, which will make your local TCP invalidate the socket. Thus, if for some reason the time between the behind-the-curtain handshake (which will simply put the socket on a queue for your nect accept() to honor) and the accept() exceeds the peer’s timeout period, your accept() will fail because the connection in the queue has gone in the meantime.

These kinds of things can be analyzed in a wireshark trace. If it’s that kind of thing, it’s benevolent and can safely be ignored.