FreeRTOS+TCP socket: Long delay before successful reconnect

Dear forum users,

I am writing application to connect my device to MQTT broker using FreeRTOS+TCP and coreMQTT libs. Connection to MQTT broker over TCP and exchanging messages over MQTT works very fine except for one issue.

Below is simplified code of my task which handle TCP connection.

static void Ethernet_Task_MQTT()
{
	int result = Ethernet_TCPConnect(address, port);
	if ((result == -pdFREERTOS_ERRNO_NONE) || (result == -pdFREERTOS_ERRNO_EISCONN))
	{
		if (Ethernet_MQTTConnect() == 0)
		{
			if (Ethernet_MQTTSubscribe() == 0)
			{
				while(Ethernet_MQTTProcess() == 0)
				{
					/* Send reading frames from queues over MQTT */
					/* Break loop when socket is disconnected (Ethernet_MQTTProcess() != 0) */
				
				}
				Ethernet_MQTTUnsubscribe();
			}
			Ethernet_MQTTDisconnect();
		}
	}

	// If connection not in progress, call disconnect to make sure TCP socket is closed before new connection
	if (result != -pdFREERTOS_ERRNO_EINPROGRESS)
	{
		Ethernet_TCPDisconnect();
	}
}

int Ethernet_TCPConnect(const char* address, const uint16_t port)
{
	assert(address != NULL);

	// Recreate socket for each new connection, socket must be closed before
	networkContext.socket = FreeRTOS_socket(FREERTOS_AF_INET, FREERTOS_SOCK_STREAM, FREERTOS_IPPROTO_TCP);
	assert(networkContext.socket != FREERTOS_INVALID_SOCKET);

	// Set server address
	struct freertos_sockaddr serverAddress;
	serverAddress.sin_port = FreeRTOS_htons(port);
	serverAddress.sin_addr = FreeRTOS_gethostbyname(address);

	assert(networkContext.socket != NULL);
	return FreeRTOS_connect(networkContext.socket, &serverAddress, sizeof(serverAddress));
}

int Ethernet_TCPDisconnect()
{
	if (networkContext.socket != NULL)
	{
		uint8_t data = 0;
		uint8_t timeout_count = 3;

		// Disable read/write opetarations on socket
		FreeRTOS_shutdown(networkContext.socket, FREERTOS_SHUT_RDWR);

		// Wait for the socket to disconnect (indicated by FreeRTOS_recv())
		while((FreeRTOS_recv(networkContext.socket, &data, 1, 0) != -pdFREERTOS_ERRNO_EINVAL) & timeout_count--);

		// Close the socket
		return FreeRTOS_closesocket(networkContext.socket);
	}

	return 0;
}

At the first try after start-up application connect immediately if server is running but when server is stopped (app exits “while” loop and tries reconnect) the Ethernet_TCPConnect() returns -pdFREERTOS_ERRNO_TIMEOUTD multiple times. Connection is successful after ~2 minutes every time. I am closing the socket every time before reconnect in Ethernet_TCPDisconnect(). I tried with Mosquitto broker and Python server socket and result is the same. Maybe someone know why socket can not reconnect instantly and it lasts so long? Maybe there is any related setting in FreeRTOSIPConfig.h?

Thank you for any help in advance,
Adam

Hello @a-luczak, welcome to FreeRTOS forums.

I will take a look at this issue. While I do that, I have a few questions for you - which platform are you using? And is there a GitHub repository which I can clone to test out the problem?

Also, do you keep calling Ethernet_TCPConnect as it returns failure? I am asking because it seems to allocate a new socket every time and then attempts to connect. That I think would be undesirable as it would lead to creation of a lot of sockets.

Let me know if I misunderstood something.

Thanks,
Aniruddha

Hello @kanherea, thank you for response.

Unfortunately we are using platform designed and developed by our own. Its based on STM32F4 connected with ENC28J60 ethernet module.

Yes I keep calling Ethernet_TCPConnect but each failure is followed by EthernetTCP_Disconnect where socket is closed (and then destroyed by IP-Task). I am constantly monitoring my heap usage and its not raising on connect retries, so I believe its not the problem. After commenting out for test Ethernet_TCPDisconnect sockets are being allocated over and over again and we have memory leak. Of course you are right - I am aware the code on this stage lacks of guard clauses to prevent memory leak on socket allocation which I will add.

Adam

Hello Adam,

Yes, I missed that detail. Thanks for the clarification.
I will try to replicate the issue. At this stage, I shall try it without an MQTT server at this point since this seems to be a specific TCP issue not connected with mqtt library.
I shall get back to you soon :slightly_smiling_face:.

Aniruddha

Yes, it is only related with TCP connection. Communication using coreMQTT after successful TCP connection works without any problems.

There are some details I noticed during testing:

  1. Device is able to connect again after ~120s and this time seems to be constant.
  2. Long reconnection issue occurs only when client connection is dropped by server (e.g. I stop TCP server application on my PC and run it again). When my device disconnects from server by calling Ethernet_TCPDisconnect, it is able to connect again immidiately.

Thanks for your help,
Adam

What do you do to cleanup a broken connection on the client/device side ?
I guessed you also properly close the socket on a failed (ECONNRESET) recv.

Hello again Adam,

I tried to replicate your problem - without any success I must say - that is, the connection was established immediately.
I used a Zynq board to act as a server and my windows PC using FreeRTOS+TCP to act as a client. Note that both sides were using FreeRTOS+TCP. I realize that the situation is exactly opposite of what you mentioned - maybe that is why my connection succeeded (I find that unlikely but not impossible).

Would you mind running Wireshark on your PC (which is running the server) and see what data is being sent to the server after disconnect from the TCP device, if any. That would help us figure out the issue. You can attach the wireshark file to your reply.

Aniruddha

Hello @hs2,

Every failed/dropped connection is followed by closing and destroying the socket in function
Ethernet_TCPDisconnect. After that new socket is created at the same pointer for new connection.

Is there a wireshark log you can share?

I also think the next step should be to wireshark log the disconnect / re-connect sequence since everything you do looks fine so far.

Hello, here is Wireshark log:
https://www.dropbox.com/s/8n5iiwlmnf6qvu6/wireshark_reconnect_log.pcap?dl=0

192.168.0.101 - client/device
192.168.0.103 - server/PC

Timeline:
11.46.42 - device started and connected successfully
11:47:42 - published message over MQTT with command to reconnect, device breaks while loop and reconnects
11:48:49 - server is stopped and started again immidiately
11:50:49 - after 2 minutes device connected succesfully

There is a lot of [TCP port numbers reused] errors when device disconnects itself and when server is stopped and drops connection. What is cause of this, should device change local port on every reconnection?

Adam

Seems like the TCP server / MQTT broker is just not listening for a new connection for 2 minutes after closing the connection to the client/device. Bug or feature ?
It’s not a client / FreeRTOS+TCP issue since the device tries hard to establish a new connection but doesn’t get a SYN-ACK from the server.

Edit: I wonder why the client local port doesn’t change for the new connection.
Silly question: Your xApplicationGetRandomNumber implementation is working, right ?

TCP server is listening for new connection but the reason is that device is trying to connect with the same port (so TCP errors inform).

@hs2 “silly” question guided me to my mistake. I used simple rand() pseudo-random number generation in xApplicationGetRandomNumer and seed was done in the beginning of FreeRTOS task handling ethernet connection. Looks like seed can’t be called inside task because every call of xApplicationGetRandomNumer gave me the same number… After placing seed in main() function, rand works properly and client is connecting with different ports. Now my client is able connect instantly after stopping and running the server again.

Thank you for pointing this out to me.
Adam

Since you’re using STM32F4 as I do this is my rand() using the RNG peripheral:
(omitting the includes of the required CMSIS headers)

//! get 32 bit random number - overriding libc::rand() by early linkage
int     rand( )
{
    assert(RCC->AHB2ENR & RCC_AHB2ENR_RNGEN);   // RNG component is (clock) enabled during early init

    decltype(RNG->SR) SR = 0;
    do
    {
        SR = RNG->SR & (RNG_SR_SEIS | RNG_SR_CEIS | RNG_SR_DRDY);
        if ( SR & (RNG_SR_SEIS | RNG_SR_CEIS) ) { RNG->CR = 0; RNG->SR = 0; RNG->CR = RNG_CR_RNGEN; }
    } while ( SR != RNG_SR_DRDY );

    return RNG->DR;
}

Just in case … :slight_smile:

2 Likes

It will definitely be useful, thanks!