Keep alive understanding

Hello,

Sometimes I have keep alives in the middle of a running tcp transfers, altough I have disabled them (testwise). What does those keep alives trigger and why in the middle of an ongoing tcp transfer?

Those keep alives during tcp transfers are causing an issue on my http server. As soon such a keep alive is being acknowledged by the client, my server want’s to shutdown the connection, doesn’t matter if the client has received requested data or not.

You can see in the picture, that the fin is out of order, because the fin should occure after receiving the acknowledge for the frame with 152 bytes.

As far as I know the FreeRTOS_send(…) function is blocked until the acknowledge(?) Here in my case it seems, that any acknowlegde unblocks the FreeRTOS_send(…) function, causing a handshake too early.

@niconrok , thanks a lot for your report about this.

I just also disabled the keep-alive messages:

#define ipconfigTCP_KEEP_ALIVE      0

and made a telnet connection to the device. But I don’t see the same behaviour.

If your web page doesn’t contain sensitive information, would you mind to attach a zipped PCAP file that shows the complete conversation(s)?

I think that you have seen the possibility in WireShark to filter packets, e.g. tcp.port==50718 or tcp.stream eq 1?

Q : does it happen all the time? Or does it only happen right after connecting?

Neither, it happens from time to time doing http get requests on json objects. I was able to avoid this issue by block the task for 1ms before doing the shutdown, but that’s rather a bad solution I would say.

I have zipped2 PCAP files. 1 shows client side and 1 shows server side (not directly server, but I have a sniffer there). Filter for 192.168.1.112.
I send you both sides because wireshark shows that the syn packet are not in order on server side. May that trigger a keep alive?

Interesting is, that problem only happens on certain lan/wifi topology. It occures when client and server are connected as followed:
client <-wireless-> router <-cable-> mesh node <-wireless-> mesh node <-cable-> server

It never occures on this topology, there also aren’t any keep alives occuring:
client <-wireless-> mesh node <-cable-> server

So may a syn packet order misalignment lead to this behaviour?

I’m ready if you need more information :slight_smile: Thanks a lot in advance htibosch!

TCP_IP.7z (118.1 KB)

@niconrok wrote:

Neither, it happens from time to time doing http get requests on json objects.

Good choice, working with JSON objects!

Off topic: I used JSON objects very often to create a remote GUI for a device. I used this kind of expressions:

/audio_set?room=3&chlist=
[
{"name":"Aux-1L","channel":1,"freq":0},
{"name":"Aux-2L","channel":3,"freq":0},
{"name":"Aux-3L","channel":5,"freq":0},
{"name":"Aux-4L","channel":7,"freq":0},
{"name":"MP3-Stream","channel":17,"freq":0},
{"name":"92.00","channel":9,"freq":9200},
{"name":"107.10","channel":9,"freq":10710}
]

I was able to avoid this issue by block the task for 1ms before doing the shutdown, but that’s rather a bad solution I would say.

Here I miss the logic: right after the SYN you see a message that looks like a keep-alive packet. Why would it be solved when inserting a delay at FIN before the shutdown?

I send you both sides because wireshark shows that the syn packet are not in order on server side. May that trigger a keep alive?

Very good to send a PCAP from both sides! Two packets indeed arrive in a different order!!

server

Interesting is, that problem only happens on certain lan/wifi topology. It occures when client and server are connected as followed:
client <-wireless-> router <-cable-> mesh node <-wireless-> mesh node <-cable-> server

You probably understand this better than I do :slight_smile:

So may a syn packet order misalignment lead to this behaviour?

I think so, one packet arrived too early.

I’m ready if you need more information.

The keep-alive packet seems misplaced, but it is a logical response from the stack.

But is it a problem? Does it disturb the communication? The effective conversation looks OK :

GET /get_pcu_rev_sw HTTP/1.1
Host: 192.168.1.112
Connection: keep-alive
<snip>
Accept-Language: de-DE,de;q=0.9,en-US;q=0.8,en;q=0.7

HTTP/1.1 200 OK
Connection: close
Content-Type: application/json

{
	"pcu_rev_sw":"01000000"
}

Or does it disturb the communication?

Normally, TCP is insensitive for order problems. When packet-B arrives earlier that packet-A, the server will send a Selective ACK (SACK).

OT: Indeed json is great! In this case the objects are used for production purposes, such as giving the device serial number, mac number, revision numbers ect. But also on customers side jsons are used in the mqtt communication to publish device information, receiving commands and even for mass firmware updates.

To come back to the issue, that’s completely right it looks all ok until the shutdown/fin handshake:


Both sides are trying to retransmit their same message up to 1000 times. The client wants to retransmit an acknowledge but the server wants to shutdown and retransmits a fin request. It repeats until the server resets the connection.

In the view of the server it is like that. The client requests the json object and the server responds with FreeRTOS_send(…) in my http_computeProdRequest(…) function:

After returning from the function the server starts the fin handshake:

It seems, that the server starts too early with the fin handshake and always the keep alive is involved if that happens. If I now statically block the task for 1 ms before the shutdown function, everything works fine.

The server doesn’t have to do a shutdown at all. The client has a request, and when it is ready, it will start a closure of the connection.
All the server has to do is keep on reading ( FreeRTOS_recv() ) from the socket until it returns an error.
Can you try that?

I’m not using the content-length or chunked encoding in the http header, that’s why I’m doing the shutdown to show the client, that it has received all the data I have. According to the rfc standard that’s a valid apporach:
https://greenbytes.de/tech/webdav/rfc2616.html#rfc.section.4.4

I’m doing this apporach because I don’t have a lot of ram on my device, I have to send bigger files in fragments and the complete size is not known at the beginning.

But I see that either content-length on small messages/objects like json or chunked encoding on bigger webpages should be the solution.

Ah, you mean that you use "Connection: close" ?
( I am so used to using "Connection: keep-alive" )

In that case, I recommend that you use the socket option FREERTOS_SO_CLOSE_AFTER_SEND in stead of calling FreeRTOS_shutdown(). That avoids calling vTaskDelay().

For example like this:

  if( ( uxBytesSent + uxCount ) == uxFileSize )
  {
      BaseType_t xTrueValue = 1;
      /* Make sure that the next call to FreeRTOS_send() will set
       * the FIN flag in the packet. */
      FreeRTOS_setsockopt( xSocket,
                           0,
                           FREERTOS_SO_CLOSE_AFTER_SEND,
                           ( void * ) &xTrueValue, sizeof( xTrueValue ) );
  }
  BaseType_t rc = FreeRTOS_send( xSocket, pcBuffer, uxCount, 0 );
  if( rc > 0 )
  {
      uxBytesSent += ( size_t ) rc;
  }

With the above code, the last bytes will be sent along with the FIN flag. The client will receive the packet and reply immediately with a FIN+ACK.

I added this option when developing the FTP server. When it sends a file, I want it to finish the connection as quickly as possible. You see it applied here in the FTP server.

Can you try setting the socket option? With this, it is not necessary to call shutdown() anymore.

Yes exactly, I’m using connection: close at the moment. In the future I will propably change to chunked encoding to have a persistent tcp connection.

I just tried your option recommendation to set “FREERTOS_SO_CLOSE_AFTER_SEND” and that works like a charm! Thanks a lot!

Hi @niconrok, very good! Thanks for reporting back.

I am still curious about the initial reversal of packets. Does that happen more often? And does it disturb the communication?

FYI :
In FreeRTOS+TCP, the application determines the life-time of a socket. When closesocket() is called, the space is freed immediately, and the connection stops to exist. When new packets arrive for this port number, the device will reply with a RST packet.

In your application, you will have to take these actions:

  • Set the socket option SO_CLOSE_AFTER_SEND
  • Send the last bytes
  • Keep on reading from the socket until recv() returns an error
  • Cal closesocket() in order to free the space taken by the socket and its buffer.

In a big OS however, after closesockt(), the socket will continue to exist “internally” and the OS will try to deliver or receive the latest data. You can also see sockets that are in their e.g. TIME_WAIT state.

One last detail about FreeRTOS_shutdown(): the parameter xHow is ignored. The IP-stack will deliver all bytes that are queued for transmission. Also, it will wait for any missing incoming data. Only when this is ready, FreeRTOS_recv() will return -pdFREERTOS_ERRNO_ENOTCONN, not connected.

Hi @htibosch, sorry for my late response.

Regarding the reversal of the packets, it does not happen that often, and it only happens, if the packets have to go through a router. I assume it is a normal phenomenon, especially if the packets have to do a lot of hops through many routers.

The communication only gets disturbed in that way I tried it to do first, with calling FreeRTOS_shutdown() manually after receveing an acknowledge from the client and always when a keepalive was involved.

Thanks for your action list, I implemented everything as you have suggested with full success.