FreeRTOS+TCP retransmission timer incorrect?

Short introduction:
We have an embedded device, running FreeRTOS with FreeRTOS+TCP.
The embedded device communicates with an application on a Windows 10 PC.

Communication is done with a propriety protocol running on TCP/IP.
The embedded device is the client, the Windows 10 PC the server.

This propriety protocol supports application data and, if there is no application data, a propriety ‘live’ message is send each 30 seconds to detect the device is correctly running.

My problem is that this ‘live’ message is retransmitted each time by FreeRTOS+TCP.
I found out that the ‘SRTT’ is calculated in FreeRTOS_TCP_WIN.c and used to detect if a retransmission is needed.
However, in my case this ‘SRTT’ is calculated and tuned when both parties are exchanging a lot of data at startup. The ‘SRTT’ gets a value of about 70ms.

After exchanging a lot of data there are only ‘live’ messages.

The embedded device sends the ‘live’ message, however, Windows 10 is TCP/IP ACKing this ‘live’ message after 200ms.
This because of the default delayed TCP/IP ACK of 200ms in Windows 10. Note: we don’t want to change this default value.

Because the ‘SRTT’ timer is only 70ms, FreeRTOS+TCP retransmits the ‘live’ message.
Windows 10 is TCP/IP ACKing after that.
A new ‘SRTT’ isn’t calculated because it is a retransmission? (line 1917 FreeRTOS_TCP_WIN.c).

Next time a ‘live’ message is send same behavior happens. The embedded device sends the ‘live’ message. It’s retransmitted. Windows 10 TCP/IP ACKs the message, but nothing happens with the internal ‘SRTT’ timer.
This keeps on forever, with each ‘live’ message.

Wireshark logging, where ‘172.16.43.1’ is the embedded device and ‘172.16.128.113’ is the Windows 10 PC.
Packet 64 is the live packet, packet 65 is the retransmission and packet 66 is the TCP/IP ACK of Windows 10.
64 12.538979 172.16.43.1 172.16.128.113 TCP 99 42832→15042 [PSH, ACK] Seq=1 Ack=46 Win=1460 Len=45 2021-11-19 12:27:55.754532
65 12.692310 172.16.43.1 172.16.128.113 TCP 99 [TCP Retransmission] 42832→15042 [PSH, ACK] Seq=1 Ack=46 Win=1460 Len=45 2021-11-19 12:27:55.907863
66 12.692700 172.16.128.113 172.16.43.1 TCP 66 15042→42832 [ACK] Seq=46 Ack=46 Win=32766 Len=0 SLE=1 SRE=46 2021-11-19 12:27:55.908253

Note that when the ‘SRTT’ timer is large, and there is no retransmission, it is seen clearly that Windows 10 is TCP/IP ACKing after 200ms.

I did a quick read of RFC6298 about the RTO (retransmission timeout) and SRTT.
If I understand it correctly the RTO should never become less then 1 second.
Chapter 2, 2.4: Whenever RTO is computed, if it is less than 1 second, then the RTO SHOULD be rounded up to 1 second.

I also have seen I can change the define ‘winSRTT_CAP_mS’ from 50ms to a higher value, however I don’t like changing 3th party source code.

So my question is, does FreeRTOS+TCP has the correct behavior with respect to resending timers or should the RTO always have a minimum value of 1 second like the RFC describes?
If it’s correct and FreeRTOS+TCP is fine, what is the best way to fix our problem? Changing the ‘winSRTT_CAP_mS’ to a higher value that is tuned with the Windows 10 timing or another solution?

Before I will read your entire message, would it be possible if you attach a PCAP file that shows the communication?
When you are new on this forum, you might not be allowed to attach files.
In that case can you send it to me directly: hein [at] htibosch [dot] net ?

Here two PCAP captures.
First one, FreeRTOSTCPRetransmissions.pcap, contains the retransmission behavior like mentioned in my first post.

Second one, FreeRTOSTCPNoRetransmissionsWithLargerSRTT_CAP_mS.pcap, is created with a larger define ‘winSRTT_CAP_mS’ that does solves the problem. It is changed from the original 50ms to 200ms.
Like said in my first post, it is a solution, but I don’t like changing 3rd party library software and I also think that FreeRTOS+TCP isn’t correct with the current TCP/IP retransmit time calculation implementation.

Here detailed info about the captures. (p) means packet number in the capture.
‘172.16.43.1’ is the (embedded) device and ‘172.16.128.113’ is the Windows 10 PC server.
Please note that all data is encrypted, you cannot use the data content, only the flow.

–[FreeRTOSTCPRetransmissions.pcap]–
(p0-p110) device starts up and exchanges lot of application data with the server. Internal in FreeRTOS+TCP the ‘SRTT’ timer is calulcated and set on about 70ms.
(p113) first ‘live’ message device → server (because of 30seconds no application data)
(p114) resend of first ‘live’ message (This is the problem I am talking about!)
(p115) TCP/IP ACK of the server
(p116) first ‘live’ message server → device (because of 30seconds no application data)
(p117) TCP/IP ACK of device
(p120) second ‘live’ message
(p121) resend of second ‘live’ message (This is the problem I am talking about!)
(p122) TCP/IP ACK of the server
and so it continues…
(p128) ‘live’ message, (p129) resend of ‘live’ message, (p130) TCP/IP ACK server
(p133) ‘live’ message, (p134) resend of ‘live’ message, (p135) TCP/IP ACK server
(p141) ‘live’ message, (p142) resend of ‘live’ message, (p143) TCP/IP ACK server
etc, etc…

–[FreeRTOSTCPNoRetransmissionsWithLargerSRTT_CAP_mS.pcap]–
(p0-p123) device starts up and exchanges lot of application data with the server in a short time. Internal in FreeRTOS+TCP the ‘SRTT’ timer is calulcated and set on about 70ms.
(p126) first ‘live’ message device → server (because of 30seconds no application data)
(p127) TCP/IP ACK of the server, delayed ACK 200ms after receiving above ‘live’ message.
(p128) first ‘live’ message server → device (because of 30seconds no application data)
(p129) TCP/IP ACK of device
(p132) second ‘live’ message
(p133) TCP/IP ACK of the server, delayed ACK 200ms after receiving above ‘live’ message.
and so it continues…
(p137) ‘live’ message, (p138) TCP/IP ACK server
(p143) ‘live’ message, (p144) TCP/IP ACK server
(p150) ‘live’ message, (p151) TCP/IP ACK server
etc, etc…

Note: I indeed cannot upload files, I will send it by email.
Maybe you can attach it if other users want to see the captures.

Please find attached the ZIP file from @nlangenb:

FreeRTOSTCPRetransmissionLogging.zip (23.0 KB)

Possibly in your next post you will be able to attach ZIP files.

First of all, thank you very much for reporting this, and thanks for the analysis.

You wrote:

A new ‘SRTT’ isn’t calculated because it is a retransmission? (line 1917 FreeRTOS_TCP_WIN.c).

That is right, only when new data was sent, the response time is being measured.
RFC6298 says: “RTT samples MUST NOT be made using segments that were retransmitted”

I did a quick read of RFC6298 about the RTO (retransmission timeout) and SRTT. If I understand it correctly the RTO should never become less then 1 second. Chapter 2, 2.4: Whenever RTO is computed, if it is less than 1 second, then the RTO SHOULD be rounded up to 1 second.

The reason that we allowed for quicker retransmissions was that some projects would show a “hiccup” when a single packet gets lost. In one such a project, each press on a button is transmitted, and when the button is putting up the volume, you could damage your ears :slight_smile:

You are right about the minimum, of 1 second: “Research suggests that a large minimum RTO is needed to keep TCP conservative and avoid spurious retransmissions AP99

It is also perfectly normal that the Windows host sends a delayed ACK after 200 ms. FreeRTOS+TCP also delays the sending of TCP ACKs, but using a shorter time of 50 ms ( see tcpDELAYED_ACK_LONGER_DELAY_MS ).

Possible solution:

I never like to make a change that affects existing projects, so the default of winSRTT_CAP_mS should stay the same.

Let’s in stead add a new option called ipconfigTCP_SRTT_MINIMUM_VALUE_MS, with default value of 50 ms.
It can be set in the project’s FreeRTOSIPConfig.h, and it will be copied to the existing macro in FreeRTOS_TCP_WIN.c:

#define winSRTT_CAP_mS  ipconfigTCP_SRTT_MINIMUM_VALUE_MS /**< Cap in milliseconds. */

Would that be OK for you?

Thank you for your investigation on this issue.

I understand that FreeRTOS+TCP is more for embedded projects and therefor has a quicker retransmission. It is perfectly fine if both parties are running FreeRTOS+TCP. Problem exists if the other party has a larger delayed ACK then 50ms, like Windows 10 with 200ms.

For us, it would be great if it’s configurable within FreeRTOSIPConfig.h. We can tweak it for our project with a Windows 10 server and we don’t have to change the original 3rd party source code.

Can you indicate when this change will be made and for which release(s)?
Currently we are on FreeRTOS LTS 202012.02. I just saw FreeRTOS LTS 202012.03 is released.

Thank you in advance

I will first make the change in the main branch, and when that is accepted we can apply the same change to the LTS release.

One minor remark, you are sending packets as a sign of life.
Why doesn’t the server reply to them, like sending an OK. that would avoid the timeout problem. Just an idea.

Thanks again!

It it’s accepted in the main branch I will copy that changes to our project. So if it’s ever integrated in the LTS release, then the define names/code changes are the same.

Reason we didn’t made a server reply is that at that time we decided that there should be no difference between server and client at that level (application layer). Of course there is a ‘client’ and ‘server’ at initiating the connection, but after that both parties are equal and send there own ‘live’ message which the other party has to check. Simpel reason that there is no ‘OK’ response on a ‘live’ message is just to save bandwidth. If that was the case, we would indeed not have run into this issue within FreeRTOS+TCP.

Can you indicate when it’s accepted in the main branch so I can copy that exact change into our software? Or do you advice just to check the main branch in a week or two?

Have you checked PR #387 already?

Just checked it! Looks good! Going to add your changes in our project.

Thanks!