FreeRTOS-PLUS-TCP Port poor TCP troughput

Hi,
I am porting the Plus-TCP stack (git tag V3.1.0) for my Ethernet driver, Here I am using the Iperf port I got from the community
from: post in forum freertos-tcp-iperf3-server.

In my test with 100M connection, For UDP I am getting 95Mbps but for TCP I was only getting 70Mbps. I want to improve this get similar throughput as UDP.

from my debugging, I suspect the ACK being delayed might be the cause. To test this out, I edited the stack (just to test). This seemd to help and was able to get around 94Mbps throughput

edit:

diff --git a/source/FreeRTOS_TCP_Transmission.c b/source/FreeRTOS_TCP_Transmission.c
index a113d8f9..de58f633 100644
--- a/source/FreeRTOS_TCP_Transmission.c
+++ b/source/FreeRTOS_TCP_Transmission.c
@@ -1362,7 +1362,9 @@
    /* Normally a delayed ACK should wait 200 ms for a next incoming
     * packet.  Only wait 20 ms here to gain performance.  A slow ACK
     * for full-size message. */
-   pxSocket->u.xTCP.usTimeout = ( uint16_t ) pdMS_TO_TICKS( tcpDELAYED_ACK_LONGER_DELAY_MS );
+   /*pxSocket->u.xTCP.usTimeout = ( uint16_t ) pdMS_TO_TICKS( tcpDELAYED_ACK_LONGER_DELAY_MS );*/
+   pxSocket->u.xTCP.usTimeout = ( uint16_t ) tcpDELAYED_ACK_SHORT_DELAY_MS;
+   /*pxSocket->u.xTCP.usTimeout = ( uint16_t ) 5;*/
 
    if( pxSocket->u.xTCP.usTimeout < 1U ) /* LCOV_EXCL_BR_LINE, the second branch will never be hit */
    {

Any suggestion on where I should be looking into, Is there a method to fix this without editing the stack code. What am I missing

@MuhammedZamroodh

Ideally, UDP should be much faster than TCP because there are no packet retransmissions, acknowledgement, or congestion control for UDP. Provided that you are on a 100M connection, TCP throughput of 70Mbps seems normal to me.

As per TCP RFC9293 - 3.8.6.3. Delayed Acknowledgments – When to Send an ACK Segment a delayed ACK must be sent with a delay of less than 0.5 seconds. tcpDELAYED_ACK_LONGER_DELAY_MS is already 20 ms, which is much lower than the maximum limit; decreasing it even lower might introduce unnecessary ACKs in the network, leading to wasted bandwidth in real network scenarios, and in some cases might lead to even slower throughput.

If your use case is just to measure performance via IPERF, I don’t think it’s beneficial to introduce another configuration (ipconfig) macro to make this user a configurable setting. Do you have any use cases outside of IPERF where changing this value seems necessary?

We have a linux port for the same hardware and my reason for investigation is that we get better results in linux (around 94Mbps). Are you sure that the 70Mbps is the expected normal ?

Please share your iPerf commands and results - from my experience iPerf is often used wrongly and gives missleading information.

Apart from that, I am also only getting around 300Mbit/s TCP throughput on a 1000M interface with FreeRTOS+TCP.

Beside seeing a PCAP, produced by iPerf, I would like to hear on what platform you ran FreeRTOS, and on what hardware you ran Linux?
Please compress the PCAP well. You should have the right to upload a file by now.

I have a STM32H7, also 100 Mbps, which also shows a high performance. But I can not tell what is really different? Using faster memories?

I would NOT dare to play with the delayed ACK times. Let’s have a look first at what is happening (the PCAP).

EDIT
Don’t forget to have a look at FREERTOS_SO_WIN_PROPERTIES, a socket option by which you determine the slice of the sliding window, and also the maximum buffer size the socket will allocate.
The defaults may give a poor performance.

pcap.7z (346.6 KB)
I am attaching the PCAP here, I can see some packet loss, do you see anything suspicious here

FREERTOS_SO_WIN_PROPERTIES Played with this and it helped
The following configuration seem to bring the throughput to 86Mbps

#ifndef IPERF_TASK_H_

#define IPERF_TASK_H_

#include <FreeRTOSConfig.h>

#define ipconfigIPERF_PRIORITY_IPERF_TASK ( configMAX_PRIORITIES - 3 )

#define USE_IPERF                               1
//#define ipconfigIPERF_DOES_ECHO_UDP             0

#define ipconfigIPERF_VERSION                   3
#define ipconfigIPERF_STACK_SIZE_IPERF_TASK     (configMINIMAL_STACK_SIZE + 680)

#define ipconfigIPERF_TX_BUFSIZE                ( 256 * ipconfigTCP_MSS )
#define ipconfigIPERF_TX_WINSIZE                ( 128 )
#define ipconfigIPERF_RX_BUFSIZE                ( 256 * ipconfigTCP_MSS )
#define ipconfigIPERF_RX_WINSIZE                ( 128 )

/* The iperf module declares a character buffer to store its send data. */
#define ipconfigIPERF_RECV_BUFFER_SIZE          ( 16 * ipconfigTCP_MSS )

#define ipconfigIPERF_USE_ZERO_COPY 0

void vIPerfInstall( void );

#endif

Have you tried enabling zero copy for IPERF?

Yes, enabling ipconfigIPERF_USE_ZERO_COPY could make iPERF’s reception run faster.

About the TCP window size: normally when WIN increases, more packets can be sent “in one go”. Only after loads of packets of 1460 bytes each, an acknowledgement will be expected.

Two remarks:

  1. When the peer is on the Internet, do not use such long WIN sizes. Much safer to use a few outstanding packets.
  2. Also on a LAN I wouldn’t use such long sizes, have a try with smaller WIN sizes, it uses much less RAM and the performance doesn’t really decrease.

If your device has like 220 outstanding packets, whing will be very difficult when one packet is missing.

Have you optimised your application, using -Os or -O2? If you want attach your FreeRTOSConfig.h as well.

I just looked at iperf using zero-copy, I think it needs a change:

See the latest iperf_task_v3_0h.c.

This changed:

+const BaseType_t xRecvSize = 0x10000;
 xRecvResult = FreeRTOS_recv( pxClient->xServerSocket, /* The socket. */
                              &pcRecvBuffer,           /* A pointer to a pointer */
-                             sizeof pcRecvBuffer,     /* Any size is OK here. */
+                             xRecvSize,               /* Any size is OK here. */
                              FREERTOS_ZERO_COPY );

Here, pcRecvBuffer is not a real buffer but a pointer. The variable will point to an internal stream buffer of the TCP socket.

The received message will be freed in this call:

FreeRTOS_recv( pxClient->xServerSocket, /* The socket being received from. */
               NULL,                    /* The buffer isn't used. */
               xRecvResult,             /* This is important now. */
               0 );

The PCAP you sent, it shows 44 packets summarized in a single packet. As though the packet was 64240 long, which it isn’t.
Would you know a way of disabling this offloading by the laptop?
Maybe it goes together with checksum offloading?

Hi @htibosch,
Sorry for the late response. I am only revisiting this now.

There are new findings. I want to summaries my findings and get some inputs from you.
There is a known issue with the hardware which causes a lot of packet drops. When we do a TCP iperf3 test when the target is booted with Linux, we observe the following

command in host: iperf3 -c 192.168.1.11 -w 128k -l 64k --port 5001 --bytes 1000M

  • there is large amounts of packet drops between each periodic throughput reports, it reports 200+ retransmissions
  • The throughput is about 200-240 Mbps on each of the periodic report
  • the congestion window is about 20KB
  • overall we get 200+ Mbps throughput

same command in host, but with target running FreeRTOS+TCP

  • Here also there is large packet drops. but less, 40-50 retransmissions
  • The throughput on each report interval is 1-1.5 Mbps
  • the congestion window is less than 1Kb
  • overall throughput is 1Mbps

I am attaching the packet dumps captured using the tcpdump utility here as a zip (it contains 2 files one captured when profiling linux and other for freertos)
tcp_dumps.zip (230.5 KB)

My thoughts/queries:

  • The congestion window is less for freertos stack, is the congestion mitigation algorithm causing this?
  • How can i validate that ?
  • What can be the best course of action if I expect a large packet drops from the hardware?
  • Or is there some other issue ? and I am going after the wrong clues ?

Before I get into details, would it be possible that you send a .pcap or whatever type of file that can be read by wireshark? That makes it much easier to understand the sequence of events. Thank you.

pcap.7z (1.5 MB)

I captured the pcap files. Attaching that here
pcap.7z (1.5 MB)

The logs from ipref3 in host system

Linux
iperf3 -c 192.168.1.11 -w 128k -l 64k --port 5001 --bytes 100M
Connecting to host 192.168.1.11, port 5001
[ 5] local 192.168.1.100 port 33696 connected to 192.168.1.11 port 5001
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 28.8 MBytes 242 Mbits/sec 462 8.48 KBytes
[ 5] 1.00-2.00 sec 29.5 MBytes 248 Mbits/sec 478 11.3 KBytes
[ 5] 2.00-3.00 sec 30.0 MBytes 252 Mbits/sec 450 8.48 KBytes
[ 5] 3.00-3.40 sec 11.7 MBytes 243 Mbits/sec 157 14.1 KBytes


[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-3.40 sec 100 MBytes 247 Mbits/sec 1547 sender
[ 5] 0.00-3.41 sec 99.8 MBytes 246 Mbits/sec receiver

Freertos
iperf3 -c 192.168.1.11 -w 128k -l 64k --port 5001 --bytes 100M
Connecting to host 192.168.1.11, port 5001
[ 5] local 192.168.1.100 port 44982 connected to 192.168.1.11 port 5001
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 512 KBytes 4.19 Mbits/sec 53 8.55 KBytes
[ 5] 1.00-2.00 sec 271 KBytes 2.22 Mbits/sec 30 2.85 KBytes
[ 5] 2.00-3.00 sec 257 KBytes 2.10 Mbits/sec 30 2.85 KBytes
[ 5] 3.00-4.00 sec 171 KBytes 1.40 Mbits/sec 26 4.28 KBytes
[ 5] 4.00-5.00 sec 257 KBytes 2.10 Mbits/sec 31 5.70 KBytes
[ 5] 5.00-6.00 sec 171 KBytes 1.40 Mbits/sec 30 2.85 KBytes
[ 5] 6.00-7.00 sec 271 KBytes 2.22 Mbits/sec 34 2.85 KBytes
[ 5] 7.00-8.00 sec 257 KBytes 2.10 Mbits/sec 34 1.43 KBytes
[ 5] 8.00-9.00 sec 171 KBytes 1.40 Mbits/sec 24 4.28 KBytes
[ 5] 9.00-10.00 sec 171 KBytes 1.40 Mbits/sec 30 2.85 KBytes
[ 5] 10.00-11.00 sec 171 KBytes 1.40 Mbits/sec 31 2.85 KBytes
[ 5] 11.00-12.00 sec 171 KBytes 1.40 Mbits/sec 32 2.85 KBytes
[ 5] 12.00-13.00 sec 257 KBytes 2.10 Mbits/sec 38 1.43 KBytes

A short response:

I am curious if the slowness is also seen when you add the -R option, which will receive from the server.

Also, I wonder if TCP Segmentation Offload (TSO) or Large Segment Offload (LSO) plays a role here?

It looks like all offloaded packets are not seen by your DUT. So the communication becomes extremely slow, all transmissions depend on SACKs.

Could you switch off TSO/LSO?

And once more the question: what platform are you using? Does your NIC support TSO/LSO?

Regards,

  • Will try the -R and get back
  • The target we use has XGMAC from synopsys
  • yes we can turn off the TSO, we use a usb to ETH converter and the linux host system was able to control the TSO using the ethtool, attaching the latest capture with TSO off
    pcap_tso_off.7z (261.8 KB)

Also this didn’t improve the throughput

I tried running the TCP test with -R.
But to get it working, I had to do some modifications in the code.

Commented out some lined, see patch below

diff --git a/FreeRTOS/Demo/CORTEX_A55_AGILEX_5_SOC/main_freertosplus_basic/iperf_task_v3_0d.c b/FreeRTOS/Demo/CORTEX_A55_AGILEX_5_SOC/main_freertosplus_basic/iperf_task_v3_0d.c
index 9eb714f1..915cdfda 100644
--- a/FreeRTOS/Demo/CORTEX_A55_AGILEX_5_SOC/main_freertosplus_basic/iperf_task_v3_0d.c
+++ b/FreeRTOS/Demo/CORTEX_A55_AGILEX_5_SOC/main_freertosplus_basic/iperf_task_v3_0d.c
@@ -313,21 +313,21 @@ BaseType_t xResult = 0;
 	{
 		size_t uxMaxSpace = (size_t) FreeRTOS_tx_space( pxClient->xServerSocket );
 		size_t uxSize = (size_t)FreeRTOS_min_uint32( uxMaxSpace, ( int32_t )sizeof( pcSendBuffer ) );
-		if( pxClient->bits.bTimed != pdFALSE_UNSIGNED )
-		{
-			if( xTaskCheckForTimeOut( &( pxClient->xTimeOut ), &( pxClient->xRemainingTime ) ) != pdFALSE )
-			{
-				/* Time is up. */
-				if( pxClient->bits.bTimedOut == pdFALSE_UNSIGNED )
-				{
-					FreeRTOS_shutdown( pxClient->xServerSocket, FREERTOS_SHUT_RDWR );
-					pxClient->bits.bTimedOut = pdTRUE_UNSIGNED;
-				}
-				break;
-			}
-		}
-
-		uxSize = FreeRTOS_min_uint32( uxSize, pxClient->ulAmount );
+		/*if( pxClient->bits.bTimed != pdFALSE_UNSIGNED )*/
+		/*{*/
+			/*if( xTaskCheckForTimeOut( &( pxClient->xTimeOut ), &( pxClient->xRemainingTime ) ) != pdFALSE )*/
+			/*{*/
+				/*[> Time is up. <]*/
+				/*if( pxClient->bits.bTimedOut == pdFALSE_UNSIGNED )*/
+				/*{*/
+					/*FreeRTOS_shutdown( pxClient->xServerSocket, FREERTOS_SHUT_RDWR );*/
+					/*pxClient->bits.bTimedOut = pdTRUE_UNSIGNED;*/
+				/*}*/
+				/*break;*/
+			/*}*/
+		/*}*/
+
+		/*uxSize = FreeRTOS_min_uint32( uxSize, pxClient->ulAmount );*/
 		if( uxSize <= 0 )
 		{
 			break;

With that I was getting a throughput measurement of about ~220Mbps

Thank you Muhammed for further testing.

Although the performance hasn’t increased dramatically, thing start to go better.
What surprises me now, is that some of the RX packets just don’t arrive!

Packet 18: send   37 bytes seq   1 arrived
Packet 20: send 1460 bytes seq  38 gets lost

Why would that be? The quality of you network is good, as we see when Linux is using it.
Packets may get lost when the DUT is lacking resources, or when the DUT is too slow.
At what data speeds did you configure the EMAC in case of FreeRTOS?

So you use a Synopsys Ethernet XGMAC IP, and the platform is presumably an A55.

Can you show the networkinterface that you are using? Did you write it your self? Is it public?

Now when there are dropped packets, I think it is better to solve that first, rather than changing the TCP congestion behaviour/parameters.

Could there be problems caused by the A55 data cache?
Do you have logging enabled (FreeRTOS_printf() and FreeRTOS_debug_print())?