+TCP TPUT drops after few hours possibly due to sequence number wrap

Hello,

When I run a TCP Iperf client, it works with expected TPUT of 6.5 Mbps for approximately 3 hours. After 3 hrs the TPUT drops to ~654 Kbps and eventually to 0, and I see a lot of “prvTCPWindowFastRetransmit”.

TCP: active 8066 => 1.1.168.192 port 5001 set ESTAB (scaling 1)                                                                                                                                                                   
Socket 8066 -> [192.168.1.1]:5001 State eCONNECT_SYN->eESTABLISHED                                                                                                                                                                
--->Time = 2149901                                                                                                                                                                                                                
--->ulSequenceNumber= 4280409148, ulFirstSequenceNumber= 2580287303, ulFirst= 4280410608                                                                                                                                          
prvTCPWindowFastRetransmit: Requeue sequence number 1700121845 < 1700123305                                                                                                                                                       
ICMPv6_recv 133 (ROUTER_SOL) from 0x2000ea50ip to 0x2000ea60ip end-point = 192.168.1.2                                                                                                                                            
ICMPv6_recv 133 (ROUTER_SOL) from 0x2000eca0ip to 0x2000ecb0ip end-point = 192.168.1.2                                                                                                                                            
ICMPv6_recv 133 (ROUTER_SOL) from 0x2000eb30ip to 0x2000eb40ip end-point = 192.168.1.2                                                                                                                                            
--->Time = 12094922                                                                                                                                                                                                               
--->ulSequenceNumber= 3968841002, ulFirstSequenceNumber= 2580287303, ulFirst= 3968842462                                                                                                                                          
prvTCPWindowFastRetransmit: Requeue sequence number 1388553699 < 1388555159                                                                                                                                                       
--->Time = 12094923
--->ulSequenceNumber= 3968843922, ulFirstSequenceNumber= 2580287303, ulFirst= 3968845382
prvTCPWindowFastRetransmit: Requeue sequence number 1388556619 < 1388558079
--->Time = 12095030
--->ulSequenceNumber= 3968841002, ulFirstSequenceNumber= 2580287303, ulFirst= 3968845382
prvTCPWindowFastRetransmit: Requeue sequence number 1388553699 < 1388558079
--->Time = 12095040
--->ulSequenceNumber= 3968851222, ulFirstSequenceNumber= 2580287303, ulFirst= 3968852682
prvTCPWindowFastRetransmit: Requeue sequence number 1388563919 < 1388565379
--->Time = 12095046
--->ulSequenceNumber= 3968854142, ulFirstSequenceNumber= 2580287303, ulFirst= 3968855602
prvTCPWindowFastRetransmit: Requeue sequence number 1388566839 < 1388568299

Has anyone run iperf for a long duration with FreeRTOS+TCP stack.? I think this happens when the sequence number wraps around UINT32_MAX.? I have tried the same with LWIP stack and it works. Any help would be appreciated.

I am using the following configs:

#define ipconfigNUM_NETWORK_BUFFER_DESCRIPTORS    10
#define ipconfigUSE_TCP_WIN                       ( 1 )
#define ipconfigNETWORK_MTU                       1500U
#define ipconfigTCP_WIN_SEG_COUNT                 240
#define ipconfigTCP_RX_BUFFER_LENGTH              ( 20000 )
#define ipconfigTCP_TX_BUFFER_LENGTH              ( 20000 )

Thanks.

Hi @sthomas,

Which version of FreeRTOS+TCP library and IPERF implementation are you using? Also is there any IPERF specific configuration that you use?

Will try to see if we can reproduce the issue.

Hello @sthomas , it looks like you have sent 4,294,967,296 bytes in 3 hours, reaching the maximum of uint32_t.

In a later version, the iper server will use a counter of type uint64_t , please have a look here.

Now when I start up iperf with --bytes 8192M. In the communication I see in the logging:

{
    "tcp":true,
    "omit":0,
    "num":8589934592,  // two times uint32_t
    "parallel":1,
    "reverse":true,
    "len":131072,
    "client_version":"3.1.3"
}
1 Like

@tony-josi-aws asked:

Which version of FreeRTOS+TCP library and IPERF implementation are you using?

Earlier versions of iperf_task.c indeed had a 32-bit counter which overflowed in your test. The latest version is v3.0f.

Hi @tony-josi-aws, @htibosch,

Thanks for your reply.

I am using FreeRTOS+TCP V4.0.0. To be exact, commit:

$ git log
commit c9e63fcbe819bf5f423970e8cc965c331aff2de7 (HEAD -> main, origin/main, origin/HEAD)
Author: Emil Popov <evpopov@gmail.com>
Date:   Wed Sep 20 06:11:02 2023 -0400

    Fixes the TCP zero-copy functionality... (#1018)

I am using a modified version of LWIP Iperf v2 which uses socket APIs (FreeRTOS_send()) for my testing. I have tested this with LWIP stack and it works.

When I switch over to freertosplustcp stack, TCP client stops after ~3hrs because of sequence number wraps (?). Same issue for TCP server also. I see that the sequence numbers are uint32_t in FreeRTOS_TCP_WIN.c and FreeRTOS_TCP_Transmission.c. (Refer: freertos-plus-tcp/source/include/FreeRTOS_TCP_WIN.h:120). Any chance to change the TCP windowing schemes to use uint64_t ?

In the mean time I will try to test with the iperf utility from freertos_plus_projects, however this may take some time.

Thanks

@sthomas,

Thanks for the reply.

Here is a sample project branch that integrates IPERF implementation by @htibosch which runs on STM32F429ZI board for reference.

@sthomas

I’m running IPERF on the sample project listed above with latest FreeRTOS+TCP mainline for few hours now, it crossed the 3 hrs mark just now without any throughput drops:

[  4] 11111.01-11112.00 sec  4.88 MBytes  41.2 Mbits/sec
[  4] 11112.00-11113.00 sec  4.75 MBytes  39.8 Mbits/sec
[  4] 11113.00-11114.01 sec  4.62 MBytes  38.5 Mbits/sec
[  4] 11114.01-11115.00 sec  4.75 MBytes  40.1 Mbits/sec
[  4] 11115.00-11116.00 sec  4.62 MBytes  38.7 Mbits/sec
[  4] 11116.00-11117.01 sec  4.75 MBytes  39.7 Mbits/sec
[  4] 11117.01-11118.01 sec  4.75 MBytes  39.7 Mbits/sec

11117.01 seconds /3600 ~= >3.08 hrs

EDIT:

Final stats for 11500 seconds:

.
.
.
[  4] 11498.01-11499.00 sec  4.75 MBytes  40.4 Mbits/sec
[  4] 11499.00-11500.00 sec  4.62 MBytes  38.8 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-11500.00 sec  39.3 GBytes  29.4 Mbits/sec                  sender
[  4]   0.00-11500.00 sec  3.33 GBytes  2.49 Mbits/sec                  receiver
CPU Utilization: local/sender 0.6% (0.2%u/0.4%s), remote/receiver 0.0% (0.0%u/0.0%s)

iperf Done.

HI @sthomas,

Was the data provided by @tony-josi-aws helpful? Is there any other assistance we can provide?