TCP performance

I have FreeRTOS+TCP running on The STM32H743 EVAL board. Using iperf3 (Thanks @htibosch !) I am seeing the following performance:

$ iperf3 -c 192.168.0.127 --port 5001 --bytes 100M
Connecting to host 192.168.0.127, port 5001
[ 5] local 192.168.0.112 port 48898 connected to 192.168.0.127 port 5001
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 151 KBytes 1.24 Mbits/sec 22 1.43 KBytes
[ 5] 1.00-2.00 sec 288 KBytes 2.36 Mbits/sec 27 1.43 KBytes
[ 5] 2.00-3.00 sec 114 KBytes 935 Kbits/sec 21 1.43 KBytes
[ 5] 3.00-4.00 sec 163 KBytes 1.33 Mbits/sec 24 2.85 KBytes
[ 5] 4.00-5.00 sec 279 KBytes 2.29 Mbits/sec 31 2.85 KBytes
[ 5] 5.00-6.00 sec 91.2 KBytes 747 Kbits/sec 20 1.43 KBytes

I see others getting 80 Mbps. Any pointers as to why these results are so poor?

I am using the STM32H7 Ethernet driver that comes with the +TCP stack:
FreeRTOS-Plus-TCP/portable/NetworkInterface/STM32Hxx/NetworkInterface.c
…as opposed to the Ethernet driver that comes with the ST HAL. Which Ethernet driver should I be using?

Thanks,
-Chris

Basically better avoid HAL when striving for performance and efficiency :wink:
In your case the bottleneck is almost certainly not the driver. Question is what is it.
Is there something else eating up the CPU ? Did you configure the stack for performance giving appropriate prio, enough buffers, suitable MTU size, etc ? Which allocation scheme do you use ? Perhaps post your FreeRTOSIPConfig.h.
Do you have Wireshark at hand and had a look at the traffic ? Could you verify the PHY came up with a 100MBit Full-Duplex link ? …

When I tested iperf3 on my STM32H747, I was also happy about the results, and I used the same driver as you do.

In my testing project, the following was defined in FreeRTOSIPConfig.h:

#define USE_IPERF                               1

#define ipconfigIPERF_DOES_ECHO_UDP             0

#define ipconfigIPERF_VERSION                   3
#define ipconfigIPERF_STACK_SIZE_IPERF_TASK     680

#define ipconfigIPERF_TX_BUFSIZE                ( 24 * ipconfigTCP_MSS )
#define ipconfigIPERF_TX_WINSIZE                ( 12 )
#define ipconfigIPERF_RX_BUFSIZE                ( 24 * ipconfigTCP_MSS )
#define ipconfigIPERF_RX_WINSIZE                ( 12 )

/* The iperf module declares a character buffer to store its send data. */
#define ipconfigIPERF_RECV_BUFFER_SIZE          ( 24 * ipconfigTCP_MSS )

EDIT The original iperf settings in this post were not the right ones. The above settings give a much better performance. It assumes that there is a lot of RAM for buffering.

When the traffic is so slow, it could be interesting to look at the actual iperf session in a PCAP. Could you make a PCAP, zip and attach it?
In case you can not attach a ZIP file, you can also email it to me:
hein [at] htibosch [dot] net.
But I have asked the moderator to allow you to attach files to your posts.

You can filter the iperf packets with tcp.port==5001.
And can you please also look at the questions from Hartmut?

About priorities:
The task in NetworkInterface: higher priority
The IP-task : normal priority
All tasks that make use of the IP-stack: lower priority.

So the iperf task gets a lower priority. For example:

#define niEMAC_HANDLER_TASK_PRIORITY         5
#define ipconfigIP_TASK_PRIORITY             4
#define ipconfigIPERF_PRIORITY_IPERF_TASK    3

In a meanwhile, I will try out iperf once again.

@chris I’ve granted you the ability to attach files to your posts.

The testing project that I used can be found here on github.

The Cortex-M7 is running on 400 MHz ( maximum 480 MHz ).

The IDE is STM32CubeIDE, Version: 1.4.2, Build: 7643.

The test shows the following performance:

  • Receiving TCP data at 80 Mbits/sec or better
  • Sending TCP data at 90 Mbits/sec or better ( using iperf -R )

Both the laptop and the STM32 board are connected to a (100/1000 Mbps) network switch.
The PHY/EMAC of the STM32 board has a speed of 100 Mbps.
During the test, the CPU had nothing else to do but work on the iperf session.

My iperf client has version 3.1.3.

#define  ipconfigIP_TASK_PRIORITY           4
#define  niEMAC_HANDLER_TASK_PRIORITY       5
#define  ipconfigIPERF_PRIORITY_IPERF_TASK  6

EDIT: the above priority scheme is in contrast what I recommended earlier: the application iperf gets the highest priority.

When testing, make sure that:

  • WireShark does not run, as it may slow down traffic (!)
  • the LAN has enough available bandwidth, i.e. no video streaming
  • iperf 3 is running on a physical laptop or PC, not a virtual machine.

Thanks for the help, guys! I have attached a pcap of about 5 seconds worth of iper3 testing. Here is the console output during this test (note my speeds have increased to almost 2 Mbps now that I have taken the test device off my busy LAN and instead connected it directly to a laptop via Ethernet):

$ iperf3 -c 192.168.0.200 --port 5001 --bytes 100M
Connecting to host 192.168.0.200, port 5001
[ 5] local 192.168.0.100 port 59544 connected to 192.168.0.200 port 5001
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 257 KBytes 2.10 Mbits/sec 22 4.28 KBytes
[ 5] 1.00-2.00 sec 218 KBytes 1.79 Mbits/sec 21 2.85 KBytes
[ 5] 2.00-3.00 sec 214 KBytes 1.75 Mbits/sec 20 1.43 KBytes
[ 5] 3.00-4.00 sec 158 KBytes 1.30 Mbits/sec 18 1.43 KBytes
[ 5] 4.00-5.00 sec 188 KBytes 1.54 Mbits/sec 27 4.28 KBytes
^C[ 5] 5.00-5.76 sec 197 KBytes 2.12 Mbits/sec 17 2.85 KBytes


[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-5.76 sec 1.20 MBytes 1.75 Mbits/sec 125 sender
[ 5] 0.00-5.76 sec 0.00 Bytes 0.00 bits/sec receiver
iperf3: interrupt - the client has terminated

iperf_test.pcapng.zip (1.0 MB)

I saw a slight improvement in speed when I changed the priority of the iperf3 task to be the highest priority task (55 in my setup). I switched the priority back in the attached FreeRTOSIPConfig.h file, as I was trying different things.

Here is my FreeRTOSIPConfig.h file:
FreeRTOSIPConfig.h (20.8 KB)

I am suspicious of my memory setup. In my linkerscript, I set up an area for the Ethernet buffers (which are surprisingly large, specifically when the iperf server starts):
.ethernet_data :
{
. = ABSOLUTE(0x30000000);
PROVIDE_HIDDEN (__ethernet_data_start = .);
KEEP ((SORT(.ethernet_data.)))
KEEP ((.ethernet_data))
PROVIDE_HIDDEN (__ethernet_data_end = .);
} >RAM_D2

And I also set up the MPU to disable caching for this memory region:
(see attached image of mpu.png)

This device is also running all three ADCs at 60 KHz, as well as running the TinyUSB stack (composite MSC device + VCP). The device only uses Ethernet or USB and never both (config option at boot). In my testing, the USB stack is not running so it should not be impacting this performance.

Thanks,
-Chris

Thanks for the PCAP file.

The most interesting thing that I see are recurring lost packets. Your PC waits 200 ms before doing a retransmission:

Why would they get lost? Because there is an underflow of DMA buffers?

Could you also attach your copy of you copy of
“CM7\Core\Inc\stm32h7xx_hal_conf.h”

I am curious about these defines:

#if( ipconfigZERO_COPY_TX_DRIVER != 0 )
    #define ETH_TX_DESC_CNT    4U  /* number of Ethernet Tx DMA descriptors */
#else
    #define ETH_TX_DESC_CNT    1U  /* number of Ethernet Tx DMA descriptors */
#endif

#define ETH_RX_DESC_CNT        4U  /* number of Ethernet Rx DMA descriptors */

EDIT: in the IPERF project I used higher values:

#define ETH_TX_DESC_CNT         14U /* number of Ethernet Tx DMA descriptors */
#define ETH_RX_DESC_CNT         8U  /* number of Ethernet Rx DMA descriptors */

And in FreeRTOSIPConfig.h 64 network buffers are defined:

#define ipconfigNUM_NETWORK_BUFFER_DESCRIPTORS	( 64 )

PS. You find all configuration files that I used here.

Would it be worth testing IPERF without running the ADC’s and without other tasks?

Try to depict the execution. Google: TLS Protocol Analysis Using IoTST—An IoT Benchmark Based on Scheduler Traces.