STM32H743 FreeRTOS+TCP issue with ZERO Copy

Not so much to use less RAM, but the get MSS equal to your sample buffer size. That can be more efficient.

If you can share a PCAP fragment, maybe we can say more about the efficiency.

ucNetworkPackets includes the whole TX packet, right?

Yes, that starts with the Ethernet header, and it ends with the last payload byte.

All network buffers come from the same memory pool: ucNetworkPackets in your case, or pvPortMalloc() when BufferAllocation_2.c is used.

I am currently traveling, but next week I will make a PCAP and send it to you, no issue

Below you can see a PCAP fragment (is this what you meant?):

45 0.007149 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=40601 Ack=1 Win=5600 Len=1400
46 0.007463 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=42001 Ack=1 Win=5600 Len=1400
47 0.007482 192.168.4.111 192.168.4.155 TCP 54 54499 β†’ 55151 [ACK] Seq=1 Ack=43401 Win=4112 Len=0
48 0.007649 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=43401 Ack=1 Win=5600 Len=1400
49 0.007960 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=44801 Ack=1 Win=5600 Len=1400
50 0.007984 192.168.4.111 192.168.4.155 TCP 54 54499 β†’ 55151 [ACK] Seq=1 Ack=46201 Win=4112 Len=0
51 0.008143 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=46201 Ack=1 Win=5600 Len=1400
52 0.008462 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=47601 Ack=1 Win=5600 Len=1400
53 0.008481 192.168.4.111 192.168.4.155 TCP 54 54499 β†’ 55151 [ACK] Seq=1 Ack=49001 Win=4112 Len=0
54 0.008642 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=49001 Ack=1 Win=5600 Len=1400
55 0.008949 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=50401 Ack=1 Win=5600 Len=1400
56 0.008970 192.168.4.111 192.168.4.155 TCP 54 54499 β†’ 55151 [ACK] Seq=1 Ack=51801 Win=4107 Len=0
57 0.009138 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=51801 Ack=1 Win=5600 Len=1400
58 0.009443 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=53201 Ack=1 Win=5600 Len=1400
59 0.009449 192.168.4.111 192.168.4.155 TCP 54 54499 β†’ 55151 [ACK] Seq=1 Ack=54601 Win=4096 Len=0
60 0.009635 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=54601 Ack=1 Win=5600 Len=1400
61 0.009926 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=56001 Ack=1 Win=5600 Len=1400
62 0.009957 192.168.4.111 192.168.4.155 TCP 54 54499 β†’ 55151 [ACK] Seq=1 Ack=57401 Win=4085 Len=0
63 0.010132 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=57401 Ack=1 Win=5600 Len=1400
64 0.010293 192.168.4.111 192.168.4.155 TCP 54 54499 β†’ 55151 [ACK] Seq=1 Ack=58801 Win=4112 Len=0
65 0.010436 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=58801 Ack=1 Win=5600 Len=1400
66 0.010450 192.168.4.111 192.168.4.155 TCP 54 54499 β†’ 55151 [ACK] Seq=1 Ack=60201 Win=4112 Len=0
67 0.010690 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=60201 Ack=1 Win=5600 Len=1400
68 0.010937 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=61601 Ack=1 Win=5600 Len=1400
69 0.010955 192.168.4.111 192.168.4.155 TCP 54 54499 β†’ 55151 [ACK] Seq=1 Ack=63001 Win=4112 Len=0
70 0.011128 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=63001 Ack=1 Win=5600 Len=1400
71 0.011432 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=64401 Ack=1 Win=5600 Len=1400
72 0.011448 192.168.4.111 192.168.4.155 TCP 54 54499 β†’ 55151 [ACK] Seq=1 Ack=65801 Win=4112 Len=0
73 0.011630 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=65801 Ack=1 Win=5600 Len=1400
74 0.011931 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=67201 Ack=1 Win=5600 Len=1400
75 0.011949 192.168.4.111 192.168.4.155 TCP 54 54499 β†’ 55151 [ACK] Seq=1 Ack=68601 Win=4112 Len=0
76 0.012124 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=68601 Ack=1 Win=5600 Len=1400
77 0.012424 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=70001 Ack=1 Win=5600 Len=1400
78 0.012434 192.168.4.111 192.168.4.155 TCP 54 54499 β†’ 55151 [ACK] Seq=1 Ack=71401 Win=4112 Len=0
79 0.012621 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=71401 Ack=1 Win=5600 Len=1400
80 0.012904 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=72801 Ack=1 Win=5600 Len=1400
81 0.012913 192.168.4.111 192.168.4.155 TCP 54 54499 β†’ 55151 [ACK] Seq=1 Ack=74201 Win=4112 Len=0
82 0.013117 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=74201 Ack=1 Win=5600 Len=1400
83 0.013381 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=75601 Ack=1 Win=5600 Len=1400
84 0.013391 192.168.4.111 192.168.4.155 TCP 54 54499 β†’ 55151 [ACK] Seq=1 Ack=77001 Win=4112 Len=0
85 0.013617 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=77001 Ack=1 Win=5600 Len=1400
86 0.013869 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=78401 Ack=1 Win=5600 Len=1400
87 0.013877 192.168.4.111 192.168.4.155 TCP 54 54499 β†’ 55151 [ACK] Seq=1 Ack=79801 Win=4112 Len=0
88 0.014110 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=79801 Ack=1 Win=5600 Len=1400
89 0.014372 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=81201 Ack=1 Win=5600 Len=1400
90 0.014385 192.168.4.111 192.168.4.155 TCP 54 54499 β†’ 55151 [ACK] Seq=1 Ack=82601 Win=4112 Len=0
91 0.014611 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=82601 Ack=1 Win=5600 Len=1400
92 0.014866 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=84001 Ack=1 Win=5600 Len=1400
93 0.014879 192.168.4.111 192.168.4.155 TCP 54 54499 β†’ 55151 [ACK] Seq=1 Ack=85401 Win=4112 Len=0
94 0.015107 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=85401 Ack=1 Win=5600 Len=1400
95 0.015372 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=86801 Ack=1 Win=5600 Len=1400
96 0.015385 192.168.4.111 192.168.4.155 TCP 54 54499 β†’ 55151 [ACK] Seq=1 Ack=88201 Win=4112 Len=0
97 0.015607 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=88201 Ack=1 Win=5600 Len=1400
98 0.015868 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=89601 Ack=1 Win=5600 Len=1400
99 0.015882 192.168.4.111 192.168.4.155 TCP 54 54499 β†’ 55151 [ACK] Seq=1 Ack=91001 Win=4112 Len=0
100 0.016102 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=91001 Ack=1 Win=5600 Len=1400
101 0.016368 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=92401 Ack=1 Win=5600 Len=1400
102 0.016383 192.168.4.111 192.168.4.155 TCP 54 54499 β†’ 55151 [ACK] Seq=1 Ack=93801 Win=4112 Len=0
103 0.016603 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=93801 Ack=1 Win=5600 Len=1400
104 0.016885 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=95201 Ack=1 Win=5600 Len=1400
105 0.016905 192.168.4.111 192.168.4.155 TCP 54 54499 β†’ 55151 [ACK] Seq=1 Ack=96601 Win=4112 Len=0
106 0.017099 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=96601 Ack=1 Win=5600 Len=1400
107 0.017387 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=98001 Ack=1 Win=5600 Len=1400
108 0.017404 192.168.4.111 192.168.4.155 TCP 54 54499 β†’ 55151 [ACK] Seq=1 Ack=99401 Win=4112 Len=0
109 0.017597 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=99401 Ack=1 Win=5600 Len=1400
110 0.017894 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=100801 Ack=1 Win=5600 Len=1400
111 0.017917 192.168.4.111 192.168.4.155 TCP 54 54499 β†’ 55151 [ACK] Seq=1 Ack=102201 Win=4112 Len=0
112 0.018091 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=102201 Ack=1 Win=5600 Len=1400
113 0.018396 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=103601 Ack=1 Win=5600 Len=1400
114 0.018414 192.168.4.111 192.168.4.155 TCP 54 54499 β†’ 55151 [ACK] Seq=1 Ack=105001 Win=4112 Len=0
115 0.018589 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=105001 Ack=1 Win=5600 Len=1400
116 0.018890 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=106401 Ack=1 Win=5600 Len=1400
117 0.018906 192.168.4.111 192.168.4.155 TCP 54 54499 β†’ 55151 [ACK] Seq=1 Ack=107801 Win=4112 Len=0
118 0.019087 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=107801 Ack=1 Win=5600 Len=1400
119 0.019391 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=109201 Ack=1 Win=5600 Len=1400
120 0.019405 192.168.4.111 192.168.4.155 TCP 54 54499 β†’ 55151 [ACK] Seq=1 Ack=110601 Win=4112 Len=0
121 0.019584 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=110601 Ack=1 Win=5600 Len=1400
122 0.019883 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=112001 Ack=1 Win=5600 Len=1400
123 0.019894 192.168.4.111 192.168.4.155 TCP 54 54499 β†’ 55151 [ACK] Seq=1 Ack=113401 Win=4112 Len=0
124 0.020079 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=113401 Ack=1 Win=5600 Len=1400
125 0.020367 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=114801 Ack=1 Win=5600 Len=1400
126 0.020379 192.168.4.111 192.168.4.155 TCP 54 54499 β†’ 55151 [ACK] Seq=1 Ack=116201 Win=4112 Len=0
127 0.020578 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=116201 Ack=1 Win=5600 Len=1400
128 0.020857 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=117601 Ack=1 Win=5600 Len=1400
129 0.020869 192.168.4.111 192.168.4.155 TCP 54 54499 β†’ 55151 [ACK] Seq=1 Ack=119001 Win=4112 Len=0
130 0.021074 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=119001 Ack=1 Win=5600 Len=1400
131 0.021353 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=120401 Ack=1 Win=5600 Len=1400
132 0.021364 192.168.4.111 192.168.4.155 TCP 54 54499 β†’ 55151 [ACK] Seq=1 Ack=121801 Win=4112 Len=0
133 0.021573 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=121801 Ack=1 Win=5600 Len=1400
134 0.021844 192.168.4.155 192.168.4.111 TCP 1454 55151 β†’ 54499 [PSH, ACK] Seq=123201 Ack=1 Win=5600 Len=1400

I send you as well my FreeRTOSIPConfig in case you see a way to optimize speed:

I send you as well my FreeRTOSIPConfig in case you see a way to optimize speed:

/*
#ifndef FREERTOS_IP_CONFIG_H

#define FREERTOS_IP_CONFIG_H

/* When non-zero, the buffers passed to the SEND routine may be passed
* to DMA. As soon as sending is ready, the buffers must be released by
* calling vReleaseNetworkBufferAndDescriptor(), */
#define ipconfigZERO_COPY_TX_DRIVER ( 1 )

/* This define doesn't mean much to the driver, except that it makes
* sure that pxPacketBuffer_to_NetworkBuffer() will be included. */
#define ipconfigZERO_COPY_RX_DRIVER ( 1 )
#define ipconfigDRIVER_INCLUDED_TX_IP_CHECKSUM 1

/* Prototype for the function used to print out. In this case it prints to the
console before the network is connected then a UDP port after the network has
connected. */
extern void vLoggingPrintf( const char *pcFormatString, ... );
/* Set to 1 to print out debug messages. If ipconfigHAS_DEBUG_PRINTF is set to
1 then FreeRTOS_debug_printf should be defined to the function used to print
out the debugging messages. */
#define ipconfigHAS_DEBUG_PRINTF 0

#if( ipconfigHAS_DEBUG_PRINTF == 1 )
    #define FreeRTOS_debug_printf(X) vLoggingPrintf X
#endif

/* Set to 1 to print out non debugging messages, for example the output of the
FreeRTOS_netstat() command, and ping replies. If ipconfigHAS_PRINTF is set to 1
then FreeRTOS_printf should be set to the function used to print out the
messages. */

#define ipconfigHAS_PRINTF 0

#if( ipconfigHAS_PRINTF == 1 )
    #define FreeRTOS_printf(X) vLoggingPrintf X
#endif

/* Define the byte order of the target MCU (the MCU FreeRTOS+TCP is executing
on). Valid options are pdFREERTOS_BIG_ENDIAN and pdFREERTOS_LITTLE_ENDIAN. */

#define ipconfigBYTE_ORDER pdFREERTOS_LITTLE_ENDIAN

/* If the network card/driver includes checksum offloading (IP/TCP/UDP checksums)
then set ipconfigDRIVER_INCLUDED_RX_IP_CHECKSUM to 1 to prevent the software
stack repeating the checksum calculations. */

#define ipconfigDRIVER_INCLUDED_RX_IP_CHECKSUM 1

/* Several API's will block until the result is known, or the action has been
performed, for example FreeRTOS_send() and FreeRTOS_recv(). The timeouts can be
set per socket, using setsockopt(). If not set, the times below will be
used as defaults. */

#define ipconfigSOCK_DEFAULT_RECEIVE_BLOCK_TIME ( 5000 )

#define ipconfigSOCK_DEFAULT_SEND_BLOCK_TIME ( 5000 )

/* Include support for LLMNR: Link-local Multicast Name Resolution
(non-Microsoft) */

#define ipconfigUSE_LLMNR ( 0 )

/* Include support for NBNS: NetBIOS Name Service (Microsoft) */

#define ipconfigUSE_NBNS ( 0 )

/* Include support for DNS caching. For TCP, having a small DNS cache is very
useful. When a cache is present, ipconfigDNS_REQUEST_ATTEMPTS can be kept low
and also DNS may use small timeouts. If a DNS reply comes in after the DNS
socket has been destroyed, the result will be stored into the cache. The next
call to FreeRTOS_gethostbyname() will return immediately, without even creating
a socket. */

#define ipconfigUSE_DNS_CACHE ( 1 )

#define ipconfigDNS_CACHE_NAME_LENGTH ( 16 )

#define ipconfigDNS_CACHE_ENTRIES ( 4 )

#define ipconfigDNS_REQUEST_ATTEMPTS ( 2 )

/* The IP stack executes it its own task (although any application task can make
use of its services through the published sockets API). ipconfigUDP_TASK_PRIORITY
sets the priority of the task that executes the IP stack. The priority is a
standard FreeRTOS task priority so can take any value from 0 (the lowest
priority) to (configMAX_PRIORITIES - 1) (the highest priority).
configMAX_PRIORITIES is a standard FreeRTOS configuration parameter defined in
FreeRTOSConfig.h, not FreeRTOSIPConfig.h. Consideration needs to be given as to
the priority assigned to the task executing the IP stack relative to the
priority assigned to tasks that use the IP stack. */

#define ipconfigIP_TASK_PRIORITY ( configMAX_PRIORITIES - 1 )

/* The size, in words (not bytes), of the stack allocated to the FreeRTOS+TCP
task. This setting is less important when the FreeRTOS Win32 simulator is used
as the Win32 simulator only stores a fixed amount of information on the task
stack. FreeRTOS includes optional stack overflow detection, see:
http://www.freertos.org/Stacks-and-stack-overflow-checking.html */

#define ipconfigIP_TASK_STACK_SIZE_WORDS ( configMINIMAL_STACK_SIZE * 5 )

/* ipconfigRAND32() is called by the IP stack to generate random numbers for
things such as a DHCP transaction number or initial sequence number. Random
number generation is performed via this macro to allow applications to use their
own random number generation method. For example, it might be possible to
generate a random number by sampling noise on an analogue input. */

extern UBaseType_t uxRand();

#define ipconfigRAND32 () uxRand()

/* If ipconfigUSE_NETWORK_EVENT_HOOK is set to 1 then FreeRTOS+TCP will call the
network event hook at the appropriate times. If ipconfigUSE_NETWORK_EVENT_HOOK
is not set to 1 then the network event hook will never be called. See
http://www.FreeRTOS.org/FreeRTOS-Plus/FreeRTOS_Plus_UDP/API/vApplicationIPNetworkEventHook.shtml
*/

#define ipconfigUSE_NETWORK_EVENT_HOOK 1

/* Sockets have a send block time attribute. If FreeRTOS_sendto() is called but
a network buffer cannot be obtained then the calling task is held in the Blocked
state (so other tasks can continue to executed) until either a network buffer
becomes available or the send block time expires. If the send block time expires
then the send operation is aborted. The maximum allowable send block time is
capped to the value set by ipconfigMAX_SEND_BLOCK_TIME_TICKS. Capping the
maximum allowable send block time prevents prevents a deadlock occurring when
all the network buffers are in use and the tasks that process (and subsequently
free) the network buffers are themselves blocked waiting for a network buffer.
ipconfigMAX_SEND_BLOCK_TIME_TICKS is specified in RTOS ticks. A time in
milliseconds can be converted to a time in ticks by dividing the time in
milliseconds by portTICK_PERIOD_MS. */

#define ipconfigUDP_MAX_SEND_BLOCK_TIME_TICKS  ( 5000 / portTICK_PERIOD_MS )

/* If ipconfigUSE_DHCP is 1 then FreeRTOS+TCP will attempt to retrieve an IP
address, netmask, DNS server address and gateway address from a DHCP server. If
ipconfigUSE_DHCP is 0 then FreeRTOS+TCP will use a static IP address. The
stack will revert to using the static IP address even when ipconfigUSE_DHCP is
set to 1 if a valid configuration cannot be obtained from a DHCP server for any
reason. The static configuration used is that passed into the stack by the
FreeRTOS_IPInit() function call. */

#define ipconfigUSE_DHCP 0

/* When ipconfigUSE_DHCP is set to 1, DHCP requests will be sent out at
increasing time intervals until either a reply is received from a DHCP server
and accepted, or the interval between transmissions reaches
ipconfigMAXIMUM_DISCOVER_TX_PERIOD. The IP stack will revert to using the
static IP address passed as a parameter to FreeRTOS_IPInit() if the
re-transmission time interval reaches ipconfigMAXIMUM_DISCOVER_TX_PERIOD without
a DHCP reply being received. */

#define ipconfigMAXIMUM_DISCOVER_TX_PERIOD ( 120000 / portTICK_PERIOD_MS )

/* The ARP cache is a table that maps IP addresses to MAC addresses. The IP
stack can only send a UDP message to a remove IP address if it knowns the MAC
address associated with the IP address, or the MAC address of the router used to
contact the remote IP address. When a UDP message is received from a remote IP
address the MAC address and IP address are added to the ARP cache. When a UDP
message is sent to a remote IP address that does not already appear in the ARP
cache then the UDP message is replaced by a ARP message that solicits the
required MAC address information. ipconfigARP_CACHE_ENTRIES defines the maximum
number of entries that can exist in the ARP table at any one time. */

#define ipconfigARP_CACHE_ENTRIES 6

/* ARP requests that do not result in an ARP response will be re-transmitted a
maximum of ipconfigMAX_ARP_RETRANSMISSIONS times before the ARP request is
aborted. */

#define ipconfigMAX_ARP_RETRANSMISSIONS ( 5 )

/* ipconfigMAX_ARP_AGE defines the maximum time between an entry in the ARP
table being created or refreshed and the entry being removed because it is stale.
New ARP requests are sent for ARP cache entries that are nearing their maximum
age. ipconfigMAX_ARP_AGE is specified in tens of seconds, so a value of 150 is
equal to 1500 seconds (or 25 minutes). */

#define ipconfigMAX_ARP_AGE 150

/* Implementing FreeRTOS_inet_addr() necessitates the use of string handling
routines, which are relatively large. To save code space the full
FreeRTOS_inet_addr() implementation is made optional, and a smaller and faster
alternative called FreeRTOS_inet_addr_quick() is provided. FreeRTOS_inet_addr()
takes an IP in decimal dot format (for example, "192.168.0.1") as its parameter.
FreeRTOS_inet_addr_quick() takes an IP address as four separate numerical octets
(for example, 192, 168, 0, 1) as its parameters. If
ipconfigINCLUDE_FULL_INET_ADDR is set to 1 then both FreeRTOS_inet_addr() and
FreeRTOS_indet_addr_quick() are available. If ipconfigINCLUDE_FULL_INET_ADDR is
not set to 1 then only FreeRTOS_indet_addr_quick() is available. */

#define ipconfigINCLUDE_FULL_INET_ADDR 1

/* ipconfigNUM_NETWORK_BUFFER_DESCRIPTORS defines the total number of network buffer that
are available to the IP stack. The total number of network buffers is limited
to ensure the total amount of RAM that can be consumed by the IP stack is capped
to a pre-determinable value. */

#define ipconfigNUM_NETWORK_BUFFER_DESCRIPTORS 168//160//124//84

/* A FreeRTOS queue is used to send events from application tasks to the IP
stack. ipconfigEVENT_QUEUE_LENGTH sets the maximum number of events that can
be queued for processing at any one time. The event queue must be a minimum of
5 greater than the total number of network buffers. */

#define ipconfigEVENT_QUEUE_LENGTH ( ipconfigNUM_NETWORK_BUFFER_DESCRIPTORS + 5 )

/* The address of a socket is the combination of its IP address and its port
number. FreeRTOS_bind() is used to manually allocate a port number to a socket
(to 'bind' the socket to a port), but manual binding is not normally necessary
for client sockets (those sockets that initiate outgoing connections rather than
wait for incoming connections on a known port number). If
ipconfigALLOW_SOCKET_SEND_WITHOUT_BIND is set to 1 then calling
FreeRTOS_sendto() on a socket that has not yet been bound will result in the IP
stack automatically binding the socket to a port number from the range
socketAUTO_PORT_ALLOCATION_START_NUMBER to 0xffff. If
ipconfigALLOW_SOCKET_SEND_WITHOUT_BIND is set to 0 then calling FreeRTOS_sendto()
on a socket that has not yet been bound will result in the send operation being
aborted. */

#define ipconfigALLOW_SOCKET_SEND_WITHOUT_BIND 1

/* Defines the Time To Live (TTL) values used in outgoing UDP packets. */

#define ipconfigUDP_TIME_TO_LIVE 128

#define ipconfigTCP_TIME_TO_LIVE 128 /* also defined in FreeRTOSIPConfigDefaults.h */

/* USE_TCP: Use TCP and all its features */

#define ipconfigUSE_TCP ( 1 )

/* USE_WIN: Let TCP use windowing mechanism. */

#define ipconfigUSE_TCP_WIN ( 1 )

/* The MTU is the maximum number of bytes the payload of a network frame can
contain. For normal Ethernet V2 frames the maximum MTU is 1500. Setting a
lower value can save RAM, depending on the buffer management scheme used. If
ipconfigCAN_FRAGMENT_OUTGOING_PACKETS is 1 then (ipconfigNETWORK_MTU - 28) must
be divisible by 8. */

#define ipconfigNETWORK_MTU 1440//1500

/* Set ipconfigUSE_DNS to 1 to include a basic DNS client/resolver. DNS is used
through the FreeRTOS_gethostbyname() API function. */

#define ipconfigUSE_DNS 1

/* If ipconfigREPLY_TO_INCOMING_PINGS is set to 1 then the IP stack will

generate replies to incoming ICMP echo (ping) requests. */

#define ipconfigREPLY_TO_INCOMING_PINGS 1

/* If ipconfigSUPPORT_OUTGOING_PINGS is set to 1 then the

FreeRTOS_SendPingRequest() API function is available. */

#define ipconfigSUPPORT_OUTGOING_PINGS 0

/* If ipconfigSUPPORT_SELECT_FUNCTION is set to 1 then the FreeRTOS_select()
(and associated) API function is available. */

#define ipconfigSUPPORT_SELECT_FUNCTION 1

/* If ipconfigFILTER_OUT_NON_ETHERNET_II_FRAMES is set to 1 then Ethernet frames
that are not in Ethernet II format will be dropped. This option is included for
potential future IP stack developments. */

#define ipconfigFILTER_OUT_NON_ETHERNET_II_FRAMES 1

/* If ipconfigETHERNET_DRIVER_FILTERS_FRAME_TYPES is set to 1 then it is the
responsibility of the Ethernet interface to filter out packets that are of no
interest. If the Ethernet interface does not implement this functionality, then
set ipconfigETHERNET_DRIVER_FILTERS_FRAME_TYPES to 0 to have the IP stack
perform the filtering instead (it is much less efficient for the stack to do it
because the packet will already have been passed into the stack). If the
Ethernet driver does all the necessary filtering in hardware then software
filtering can be removed by using a value other than 1 or 0. */

#define ipconfigETHERNET_DRIVER_FILTERS_FRAME_TYPES 1

/* The windows simulator cannot really simulate MAC interrupts, and needs to
block occasionally to allow other tasks to run. */

#define configWINDOWS_MAC_INTERRUPT_SIMULATOR_DELAY ( 20 / portTICK_PERIOD_MS )

/* Advanced only: in order to access 32-bit fields in the IP packets with
32-bit memory instructions, all packets will be stored 32-bit-aligned, plus 16-bits.
This has to do with the contents of the IP-packets: all 32-bit fields are
32-bit-aligned, plus 16-bit(!) */

#define ipconfigPACKET_FILLER_SIZE 2

/* Define the size of the pool of TCP window descriptors. On the average, each
TCP socket will use up to 2 x 6 descriptors, meaning that it can have 2 x 6
outstanding packets (for Rx and Tx). When using up to 10 TP sockets
simultaneously, one could define TCP_WIN_SEG_COUNT as 120. */

#define ipconfigTCP_WIN_SEG_COUNT 240

/* Each TCP socket has a circular buffers for Rx and Tx, which have a fixed
maximum size. Define the size of Rx buffer for TCP sockets. */

#define ipconfigTCP_RX_BUFFER_LENGTH ( 11680)//5840 )

/* Define the size of Tx buffer for TCP sockets. */

#define ipconfigTCP_TX_BUFFER_LENGTH ( 35040)//11680 )

/* When using call-back handlers, the driver may check if the handler points to
real program memory (RAM or flash) or just has a random non-zero value. */

#define ipconfigIS_VALID_PROG_ADDRESS (x) ( (x) != NULL )

/* Include support for TCP hang protection. All sockets in a connecting or
disconnecting stage will timeout after a period of non-activity. */

#define ipconfigTCP_HANG_PROTECTION ( 1 )

#define ipconfigTCP_HANG_PROTECTION_TIME ( 30 )

/* Include support for TCP keep-alive messages. */

#define ipconfigTCP_KEEP_ALIVE ( 1 )

#define ipconfigTCP_KEEP_ALIVE_INTERVAL ( 20 ) /* in seconds */

/* If ipconfigUSE_DHCP is 1 then FreeRTOS+TCP will attempt to retrieve an IP
address, netmask, DNS server address and gateway address from a DHCP server. If
ipconfigUSE_DHCP is 0 then FreeRTOS+TCP will use a static IP address. The
stack will revert to using the static IP address even when ipconfigUSE_DHCP is
set to 1 if a valid configuration cannot be obtained from a DHCP server for any
reason. The static configuration used is that passed into the stack by the
FreeRTOS_IPInit() function call. */

//#define ipconfigUSE_DHCP 1

#define ipconfigDHCP_REGISTER_HOSTNAME 1

//#define ipconfigUSE_DHCP_HOOK 1

#define ipconfigUSE_LINKED_RX_MESSAGES 1

#define portINLINE __inline

#endif /* FREERTOS_IP_CONFIG_H */

Yes almost: could you ZIP the PCAP file and attach it to you post? You can use the upload button for this:
image

I just edited you post that contains FreeRTOSIPConfig.h. When you want to insert C code, you can either use this button:
image

or you can attach it as a file.

I will have a though about optimising the speed.

1 Like

Thanks for the info. I’ve just attached a ZIP file with both the PCAP as well as the FreeRTOSIPConfig.h
PCAP+FreeRTOSIPConfig.zip (3.7 MB)

Hello Javier,

Thank you for the ZIP. That is much easier for me because WireShark helps a lot when analysing network traffic. Have you seen this option?

Statistics -> TCP Stream Graphs -> Time Sequence (tcptrace)

It show this graph for your TCP connection.

image

The PCAP looks good to me. There is a constant stream of data with a speed of 6 MB/sec ( about 51 Mbit/sec ). The PHY can handle up to 100 Mbps.

I think that you can easily get up the speed if you play with the window- and buffer-sizes.

In case you are using IPERF3, you could define the following in your FreeRTOSIPConfig.h:

#define ipconfigIPERF_TX_BUFSIZE   ( 20 * ipconfigTCP_MSS ) /* Units of bytes. */
#define ipconfigIPERF_TX_WINSIZE   ( 10 )                   /* Size in units of MSS */
#define ipconfigIPERF_RX_BUFSIZE   ( 20 * ipconfigTCP_MSS ) /* Units of bytes. */
#define ipconfigIPERF_RX_WINSIZE   ( 10 )                   /* Size in units of MSS */

In case it is not an IPERF connection, you can use the socket option FREERTOS_SO_WIN_PROPERTIES. It lets you set all buffer sizes and the maximum WIN size for reception and transmission:

    WinProperties_t xWinProperties;

    memset(&xWinProperties, '\0', sizeof xWinProperties);

    xWinProperties.lTxBufSize   = 20 * ipconfigTCP_MSS; /* Units of bytes. */
    xWinProperties.lTxWinSize   = 10                    /* Size in units of MSS */
    xWinProperties.lRxBufSize   = 20 * ipconfigTCP_MSS; /* Units of bytes. */
    xWinProperties.lRxWinSize   = 10;                   /* Size in units of MSS */

    FreeRTOS_setsockopt( xSocket,
                         0,
                         FREERTOS_SO_WIN_PROPERTIES,
                         ( void * ) &( xWinProperties ),
                         sizeof( xWinProperties ) );

You can call FREERTOS_SO_WIN_PROPERTIES for the listening (parent-) socket. All properties will be inherited by the child sockets.

Task priorities:

You have defined:

#define ipconfigIP_TASK_PRIORITY      ( configMAX_PRIORITIES - 1 )  // FreeRTOSIPConfig.h
#define niEMAC_HANDLER_TASK_PRIORITY  ( configMAX_PRIORITIES - 1 )  // default

The demo task prvServerWorkTask() is running on priority 1.

I would lower ipconfigIP_TASK_PRIORITY as ( configMAX_PRIORITIES - 2 ).

It is recommend to give the lowest priority to the task that is using TCP/IP.

#define ipconfigNUM_NETWORK_BUFFER_DESCRIPTORS 168//160//124//84

Mind you that is a lot. I was using 64 buffers in my STM32Hx demo project.

In FreeRTOS_IP.c you will find vPrintResourceStats(). It is called from prvEMACHandlerTask() in case ipconfigHAS_PRINTF is defined:

#define ipconfigHAS_PRINTF       1
#if( ipconfigHAS_PRINTF == 1 )
    #define FreeRTOS_printf(X)   vLoggingPrintf X
#endif

The function will print warnings to the logging in case the resources are going low, like e.g.:

Network buffers: 21 lowest 10
Heap: current 326000 lowest 25000

This logging can be very useful because it warns for shortages.

Regards,

1 Like

Hello Hein,

first of all again thanks for your time and support!.

I have tried to improve the tcp speed by changing the buffer sizes and maximum WIN size but I see no effect. I might be already at the edge, without real ZERO Copy (I’ve yet to evaluate how can I write my payloads directly into the ucNetworkPackets buffer). In my application I am reading over SAI buses, via DMA, 16 signals each one at 50KSPS. In ISR time I have as well to do some staff to rearrange the data into the payload. There is optimization potential for sure, but this is where I am today. That data is the one, among other that I read over SPI, that I send into the packets.

As performance comparison I can tell, given the sample rates that I manage to send over ethernet, that without OS and the Raw API from LWIP I manage around 25% more performance than with FreeRTOS+TCP (w/o real ZERO Copy yet). The STM32H7 ETH+LWIP implementation is however completely unreliable for my application and therefore a no-go, as after some hours it stops sending packets. In my application I would need continuous high speed TCP packet sending over weeks, in some cases months. Even if slower in my current case I am completely happy with FreeRTOS+TCP. Honestly speaking, before discovering it I was about to drop completely the project.

Regards,

Javier

Hola Javier, three questions: how did you test the performance, using IPERF3 or using your own TCP code?

Would you like to share any TCP code? You can also do so by sending me a private message.

Do you have a new PCAP for me? You can truncate it, just 2 seconds of data is enough.

I am still hopeful about your project, because I just saw these performances using IPERF3 on my STM32H747:

C:\>iperf3 -c 192.168.2.114 --port 5001 --bytes 100M
Connecting to host 192.168.2.114, port 5001
[  4] local 192.168.2.5 port 3324 connected to 192.168.2.114 port 5001
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec  10.5 MBytes  87.7 Mbits/sec
[  4]   1.00-2.01   sec  10.5 MBytes  87.9 Mbits/sec
[  4]   2.01-3.00   sec  10.5 MBytes  88.3 Mbits/sec
[  4]   3.00-4.00   sec  10.4 MBytes  87.1 Mbits/sec
[  4]   4.00-5.00   sec  10.5 MBytes  88.3 Mbits/sec
[  4]   5.00-6.00   sec  10.4 MBytes  86.8 Mbits/sec
...

C:\>iperf3 -c 192.168.2.114 --port 5001 --bytes 100M -R
Connecting to host 192.168.2.114, port 5001
Reverse mode, remote host 192.168.2.114 is sending
[  4] local 192.168.2.5 port 3335 connected to 192.168.2.114 port 5001
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec  10.1 MBytes  84.4 Mbits/sec
[  4]   1.00-2.00   sec  10.0 MBytes  84.1 Mbits/sec
[  4]   2.00-3.01   sec  9.80 MBytes  81.1 Mbits/sec
[  4]   3.01-4.00   sec  10.6 MBytes  90.2 Mbits/sec
[  4]   4.00-5.00   sec  9.92 MBytes  83.3 Mbits/sec
[  4]   5.00-6.00   sec  10.9 MBytes  91.5 Mbits/sec
...

That is 85% of the available bandwidth of 100 Mbps.

Hola Heinz,

I’ve tested with my own TCP code (Labview) but in both cases (no OS + RAW LWIP API and FreeRTOS+TCP) it was exactly the same setup. I can try to generate PCAPs for both configs and send them to you.

Regarding the code, let me see what I can share via a private message.

Just for completeness: here are 2 PCAP files of reception and transmission: iperf_results.7z (138.2 KB)

And this is the testing project that I used.

You find the TCP driver here.

Note that when Wireshark is running, the transmission speed will drop a few percent.

Hello Hein,

for the time being I have to take my statement about performance difference back. A fair comparison would be to test FreeRTOS+LWIP against FreeRTOS+TCP. In my tests w/o OS the setup is slightly different and today I have seen that I have as well buffer overruns (the SAI buses put data faster than the ETH driver can send away into packets).

As conclusion, for now I can not tell if there is a difference in performance between LWIP and FreeRTOS+TCP. I will try to find some time in the upcoming weeks to integrate LWIP into my FreeRTOS project branch and do some comparison tests.

BR,

Javier

Hello Hein,

some news on my side. I have been able to implement the same as you did in your FTP example for mostly ZERO copy. Now I am calling FreeRTOS_get_tx_head to get the head of the buffer and I write there my payload. When I detect that there is no sufficient space in the buffer I switch to a separate one for that packet.

I have however one question. I see that FreeRTOS_get_tx_head returns addresses which are not within the ucNetworkPackets buffer but in the heap. The speed is indeed slightly better than before but I am wondering if I am really doing ZERO copy (mostly) now.

Regards,

Javier

I see that FreeRTOS_get_tx_head returns addresses which are not within the ucNetworkPackets buffer but in the heap

That is as expected. The TCP stream buffers are allocated in the heap, they’re allocated by the macro pvPortMallocLarge():

#ifndef pvPortMallocLarge
    #define pvPortMallocLarge( x )    pvPortMalloc( x )
#endif

The speed is indeed slightly better than before but I am wondering if I am really doing ZERO copy (mostly) now.

Normally, two memcpy’s are necessary for TCP transmission: once by FreeRTOS_send(), and once by the TCP driver. The latter copies data from the TX stream to a network buffer.

Only UDP can be done 100% zero-copy.

In my experience, the effect of zero-copy techniques is often not really overwhelming.

You call FreeRTOS_get_tx_head(), which returns a location within the outgoing stream buffer. Without that technique, FreeRTOS_send() would have to copy the same data. Now you copy it directly into the buffer.

Mind you that memcpy can be very slow when the buffers have a different alignment. Suppose you have the following pointers:

	const size_t uxLength = 1460;
    uint_8 * pointer_A = ( uint_8 * ) 0x2000;
    uint_8 * pointer_B = ( uint_8 * ) 0x3001;
	memcpy( pointer_A, pointer_B, uxLength );

This will be a very slow copy, most likely: byte-after-byte. Whereas copying between 2 well-aligned buffers happens a lot quicker.

Thanks a lot for the explanation Hein. I have implemented the improvements that you’ve sent me and did a check with (mostly) and w/o ZERO-Copy:

Without ZERO-Copy: I reach 58Mbps. Considering that I have a very high load in the uC (among others, 2x SAI interrupts every 20us where I have to do staff) I think it is not bad

With ZERO-Copy (mostly): I reach 68Mbps which again, for the high load of the uC is very good. It would be great to be able to always use the buffer pointed by FreeRTOS_get_tx_head, as I think I might get even higher speeds. Is there a way to control the size of the available heap for the ETH TCP buffer to make it be a multiple of the packet size?, this way I think I would always have space as long as I do not overflow. Alternatively a way to move forward with the buffer until it reaches the 0 position would be very useful

Regards,

Javier

You can tune the socket send buffer size using FreeRTOS_setsockopt( .. FREERTOS_SO_SNDBUF .. ) according to your usage profile.

Thanks Hartmut, you are right. I was actually already using it via FREERTOS_SO_WIN_PROPERTIES.

I think what would be very useful would be to be able, via f.i. a function call, to make the TCP circular buffer appointed by FreeRTOS_get_tx_head bend to the first position if the remaining space is lower than the desired payload. This way I could avoid switching in these cases to an external buffer which reduces the TCP throughput as is β€œless” ZERO-Copy

Congratulations, and thanks for keeping us up-to-date.

FREERTOS_SO_WIN_PROPERTIES is indeed useful, because it sets both the stream buffer sizes and the WIN sizes.

Yes it would be efficient if you can always write the ADC sample data directly into the stream buffer.

As you know, that is a circular buffer. Even if there is space for 1400 bytes, you might need 2 memcpy’s to write all data. Would that be possible in the driver?

I am buffering the payload of the packets (total 80 packets, each payload 1400 bytes) in a big array in D2 RAM

If data comes in packets of 1400 bytes each, it would be a good idea to create a TX buffer which is a multiple of 1400 bytes. The driver will automatically round it up to a multiple of 4 bytes. Thus e.g. 10 * 1400 bytes will become 14004 bytes.

The reason is that you want a well-aligned length, and you need 1 byte extra, which remains unused when the buffer is full. A stream buffer is said to be full when N-1 bytes are written to it.

You wrote:

to make the TCP circular buffer appointed by FreeRTOS_get_tx_head() bend to the first position

Not sure what you mean here. FreeRTOS_get_tx_head() always returns the HEAD position of the buffer, the location where the next byte will be written. And also it returns how the number of bytes that can be written at that location.

As I wrote, you might need 2 memcpy’s to store one buffer. In this example the buffer has space for 6 blocks:

image

The data in the blue area will need to be copied in two calls to memcpy.

Thinking … thinking harder … if you do not mind to leave 1400 bytes unused, you could give it a length of e.g. 10 x 1400 - 4 ( = 13996 ). It will be rounded up to 14000.

Then if you only write blocks of 1400 bytes, you will never need a second call to memcpy(). When the buffer is β€œfull”, FreeRTOS_get_tx_head() will inform you that 1399 bytes can be written. Your driver will have to wait.

One more thought: your driver doesn’t want to β€œpoll” for space in the stream buffer. You can have it block on a semaphore, which will be given to as soon as the amount of space has increased.
See FREERTOS_SO_SET_SEMAPHORE, which connects a semaphore to a socket.

The semaphore will also be given to on other important events: new data arrived, the connection got closed, an error occurred, and more.

Thanks for the inputs Hein. Regarding β€œbending” of the buffer some explanation below of what I am doing now (I think it is similar to your FTP example):

1- FreeRTOS_get_tx_head() reports >= 1400 bytes. I start filling inside the SAI ISR the 1400 bytes starting at head. Note that I can not fill everything at once using memcpy because in each ISR call I can get ready a chunk of 140 bytes only.

2- FreeRTOS_get_tx_head() reports < 1400bytes. In this case it could be done but probably would be complicated and time intensive in each ISR to control how much can I write from the 1400 bytes of the packet until the buffer comes to its end and afterwards call FreeRTOS_get_tx_head() again to get the start of the buffer and fill in the remaining bytes until I reach 1400. What I do today is for these 1400 bytes I just switch to a secondary buffer (now in D2, but I will put it as well in AXISRAM).

3- After 10 ISR calls the payload is filled in the buffer, very frequently the TCP buffer in the heap, in some cases my secondary buffer and I notify the sending task. If the packet was in the TCP buffer in the heap I call FreeRTOS_Send with NULL as address (ZERO-Copy), otherwise I call it with the start address of my secondary buffer (Non ZERO-Copy).

I will have to rethink point 2 and maybe do try to use always the TCP buffer in the heap even if more time intensive in the ISR. I can benchmark then what would be best.

I will as well try to take out of the ISR most parts and call a task after each call and not just when the packet is fully filled with the ISR-to-Task notification system. I do not really know if this would help because in the end of the day the task will have to be triggered nevertheless once every 20us.