Slow iPerf throughput

I am running FreeRTOS 202212.01 on a custom board with 1GHz RISC-V processor with 1GB LPDDR4. I am observing slow iPerf throughput:

  • Device recv: 180Mbps
  • Device send: 182Mbps

While on lwip, I can get recv and send around 300Mbps. As a new user, I could not upload attachment, so below is my freeRTOSIpConfig:

/*
 * FreeRTOS+TCP V2.3.2
 * Copyright (C) 2020 Amazon.com, Inc. or its affiliates.  All Rights Reserved.
 *
 * Permission is hereby granted, free of charge, to any person obtaining a copy of
 * this software and associated documentation files (the "Software"), to deal in
 * the Software without restriction, including without limitation the rights to
 * use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
 * the Software, and to permit persons to whom the Software is furnished to do so,
 * subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in all
 * copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
 * FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
 * COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
 * IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
 * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 *
 * 
 * 
 */

#ifndef FREERTOS_IP_CONFIG_H
#define FREERTOS_IP_CONFIG_H


#include <FreeRTOS.h>

/* Prototype for the function used to print out.  In this case it prints to the
console before the network is connected then a UDP port after the network has
connected. */
extern void vLoggingPrintf( const char *pcFormatString, ... );

/* Set to 1 to print out debug messages.  If ipconfigHAS_DEBUG_PRINTF is set to
1 then FreeRTOS_debug_printf should be defined to the function used to print
out the debugging messages. */

#define ipconfigHAS_DEBUG_PRINTF	0
#if( ipconfigHAS_DEBUG_PRINTF == 1 )
	#define FreeRTOS_debug_printf(X)	bsp_printf X
#endif


/* Set to 1 to print out non debugging messages, for example the output of the
FreeRTOS_netstat() command, and ping replies.  If ipconfigHAS_PRINTF is set to 1
then FreeRTOS_printf should be set to the function used to print out the
messages. */

#define ipconfigHAS_PRINTF			1
#if( ipconfigHAS_PRINTF == 1 )
	#define FreeRTOS_printf(X)			bsp_printf X
#endif

/* Define the byte order of the target MCU (the MCU FreeRTOS+TCP is executing
on).  Valid options are pdFREERTOS_BIG_ENDIAN and pdFREERTOS_LITTLE_ENDIAN. */
#define ipconfigBYTE_ORDER pdFREERTOS_LITTLE_ENDIAN

/* The checksums will be checked and calculated by the STM32F4x ETH peripheral. */
#define ipconfigDRIVER_INCLUDED_TX_IP_CHECKSUM		( 0 )
#define ipconfigDRIVER_INCLUDED_RX_IP_CHECKSUM		( 1 )

/* Several API's will block until the result is known, or the action has been
performed, for example FreeRTOS_send() and FreeRTOS_recv().  The timeouts can be
set per socket, using setsockopt().  If not set, the times below will be
used as defaults. */
#define ipconfigSOCK_DEFAULT_RECEIVE_BLOCK_TIME	( 5000 )
#define	ipconfigSOCK_DEFAULT_SEND_BLOCK_TIME	( 5000 )

#define ipconfigZERO_COPY_RX_DRIVER			( 0 )
#define ipconfigZERO_COPY_TX_DRIVER			( 0 )

/* Include support for LLMNR: Link-local Multicast Name Resolution
(non-Microsoft) */
#define ipconfigUSE_LLMNR					( 1 )

/* Include support for NBNS: NetBIOS Name Service (Microsoft) */
#define ipconfigUSE_NBNS					( 1 )

/* Include support for DNS caching.  For TCP, having a small DNS cache is very
useful.  When a cache is present, ipconfigDNS_REQUEST_ATTEMPTS can be kept low
and also DNS may use small timeouts.  If a DNS reply comes in after the DNS
socket has been destroyed, the result will be stored into the cache.  The next
call to FreeRTOS_gethostbyname() will return immediately, without even creating
a socket. */
#define ipconfigUSE_DNS_CACHE				( 1 )
#define ipconfigDNS_CACHE_NAME_LENGTH		( 16 )
#define ipconfigDNS_CACHE_ENTRIES			( 4 )
#define ipconfigDNS_REQUEST_ATTEMPTS		( 4 )

/* The IP stack executes it its own task (although any application task can make
use of its services through the published sockets API). ipconfigIP_TASK_PRIORITY
sets the priority of the task that executes the IP stack.  The priority is a
standard FreeRTOS task priority so can take any value from 0 (the lowest
priority) to (configMAX_PRIORITIES - 1) (the highest priority).
configMAX_PRIORITIES is a standard FreeRTOS configuration parameter defined in
FreeRTOSConfig.h, not FreeRTOSIPConfig.h. Consideration needs to be given as to
the priority assigned to the task executing the IP stack relative to the
priority assigned to tasks that use the IP stack. */
#define ipconfigIP_TASK_PRIORITY			( configMAX_PRIORITIES - 2 )

/* The size, in words (not bytes), of the stack allocated to the FreeRTOS+TCP
task.  This setting is less important when the FreeRTOS Win32 simulator is used
as the Win32 simulator only stores a fixed amount of information on the task
stack.  FreeRTOS includes optional stack overflow detection
*/
#define ipconfigIP_TASK_STACK_SIZE_WORDS	( configMINIMAL_STACK_SIZE * 5 )

/* ipconfigRAND32() is called by the IP stack to generate random numbers for
things such as a DHCP transaction number or initial sequence number.  Random
number generation is performed via this macro to allow applications to use their
own random number generation method.  For example, it might be possible to
generate a random number by sampling noise on an analogue input. */
extern UBaseType_t uxRand(void);
#define ipconfigRAND32()	uxRand()

/* If ipconfigUSE_NETWORK_EVENT_HOOK is set to 1 then FreeRTOS+TCP will call the
network event hook at the appropriate times.  If ipconfigUSE_NETWORK_EVENT_HOOK
is not set to 1 then the network event hook will never be called.
*/
#define ipconfigUSE_NETWORK_EVENT_HOOK 1

/* Sockets have a send block time attribute.  If FreeRTOS_sendto() is called but
a network buffer cannot be obtained then the calling task is held in the Blocked
state (so other tasks can continue to executed) until either a network buffer
becomes available or the send block time expires.  If the send block time expires
then the send operation is aborted.  The maximum allowable send block time is
capped to the value set by ipconfigMAX_SEND_BLOCK_TIME_TICKS.  Capping the
maximum allowable send block time prevents prevents a deadlock occurring when
all the network buffers are in use and the tasks that process (and subsequently
free) the network buffers are themselves blocked waiting for a network buffer.
ipconfigMAX_SEND_BLOCK_TIME_TICKS is specified in RTOS ticks.  A time in
milliseconds can be converted to a time in ticks by dividing the time in
milliseconds by portTICK_PERIOD_MS. */
#define ipconfigUDP_MAX_SEND_BLOCK_TIME_TICKS ( 5000 / portTICK_PERIOD_MS )

/* If ipconfigUSE_DHCP is 1 then FreeRTOS+TCP will attempt to retrieve an IP
address, netmask, DNS server address and gateway address from a DHCP server.  If
ipconfigUSE_DHCP is 0 then FreeRTOS+TCP will use a static IP address.  The
stack will revert to using the static IP address even when ipconfigUSE_DHCP is
set to 1 if a valid configuration cannot be obtained from a DHCP server for any
reason.  The static configuration used is that passed into the stack by the
FreeRTOS_IPInit() function call. */
#define ipconfigUSE_DHCP				0
#define ipconfigDHCP_REGISTER_HOSTNAME	1
#define ipconfigDHCP_USES_UNICAST       1

/* When ipconfigUSE_DHCP is set to 1, DHCP requests will be sent out at
increasing time intervals until either a reply is received from a DHCP server
and accepted, or the interval between transmissions reaches
ipconfigMAXIMUM_DISCOVER_TX_PERIOD.  The IP stack will revert to using the
static IP address passed as a parameter to FreeRTOS_IPInit() if the
re-transmission time interval reaches ipconfigMAXIMUM_DISCOVER_TX_PERIOD without
a DHCP reply being received. */
#define ipconfigMAXIMUM_DISCOVER_TX_PERIOD		( pdMS_TO_TICKS( 30000 ) )

/* The ARP cache is a table that maps IP addresses to MAC addresses.  The IP
stack can only send a UDP message to a remove IP address if it knowns the MAC
address associated with the IP address, or the MAC address of the router used to
contact the remote IP address.  When a UDP message is received from a remote IP
address the MAC address and IP address are added to the ARP cache.  When a UDP
message is sent to a remote IP address that does not already appear in the ARP
cache then the UDP message is replaced by a ARP message that solicits the
required MAC address information.  ipconfigARP_CACHE_ENTRIES defines the maximum
number of entries that can exist in the ARP table at any one time. */
#define ipconfigARP_CACHE_ENTRIES		6

/* ARP requests that do not result in an ARP response will be re-transmitted a
maximum of ipconfigMAX_ARP_RETRANSMISSIONS times before the ARP request is
aborted. */
#define ipconfigMAX_ARP_RETRANSMISSIONS ( 5 )

/* ipconfigMAX_ARP_AGE defines the maximum time between an entry in the ARP
table being created or refreshed and the entry being removed because it is stale.
New ARP requests are sent for ARP cache entries that are nearing their maximum
age.  ipconfigMAX_ARP_AGE is specified in tens of seconds, so a value of 150 is
equal to 1500 seconds (or 25 minutes). */
#define ipconfigMAX_ARP_AGE			150

/* Implementing FreeRTOS_inet_addr() necessitates the use of string handling
routines, which are relatively large.  To save code space the full
FreeRTOS_inet_addr() implementation is made optional, and a smaller and faster
alternative called FreeRTOS_inet_addr_quick() is provided.  FreeRTOS_inet_addr()
takes an IP in decimal dot format (for example, "192.168.0.1") as its parameter.
FreeRTOS_inet_addr_quick() takes an IP address as four separate numerical octets
(for example, 192, 168, 0, 1) as its parameters.  If
ipconfigINCLUDE_FULL_INET_ADDR is set to 1 then both FreeRTOS_inet_addr() and
FreeRTOS_indet_addr_quick() are available.  If ipconfigINCLUDE_FULL_INET_ADDR is
not set to 1 then only FreeRTOS_indet_addr_quick() is available. */
#define ipconfigINCLUDE_FULL_INET_ADDR	1

/* ipconfigNUM_NETWORK_BUFFER_DESCRIPTORS defines the total number of network buffer that
are available to the IP stack.  The total number of network buffers is limited
to ensure the total amount of RAM that can be consumed by the IP stack is capped
to a pre-determinable value. */
#if( ipconfigZERO_COPY_RX_DRIVER != 0 )
	/* _HT_ Actually we should know the value of 'configNUM_RX_DESCRIPTORS' here. */
	#define ipconfigNUM_NETWORK_BUFFER_DESCRIPTORS		( 25 + 6 )
#else
	#define ipconfigNUM_NETWORK_BUFFER_DESCRIPTORS		48
#endif

/* A FreeRTOS queue is used to send events from application tasks to the IP
stack.  ipconfigEVENT_QUEUE_LENGTH sets the maximum number of events that can
be queued for processing at any one time.  The event queue must be a minimum of
5 greater than the total number of network buffers. */
#define ipconfigEVENT_QUEUE_LENGTH		( ipconfigNUM_NETWORK_BUFFER_DESCRIPTORS + 5 )

/* The address of a socket is the combination of its IP address and its port
number.  FreeRTOS_bind() is used to manually allocate a port number to a socket
(to 'bind' the socket to a port), but manual binding is not normally necessary
for client sockets (those sockets that initiate outgoing connections rather than
wait for incoming connections on a known port number).  If
ipconfigALLOW_SOCKET_SEND_WITHOUT_BIND is set to 1 then calling
FreeRTOS_sendto() on a socket that has not yet been bound will result in the IP
stack automatically binding the socket to a port number from the range
socketAUTO_PORT_ALLOCATION_START_NUMBER to 0xffff.  If
ipconfigALLOW_SOCKET_SEND_WITHOUT_BIND is set to 0 then calling FreeRTOS_sendto()
on a socket that has not yet been bound will result in the send operation being
aborted. */
#define ipconfigALLOW_SOCKET_SEND_WITHOUT_BIND 1

/* Defines the Time To Live (TTL) values used in outgoing UDP packets. */
#define ipconfigUDP_TIME_TO_LIVE		128
#define ipconfigTCP_TIME_TO_LIVE		128 /* also defined in FreeRTOSIPConfigDefaults.h */

/* USE_TCP: Use TCP and all its features */
#define ipconfigUSE_TCP				( 1 )

/* USE_WIN: Let TCP use windowing mechanism. */
#define ipconfigUSE_TCP_WIN			( 1 )

/* The MTU is the maximum number of bytes the payload of a network frame can
contain.  For normal Ethernet V2 frames the maximum MTU is 1500.  Setting a
lower value can save RAM, depending on the buffer management scheme used.  If
ipconfigCAN_FRAGMENT_OUTGOING_PACKETS is 1 then (ipconfigNETWORK_MTU - 28) must
be divisible by 8. */

#define ipconfigNETWORK_MTU					1500

/* Set ipconfigUSE_DNS to 1 to include a basic DNS client/resolver.  DNS is used
through the FreeRTOS_gethostbyname() API function. */
#define ipconfigUSE_DNS								1

/* If ipconfigREPLY_TO_INCOMING_PINGS is set to 1 then the IP stack will
generate replies to incoming ICMP echo (ping) requests. */
#define ipconfigREPLY_TO_INCOMING_PINGS				1

/* If ipconfigSUPPORT_OUTGOING_PINGS is set to 1 then the
FreeRTOS_SendPingRequest() API function is available. */
#define ipconfigSUPPORT_OUTGOING_PINGS				1

/* If ipconfigSUPPORT_SELECT_FUNCTION is set to 1 then the FreeRTOS_select()
(and associated) API function is available. */
#define ipconfigSUPPORT_SELECT_FUNCTION				1

/* If ipconfigFILTER_OUT_NON_ETHERNET_II_FRAMES is set to 1 then Ethernet frames
that are not in Ethernet II format will be dropped.  This option is included for
potential future IP stack developments. */
#define ipconfigFILTER_OUT_NON_ETHERNET_II_FRAMES  1

/* If ipconfigETHERNET_DRIVER_FILTERS_FRAME_TYPES is set to 1 then it is the
responsibility of the Ethernet interface to filter out packets that are of no
interest.  If the Ethernet interface does not implement this functionality, then
set ipconfigETHERNET_DRIVER_FILTERS_FRAME_TYPES to 0 to have the IP stack
perform the filtering instead (it is much less efficient for the stack to do it
because the packet will already have been passed into the stack).  If the
Ethernet driver does all the necessary filtering in hardware then software
filtering can be removed by using a value other than 1 or 0. */
#define ipconfigETHERNET_DRIVER_FILTERS_FRAME_TYPES	0

/* The windows simulator cannot really simulate MAC interrupts, and needs to
block occasionally to allow other tasks to run. */
#define configWINDOWS_MAC_INTERRUPT_SIMULATOR_DELAY ( 2 / portTICK_PERIOD_MS )

/* Advanced only: in order to access 32-bit fields in the IP packets with
32-bit memory instructions, all packets will be stored 32-bit-aligned, plus
16-bits.  This has to do with the contents of the IP-packets: all 32-bit fields
are 32-bit-aligned, plus 16-bit(!). */
#define ipconfigPACKET_FILLER_SIZE 2

/* Define the size of the pool of TCP window descriptors.  On the average, each
TCP socket will use up to 2 x 6 descriptors, meaning that it can have 2 x 6
outstanding packets (for Rx and Tx).  When using up to 10 TP sockets
simultaneously, one could define TCP_WIN_SEG_COUNT as 120. */
#define ipconfigTCP_WIN_SEG_COUNT 64

/* Each TCP socket has a circular buffers for Rx and Tx, which have a fixed
maximum size.  Define the size of Rx buffer for TCP sockets. */
#define ipconfigTCP_RX_BUFFER_LENGTH			( 3 * 1460 )

/* Define the size of Tx buffer for TCP sockets. */
#define ipconfigTCP_TX_BUFFER_LENGTH			( 2 * 1460 )

/* When using call-back handlers, the driver may check if the handler points to
real program memory (RAM or flash) or just has a random non-zero value. */
#define ipconfigIS_VALID_PROG_ADDRESS(x) ( (x) != NULL )

/* Include support for TCP hang protection.  All sockets in a connecting or
disconnecting stage will timeout after a period of non-activity. */
#define ipconfigTCP_HANG_PROTECTION				( 1 )
#define ipconfigTCP_HANG_PROTECTION_TIME		( 30 )

/* Include support for TCP keep-alive messages. */
#define ipconfigTCP_KEEP_ALIVE				( 1 )
#define ipconfigTCP_KEEP_ALIVE_INTERVAL		( 20 ) /* in seconds */

/* Set to 1 or 0 to include/exclude FTP and HTTP functionality from the standard
server task. */
#define ipconfigUSE_FTP						1
#define ipconfigUSE_HTTP					1

/* Buffer and window sizes used by the FTP and HTTP servers respectively.  The
FTP and HTTP servers both execute in the standard server task. */
#define ipconfigFTP_TX_BUFSIZE				( 4 * ipconfigTCP_MSS )
#define ipconfigFTP_TX_WINSIZE				( 2 )
#define ipconfigFTP_RX_BUFSIZE				( 8 * ipconfigTCP_MSS )
#define ipconfigFTP_RX_WINSIZE				( 4 )
#define ipconfigHTTP_TX_BUFSIZE				( 3 * ipconfigTCP_MSS )
#define ipconfigHTTP_TX_WINSIZE				( 2 )
#define ipconfigHTTP_RX_BUFSIZE				( 4 * ipconfigTCP_MSS )
#define ipconfigHTTP_RX_WINSIZE				( 4 )

/* When set to 1, the application writer must provide the implementation of a
function with the following name and prototype:
BaseType_t xApplicationDNSQueryHook( const char *pcName );
The function must return pdTRUE if pcName matches a test name assigned to the
device, and pdFALSE in all other cases.  */
#define ipconfigDNS_USE_CALLBACKS			1
#define ipconfigSUPPORT_SIGNALS				1

/* This demo creates a virtual network connection by accessing the raw Ethernet
or WiFi data to and from a real network connection.  Many computers have more
than one real network port, and configNETWORK_INTERFACE_TO_USE is used to tell
the demo which real port should be used to create the virtual port.  The ports
available are displayed on the console when the application is executed.  For
example, on my development laptop setting configNETWORK_INTERFACE_TO_USE to 4
results in the wired network being used, while setting
configNETWORK_INTERFACE_TO_USE to 2 results in the wireless network being
used. */
#define configNETWORK_INTERFACE_TO_USE 4L


//#define configECHO_SERVER_ADDR0	192
//#define configECHO_SERVER_ADDR1 168
//#define configECHO_SERVER_ADDR2 0
//#define configECHO_SERVER_ADDR3 222

/* Default MAC address configuration.  The demo creates a virtual network
connection that uses this MAC address by accessing the raw Ethernet/WiFi data
to and from a real network connection on the host PC.  See the
configNETWORK_INTERFACE_TO_USE definition above for information on how to
configure the real network connection to use. */
#define configMAC_ADDR0		0x00
#define configMAC_ADDR1		0x11
#define configMAC_ADDR2		0x22
#define configMAC_ADDR3		0x33
#define configMAC_ADDR4		0x44
#define configMAC_ADDR5		0x41

/* Default IP address configuration.  Used in ipconfigUSE_DNS is set to 0, or
ipconfigUSE_DNS is set to 1 but a DNS server cannot be contacted. */
#define configIP_ADDR0		192
#define configIP_ADDR1		168
#define configIP_ADDR2		31
#define configIP_ADDR3		55

/* Default gateway IP address configuration.  Used in ipconfigUSE_DNS is set to
0, or ipconfigUSE_DNS is set to 1 but a DNS server cannot be contacted. */
#define configGATEWAY_ADDR0	192
#define configGATEWAY_ADDR1	168
#define configGATEWAY_ADDR2	31
#define configGATEWAY_ADDR3	65

/* Default DNS server configuration.  OpenDNS addresses are 208.67.222.222 and
208.67.220.220.  Used in ipconfigUSE_DNS is set to 0, or ipconfigUSE_DNS is set
to 1 but a DNS server cannot be contacted.*/
#define configDNS_SERVER_ADDR0 	8
#define configDNS_SERVER_ADDR1 	8
#define configDNS_SERVER_ADDR2 	8
#define configDNS_SERVER_ADDR3 	8

/* Default netmask configuration.  Used in ipconfigUSE_DNS is set to 0, or
ipconfigUSE_DNS is set to 1 but a DNS server cannot be contacted. */
#define configNET_MASK0		255
#define configNET_MASK1		255
#define configNET_MASK2		255
#define configNET_MASK3		0

/* The UDP port to which print messages are sent. */
#define configPRINT_PORT	( 15000 )

#define Speed_1000Mhz		0x04
#define Speed_100Mhz		0x02
#define Speed_10Mhz			0x01

/* The maximum time to wait for a closing socket to close. */
#define tcpechoSHUTDOWN_DELAY	( pdMS_TO_TICKS( 5000 ) )

/* The standard echo port number. */
#define tcpechoPORT_NUMBER		7

/* The example IP trace macros are included here so the definitions are
available in all the FreeRTOS+TCP source files. */
//#include "DemoIPTrace.h"

/* Simple UDP client and server task parameters. */
#define mainSIMPLE_UDP_CLIENT_SERVER_TASK_PRIORITY		( tskIDLE_PRIORITY )
#define mainSIMPLE_UDP_CLIENT_SERVER_PORT				( 7 )

/* Echo client task parameters - used for both TCP and UDP echo clients. */
#define mainECHO_CLIENT_TASK_STACK_SIZE 				( configMINIMAL_STACK_SIZE * 2 )	/* Not used in the Windows port. */
#define mainECHO_CLIENT_TASK_PRIORITY					( tskIDLE_PRIORITY + 1 )

/* Echo server task parameters. */
#define mainECHO_SERVER_TASK_STACK_SIZE					( configMINIMAL_STACK_SIZE * 2 )	/* Not used in the Windows port. */
#define mainECHO_SERVER_TASK_PRIORITY					( tskIDLE_PRIORITY + 1 )

/* Define a name that will be used for LLMNR and NBNS searches. */
#define mainHOST_NAME			"RISCV"
#define mainDEVICE_NICK_NAME		"FreeRTOS"


#endif /* FREERTOS_IP_CONFIG_H */


iPerf related parameters:

/* Put the TCP server at this port number: */
#ifndef ipconfigIPERF_TCP_ECHO_PORT
	/* 5001 seems to be the standard TCP server port number. */
	#define ipconfigIPERF_TCP_ECHO_PORT				5001
#endif

/* Put the TCP server at this port number: */
#ifndef ipconfigIPERF_UDP_ECHO_PORT
	/* 5001 seems to be the standard UDP server port number. */
	#define ipconfigIPERF_UDP_ECHO_PORT				5001
#endif

#ifndef ipconfigIPERF_STACK_SIZE_IPERF_TASK
	/* Stack size needed for vIPerfTask(), a bit of a guess: */
	#define ipconfigIPERF_STACK_SIZE_IPERF_TASK		680
#endif

#ifndef ipconfigIPERF_PRIORITY_IPERF_TASK
	/* The priority of vIPerfTask(). Should be lower than the IP-task
	and the task running in NetworkInterface.c. */
	#define	ipconfigIPERF_PRIORITY_IPERF_TASK		3
#endif

#ifndef ipconfigIPERF_RECV_BUFFER_SIZE
	/* Only used when ipconfigIPERF_USE_ZERO_COPY = 0.
	Buffer size when reading from the sockets. */
	#define ipconfigIPERF_RECV_BUFFER_SIZE			( 24 * ipconfigTCP_MSS )
#endif

#ifndef ipconfigIPERF_LOOP_BLOCKING_TIME_MS
	/* Let the mainloop wake-up so now and then. */
	#define ipconfigIPERF_LOOP_BLOCKING_TIME_MS		5000UL
#endif

#ifndef ipconfigIPERF_HAS_TCP
	/* A TCP server socket will be created. */
	#define ipconfigIPERF_HAS_TCP					1
#endif

#ifndef ipconfigIPERF_HAS_UDP
	/* A UDP server socket will be created. */
	#define ipconfigIPERF_HAS_UDP					1
#endif

#ifndef ipconfigIPERF_DOES_ECHO_UDP
	/* By default, this server will echo UDP data as required by iperf. */
	#define ipconfigIPERF_DOES_ECHO_UDP				0
#endif

#ifndef ipconfigIPERF_USE_ZERO_COPY
	#define ipconfigIPERF_USE_ZERO_COPY				1
#endif

#ifndef ipconfigIPERF_TX_BUFSIZE
	#define ipconfigIPERF_TX_BUFSIZE				( 24 * ipconfigTCP_MSS )	/* Units of bytes. */
	#define ipconfigIPERF_TX_WINSIZE				( 12 )			/* Size in units of MSS */
	#define ipconfigIPERF_RX_BUFSIZE				( 24 * ipconfigTCP_MSS )	/* Units of bytes. */
	#define ipconfigIPERF_RX_WINSIZE				( 12 )			/* Size in units of MSS */
#endif

#ifndef ARRAY_SIZE
	#define ARRAY_SIZE(x)	(BaseType_t)(sizeof(x)/sizeof(x)[0])
#endif

#define ipconfigIPERF_VERSION					3

I have also captured the pcap. Any help would be appreciated.

Does the information in the following posts help -

Can anyone grant permission to attach files to @h3ikichi please?

That would help to understand why the performance is slow.

It must be said that running a capture program like WireShark may influence the throughput: things will go slower.

Have you tried both the active and the passive modes?

    // Send data to the DUT
    iperf3 -c 192.168.2.114 -4 --port 5001 --bytes 100M
    // Receive data from the DUT
    iperf3 -c 192.168.2.114 -4 --port 5001 --bytes 100M -R

EDIT Sorry, you already answered this question here aobve.

Are you running the client program iperf3 on a real (non-virtual) host?

Are that other tasks in the DUT that could influence the Ethernet performance?

EDIT Can you tell which Network Interface you are using? Is it publicly accessible?

@h3ikichi should now be able to attach files.

1 Like

Yes, I am running the client on a Windows laptop.

No, iPerf is the only task that is running.

Did you mean my DUT? It is a prototype custom FPGA board with a hardened RISC-V core.

Do you see any mistake or misconfiguration in my config?

I have sent the pcap to your email at hein [at] htibosch [point] net since it is too large. Appreciate if you can take a look.

Please find attached your PCAP file: iperf_fpga_test.7z (322.1 KB)
The 7-zip protocol has good results when compressing PCAP files.

Are you using huge frames, i.e. more than 1500 bytes? Or is there some offloading/grouping?

I see packet sizes like 16060 and 17520, which are possibly 11 and 12 packets.

I would like to have a look at your driver to see how things are implemented.

You are using these TCP parameters for IPerf:

    #define ipconfigIPERF_TX_BUFSIZE  ( 24 * ipconfigTCP_MSS ) /* Units of bytes. */
    #define ipconfigIPERF_TX_WINSIZE  ( 12 ) /* Size in units of MSS */
    #define ipconfigIPERF_RX_BUFSIZE  ( 24 * ipconfigTCP_MSS ) /* Units of bytes. */
    #define ipconfigIPERF_RX_WINSIZE  ( 12 ) /* Size in units of MSS */

If there is enough RAM, you could try to double these values. If you make a new PCAP, 5 seconds is enough.

I am using 1500 MTU. Only RX_IP CHECKSUM is done by the MAC.

I have doubled those values, and I get:

  • recv: ~160Mbps
  • send: ~200Mbps

I have attached the 7z pcap after double the values and the network driver.
NetworkInterface.c (13.0 KB)
.
recv2.7z (363.0 KB)

Hi @htibosch , did you have any chance to take a look of my driver and pcap? Do you see any problem?

Sorry h3ikichi, I was a bit busy this week.

Thank you for sending your NetworkInterface.c, it is well written!

Can you please make sure that the following FreeRTOS priorities are assigned:

  • Highest for prvEMACDeferredInterruptHandlerTask(): configMAX_PRIORITIES-1
  • Medium for ipconfigIP_TASK_PRIORITY : configMAX_PRIORITIES-2
  • Lower to the task that is sing the IP-stack ipconfigIPERF_PRIORITY_IPERF_TASK: configMAX_PRIORITIES-3

Your function prvEMACDeferredInterruptHandlerTask() polls an interrupt line without ever sleeping / blocking:

    int Poll_Interrupt( void )
    {
        if(descriptors0[cur_des].status & 0x3FFFFFFF) {
            return 1;
        } else {
            return 0;
        }
    }

    static void prvEMACDeferredInterruptHandlerTask()
    {
        for( ;; )
        {
            while(!(Poll_Interrupt()));
            ...
        }
    }

The above code would block all lower-priority tasks.

Please make it blocking, e.g. like in this example:

To wake up the task:

void userInterrupt()
{
    BaseType_t xHigherPriorityTaskWoken = pdFALSE;
    xTaskNotifyFromISR( xEMACTaskHandle,
                        EMAC_IF_RX_EVENT,
                        eSetBits,
                        &( xHigherPriorityTaskWoken );
    portYIELD_FROM_ISR( xHigherPriorityTaskWoken );
}

To get notified:

    /* Wait for a new event*/
    static void prvEMACDeferredInterruptHandlerTask( void *pvParameters )
    {
        for( ;; )
        {
            uint32_t ulISREvents = 0U;
            xTaskNotifyWait( 0U,                /* ulBitsToClearOnEntry */
                            EMAC_IF_ALL_EVENT, /* ulBitsToClearOnExit */
                            &( ulISREvents ),  /* pulNotificationValue */
                            ulMaxBlockTime );
        }
    }

Use ulMaxBlockTime if you want to get woken up to do some regular tasks like checking the Link Status of the PHY.

So all tasks are sleeping until a packet comes in. prvEMACDeferredInterruptHandlerTask() will wake up and receive as many packets found in the queue.