SAM4E - Ethernet Speed Issues

tecnpoli2 wrote on Friday, February 26, 2016:

Goodmorning

My configuration is the follow:
Board: SAM4E-EK (Atmel development kit)
MCU: Atmel SAM4E16E (Cortex-M4)
IDE: Atmel Studio 6.2
Compiler: ARM GCC v4.8.1143
OS: FreeRTOS 8.2
Ethernet Stack: FreeRTOS+UDP
Ethernet Driver : gmac diver provided by Atmel(ASF 3.14)
Ethernet Transceiver : Ksz8051mnl

I’m using the FreeRTOS demo on the board, but when I try to send UDP packets from the PC, the board lose about 50% of the sent packets.
I have configured the demo project like the following mode
FreeRTOS:

  • heap_4.c
    FreeRTOS +UDP:
  • BufferAllocation_2.c
  • xGMACOptions.uc_copy_all_frame = 1;
    • and I have modified the ethernet_phy.c in order to configure the gmac register (GMAC_NCFGR) to 100Mbps and FullDuplex
      this because the “ethernet_phy_set_link” function sets the register to 10Mbps and HalfDuplex

The ethernet transceiver of my board is already set to 100Mbps but I still lose the UDP packets.
Why do I lose so much packets?
Should it be possible to reach the 80% (or higher) of efficency?

Thanks in advance.
Best regards
Fabio

heinbali01 wrote on Friday, February 26, 2016:

Hi Fabio,

You might want to have a look at FreeRTOS+TCP (unless you need fragmentation of packets, which has been dropped in +TCP).

The gmac diver in FreeRTOS+TCP:

FreeRTOS-Plus-TCP\portable\NetworkInterface\ATSAM4E\*.c

was heavily optimised.

I would advice to use a single TX DMA buffer of e.g. 1500 bytes long (MTU), and use as many RX buffers as possible.

100Mbps and FullDuplex

After making these settings, make sure that the negotiation will be done.

  1. Decide what properties to advertise (100Mbps and FullDuplex)
  2. Start the auto negotiation
  3. Program the negotiated values in the registers

Should it be possible to reach the 80% (or higher) of efficiency?

What do you mean with ‘80%’?

In my experience, with an MCU like SAM4E16E, you can use at least 30% of the available bandwidth. This is about 3 MByte nett per second on a 100 Mb LAN.
It is important to use some kind of acknowledgement, either by ACK-ing UDP messages your self, or by using the TCP protocol.
The EMAC has registers to store MAC addresses. Do not use the promiscuous mode (receive ALL packets), unless you really need that.

Regards.

tecnpoli2 wrote on Friday, February 26, 2016:

Hi Hein
Thanks for your quick answer.

On my application, I cannot use the TCP stack because my board has to communicate with an other one which use the UDP protocol.

The two boards will have a point to point link.

I’ve already tried to modify the DMA buffer size (TX and RX) but this hasn’t improved the performance.
I’m sure that the GMAC registers are right set.
Saw that a point to point link is used, I would like reach the 80%(or higher) of the available bandwidth and above all I wouldn’t lose two or more consecutive packets.

ok for the ACK, but this could slow down my ethernet comunication.

thanks

regards

heinbali01 wrote on Friday, February 26, 2016:

Sending out UDP packets at high speed and without feedback (ACK’s) is always problematic.

A fast host (Linux, iOS or Windows) is able to use up to 100% of the bandwidth (of a 100 Mbit LAN). Any MCU running faster than 500 MHz can do that too. But a Cortex-M4 running on 120 MHz will never be able to receive so much data and do something sensible with it.

May I ask what type of data you are receiving? Is it maybe audio or video data? Measurements?

Do you have external RAM, to implement some buffering?

Sending ACK’s might indeed slow down the communication, unless you send delayed ACK’s to the sending host. You might want to send a single short reply after every 8 or 16 UDP packets, telling which packets have arrived correctly?

It might still be interesting to find out where your packets get lost. Was there a lack of Network Buffers, or a lack of DMA buffers? Were there any errors?

tecnpoli2 wrote on Friday, February 26, 2016:

Hi

I know that the MCU cannot handle the all incoming UDP packets, but I don’t believe that the MCU can lose so many packets.
If the MCU performs a single istruction every 8ns (1/120MHz) and generally an UDP packets with 1520 byte of payload is sent every 125us at 100Mbps ( 100Mbps/8Byte = 12MBps ==> 1520Byte/12MBps = 125us), I suppose that the MCU must be able to manage the UDP packets if this is its only task.

I have to transfer the data log at high speed (if it is possible).

I’ve noticed that the packets are already lost by GMAC_Handler

Could you tell me if there are FreeRTOS+UDP settings that could improve the performance?
I’ve already tried with the DMA buffers and tasks priority

thanks

heinbali01 wrote on Saturday, February 27, 2016:

Good morning Fabio,

Good news for you !

I have done some tests with several methods of data reception in FreeRTOS+TCP ( code attached).

There are several ways to receive UDP- or TCP-messages:

  1. The normal BSD-compatible way, in which a message is copied to a buffer, supplied to FreeRTOS_recvfrom():
    char pcRecvBuffer[1460];
    struct freertos_sockaddr xAddress;
    uint32_t xAddressLength;
    xRecvResult = FreeRTOS_recvfrom( xSocket,
        ( void * ) pcRecvBuffer,
         sizeof( pcRecvBuffer ),
        0, &xAddress, &xAddressLength );
  1. The non-zero-copy method: passing FREERTOS_ZERO_COPY to flags in FreeRTOS_recvfrom()
    static char *pcRecvPointer;
    struct freertos_sockaddr xAddress;
    uint32_t xAddressLength;
    xRecvResult = FreeRTOS_recvfrom( xSocket,
           ( void * ) &pcRecvPointer,
           sizeof( pcRecvPointer ),
            FREERTOS_ZERO_COPY, &xAddress, &xAddressLength );
  1. The callback method: bind a receive function to a socket, which will be called upon reception of each packet.

The third method is able to receive 119700630 Bytes/sec, which is about 95.8 Mbit per second. Non of the packets got lost in my test with iperf.

One must be careful when using this method: the call-back function is called from an unusual context: the IP-task. It is allowed to reply immediately from within the handler.
This call-back method is hardly documented on purpose: when not used properly, it may lead to nasty bugs.
If your handler decides that intensive processing is necessary, please defer it to another task.

    static BaseType_t xOnUdpReceive( Socket_t xSocket, void * pvData, size_t xLength,
        const struct freertos_sockaddr *pxFrom, const struct freertos_sockaddr *pxDest )
    {
        /* 'xLength' bytes have been received in the buffer 'pvData' */
        return 1;
    }

    {
    F_TCP_UDP_Handler_t xHandler;

        memset( &xHandler, '\0', sizeof ( xHandler ) );
        xHandler.pOnUdpReceive = xOnUdpReceive;
        FreeRTOS_setsockopt( xUDPServerSocket, 0, FREERTOS_SO_UDP_RECV_HANDLER,
            ( void * ) &xHandler, sizeof( xHandler ) );
    }

The iperf command that I used for testing was:

    iperf -u -p 5001 -c 192.168.2.109 -b 10M
    iperf -u -p 5001 -c 192.168.2.109 -b 20M
    ...
    iperf -u -p 5001 -c 192.168.2.109 -b 100M

Please report back to the list how your experiences are when using FreeRTOS+TCP.
If you have any questions while migrating, please tell.

Regards.

heinbali01 wrote on Saturday, February 27, 2016:

Correction: I messed up some numbers in the previous post.

The test had run for 10 seconds and within that period 119700630 Bytes were received by the SAM4E16E.

The throughput was close to 100% as you can see here below:

hein@laptop:~$ iperf -u -p 5001 -c 192.168.2.109 -b 100M
------------------------------------------------------------
Client connecting to 192.168.2.109, UDP port 5001
Sending 1470 byte datagrams
UDP buffer size: 64.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.2.3 port 60981 connected with 192.168.2.109 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   114 MBytes  95.7 Mbits/sec
[  3] Sent 81986 datagrams

Regards.

tecnpoli2 wrote on Monday, February 29, 2016:

Hi Hein

thanks so much.

I’ve tried to reproduce your test, but the FreeRTOS+TCP demo doesn’t compile.

		make: *** No rule to make target 'src/FreeRTOS+TCP/protocols/FreeRTOS_ftp_server.o', needed by 'RTOSDemo.elf'.  Stop.
	Done executing task "RunCompilerTask" -- FAILED.


Done building target "CoreBuild" in project "RTOSDemo.cproj" -- FAILED.
Done building project "RTOSDemo.cproj" -- FAILED.

Build FAILED.
========== Rebuild All: 0 succeeded, 1 failed, 0 skipped ==========

how can I compile the FreeRTOS+TCP demo?

Can I try your test with the FreeRTOS+UDP?

the GMAC_Handler already calls a callback fuction (prvGMACRxCallback() ) for the incoming packets.
This callback is used by the FreeRTOS+UDP in order to wake up the “prvGMACDeferredInterruptHandlerTask” task which has the higher priority.

if I check when UDP packets arrive at the GMAC_Hander, they are about the half of these sent

static void prvGMACRxCallback( uint32_t ulStatus )
{
BaseType_t xHigherPriorityTaskWoken = pdFALSE;

	configASSERT( xMACEventHandlingTask );

	/* Unblock the deferred interrupt handler task if the event was an Rx. */
	if( ( ulStatus & GMAC_RSR_REC ) != 0 )
	{
		vTaskNotifyGiveFromISR( xMACEventHandlingTask, &xHigherPriorityTaskWoken );
	}

	portEND_SWITCHING_ISR( xHigherPriorityTaskWoken );
}
static void prvGMACDeferredInterruptHandlerTask( void *pvParameters )
{
xNetworkBufferDescriptor_t *pxNetworkBuffer = NULL;
xIPStackEvent_t xRxEvent = { eEthernetRxEvent, NULL };
static const TickType_t xBufferWaitDelay = 1500UL / portTICK_RATE_MS;
uint32_t ulReturned;

	/* This is a very simply but also inefficient implementation. */

	( void ) pvParameters;

	for( ;; )
	{
		/* Wait for the GMAC interrupt to indicate that another packet has been
		received.  A while loop is used to process all received frames each time
		this task is notified, so it is ok to clear the notification count on the
		take (hence the first parameter is pdTRUE ). */
		ulTaskNotifyTake( pdTRUE, xBufferWaitDelay );

		ulReturned = GMAC_OK;
		while( ulReturned == GMAC_OK )
		{
			/* Allocate a buffer to hold the data if one is not already held. */
			if( pxNetworkBuffer == NULL )
			{
				pxNetworkBuffer = pxNetworkBufferGet( ipTOTAL_ETHERNET_FRAME_SIZE, xBufferWaitDelay );
			}

			if( pxNetworkBuffer != NULL )
			{
				/* Attempt to read data. */
				ulReturned = gmac_dev_read( &xGMACStruct, pxNetworkBuffer->pucEthernetBuffer, ipTOTAL_ETHERNET_FRAME_SIZE, ( uint32_t * ) &( pxNetworkBuffer->xDataLength ) );

				if( ulReturned == GMAC_OK )
				{
					#if ipconfigETHERNET_DRIVER_FILTERS_FRAME_TYPES == 1
					{
						if( pxNetworkBuffer->xDataLength > 0 )
						{
							/* If the frame would not be processed by the IP
							stack then don't even bother sending it to the IP
							stack. */
							if( eConsiderFrameForProcessing( pxNetworkBuffer->pucEthernetBuffer ) != eProcessBuffer )
							{
								pxNetworkBuffer->xDataLength = 0;
							}
						}
					}
					#endif

					if( pxNetworkBuffer->xDataLength > 0 )
					{
						/* Store a pointer to the network buffer structure in
						the	padding	space that was left in front of the Ethernet
						frame.  The pointer is needed to ensure the network
						buffer structure can be located when it is time for it
						to be freed if the Ethernet frame gets used as a zero
						copy buffer. */
						*( ( xNetworkBufferDescriptor_t ** ) ( ( pxNetworkBuffer->pucEthernetBuffer - ipBUFFER_PADDING ) ) ) = pxNetworkBuffer;

						/* Data was received and stored.  Send it to the IP task
						for processing. */
						xRxEvent.pvData = ( void * ) pxNetworkBuffer;
						if( xQueueSendToBack( xNetworkEventQueue, &xRxEvent, ( TickType_t ) 0 ) == pdFALSE )
						{
							/* The buffer could not be sent to the IP task. The
							frame will be dropped and the buffer reused. */
							iptraceETHERNET_RX_EVENT_LOST();
						}
						else
						{
							iptraceNETWORK_INTERFACE_RECEIVE();

							/* The buffer is not owned by the IP task - a new
							buffer is needed the next time around. */
							pxNetworkBuffer = NULL;
						}
					}
					else
					{
						/* The buffer does not contain any data so there is no
						point sending it to the IP task.  Re-use the buffer on
						the next loop. */
						iptraceETHERNET_RX_EVENT_LOST();
					}
				}
				else
				{
					/* No data was received, keep the buffer for re-use.  The
					loop will exit as ulReturn is not GMAC_OK. */
				}
			}
			else
			{
				/* Left a frame in the driver as a buffer was not available.
				Break out of loop. */
				ulReturned = GMAC_INVALID;
			}
		}
	}
}

I’m using the your first FreeRTOS_recvfrom() configuration. in which a message is copied to a buffer.

Thanks you
regards

northeastnerd wrote on Monday, July 11, 2016:

This is an old thread but the build error is still in the latest download. The file name for the FTP server is lower cased in the project file. If you edit the RTOSDemo.cproj and search for “ftp_server” and change the name from “FreeRTOS_ftp_server.c” to “FreeRTOS_FTP_server.c” it will build correctly.