+TCP +FAT FTP server slow transfer

Hello,

I am running the +TCP and +FAT libraries in my project which is running on a Xilinx Zynq 7000. I have the latest versions of both those libraries in my project. I am also using the FTP server code from FreeRTOS. The problem I am having is that the file transfer speed gets progressively worse during longer transfers. See image:

I am using a KSZ9563R Ethernet switch as my PHY which is configured to run in 100BASE-T. Any ideas why this could be happening? I have tried pausing all the other threads in the program and am still seeing this issue. I have also attached my IPConfig file.FreeRTOSIPConfig.h (17.8 KB)

Update: If I stop the from the FTP client and restart, the speed does not go back up. However, if I power-cycle my device, the speed picks up again for a few transfers.

Thank you!

If you have enough memory I’d try to considerably bump ipconfigNUM_NETWORK_BUFFER_DESCRIPTORS and give it a shot.
It might be caused by lacking network buffers. I’d also have a look at the traffic with Wireshark, too.

Okay I will give this a shot.

One more thing - when I set ipconfigFTP_TX_ZERO_COPY to 1, FTP stops sending altogether…

Increasing the network buffers didn’t help. This is the response from thread stats. I saw that if I keep running thread stats, the FTPServer usage starts going down but slowly

IDLE            304956248               76
FTPServer       79193429                19
IP-task         1110856         <1
ConfigSer       48426           <1
Task0           2142            <1
EMAC            439923          <1
ServerLis       54              <1
Task1           24534           <1
Tmr Svc         10747543                2
ServerLis       4593            <1
ServerLis       10967           <1
Task2           385499          <1
Logger Ta       767             <1
Task3           5               <1
ServerLis       57              <1

I would be curious to see a recording of the FTP-session with Wireshark. It would be interesting to see a normal transfer, and a slow one. Could you ZIP a PCAP file and attach it to your post?

As you probably know, the latest releases of the FTP and HTTP servers can be found here.
The FAT library is available from here.
Note that both the FTP/HTTP-servers, as well as the +FAT library are still in the Labs stage.
And needless to say that the latest +TCP library can be found here.

Some remarks about your FreeRTOSIPConfig.h:

#define ipconfigNETWORK_MTU     1526

Unless you have a special reason to make MTU so big, I would recommend putting it to a more compatible size of 1500 bytes. MTU is the maximum transmission size, without the 14-byte Ethernet header, and without the (invisible) trailing Ethernet checksum. So in fact, 1518 can be transmitted.

#define ipconfigTCP_MSS         1460

1460 is indeed the largest (TCP) segment size, but I would recommend not to define it, because the default definition of ipconfigTCP_MSS will follow the size of MTU.

Xilinx has this logging function:

extern void xil_printf ( const char8 *ctrl1, ...);

If you use xil_printf(), make sure it is re-entrant, or use it from a single task only.

I miss the TCP settings for the FTP server, like e.g. :

#define ipconfigFTP_TX_BUFSIZE   ( 24 * ipconfigTCP_MSS )
#define ipconfigFTP_TX_WINSIZE   ( 18 )
#define ipconfigFTP_RX_BUFSIZE   ( 24 * ipconfigTCP_MSS )
#define ipconfigFTP_RX_WINSIZE   ( 18 )

Without these defines, the drive will use the defaults, also from your FreeRTOSIPConfig.h::

/* Each TCP socket has a circular buffers for Rx and Tx, which have a fixed
maximum size.  Define the size of Rx buffer for TCP sockets. */
#define ipconfigTCP_RX_BUFFER_LENGTH   ( 0x4000 )
/* Define the size of Tx buffer for TCP sockets. */
#define ipconfigTCP_TX_BUFFER_LENGTH   ( 0x4000 )

This means that the sockets will have stream buffers that can hold 16 KB in each direction.

When I played with FTP on Zynq, I gave it very large TCP buffers:

#define ipconfigFTP_TX_BUFSIZE    ( 256 * 1024 )
#define ipconfigFTP_TX_WINSIZE    ( 24 )
#define ipconfigFTP_RX_BUFSIZE    ( 24 * 1024 )
#define ipconfigFTP_RX_WINSIZE    ( 12 )

Hello Hein,

I have attached a pcap file here: ftp_fast_slow.zip (355.0 KB)

Since the speed of the transfer is not predictable, the PCAP has a lot of data since I was waiting for the slow transfer. Here are hints for you to find the fast and slow transfer in the PCAP:

From frame number 12519 is the beginning of the fast transfer - it goes to port 58062 of my computer. Then around frame14071, the slower transfer begins to port 58063 of my computer. Here is a screenshot from the serial terminal to show you where I got those port numbers from and how I correlated them to the speed:

I will make some changes to my IPConfig file based on what you said and see if that makes a difference. If it matters at all, the FTP task priority is set to configMAX_PRIORITIES - 3.

Also in a recent run, I faced this but have not been able to reproduce it consistently. This happens when I use my raspberry pi to download files from the FTP server instead of my dev machine.

Thanks,
Sid

the PCAP has a lot of data since I was waiting for the slow transfer

Tip: If you want to exclude other data, you can put a filter like: ip.addr==192.168.0.5, to only see traffic with your device, and then use “File->Export specified packets” to only write displayed packets.

If you open the PCAP, and set a filter called “tcp.stream eq 14”, you will see the FTP session that went slow.
What I see is a period of non-activity of exactly 5 seconds.

During those 5 seconds, from packet 14761 and 14773, there is no network activity, and x.x.x.100 (PC) is waiting for x.x.x.5 (Xilinx device).

What could cause the non-activity in the Zynq for 5 full seconds? In the network data I do not see reason for this break.

I ran the program again and saw that all the slower transfers have that 5s gap. But all the 5s gaps also have traffic coming from the other device on the network (Raspberry Pi). Do you think the Zynq is not sending data because the host is busy? Or is it that the host is not busy which is why are are seeing the additional traffic?

Please pardon my lack of knowledge in this domain, it is something I am trying to get a better grip over…

Edit: I have answered my own question - I disconnected the Pi and I still see the non-activity for 5s…

I cannot quite figure out what would cause non-activity on the IP task for 5s consistently. All my other timed tasks are lower priority than the IP task and FTP task. And the gap between the slow transfers is not periodic either. Any ideas?

Just a quick experiment, what if you change the following time-outs in FreeRTOSIPConfig.h from:

#define ipconfigSOCK_DEFAULT_RECEIVE_BLOCK_TIME    ( 5000 )
#define ipconfigSOCK_DEFAULT_SEND_BLOCK_TIME       ( 5000 )

to for instance 3000, and 4000? Does the silent period also change to 3 or to 4 seconds?

PS. Normally a macro describing a time-out, should end with either _MS, or with _TICKS. These are very old macro’s and we didn’t want to disturb existing users and change it.

But the two macro’s above are in units of clock-ticks, so the definition should be like:

#define ipconfigSOCK_DEFAULT_RECEIVE_BLOCK_TIME    pdMS_TO_TICKS( 5000u )
#define ipconfigSOCK_DEFAULT_SEND_BLOCK_TIME       pdMS_TO_TICKS( 5000u )

In your project, configTICK_RATE_HZ is most probably 1000, so in your project, the result will be the same.

I was in the middle of doing this same experiment when you sent this message :sweat_smile: unfortunately, it did not change anything. My project’s configTICK_RATE_HZ is set to 100 so I guess I was setting the default timeout to 50s and not 5s so it couldn’t have been the problem anyway. Do you recommend configTICK_RATE_HZ to be 1000? Is there any benefit of that?

What I also did just now was turn off ALL the tasks that my application was launching except the FTP server and IP task and I am still seeing the same 5s silent period which makes me think that it is something in my settings that is causing this. Or something with the PHY or switch…

Yes, as you have a very fast CPU, it doesn’t hurt to put configTICK_RATE_HZ to 1000, and have more precise timings. All timers in the TCP/IP stack are based on the FreeRTOS clock-tick.
I did test the stack with a 100 Hz clock-tick, but that is long ago.

The IP-task sleeps when it waits for a message in the queue, but with a maximum time that is defined as:

#ifndef ipconfigMAX_IP_TASK_SLEEP_TIME
    #define ipconfigMAX_IP_TASK_SLEEP_TIME    ( pdMS_TO_TICKS( 10000UL ) )
#endif

So as long as the stack is idle, it will wake-up every 10 seconds.

Could you let it wake-up every second:

#define ipconfigMAX_IP_TASK_SLEEP_TIME    ( pdMS_TO_TICKS( 1000UL ) )

and have it send a log-line every time it wakes-up from xQueueReceive()?
I would expect a log-line every second or more often.

Or something with the PHY or switch…

Do you see any logging from NetworkInterface, like prvEMACHandlerTask: PHY LS now 0?

FWIW it’s working very well with a 100 Hz tick (I’m using).

Yes, I see the print message 1/sec.

Changing configTICK_RATE_HZ may have reduced the occurrence of the silent periods (not sure if this is fully true, will have to revert to quantify) but in a transfer of 57x 1MB files, I still see a silent period 3-4 times. I used to take close to 2mins to transfer that data and now it is taking less than 1 but I still think that silent period will be an issue once I scale up to 1000s of files.

@hs2 wrote:

FWIW it’s working very well with a 100 Hz tick (I’m using).

Thanks for informing.

but I still think that silent period will be an issue once I scale up to 1000s of files.

while it should never occur.

I will test it my self, and come back to it. Thanks so far.

Okay thank you for your help Hein, I really appreciate it. If you are unable to reproduce this, let me know and I will do my best to help.

Do you think I should make a new post for this issue?

I only see this when I run my Python based FTP server code (using ftplib) on my Raspberry Pi that is also on the same network as my device I am testing. When I run the Python program from my Windows 10 dev machine, I do not see these errors.

Do you think I should make a new post for this issue?

No, this post is good.

, I do not see these errors.

The dark image is not readable this time. Could you send it as plain text, or a better image?

Sid, I just tested my Xilinx/Zynq project while using your FreFreeRTOSIPConfig.h. I did not see the 5-second period of inactivity, but i did see many irregularities in the TCP communication.

when I restored my own FreeRTOSIPConfig.h, which I uploaded here: FreeRTOSIPConfig.h (22.6 KB)

FTP runs perfectly in two directions. I used a 1MB file to test with. I up- and downloaded it a 100 times, using FileZilla.
I also ran iperf, which had the usual performance.

Please also make sure that the net hardware ( switch, cables ) is OK.

I just compared our FreeRTOSIPConfig.h files:

    #define ipconfigNIC_N_TX_DESC ( 4 )   => 32
    #define ipconfigNIC_N_RX_DESC ( 16 )  => 32

    #define ipconfigNETWORK_MTU     1526  => 1500
    #define ipconfigTCP_MSS         1460  // please drop this definition

    #define ipconfigNUM_NETWORK_BUFFER_DESCRIPTORS   ( 64 ) => 96

Please add:

    #define ipconfigFTP_TX_BUFSIZE   ( 256 * 1024 )
    #define ipconfigFTP_TX_WINSIZE   ( 12 )
    #define ipconfigFTP_RX_BUFSIZE   ( 256 * 1024 )
    #define ipconfigFTP_RX_WINSIZE   ( 12 )

For running iperf I used larger TCP windows:

    #define ipconfigIPERF_TX_BUFSIZE    ( 128 * 1024 )	/* Units of bytes. */
    #define ipconfigIPERF_TX_WINSIZE	( 48 )			/* Size in units of MSS */
    #define ipconfigIPERF_RX_BUFSIZE	( 128 * 1024 )	/* Units of bytes. */
    #define ipconfigIPERF_RX_WINSIZE	( 48 )			/* Size in units of MSS */

    #define ipconfigIPERF_STACK_SIZE_IPERF_TASK    680
    #define ipconfigIPERF_VERSION                  3

My guess is that the problem was here:

#define ipconfigNIC_N_TX_DESC ( 4 )
#define ipconfigNIC_N_RX_DESC ( 18 )

Now I am still not sure if the above also helps for you. It helped for me to get reliable fast transfers, both for FTP and for iperf.


I have found the problem that occurs on your board! The 5-second delay. You should have seen it when logging is enabled as: "emacps_send_message: Time-out waiting for TX buffer".

XStatus emacps_send_message( xemacpsif_s * xemacpsif,
                             NetworkBufferDescriptor_t * pxBuffer,
                             int iReleaseAfterSend )
{
    int head = xemacpsif->txHead;
    int iHasSent = 0;
    uint32_t ulBaseAddress = xemacpsif->emacps.Config.BaseAddress;
    TickType_t xBlockTimeTicks = pdMS_TO_TICKS( 5000U );
    ...
    if( xSemaphoreTake( xTXDescriptorSemaphore, xBlockTimeTicks ) != pdPASS )

You see the 5000 ms?

All DMA descriptors for transmission were occupied, and none came available within 5 seconds. Which is a bit strange.

xSemaphoreTake() is a counting-semaphore that keeps track of the available TX DMA descriptors. When a transmission is ready, the semaphore is given.

emacps_send_message() is called from xNetworkInterfaceOutput(), which is called from the IP-task. The task prvEMACHandlerTask() should be woken up ( EMAC_IF_TX_EVENT ) as soon as a transmission is ready, and it will give to the counting semaphore.

We will have to change this code: the IP-task should never wait for 5 seconds. But also I would like to understand why no TX descriptor became available after 5 seconds.