[FreeRTOS+TCP] Long delay between two splitted frames

We are running a tcp server task on the Zynq7000 Soc.
This tcp server task sends mutiple chunks of data with a fixed size 1554 bytes to a tcp client running on Windows OS.
As a result, this data will be splitted into two frames
Frame 1: 1514 bytes (1460 bytes raw data)
Frame 2: 148 bytes (94 bytes raw data)
I captured the data via Wireshark on the Windows and found that the delay time between two frames is almost 0, but sometimes it is ~60ms which causes a performance issue on our application.
I would like to ask what can be the causes of 60ms delay time?
Thank you

Hi @trantam_cdt92, thanks for reporting this.

If it is possible, can you send a zipped PCAP file? I would like to know which party causes the 60 ms delay.

It can also be useful to do the same test with a different TCP client like eg. ncat (nc or netcat).

When possible, you could also try to send 2 full packets, each one carrying 1460 bytes, just adding zeros to your send buffer. See if that makes a difference.

Hi @htibosch
Thank you for quick response
Since I am a new user so I can’t upload the file. Please download the pcap log file from this 1YvZm8rh69Jus85tq0v4NBiKQZJYKWIOi/view

Please get the link from this picture, I am not allowed to paste a link too.

splitted_frame.zip (376.1 KB)

You can use this filter to see where the delay time is about 60ms
(frame.len == 1514 || frame.len == 148) && frame.time_delta > 0.05
Also the time display format is "Seconds since previous captured packet

Moderator, could you please give permission to @trantam_cdt92 to attach files?

For now, I inserted a link to your PCAP file “pcapng”.

I will check the file and come back to it.

I don’t appear to have rights to do this. I’ll ping @aggarg to see if he can update permissions for Tam.

Done. Should be able to post links now.

Note that TCP packets are sent from the function prvTCPSendRepeated().

This function will either stop because:

  • No more data to send (however, 1460 + 94 bytes are expected to be sent)
  • No resources available (only one network buffer)
  • The MPU is kept busy for 100 ms at a priority higher that the IP-task

Find attached splitted_fram_samples.zip (978 Bytes)

Packet 10181/10182: The strange thing is that the second packet
carrying 94 bytes is sent twice. As though packet 10181 was ready-to-send but it got holdup for 102 ms.

So maybe prvTCPSendRepeated() has sent both parts (1460 + 94 bytes) without delay, but the second part got holdup for some reason.

I wonder if the 100 ms is related to the xmit timeout, see ipconfigTCP_SRTT_MINIMUM_VALUE_MS.

Would it be worth if you show some code that sends 1,554 bytes continuously, even if they’re fake bytes?

1 Like

Hello @trantam_cdt92, do you have to wait a long time before mentioned problem occurs?

I just tried the following on my Zynq/Zybo.

I wrote a TCP server (see attachment below) which listens to port 20000 at IP address 192.168.2.127.

This server will send chunks of 1554 bytes to its clients every 6 ms.

I also added the commands sent by the TCP client, eg. “thank you”.

I wrote this python script that becomes the client:

import socket

server_ip = "192.168.2.127"
port = 20000

with socket.create_connection((server_ip, port)) as s:
    while True:
        buf = s.recv(1554)
        if not buf:
            break
        # process buf here, or ignore
        s.sendall(b"thank you")

print("Program ended")

This test went well: accurate delivery of all packets without an exception.

Here is the code that runs on my Zybo:

tcp_test_server.zip (3.5 KB)

The CPU was busy for on 2.3 % of the time.
Could you try this in your hardware?

I started the testing task it as:

    vStartSimple2TCPServerTasks( 1024u, 2u );

after the network is up and running.

Thanks

Hi @htibosch

Sorry for the late reply.

do you have to wait a long time before mentioned problem occurs?
No. It occurs after a few seconds.

I have tried with your test code and the issue didn’t occur.

Btw, when I change ipconfigTCP_SRTT_MINIMUM_VALUE_MS to a smaller value, such as 10 ms, the issue doesn’t occur with my code. But somehow, if I change the data size to double (1554 * 2), the bad performance issue of my application still occurs.
For my application, I am using QTcpSocket of Qt Framework to run the tcp client. Do you think the QTcpSocket caused the issue?

It seems that this issue is related to QTcpSocket. Please let me continue investigating this issue and I will update you once I have more information. Thank you very much.

1 Like

I have tried with your test code and the issue didn’t occur.

Thanks for testing, we’re one step further.

Do you think the QTcpSocket caused the issue?

Not sure, I think that QT is also a stable product.

Your test with ipconfigTCP_SRTT_MINIMUM_VALUE_MS is interesting.

RTT is Round Trip Time
SRTT is Smoothed RTT

Your PCAP showed that the first segment (1460 bytes) was sent immediately.
I also believe that the smaller segment (94 bytes) was queued to be processed by DMA.
However the driver doesn’t manage to actually start the sending.

I think the driver should wait for xemacpsif->txBusy to become false before sending a new packet. See the changes that I made to the function emacps_send_message() in x_emacpsif_dma.c:

x_emacpsif_dma_hein.zip (7.7 KB)

The field txBusy was added to struct xemacpsif_s for this purpose, I do not know if and when the driver stopped polling it.
In the attached version, I have the driver spin-wait for txBusy to become low. We can make that nicer. Please try this first.

Hi @htibosch
Can you please try to reproduce the issue by following the below steps?

  1. Update your test code to only send the 1554 bytes data when the tcp client sends a command. Basically, comment out the 6ms delay code and change lResult >= 0 to lResult > 0

  2. Use this test python script to run the test

test_server.7z (422 Bytes)

I can easily see the issue with the above update.

Many FreeRTOS functions are only included when a macro has been defined. When using xTaskDelayUntil(), you’ll need to define in your FreeRTOSConfig.h :

    #define INCLUDE_xTaskDelayUntil      ( 1 )

When doing that, the statement if ( lResult > 0 ) can stay there.

Remind that WireShark has an interesting option:

Statistics -> TCP stream graphs -> Time Sequence (tcptrace)

After enabling xTaskDelayUntil(), can you try “my version” of tcp_test.py again?

test_server.py works for me as well, although it does sometimes show buf size = 1460, the PCAP looks good.

But most important, have the TCP retransmissions gone?

@htibosch
Sorry, I didn’t explain clearly. I commented out xTaskDelayUntil to change the logic to the same as my application. Not because I couldn’t use it.
My application logic is:

  1. Tcp client sends a command.
  2. Tcp server receives the command and then send 1554 bytes data back to tcp client
  3. Tcp client reads until it gets enough 1554 bytes
  4. Repeat step 1, 2, 3 many times.
    If you have not tried to change your test code as the above. Could you please try again?
    And please use this filter to check the issue.
    (frame.len == 1514 || frame.len == 148) && frame.time_delta > 0.05
    Time display format: “Seconds since previous captured packet”

That is most important. And if they haven’t gone, make sure the new code is executed.

Hi @htibosch

Let me summary the results.

  • test_tcp.py + your original test code => Both delay time and Tcp retransmission did NOT occur
  • test_server.py + your original test code => Both delay time and Tcp retransmission did NOT occur
  • test_server.py + my updated test code => Both delay time and Tcp retransmission occurred
  • test_server.py + my updated test code + updated x_emacpsif_dma.c => Both delay time and Tcp retransmission still occurred

Didn’t you see any problem with test.server.py + my updated test code?

Hi @trantam_cdt92,

I get the same results as you (!), except that the updated x_emacpsif_dma.c solves the problem.

Can you make sure that you modified the right instance of x_emacpsif_dma.c? Sometimes I am mistaken myself: I change a function, but in a source file that isn’t compiled.
I even set #warning statements to make sure the compiler gets there.

When I run the test with your modifications, I get loads of packets when filtering:

    tcp.analysis.retransmission || tcp.analysis.ack_lost_segment

They show the same pattern as in your original splitted_frame.pcapng.

In my setup, the modification in x_emacpsif_dma.c makes a big difference.

Almost zero problems and of a different pattern.

So I can replicate this one:

  • test_server.py + my updated test code => Both delay time and Tcp retransmission occurred

I added a configASSERT() to see if txBusy gets low:

 if( xemacpsif->txBusy == pdTRUE )
 {
     eventLogAdd( "Start" );
 
     // As a test, spin-wait here for the previous
     // transmission to be ready.
     uint32_t uxCounter = 10000;
 
     while( uxCounter > 0 )
     {
         // Test if 'volatile int txBusy' has been reset.
         // Should be found reset after 750 loops
         if( xemacpsif->txBusy == pdFALSE )
         {
             break;
         }
 
         uxCounter--;
     }
+    configASSERT( xemacpsif->txBusy == pdFALSE );
     eventLogAdd( "End %u", ( unsigned ) ( 10000 - uxCounter ) );
 }

EDIT
Here I use eventLogAdd() to write log lines to memory, along with a timestamp. During or after the test, I can inspect and clear the log lines.

Here the logging showed that the txBusy flag will be cleared after about 650 loops. Once tested, I’ll make it interrupt-driven, of course.

1 Like

Hi @htibosch

Sorry for my bad check. With the updated x_emacpsif_dma.c, it also worked well in my environments. No delay time and TCP retransmission.
However, my application still has a bad performance, but I think it can be my application issue.

Are you going to commit your modification in the x_emacpsif_dma.c?

Btw, thank you very much for your efforts. You helped me a lot.

My application performance is good now after I update its internal logics.
I would be great if you can share me a final update in x_emacpsif_dma.c. Thank you in advance.

When I ran tcp_test_server, the CPU was busy for only 2.3 % of the time. The Ethernet also has a lot of spare bandwidth.
You could consider using FreeRTOS_select()

The patch for x_emacpsif_dma.c is on my list for next week.