FreeRTOS TCP/IP - TCP send priority

nndpkz wrote on Friday, May 25, 2018:

Hello everyone,

we have just made basic TCP/IP server application on ZYNQ-7000 using FreeRTOS v10.0.0 and it’s TCP/IP stack. LAN on our ZYNQ board is 1Gb. We have two tasks (of equal priorities = configMAX_PRIORITIES - 3), one for sending to and other for receiving data from PC (Windows 7) over single TCP socket. PC and ZYNQ are connected via dedicated Ethernet cable.

We are measuring throughputs in our Windows TCP client. We have around 24MB/s in total . However, even though we expect to have divided throughput between Rx and Tx side (for example Rx - 13MB/s and Tx - 11MB/s), it seems that Rx side (PC Rx <- ZYNQ Tx) is more dominant i.e. have higher priority. When this Rx side is active other side seems completely inactive, thus we are getting 24MB/s for Rx side, and once Rx side is finished, Tx “gets activated” and we get 24MB/s for Tx. Of course, Tx and Rx are implemented concurrently (as threads) in Windows TCP client app i.e. they are always working at the same time. We exchange data in chunk sizes of 4kB.

It seems like TCP send function has higher priority than TCP receive function on FreeRTOS TCP/IP stack. I say “it seems”, it is just my guess, but I have no real clue what is going on. Maybe the problem is on Windows side. We are investigating this.

Does anyone have any idea what might be a cause of this or which methods to use in order to investigate this? If any more information is needed, let me know.

Best regards,
Nenad

nndpkz wrote on Friday, May 25, 2018:

Hi guys,

after some magic experimenting, we decided to send and receive data in chunks of 1kB on Windows side (client), while keep on receiveing and sending data in chunks of 4kB on ZYNQ side (server). It seems like this solved the “priority” issue, as you can notice from attachment I posted.

I hope anyone can explain me what is going on? I would really like to know the real cause of this behaviour.

Best regards,
Nenad

heinbali01 wrote on Friday, May 25, 2018:

We are measuring throughputs … We have around 24MB/s in total

Is that 24 MByte, or around 200 Mbps? Is that what you need, or do you want more speed?

A Zynq 7000 on a 1 Gb LAN should be able to transport quite a bit more data. But the actual throughput depends of course, on what you do with the data, how much processing is involved.

What TCP parameters do you use? What are the buffer sizes and TCP window sizes? Have you used the socket option FREERTOS_SO_WIN_PROPERTIES? That can help a lot to increase the transfer speeds in case you’re sending lots of data continuously.

it seems that Rx side (PC Rx ← ZYNQ Tx) is more dominant

Yes that is possible, I found the same in (iperf) performances tests between a laptop and FreeRTOS+TCP. When it comes to receiving data, +TCP is dependent on the (efficiency of) the host. When it comes to sending, FreeRTOS+TCP will send (queue) packets non-stop until the TCP windows is full.
But I would not expect that one direction stops (starves) when the other is active! Have you also tried the same thing with two separate sockets ( connections )?

FreeRTOS+TCP won’t give priority to data travelling in a particular direction.

One thing is important to check, the task priorities:

  • idle priority : the idle task
  • low priority : the tasks that make use of +TCP
  • medium priority : the IP-task (ipconfigIP_TASK_PRIORITY)
  • high priority : prvEMACHandlerTask(), i.e. the driver at e.g. configMAX_PRIORITIES - 1

The above priorities normally work nicely.

You could also consider doing both TX and RX from a single task. In that case, you can have that task sleep by either calling FreeRTOS_select(), or on a semaphore, which you can bind to a socket ( in case ipconfigSOCKET_HAS_USER_SEMAPHORE = 1 ):

    FreeRTOS_setsockopt( xSocket, 0, FREERTOS_SO_SET_SEMAPHORE, ( void * ) &xSemaphore, sizeof( xSemaphore ) );

Your task can block on the semaphore and after waking up, it can check for transmission (FreeRTOS_tx_space()) or reception (FreeRTOS_rx_size()) . Maybe that helps against the starvation of one task.

At the Window side, you can also use a single thread and call select() to block

nndpkz wrote on Friday, May 25, 2018:

Hi Hein, thank you for your detailed answer.

Is that 24 MByte, or around 200 Mbps? Is that what you need, or do you want more speed?

Yes, that is the speed. We would like to have more speed. This is the bare minimum we could use, but we are not sure how much additional processing will make this throughput go down.

But the actual throughput depends of course, on what you do with the data, how much processing is involved.

Right now, we don’t have any processing at all! We just exchange already prepared buffer data generated in applications themselves, nothing else besides that.

What TCP parameters do you use? What are the buffer sizes and TCP window sizes? Have you used the socket option FREERTOS_SO_WIN_PROPERTIES?

We use all the configurations mentioned in chapter TCP/IP Stack Configuration to Maximise Throughput of this FreeRTOS article: TCP/IP stack configuration examples for RTOS applications, and I mean macro defines and WinProps settings. Just to mention, we don’t have any problem with memory, as we have 1GB of RAM, so we should be able to achieve greater speeds.

Have you also tried the same thing with two separate sockets ( connections )?

Tried this, nothing changed at all.

One thing is important to check, the task priorities…

We will check priorities once more. What do you consider by low priority? As I mentioned, our tasks that use +TCP both have configMAX_PRIORITIES - 3 priorities. That should be lower than priority for IP task for sure, but I will do the double check.

You could also consider doing both TX and RX from a single task.

We will definitely consider this and probably try this solution as well. But, I think we will also try to avoid it if possible, as this will make our design more complicated.

What do you think about my first reply to this post? We noticed that when we decrease chunk size of send function on Windows side, we get more normal behaviour, and it seems that issue with “priorities” is not present any more. It seems like it’s definitely something going on with chunk sizes of send, and maybe receive, functions.

heinbali01 wrote on Saturday, May 26, 2018:

We would like to have more speed. This is the bare minimum we could use,
but we are not sure how much additional processing will make this throughput go down.

The possible throughput will be a lot higher than 200 Mbps

We use all the configurations mentioned in chapter TCP/IP Stack Configuration
to Maximise Throughput of this FreeRTOS article:…

The next step is to start-up Wireshark and study the TCP conversation. Where do you see delays? How many packets are being sent without waiting for acknowledgement? Are all packets totally filled ( a TCP payload of 1460 bytes )?

Would you mind posting a ( compressed ) PCAP file here that shows the “priority problem”?

We will definitely consider this and probably try this solution as well.
But, I think we will also try to avoid it if possible, as this will make
our design more complicated.

I think that it is worth spending time on the implementation side. Using select() is the most compatible option, whereas using a semaphore is easier to implement.

The FTP and HTTP servers ( see Protocols directory ) are also using a single task. That task sleeps by calling FreeRTOS_select(). When waking-up, it will check all clients and communicate in both directions.

What do you think about my first reply to this post?

I can imagine that it changes the flow. But using small 1KB buffers will also decrease the overall speed?

If you want I do a simulation here ( Laptop <=> Zynq ) of what you’re trying to reach. Should I think of 2 streams with constant data? Or does the data come in chunks of so many KB’s? Is it like e.g. streaming audio?

heinbali01 wrote on Saturday, May 26, 2018:

Hi Nenad, I have good news for you : iperf3 shows exactly the same pattern: sending from Zynq to a host gets absolute priority above receiving data;

From host to device, high speed:

[  4]   0.00-1.00   sec  50.8 MBytes   426 Mbits/sec
[  4]   1.00-2.00   sec  56.4 MBytes   472 Mbits/sec

Now I also start iperf -R, which will receive data from the Zynq, and the data rate drops to 2.1 Mbps:

[  4]   2.00-3.00   sec  21.6 MBytes   181 Mbits/sec
[  4]   3.00-4.00   sec   256 KBytes  2.10 Mbits/sec
[  4]   4.00-5.00   sec   256 KBytes  2.10 Mbits/sec
[  4]   5.00-6.00   sec   256 KBytes  2.10 Mbits/sec
[  4]   6.00-7.00   sec   512 KBytes  4.19 Mbits/sec
[  4]   7.00-8.00   sec   128 KBytes  1.05 Mbits/sec
[  4]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec
[  4]   9.00-10.00  sec  2.12 MBytes  17.8 Mbits/sec

iperf -R is ready and the communcation resumes:

[  4]  10.00-11.00  sec  14.8 MBytes   124 Mbits/sec
[  4]  11.00-12.00  sec  55.5 MBytes   466 Mbits/sec
[  4]  12.00-13.00  sec  55.4 MBytes   465 Mbits/sec
[  4]  13.00-14.00  sec  57.5 MBytes   483 Mbits/sec
[  4]  14.00-15.00  sec  55.9 MBytes   469 Mbits/sec
[  4]  15.00-15.51  sec  28.8 MBytes   472 Mbits/sec

These are the iperf -R results:

[  4] local 192.168.2.5 port 49111 connected to 192.168.2.108 port 5001
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec  61.4 MBytes   515 Mbits/sec
[  4]   1.00-2.00   sec  60.8 MBytes   510 Mbits/sec
[  4]   2.00-3.00   sec  61.1 MBytes   512 Mbits/sec
[  4]   3.00-4.00   sec  60.8 MBytes   510 Mbits/sec
[  4]   4.00-5.00   sec  60.0 MBytes   503 Mbits/sec
[  4]   5.00-6.00   sec  61.8 MBytes   519 Mbits/sec
[  4]   6.00-6.55   sec  34.1 MBytes   522 Mbits/sec

It has a constant speed of > 500 Mbps

I will come back to this. H

heinbali01 wrote on Saturday, May 26, 2018:

Before creating a solution in the library, I am wondering if you can arrange the flow control at a higher level:

Suppose that the TCP server running on Zynq becomes the master. The master-slave communication will include commands and data, like for instance:

    Zynq command   : please send me your raw data
    Client command : I have 100 KB to send
    Client data    : 100 KB
    Zynq command   : I have 80 KB to send
    Zynq data      : 80 KB
    Zynq command   : please send me you raw data

In the above case, the data speeds will be really high in both ways as your communication is “half-duplex”.

If that protocol is not possible…

what I just tried is to use asymmetric TCP parameters, small buffers for sending and large buffers for the reception of data:

    #define ipconfigIPERF_TX_BUFSIZE   ( 2 * ipconfigTCP_MSS )   /* Units of bytes. */
    #define ipconfigIPERF_TX_WINSIZE   ( 2 )                     /* Size in units of MSS */
    #define ipconfigIPERF_RX_BUFSIZE   ( 32 * ipconfigTCP_MSS )  /* Units of bytes. */
    #define ipconfigIPERF_RX_WINSIZE   ( 16 )                    /* Size in units of MSS */

With these parameters, the device will send at a low speed ( 80 Mbps ) while the host can easily send at full speed ( 360 Mbps ) concurrently.
You may be able to find an optimum somewhere, but I don’t really like this type of solution.

The case is not yet closed ( i.e. I will keep on thinking of a solid solution ).

nndpkz wrote on Saturday, May 26, 2018:

Would you mind posting a ( compressed ) PCAP file here that shows the “priority problem”?

I have posted our Wireshark capture in attachments. It is however really big, even compressed, around 19MB. Is there any other way to post it? PC IP: 172.16.0.200, ZYNQ IP: 172.16.0.215.

How many packets are being sent without waiting for acknowledgement? Are all packets totally filled ( a TCP payload of 1460 bytes )?

I would say around 8 packets are sent without waiting for ACK. As you can see, most of TCP payload is indeed 1460 bytes or 1176 bytes. It is also obvious that only one direction is working at a time, first only ZYNQ Tx → PC Rx, while PC Tx only sends ACKs (packets of 54 bytes), and once this side is finished, PC Tx → ZYNQ Rx gets activated.

I am wondering if you can arrange the flow control at a higher level…

We will take in consideration this change as well, but we will need more time to implement it.

what I just tried is to use asymmetric TCP parameters, small buffers for sending and large buffers for the reception of data…

I tried this, but I changed these parameters through WinProps for our application. With these settings, Rx and Tx were almost of equal priorities, but overall speed was really low, around 10MB/s.

Thank you for performing iperf tests. I tried this as well, but I am having difficulties running iperf on our ZYNQ app. I call vIPerfInstall from vApplicationIPNetworkEventHook after Network is up, but I always get assert on line 4566 in tasks.c - configASSERT( xTaskToNotify );. I tried calling it outside this Event hook, after network is up, but the result is same. I am using iperf_task_v3_0c.c.

heinbali01 wrote on Tuesday, May 29, 2018:

Hi Nenad, what we found is that the Zynq seems to drop incoming packets as soon as it sends lots of data. The device will send SACK’s ( Selective ACK’s ) and the host retransmits the missing packet after a time-out of 300 ms.

What I tried is decreasing the TCP/MSS from the default ( 1460 ) to 1 KB:

    /* Define the Maximum Segment Size. */
	#define ipconfigTCP_MSS		( 1024 )

This give a nett TCP payload of 1024 bytes. The Ethernet packets will 1064 bytes long. With this setting there are very few dropped RX packets, and the speed is acceptable in both directions.

I’m not sure if anyone recognises this problem: A Zynq dropping incoming packets under heavy traffic in both directions?

I’d like to see a better solution rather than decreasing the TCP packet size.

nndpkz wrote on Friday, July 06, 2018:

Hi Hein,

after some time we also tried the same thing with lwIP raw TCP/IP stack and the behavior is the same. Zynq TX side is once again the dominant one.

I started counting MAC send and receive handlers and it’s obvious that when both connections are active send handlers are completely dominant.

At one point there is:
tx_handler: 3759422
rx_handler: 864940

I will check if there are any priorities for these interrupts. I have already written to Xilinx support forum, waiting for their reply.

Best regards,
Nenad

P.S. I forgot to write that when we decreased TCP_MSS to 1024 in the FreeRTOS TCP/IP setup, that was one possible solution, but wasn’t good enough for our application.