STM32H743 FreeRTOS+TCP issue with ZERO Copy

So, last update and probably final here. I had to modify slightly the file “FreeRTOS_Sockets.c” to manage 100% use of the heap TCP buffer instead of my secondary one. If I set the lTxBufSize to my Payload size (1400) minus 4 the buffer size was rounded due to the MSS size, see “FreeRTOS_Sockets.c” row 1722:

            if( lOptionName == FREERTOS_SO_SNDBUF )
            {
                /* Round up to nearest MSS size */
                ulNewValue = FreeRTOS_round_up( ulNewValue, ( uint32_t ) pxSocket->u.xTCP.usMSS );
                pxSocket->u.xTCP.uxTxStreamSize = ulNewValue;
            }

I have commented out the rounding and now the TCP Tx Buffer is always a multiple of my payload size and I never have to get into my secondary buffer (still there in case there is some issue).

I manage to get 72Mbps TCP stream now and considering the high load (the 20us ISRs) I think it is quite fine for my application. I will eventually do a benchmark against LWIP, but for now I am really satisfied and I will move on with areas in my application where I have to improve.

Thanks for your support!

I manage to get 72Mbps TCP stream now

You make my day! Thank you for your patience. I will check the piece of code that you mention.

Great achievement :+1:
I agree with you that you’re probably on the edge of your system with the pretty high interrupt rate of the data source. I would be surprised if you’d get comparable performance with lwIP.

By the way, I’ve modified FreeRTOS_sockets.c/.h and added an advance function that helps me fully use the ETH buffer in the heap together with FreeRTOS_get_tx_head:

/**
 * @brief Get a direct pointer to the first element of the buffer.
 *
 * @param[in] xSocket: The socket owning the buffer.
 *
 * @return First element of the circular transmit buffer if all checks pass. Or else, NULL
 *         is returned.
 */
    uint8_t * FreeRTOS_get_first_buff_element( ConstSocket_t xSocket )
    {
        uint8_t * pucReturn = NULL;
        const FreeRTOS_Socket_t * pxSocket = ( const FreeRTOS_Socket_t * ) xSocket;
        StreamBuffer_t * pxBuffer = NULL;

        /* Confirm that this is a TCP socket before dereferencing structure
         * member pointers. */
        if( prvValidSocket( pxSocket, FREERTOS_IPPROTO_TCP, pdFALSE ) == pdTRUE )
        {
            pxBuffer = pxSocket->u.xTCP.txStream;

            if( pxBuffer != NULL )
            {
                pucReturn = &( pxBuffer->ucArray[ 0 ] );
            }
        }

        return pucReturn;

As I fill in the buffer progressively, I start writing my data in “FreeRTOS_get_tx_head() + Payload size” and I pass NULL to the FreeRTOS_Send function to get the payload which is already complete get sent. When I see that the head is one payload away from the end, I call my function and I start writing data at the beginning of the buffer. FreeRTOS_Send with NULL argument then sends the previous payload already complete that lays at the end of the buffer.

Just my 2 cents, might be worth considering getting this into the official code for the advance users. The speeds I get now are crazy (>80Mbps) but then other tasks stop to work properly, so actually I’ve to stay at max 72Mbps for my application to work properly

As I fill in the buffer progressively, I start writing my data in “FreeRTOS_get_tx_head() + Payload size”

So you create your own temporary HEAD pointer in the buffer? And I guess that you write packets of 140 bytes each?

I pass NULL to the FreeRTOS_Send() function to get the payload which is already complete get sent.

So now, without further copying, the IP-task sends like 1400 bytes?

When I see that the head is one payload away from the end, I call my function and I start writing data at the beginning of the buffer. FreeRTOS_Send with NULL argument then sends the previous payload already complete that lays at the end of the buffer.

I assume that you found that calling FreeRTOS_Send() for each small block ( 140 bytes ) is much slower than calling it for large blocks ( of 1400 bytes ) only?

The speeds I get now are crazy (>80Mbps)…

Very good, that is also what I also found!

Just my 2 cents, might be worth considering getting this into the official code for the advance users.

Yes, I don’t mind adding that function. I would propose some minor (non-functional) changes:

/**
 * @brief Get a pointer to the first element of the TX stream buffer.
 *
 * @param[in] xSocket: The socket owning the buffer.
 *
 * @return First element of the circular transmit buffer if all checks pass. Or else, NULL
 *         is returned.
 */
uint8_t * FreeRTOS_get_tx_base( ConstSocket_t xSocket )
{
    uint8_t * pucReturn = NULL;
    const FreeRTOS_Socket_t * pxSocket = ( const FreeRTOS_Socket_t * ) xSocket;

    /* Confirm that this is a TCP socket before dereferencing structure
     * member pointers. */
    if( prvValidSocket( pxSocket, FREERTOS_IPPROTO_TCP, pdFALSE ) == pdTRUE )
    {
        StreamBuffer_t * pxBuffer = pxSocket->u.xTCP.txStream;

        if( pxBuffer != NULL )
        {
            pucReturn = pxBuffer->ucArray;
        }
    }

    return pucReturn;
}

I create my own temporary head pointer where I write chunks of 140 bytes (data gathered in one ISR call). I trigger FreeRTOS_Send when the payload is full → after 10 ISRs = 1400 bytes

Correct, I trigger FreeRTOS_Send() with NULL as buffer pointer and 1400 as length. The IP Task sends then the payload which is already complete. At the same time I start constructing my next payload in HEAD + One Payload (1400 bytes) or BASE if my HEAD is 1 payload away from the end of the buffer.

Thanks for the changes, I have taken them over into my code as well

Hi Hein,

I’m also evaluating performances with ipref3 on STM32H757 using your code.

Would you know why I get so many retry when I run the test in reverse mode?
Do you get the same?

Thank you for the amazing work you’re doing!

Stefano

Reverse mode, remote host 10.41.16.253 is sending
[  4] local 10.41.16.29 port 52098 connected to 10.41.16.253 port 5001
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec  11.1 MBytes  93.0 Mbits/sec
[  4]   1.00-2.00   sec  11.0 MBytes  92.1 Mbits/sec
[  4]   2.00-3.00   sec  11.0 MBytes  92.2 Mbits/sec
[  4]   3.00-4.00   sec  11.0 MBytes  92.1 Mbits/sec
[  4]   4.00-5.00   sec  11.0 MBytes  92.1 Mbits/sec
[  4]   5.00-6.00   sec  11.0 MBytes  92.2 Mbits/sec
[  4]   6.00-7.00   sec  11.0 MBytes  92.2 Mbits/sec
[  4]   7.00-8.00   sec  11.0 MBytes  92.2 Mbits/sec
[  4]   8.00-9.00   sec  11.0 MBytes  92.1 Mbits/sec
[  4]   9.00-9.09   sec  1.03 MBytes  91.8 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-9.09   sec  37.0 Bytes  32.5 bits/sec  4294967295             sender
[  4]   0.00-9.09   sec   100 MBytes  92.2 Mbits/sec                  receiver

iperf Done.

C:\Users\ugo64915\Downloads\iperf-3.1.3-win64>

Hi Stefano,

why I get so many retry

Do you mean “Retr 4294967295” ?

The number looks like (a 32-bit) -1. I don’t know what it stands for.

When I look at a PCAP of an iperf connection, I don’t see any retries.

Your iperf results are even better than what I got here.
Thanks

Hi Hein,

Yes, by retry I meant Retr in the ipref printout. In your previous example it was omitted by I expect it was the same.

With a little more digging, it looks like is because of partial implementation of ipref.

In iperf_task_v3_0d.c

			ulLength = snprintf( pcResponse + 4, sizeof( pcResponse ) - 4,
				"{"
					"\"cpu_util_total\":0,"
					"\"cpu_util_user\":0,"
					"\"cpu_util_system\":0,"
					"\"sender_has_retransmits\":-1,"
					"\"streams\":["
						"{"
							"\"id\":1,"
							"\"bytes\":%lu,"
							"\"retransmits\":-1,"
							"\"jitter\":0,"
							"\"errors\":0,"
							"\"packets\":0"
						"}"
					"]"
				"}\xe",
				ulCount );

The response to the iperf server has hardcoded values.
“"sender_has_retransmits":-1,”
“"retransmits":-1,”

This is what is why the iperf server is printing Retr 4294967295.
I guess it can be ignored if PCAP doesn’t show retries.

Changing

"\"sender_has_retransmits\":-1,"

to

"\"sender_has_retransmits\":0,"

would stop the server from printing the Retr value.

Thank you
Cheers

Very good of you, I will change the code accordingly.

As you see, I never implemented the 'retransmit’s feature.
I analysed the protocol by running two iperf instances, talking with each other. There I saw the JSON expression:
sender_has_retransmits:-1, which I just copied.

“Retr” is only printed when the -R option is used: the embedded server sends data.

Thanks

I created PR #544 which adds the new function FreeRTOS_get_tx_base().

Would you be able to check it?

I tested it as follows:

{
    BaseType_t xLength;
    uint8_t * pucBase = FreeRTOS_get_tx_base( xSocket );
    uint8_t * pucHead = FreeRTOS_get_tx_head( xSocket, &( xLength ) );
    FreeRTOS_printf( ( "httpTest: TX base = %p head = %p diff %u\n",
                       ( void * ) pucBase,
                       ( void * ) pucHead,
                       ( unsigned ) ( pucHead - pucBase ) ) );
}

When called before the first FreeRTOS_send(), the functions will create the TX buffer.
It will return NULL in case there was not enough heap.

Thanks Hein, I will integrate and test it. It will take some time as I am currently solving some other topics in my application, but I will certainly do it and give feedback