So, last update and probably final here. I had to modify slightly the file “FreeRTOS_Sockets.c” to manage 100% use of the heap TCP buffer instead of my secondary one. If I set the lTxBufSize to my Payload size (1400) minus 4 the buffer size was rounded due to the MSS size, see “FreeRTOS_Sockets.c” row 1722:
I have commented out the rounding and now the TCP Tx Buffer is always a multiple of my payload size and I never have to get into my secondary buffer (still there in case there is some issue).
I manage to get 72Mbps TCP stream now and considering the high load (the 20us ISRs) I think it is quite fine for my application. I will eventually do a benchmark against LWIP, but for now I am really satisfied and I will move on with areas in my application where I have to improve.
Great achievement
I agree with you that you’re probably on the edge of your system with the pretty high interrupt rate of the data source. I would be surprised if you’d get comparable performance with lwIP.
By the way, I’ve modified FreeRTOS_sockets.c/.h and added an advance function that helps me fully use the ETH buffer in the heap together with FreeRTOS_get_tx_head:
/**
* @brief Get a direct pointer to the first element of the buffer.
*
* @param[in] xSocket: The socket owning the buffer.
*
* @return First element of the circular transmit buffer if all checks pass. Or else, NULL
* is returned.
*/
uint8_t * FreeRTOS_get_first_buff_element( ConstSocket_t xSocket )
{
uint8_t * pucReturn = NULL;
const FreeRTOS_Socket_t * pxSocket = ( const FreeRTOS_Socket_t * ) xSocket;
StreamBuffer_t * pxBuffer = NULL;
/* Confirm that this is a TCP socket before dereferencing structure
* member pointers. */
if( prvValidSocket( pxSocket, FREERTOS_IPPROTO_TCP, pdFALSE ) == pdTRUE )
{
pxBuffer = pxSocket->u.xTCP.txStream;
if( pxBuffer != NULL )
{
pucReturn = &( pxBuffer->ucArray[ 0 ] );
}
}
return pucReturn;
As I fill in the buffer progressively, I start writing my data in “FreeRTOS_get_tx_head() + Payload size” and I pass NULL to the FreeRTOS_Send function to get the payload which is already complete get sent. When I see that the head is one payload away from the end, I call my function and I start writing data at the beginning of the buffer. FreeRTOS_Send with NULL argument then sends the previous payload already complete that lays at the end of the buffer.
Just my 2 cents, might be worth considering getting this into the official code for the advance users. The speeds I get now are crazy (>80Mbps) but then other tasks stop to work properly, so actually I’ve to stay at max 72Mbps for my application to work properly
As I fill in the buffer progressively, I start writing my data in “FreeRTOS_get_tx_head() + Payload size”
So you create your own temporary HEAD pointer in the buffer? And I guess that you write packets of 140 bytes each?
I pass NULL to the FreeRTOS_Send() function to get the payload which is already complete get sent.
So now, without further copying, the IP-task sends like 1400 bytes?
When I see that the head is one payload away from the end, I call my function and I start writing data at the beginning of the buffer. FreeRTOS_Send with NULL argument then sends the previous payload already complete that lays at the end of the buffer.
I assume that you found that calling FreeRTOS_Send() for each small block ( 140 bytes ) is much slower than calling it for large blocks ( of 1400 bytes ) only?
The speeds I get now are crazy (>80Mbps)…
Very good, that is also what I also found!
Just my 2 cents, might be worth considering getting this into the official code for the advance users.
Yes, I don’t mind adding that function. I would propose some minor (non-functional) changes:
/**
* @brief Get a pointer to the first element of the TX stream buffer.
*
* @param[in] xSocket: The socket owning the buffer.
*
* @return First element of the circular transmit buffer if all checks pass. Or else, NULL
* is returned.
*/
uint8_t * FreeRTOS_get_tx_base( ConstSocket_t xSocket )
{
uint8_t * pucReturn = NULL;
const FreeRTOS_Socket_t * pxSocket = ( const FreeRTOS_Socket_t * ) xSocket;
/* Confirm that this is a TCP socket before dereferencing structure
* member pointers. */
if( prvValidSocket( pxSocket, FREERTOS_IPPROTO_TCP, pdFALSE ) == pdTRUE )
{
StreamBuffer_t * pxBuffer = pxSocket->u.xTCP.txStream;
if( pxBuffer != NULL )
{
pucReturn = pxBuffer->ucArray;
}
}
return pucReturn;
}
I create my own temporary head pointer where I write chunks of 140 bytes (data gathered in one ISR call). I trigger FreeRTOS_Send when the payload is full → after 10 ISRs = 1400 bytes
Correct, I trigger FreeRTOS_Send() with NULL as buffer pointer and 1400 as length. The IP Task sends then the payload which is already complete. At the same time I start constructing my next payload in HEAD + One Payload (1400 bytes) or BASE if my HEAD is 1 payload away from the end of the buffer.
Thanks for the changes, I have taken them over into my code as well
Very good of you, I will change the code accordingly.
As you see, I never implemented the 'retransmit’s feature.
I analysed the protocol by running two iperf instances, talking with each other. There I saw the JSON expression: sender_has_retransmits:-1, which I just copied.
“Retr” is only printed when the -R option is used: the embedded server sends data.
Thanks Hein, I will integrate and test it. It will take some time as I am currently solving some other topics in my application, but I will certainly do it and give feedback
Hi Hain, sorry for giving my feedback so late, the modifications work well. I just have to change for my application, as I fill-in directly the FreeRTOS TCP buffer, some changes in freertos_sockets.c (line 1760):
btw, I believe this has not been mentioned here, so for completeness’ sake: On the Cortex M, accessing internal memory is dramatically faster than external memory which the driver apparently takes into consideration. Anyone who relocates the dynamic heap to external memory may experience a drastic performance degregation.
if you are looking for more ways to fine tune your system, you may want to inspect your map file to see if any data accessed by the buffer chain resides in external memory and if so find a way to relocate it.
Along the same lines, you may also want to experiment with speed optimization if you have ample flash left (needless to say, this will not affect anything transferred by DMA); speed optimized code - even more so when run from internal flash - can bloat to a mutiple over size optimized code but may yield throughputs enhancements in the multitude range.