Hello, a weird one. I am using FreeRTOS + TCP on a RISCV soft processor running on an FPGA at 100 MHz. The link is being run over a UART at 921600 using SLIP encoding. The software is simple, it runs a TCP server supporting a single TCP connection as a server, data sent to the server gets written over a different serial, data coming from that serial get written to the connecting socket. Given that this link is ~1Mbps physically from server to client and ~100 kbps physically from client to server, high TCP performance is not a requirement.
In the ideal lab environment (little latency) I was able to achieve ~600 kbps TCP throughput. This was considered “good enough”, but as we have transitional to a more operational setup (400 ms RTT), the throughput has dropped to almost half ~150 kbps. I am shocked at the drop in performance from just adding latency as I would expect the windowing mechanism to make up for this (we never get close to “bytes in flight” being equal to the TX window/buffer size). Based on the wireshark captures and time references, it almost seems like the stack is waiting for the ACK to continue creating/sending packets. I have attached the “good enough” wireshark capture as a reference but I unfortunately cannot attach the more operational like capture.
I guess the ask is if the issue might stand out to anyone or if I am misunderstanding TCP? Based on looking at the time references, there seems to be significant time spent creating these TX packets. I would expect (based on others throughput) that it would be way faster, and also that being able to get 600 kbps would only lower if we were hitting the max window size.
I am also using heap_3.c and Buffer Allocation scheme 2. The IP Task is the highest priority task and should be getting about 5 KB at a time from the sender task.
How is the RX packets handled? I believe you have a separate handler task that sends the RX packets to the IP task from the interface? If so, Ideally that task should be having higher priority than IP task.
This might be a possible reason why ACKs are delayed.
Hey, @tony-josi-aws thanks for the reply. I will have to test a build tomorrow by setting the RX Handler task to a higher priority to see if I can achieve even higher throughput. The good news is in the short time from my original post I got the high latency environment to produce the nominal 600 kbps which should be “good enough” (Would highlight that anything north of 850 kbps is likely unobtainable because of the link).
After staring at the Wireshark capture long enough and using the FreeRTOS API to see the TX queue space, it was clear that we were hitting the max TX queue length. After dramatically increasing the TX window/buffer size, performance is back up!
I will still try your suggestion to see if I can squeeze in some more performance. Thanks!
It looks like you took the PCAP at the host 192.168.1.2, is that true?
Would it be possible to record a PCAP at the other end?
I always found that when I look at the host side, my DUT seems to be awfully slow. But when I create a PCAP file in my DUT, the remote host seems to be slow.
Unfortunately with our setup we are not able to capture at the DUT (this would make debugging a lot easier). The device is deeply embedded with no way to sniff.
We are running into another unintended side affect with increasing the window size. It seems that during a TCP download from the DUT, at some point it stops working and we are no longer able to ping the device (ICMP request). The device isn’t “broken” (still retransmitting the TCP packets) but obviously the RX side is not working so its not processing the ACKs or ICMP requests. When I revert back to the lower window sizes, we do not see this issue.
Im not ruling out an issue in our receive logic but was hoping someone might be able to explain to me the relationship between network buffers and the window size? To me it seems like we are not handling some memory or config item properly. Is it possible that we are utilizing all network buffers with the increased window size and then are no longer able to process the RX packets? Could my event queue be too small? etc etc
I think that your intuition is correct: when increasing the WIN size, more network buffers are needed for TCP.
There is a function vPrintResourceStats() that you could call at a regular interval, it prints the availability of the most important resources. I normally call it from the main loop in the network interface task.
The function vPrintResourceStats() needs a working implementation of FreeRTOS_printf().
Quick Question - in FreeRTOS when setting the window size and and buffer size, where is it getting the memory? Is it from ipconfigIP_TASK_STACK_SIZE_WORDS or does it malloc from the heap when a connection is made?
The interesting part about this issue is I never see my handler task (that sends packets to the IP task itself) set any of the “error” LEDs. I have this task set an LED if it ever has to drop a packet because of not being able to get a network buffer or drop for whatever reason. See code below.
I also reduced my TX Window size to 25 packets, RX to 5, and increased to the number of network buffers to 60 but still ran into the same issue. Not sure the issue is depleting network buffers… Any thoughts?
I can start working on getting vPrintResourceStats() implemented, just a little more involved to breakout the print messages to a peripheral in its current state.
// Allocate Network Buffer
pxNetworkBuffer = pxGetNetworkBufferWithDescriptor(rx_buf_len, 100);
// If fail to get Buffer
if (pxNetworkBuffer != NULL)
{
// Copy Data into the Network Buffer
memcpy(pxNetworkBuffer->pucEthernetBuffer, rx_buf, rx_buf_len);
pxNetworkBuffer->xDataLength = rx_buf_len;
// Determine if needs to be processed
if( eConsiderFrameForProcessing( pxNetworkBuffer->pucEthernetBuffer ) == eProcessBuffer )
{
// Set Pointer to Network Buffer
xStackRxEvent.pvData = (void *) pxNetworkBuffer;
// Send to Stack
if( xSendEventStructToIPTask( &( xStackRxEvent ), 100 ) == pdFALSE )
{
vReleaseNetworkBufferAndDescriptor( pxNetworkBuffer );
iptraceETHERNET_RX_EVENT_LOST();
GPIO_set_output(&g_gpio_out, LED, GPIO_DRIVE_HIGH);
}
else
{
// Track RX event
iptraceNETWORK_INTERFACE_RECEIVE();
}
}
else
{
// Buffer Release
vReleaseNetworkBufferAndDescriptor( pxNetworkBuffer );
GPIO_set_output(&g_gpio_out, LED, GPIO_DRIVE_HIGH);
}
}
else
{
// Track RX lost event
iptraceETHERNET_RX_EVENT_LOST();
GPIO_set_output(&g_gpio_out, LED, GPIO_DRIVE_HIGH);
}
Quick Question - in FreeRTOS when setting the window size and and buffer size, where is it getting the memory? Is it from ipconfigIP_TASK_STACK_SIZE_WORDS or does it malloc from the heap when a connection is made?