+TCP window update

friesen · May 14, 2021, 7:51pm

Any thoughts about why the window doesn’t reflect reality here? For example, see
tcp.stream eq 1
in the wireshark from No 75 to 77 it drops from 5276 to 1460, then stays there until it zero’s out. 10.0.0.53 is the FreeRTOS+TCP device (STM32F769NI)

I feel fairly confident that I am servicing this socket, so I’m a little unclear why this window is acting funny. This is version 2.3.3, 2.2.0 did the same thing essentially.

  xWinProps.lTxBufSize = 4 * ipconfigTCP_MSS; /* Unit: bytes */
  xWinProps.lTxWinSize = 2; /* Unit: MSS */
  xWinProps.lRxBufSize = 16 * ipconfigTCP_MSS; /* Unit: bytes */
  xWinProps.lRxWinSize = 8; /* Unit: MSS */

EM.filtered.zip (138.2 KB)

RAc · May 14, 2021, 8:41pm

Hi Erik,

as I’m sure you know, the window size essentially is the shock absorber between what the peer stuffs into the socket’s receive buffer and what the target app removes from it. So if the peer stuffs data faster into the connection than the application can remove it, the window shrinks.

When you write that you are “servicing the socket,” are you sure that you service it in real enough time, or can it be the case that the application gets starved out by other tasks (possibly even the network task receiving the data) and thus fails to remove the data timely from the socket?

friesen · May 14, 2021, 9:32pm

Changing the ipconfigTCP_WIN_SEG_COUNT from 64 to 256 has changed this behaviour.

I’m not completely satisfied not understanding why.

The process isn’t really cpu starved, its essentially on a recv timeout of 10ms. This is my cpu task list, symetrix is the one this socket runs in.

Task Name       | CPU  | Stack
MzDiscovery     | 00.0 | 87
guiTasks        | 15.8 | 199
Cmd             | 00.0 | 267
Tcp putty       | 00.0 | 313
displayL        | 00.4 | 129
displayM        | 00.4 | 129
displayR        | 00.7 | 129
faders          | 00.7 | 225
Logging         | 00.0 | 151
qspiTasks       | 00.0 | 201
HttpWork        | 00.0 | 493
Config          | 00.0 | 285
controlTasks    | 00.4 | 239
dsp-manage      | 00.0 | 139
symetrix        | 01.7 | 249
Wireshark       | 00.0 | 229
Sntp            | 00.0 | 97
IDLE            | 79.2 | 91

friesen · May 14, 2021, 11:02pm

It doesn’t make sense to me that the window doesn’t really follow the byte count of what came in, all of the sudden it changes, so apparently there is some window housekeeping mechanism rather than a real time window like I would expect.

RAc · May 15, 2021, 5:17am

You might want to monitor your timing behavior with Tracealyzer or a compatible utility, normally that gives you the clue.

By “starve” I mean that the task that does the recv() calls gets starved out by other tasks. The recv() tomeout does not account for any of that. What are your task priorities? If the network tasj has a higher pri than your processing task, it may be the one that steals the cycles.

htibosch · May 15, 2021, 7:23am

can it be the case that the application gets starved out by other tasks (possibly even the network task receiving the data) and thus fails to remove the data timely from the socket?

Thanks @RAc for these well-phrased explanations. I was thinking in the same direction. The value of “win” is measured just before sending a packet. When it is zero, it means that the reception stream buffer is still full.
You can also see that a while later (300 ms), win has increased again.

Changing the ipconfigTCP_WIN_SEG_COUNT from 64 to 256 has changed this behaviour.

The macro with the cryptic name ipconfigTCP_WIN_SEG_COUNT has received very little attention.
A value of 64 means that your application can have at most 64 outstanding TCP segments.
When the application is sending, it must keep track of which segments have been acknowledged, and which ones haven’t.
When receiving, it must check if there are segments missing. If so, it will store the received segments and send a selective ACK.

The function xTCPWindowNew() creates such a segment descriptor. When it runs out of segments, it will print a warning: “Error: all segments occupied”. This is like the stack: it should never have a shortage.

Did you see the above logging?

I just measured: two parallel FTP sessions were sending data. As the reception on a LAN is perfect, no segments are needed for reception: no packet is out-of-order.
For transmission there was a maximum of 30 outstanding segments. That is determined by xWinProps.lTxWinSize, the transmission window size.

And finally, have you checked the task priorities? It is preferable to assign these priorities:

Higher : The EMAC deferred handler task ( niEMAC_HANDLER_TASK_PRIORITY )
Normal : The IP-task ( ipconfigIP_TASK_PRIORITY )
Lower : The tasks that make use of TCP/IP

All other tasks are free to select their priority.

Maybe you want to show some TCP, either here or privately?

friesen · May 15, 2021, 1:06pm

@hein I did attach a pcap above in that zip. I totally get that I cannot let that process starve. In this case, I can only present circumstantial evidence that it is not starved, as this problem only happens on the customer’s network, even though I have exact equipment. BTW, these captures are at the FreeRTOS nic.

If it was indeed starved, then it seems like every incoming packet should decrease the window size, which it does not. Note in the pcap that the window size decreases approximately double the mss from one packet to the next after a long period of staying the same (75 and 77). In my way of thinking, this would point to a window size calculation being starved in the ip task. Also note that the window size does not update for almost 200 ms at frame 93.

Now, if indeed you are correct about cpu starving, I would suggest that the stack should still handle this differently, and should rather do instant window changes rather than the method it is using now. In my situation, this is a very low latency network, with typical round trip times well less than 5 ms, and my gut feeling is that moving to 256 from 64 only masks the problem in that it allows the subsystem time to do housekeeping on the used segments.

Here are the application priorities

#define PRIORITIES_H

//Freertos priorities Highest = 10

#define PRIORITY_RTOS_QSPI      (configMAX_PRIORITIES - 2)
#define PRIORITY_RTOS_IP        (configMAX_PRIORITIES - 2)
#define PRIORITY_RTOS_MAC       (configMAX_PRIORITIES - 1)
#define PRIORITY_RTOS_DISCOVERY ( 1 )
#define PRIORITY_RTOS_CMD       ( 1 )
#define PRIORITY_RTOS_PUTTY     ( 1 )
#define PRIORITY_RTOS_DISPLAY   (configMAX_PRIORITIES - 1)
#define PRIORITY_RTOS_GUI       (configMAX_PRIORITIES - 3)
#define PRIORITY_RTOS_CONTROL   ( 1 )
#define PRIORITY_RTOS_DSPMAN    ( 1 )
#define PRIORITY_RTOS_DSP       ( 1 )
#define PRIORITY_RTOS_LOGS      ( 2 )
#define PRIORITY_RTOS_FADERS    (configMAX_PRIORITIES)
#define PRIORITY_RTOS_INPUT     ( 1 )
#define PRIORITY_RTOS_IPERF     ( 2 )

//Interrupt priorities Highest = 0
#define PRIORITY_DISPLAY_DMA    (3)
#define PRIORITY_DISPLAY_DMA2D  (3)
#define PRIORITY_USART1_RX      (2)
#define PRIORITY_USART1_TX_DMA  (3)
#define PRIORITY_TIMER1_PWM     (2)
#define PRIORITY_INPUT_I2C      (3)
#define PRIORITY_QUADSPI_INT    (3)
#define PRIORITY_QUADSPI_DMA    (3)
#define PRIORITY_ETH_INT        (3)
#define PRIORITY_EUIMAC         (5)

#endif

rtel · May 15, 2021, 3:53pm

Only commenting on the code in the last post, rather than the issue being discussed:

Note the highest priority a task can have is ( configMAX_PRIORITIES - 1 ) as the lowest priority is 0 (rather than 1). If you attempt to set a higher priority then the actually priority is capped to ( configMAX_PRIORITIES - 1 ). So in your case the PRIORITY_RTOS_FADERS, PRIORITY_RTOS_DISPLAY and PRIORITY_RTOS_MAC tasks all have the same priority.

RAc · May 15, 2021, 4:15pm

What happens if you do not filter the trace, Erik?

It is possible that at the customer’s site there is a braodcast storm intervening with the user data stream, keeping the TCP stack busy during short burst times. I’ve seen this many many times.

These storms may be due to legitimate uses (eg network management software that attempts to scan the entire subnet brutally), misconfigured routers or malign software.

friesen · May 16, 2021, 1:32am

This ended up being as people suggested to me, I went to prove it and ended up seeing that I was waiting on a queue elsewhere instead of servicing the socket.

I do still feel like I don’t understand why the windowing is doing what it is.

RAc · May 16, 2021, 6:23am

I believe it is fairly specification conformant:

There are recommendations on page 43 about the window management. To me that looks consistent with the behavior you see.

friesen · May 16, 2021, 12:52pm

Page 43 gives room for deferring window updates going up, but I don’t see any mention of deferring updates going down.

The problem with the way it is in the above pcap is that the sender could have sent a bunch of packets based on the 5267 window on no. 75, when in reality the window would appear to have been smaller than that based on dropping to 1460 on no 77. Do you mind explaining why the stack did it this way?

RAc · May 16, 2021, 2:07pm

Apparently I can’t (assuming you addressed me) because I wasn’t involved in the development of the network stack, sorry…

I am aware that other network stacks do gradually decrease window sizes (at least I remember having seen that behavior). It should be straightforward enough to locate the responsible piece of code (since most of my customers use lwip and not +TCP and thus would have to look at the code base from scratch, I’m not inclined to do so myself right now). Possibly code comments hint at the rationale behind it.

htibosch · May 19, 2021, 12:57pm

Hello and thanks for this interesting discussion!

When I look at the TCP statistics, I don’t see a big problem:

macro_level_red

And when I zoom into the first 120 samples, I see this:

micro_level

which is indeed around packet 75.

Note that the TCP peer must do some calculation, when it sees a win=5267, it must subtract the packets that are on its way, so the “real win” equals “win - outstanding”.
Actually the receiving end is reporting: “at the moment I acknowledged packet 18741, I had a buffer space of 5267”.

Please note also that the time stamps in Wireshark may not always agree with reality. Under Windows, it might use a time clock that runs at 18.2 Hz ( good-old MSDOS ). It just estimates the high-precision time-stamps.

Often while testing I created a PCAP-1 under Windows, and also a PCAP-2 in the FreeRTOS application ( using a fast Zynq with a disk ).
Looking at PCAP-1, I concluded that the Zynq was very slow.
Looking at PCAP-2, I concluded that Windows had huge delays!

While developing FreeRTOS+TCP, I have looked at ( the behaviour of ) other TCP/IP stacks, especially Linux and Windows. And yes I saw the slow-start and other tricks to handle congestion, Nagle’s algorithm, and more. We studied PCAPS, both taken from a LAN, as well as from a poor-quality Internet connection with a high amount of dropped packets.

These studies resulted in two features related to TCP flow-control:

Low- and High-water system. The value of “win” can drop to zero in case low-water has been reached. It will become non-zero again as soon as high-water has been reached.
( see FREERTOS_SO_SET_LOW_HIGH_WATER ).
A stop and go system: this is the manual version. This was developed for audio streaming. As soon as the DAC driver ( Digital to Analogue Convertor ) has enough data, it will send a STOP. As soon as the DAC buffer gets below 50%, it sends a START again.
( see FREERTOS_SO_STOP_RX ).

As these are non-standard features, they are hardly documented. I will document them more in detail and post that first here, then on freertos.org.

FreeRTOS+TCP is in style with the kernel: it is kept as simple as possible in order to:

have a small memory footprint.
to keep it easy to understand.
to keep it robust and fast.
facilitate debugging your TCP/IP application.

htibosch · May 19, 2021, 1:46pm

I just described how to use the two flow-control mechanisms in a PDF file: plus_tcp_flow_control.zip (82.0 KB)

friesen · May 19, 2021, 6:57pm

What triggers the win recalc?

htibosch · May 20, 2021, 2:06am

What triggers the win recalc?

There is not a recalculation, there are win updates.
When the socket receives data from the peer, and when the application reads from a TCP socket. These events may both lead to a win update.

htibosch · May 20, 2021, 7:42am

Erik wrote:

This ended up being as people suggested to me, I went to prove it and ended up seeing that I was waiting on a queue elsewhere instead of servicing the socket.

Have you made a new PCAP since then? And does it look better now?

I do still feel like I don’t understand why the windowing is doing what it is.

I just did a “tcptrace” analysis of your PCAP file, very interesting:

tcptrace

It shows the data that are transferred ( about 20,000 bytes/sec ), and also the estimated free buffer space ( “win” ) at the receiving end.

At first “win” is 11680 all the time: that means that the application picks up every packet immediately. This corresponds with your declaration:

    xWinProps.lRxWinSize = 8; /* Unit: MSS, 8 x 1460 = 11680 */

Then the RX stream buffer starts filling up, because the application is not reading from it. Luckily the buffer is bigger that “win” :

    xWinProps.lRxBufSize = 16 * ipconfigTCP_MSS; /* Unit: bytes */

Slowly win decreases until it stays at 1460, and then it becomes zero for a short while. At that moment, the socket has 23360 bytes stored in the RX buffer.

Finally at 1.8 seconds, the application reads from the RX buffer, and everything becomes normal again.
After that the application reads all data immediately, and “win” stays at 11680.

friesen · May 20, 2021, 12:31pm

Thanks for explaining this.

I think what made me confused, It seems like it should look like my red line here.

I’ve made new pcaps, the window never goes down now. Basically the problem was the initial burst, which fired off queues to the UI. The queues filled up among other things. It took fiddling with tracealyzer and priorities among other optimizations to iron this out.

About packet order, we spent a bunch of time on this project trying to figure out why one of the port mirrored switches kept showing out of order packets. In the end we rejected this as switch related, as we had proof from both ends showing otherwise. One advantage of integrating wireshark into NetworkInterface.c is you know the exactly how +TCP sees everything.