FreeRTOS+TCP slow to accept TCP connections

I have a TCP server demo running on a STM32H7 Nucleo board. It seems to be working correctly for the most part: I can flood ping and there is no packet loss, I see ARP working correctly, and so on. But for some reason, when I try to connect to the stack over TCP, it takes a long and variable time to reply with SYN ACK—anywhere from a few SYN packets ignored to perhaps ten of them, before the stack suddenly ACKs and the connection is open. Once the TCP connection is established, everything seems fine again. Here is a packet trace using tcpdump; 192.168.15.1 is a Linux PC and 192.168.15.2 is the Nucleo board running FreeRTOS+TCP:

09:52:03.380033 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.15.2 tell 192.168.15.2, length 46
09:52:10.363374 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.15.2 tell 192.168.15.1, length 28
09:52:10.363685 ARP, Ethernet (len 6), IPv4 (len 4), Reply 192.168.15.2 is-at 00:11:22:33:44:55, length 50
09:52:10.363696 IP (tos 0x0, ttl 64, id 46606, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.15.1.37436 > 192.168.15.2.23: Flags [S], cksum 0x9f82 (incorrect -> 0xe57b), seq 2741833246, win 64240, options [mss 1460,sackOK,TS val 797594838 ecr 0,nop,wscale 7], length 0
09:52:11.365802 IP (tos 0x0, ttl 64, id 46607, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.15.1.37436 > 192.168.15.2.23: Flags [S], cksum 0x9f82 (incorrect -> 0xe190), seq 2741833246, win 64240, options [mss 1460,sackOK,TS val 797595841 ecr 0,nop,wscale 7], length 0
09:52:12.389672 IP (tos 0x0, ttl 64, id 46608, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.15.1.37436 > 192.168.15.2.23: Flags [S], cksum 0x9f82 (incorrect -> 0xdd90), seq 2741833246, win 64240, options [mss 1460,sackOK,TS val 797596865 ecr 0,nop,wscale 7], length 0
09:52:13.413779 IP (tos 0x0, ttl 64, id 46609, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.15.1.37436 > 192.168.15.2.23: Flags [S], cksum 0x9f82 (incorrect -> 0xd990), seq 2741833246, win 64240, options [mss 1460,sackOK,TS val 797597889 ecr 0,nop,wscale 7], length 0
09:52:14.437746 IP (tos 0x0, ttl 64, id 46610, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.15.1.37436 > 192.168.15.2.23: Flags [S], cksum 0x9f82 (incorrect -> 0xd590), seq 2741833246, win 64240, options [mss 1460,sackOK,TS val 797598913 ecr 0,nop,wscale 7], length 0
09:52:15.462046 IP (tos 0x0, ttl 64, id 46611, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.15.1.37436 > 192.168.15.2.23: Flags [S], cksum 0x9f82 (incorrect -> 0xd190), seq 2741833246, win 64240, options [mss 1460,sackOK,TS val 797599937 ecr 0,nop,wscale 7], length 0
09:52:17.511420 IP (tos 0x0, ttl 64, id 46612, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.15.1.37436 > 192.168.15.2.23: Flags [S], cksum 0x9f82 (incorrect -> 0xc98e), seq 2741833246, win 64240, options [mss 1460,sackOK,TS val 797601987 ecr 0,nop,wscale 7], length 0
09:52:21.541423 IP (tos 0x0, ttl 64, id 46613, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.15.1.37436 > 192.168.15.2.23: Flags [S], cksum 0x9f82 (incorrect -> 0xb9d0), seq 2741833246, win 64240, options [mss 1460,sackOK,TS val 797606017 ecr 0,nop,wscale 7], length 0
09:52:29.989755 IP (tos 0x0, ttl 64, id 46614, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.15.1.37436 > 192.168.15.2.23: Flags [S], cksum 0x9f82 (incorrect -> 0x98d0), seq 2741833246, win 64240, options [mss 1460,sackOK,TS val 797614465 ecr 0,nop,wscale 7], length 0
09:52:33.218868 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.15.2 tell 192.168.15.2, length 46
09:52:35.109489 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.15.2 tell 192.168.15.1, length 28
09:52:35.109871 ARP, Ethernet (len 6), IPv4 (len 4), Reply 192.168.15.2 is-at 00:11:22:33:44:55, length 50
09:52:46.373435 IP (tos 0x0, ttl 64, id 46615, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.15.1.37436 > 192.168.15.2.23: Flags [S], cksum 0x9f82 (incorrect -> 0x58d0), seq 2741833246, win 64240, options [mss 1460,sackOK,TS val 797630849 ecr 0,nop,wscale 7], length 0
09:53:03.060471 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.15.2 tell 192.168.15.2, length 46
09:53:18.629464 IP (tos 0x0, ttl 64, id 46616, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.15.1.37436 > 192.168.15.2.23: Flags [S], cksum 0x9f82 (incorrect -> 0xdacf), seq 2741833246, win 64240, options [mss 1460,sackOK,TS val 797663105 ecr 0,nop,wscale 7], length 0
09:53:18.630286 IP (tos 0x0, ttl 128, id 0, offset 0, flags [none], proto TCP (6), length 52)
    192.168.15.2.23 > 192.168.15.1.37436: Flags [S.], cksum 0x238c (correct), seq 536886496, ack 2741833247, win 2920, options [mss 1460,nop,wscale 0,nop,nop,sackOK], length 0
09:53:18.630400 IP (tos 0x0, ttl 64, id 46617, offset 0, flags [DF], proto TCP (6), length 40)
    192.168.15.1.37436 > 192.168.15.2.23: Flags [.], cksum 0x6dc9 (correct), seq 1, ack 1, win 502, length 0

and here is the debugging output from the stack, on my serial terminal:

FreeRTOS_AddEndPoint: MAC: 44-55 IPv4: c0a80f02ip
main: starting scheduler
prvIPTask started
PHY ID 7C130
TCP listen: in task func
TCP listen: socket created
TCP listen: sockopts set
xPhyReset: phyBMCR_RESET 0 ready
+TCP: advertise: 01E1 config 3100
Autonego ready: 00000004: full duplex 100 mbit high status
TCP listen: socket bound
TCP listen: in loop
pxEasyFit: ARP c0a80f01ip -> c0a80f02ip
ipARP_REQUEST from c0a80f01ip to c0a80f02ip end-point c0a80f02ip
pxEasyFit: ARP c0a80f01ip -> c0a80f02ip
ipARP_REQUEST from c0a80f01ip to c0a80f02ip end-point c0a80f02ip
Gain: Socket 23 now has 1 / 20 child me: 20004ad0 parent: 20003fb8 peer: 20004ad0
prvSocketSetMSS: 1460 bytes for c0a80f01ip port 37436
prvWinScaleFactor: uxRxWinSize 2 MSS 1460 Factor 0
TCP: passive 23 => 1.15.168.192 port 37436 set ESTAB (scaling 1)
prvAcceptWaitClient: client 20004ad0 parent 20003fb8
TCP listen: accepted! creating task

(The lines starting with “TCP listen:” are ones I added.) There is a minute-long delay between the ARP messages and the “Gain: Socket 23 now has 1 / 20 child”, during which time the stack is ignoring TCP SYN packets from timestamp 09:52:10.4 until 09:53:18.6, when it finally ACKs and the connection is suddenly established. From that time on there is basically zero latency and everything is perfectly snappy.

Any idea what might be going on, and how to debug it? Thanks.

Whats the priority of tasks in your system? Do you have any other tasks that are running with priority higher than FreeRTOS+TCP tasks that might block network tasks?

I’m seeing this:

name            state   prio    stack   num
Green LED       X       1       272     1
Yellow LED      R       1       100     2
Red LED         R       1       100     3
IDLE            R       0       104     6
EMAC_STM32      B       6       272     7
IP-task         B       5       2274    5
TCP Debug Liste B       1       551     4

(The “LED” tasks are simple blinkys with varying delays. The task list is obtained and printed as part of the “green LED” task.)
So I’m not seeing anything unexpected here. The IP-task is running at the highest priority except for the EMAC, and it is waiting on something, for many seconds. And during that time, it is ignoring SYN packets. But it will, for example, respond to an unlimited number of pings (ICMP) with practically zero latency (I see about 0.2 ms) and no dropped packets.

It does seem that other activity on the network “modulates” the deafness of the stack to TCP SYN packets. For example, if I run pings and then try to connect, it typically takes longer to SYN ACK, and the connection often times out completely. When trying to connect without pings running, it usually takes between 30-60 seconds.

Is there some way to turn up the debugging verbosity so I can see incoming packets traversing the various processing stages inside the stack?

See here: FreeRTOS-Plus-TCP Configuration - FreeRTOS™.

So basically, make all of the trace macros print something? A bit of a chore, but probably worth it. I’ll report back next week with findings.

No, that is not what I meant. I wanted you to define ipconfigHAS_DEBUG_PRINTF and ipconfigHAS_PRINTF to 1.

Oh, but I’ve already done that, as you can see from the trace messages above. I was asking if there was any additional logging / verbosity that I could turn on, since the basic debugging output isn’t hinting at why the stack goes deaf for TCP SYN packets for a while. Maybe I need to add some debug output in the lowest level code in NetworkInterface.c…

Yes, that seems like a good idea. It would help us determine if the SYN packet was received by the stack in time or not.

Ok, so far I’ve found that every packet is causing a call to HAL_ETH_RxLinkCallback() in STM32 NetworkInterface.c. There, prvAcceptPacket() is called and returns pdTRUE as expected. So the packets are making it through the software filters, but a lot of TCP SYN packets are failing to progress beyond that point. (ICMP and ARP have no trouble.)

HAL_ETH_RxLinkCallback() does nothing else except manipulate the network buffer descriptors. I don’t understand the driver well enough to know what is supposed to happen next. I expected a function call of some sort to trigger the next stage of rx packet processing, but no. Where should I look to see if the packet is “coming out the other side”?

[Small update: I dug into the HAL driver (the one provided in the FreeRTOS+TCP distribution) and added some debug output in HAL_ETH_ReadData(), after HAL_ETH_RxLinkCallback() is called. There is a while loop which seems to execute multiple times, then eventually a call to ETH_UpdateDescriptor(), all of which seems to be happening on the ignored packets as well as the one which finally “clicks” and sets up the connection.]

Paging @htibosch :slight_smile:

Ugh, please disregard (maybe?) most of what I wrote above. I thought the problem was likely to be in the low-level code and spent too much time tracing through it when I should have started looking at the IP task itself.

I’ve managed to trace the problem to prvProcessEthernetPacket(), sort of… I added debugging output to prvSendRxEvent() and xSendEventStructToIPTask(), to see which event types were being queued for the IP task. Everything looked normal, i.e. there were eNetworkRxEvent events being queued for all of the TCP SYN packets that were subsequently ignored by the stack.

I then turned my attention to prvProcessEthernetPacket(), and added a FreeRTOS_debug_printf call just before these lines:

    /* Interpret the received Ethernet packet. */
    switch( pxEthernetHeader->usFrameType )

to show me the frame type. As soon as I added the debug printf, everything started working perfectly. I commented it out, and was back to the same symptoms as before. In fact, adding the debug printf anywhere inside prvProcessEthernetPacket(), even at the end, makes the stack behave correctly.

I’m not sure where to go from here. Am I seeing a weird race condition? I tried replacing the debug printf with a short busy-wait, and also with a vTaskDelay(2), and those did not “fix” the problem like the printf. I thought maybe it had to do with the serial port output (not sure why), so I disabled that part; the printf was then just filling a buffer and returning. And it still made the stack work. So I’m baffled. Any ideas?

I may need to remove the printfs and use GPIO debugging in prvProcessEthernetPacket(), assuming that doesn’t magically make it work like the printf does…

Can you try turning off any compiler optimizations? Do you have any other tasks running at the same priority as the IP task? Using Tracelyzer may be beneficial here to understand what is going on.

@mark03

As soon as I added the debug printf, everything started working perfectly.

Could it be that accidentally, your .ethernet_data memory is cacheable? And that your debugging statement tries to access the cached memory?

So this could be enough to solve the problem:

if( pxEthernetHeader->usFrameType )
{
    _nop();
}

Just calling vTaskDelay(2) didn’t help, possibly because it didn’t try to access the cached memory?

Here is how I defined/used the memories, just as an example:

STM32H747XIHX_FLASH.ld
main.c

Note that AXI_RAM is used to store Ethernet packets.

1 Like

@mark03, would it be possible to attach the source code of your application, minus the private code, if any?

I do have an STM32H747, and I would like to try to reproduce the problem.

Apologies for the slow reply; I am traveling and not working every day.

Compiler optimizations are already turned off (-O0).

@htibosch good guess on the caching but when I tried that experiment (with the __nop()) it didn’t make any difference like the printf does. That is not conclusive, of course.

The project I’m working on does not allow me to share code, but if all else fails, I will see if I can put together a minimal example. Here is my MPU configuration:

ARM_MPU_SetRegion(
    ARM_MPU_RBAR(0, __eth_sec_start),
    ARM_MPU_RASR(1,               // disable instruction fetch
                 ARM_MPU_AP_FULL, // full access
                 1,               // TEX = 001 (see Table B3-13 in ARMv7M arch manual)
                 0,               // not shareable
                 0,               // not cacheable
                 0,               // not bufferable
                 0,               // sub-region disable field
                 ARM_MPU_REGION_SIZE_128KB)
);
ARM_MPU_Enable(MPU_CTRL_PRIVDEFENA_Msk); // | MPU_CTRL_HFNMIENA_Msk);

and linker script:

MEMORY
{
  RAM_D1 (xrw)   : ORIGIN = 0x24000000, LENGTH =  512K
  FLASH   (rx)   : ORIGIN = 0x08000000, LENGTH = 1024K    /* Memory is divided. Actual start is 0x08000000 and actual length is 2048K */
  DTCMRAM (xrw)  : ORIGIN = 0x20000000, LENGTH =  128K
  RAM_ETH (xrw)  : ORIGIN = 0x30000000, LENGTH =  128K    /* 20jul25: carved out SRAM1 (128k) for Ethernet buffers */
  RAM_D2 (xrw)   : ORIGIN = 0x30020000, LENGTH =  160K
  RAM_D3 (xrw)   : ORIGIN = 0x38000000, LENGTH =   64K
  ITCMRAM (xrw)  : ORIGIN = 0x00000000, LENGTH =   64K
}

[…]

.eth_sec :
{
    . = ALIGN(32);  /* just in case---should check if the ETH periph has a requirement */
    PROVIDE_HIDDEN (__eth_sec_start = .);
    KEEP (*(SORT(.RxDescripSection)))
    KEEP (*(SORT(.TxDescripSection)))
    KEEP (*(SORT(.EthBuffersSection)))
    PROVIDE_HIDDEN (__eth_sec_end = .);    
} >RAM_ETH

and NetworkInterface.c:

#define niEMAC_TX_DESC_SECTION            ".TxDescripSection"
#define niEMAC_RX_DESC_SECTION            ".RxDescripSection"
#define niEMAC_BUFFERS_SECTION            ".EthBuffersSection"

[…]

static ETH_DMADescTypeDef xDMADescTx[ ETH_TX_DESC_CNT ] __ALIGNED( portBYTE_ALIGNMENT ) __attribute__( ( section( niEMAC_TX_DESC_SECTION ) ) );
static ETH_DMADescTypeDef xDMADescRx[ ETH_RX_DESC_CNT ] __ALIGNED( portBYTE_ALIGNMENT ) __attribute__( ( section( niEMAC_RX_DESC_SECTION ) ) );

[…]

void vNetworkInterfaceAllocateRAMToBuffers( NetworkBufferDescriptor_t pxNetworkBuffers[ ipconfigNUM_NETWORK_BUFFER_DESCRIPTORS ] )
{
    static uint8_t ucNetworkPackets[ ipconfigNUM_NETWORK_BUFFER_DESCRIPTORS ][ niEMAC_TOTAL_BUFFER_SIZE ] __ALIGNED( niEMAC_BUF_ALIGNMENT ) __attribute__( ( section( niEMAC_BUFFERS_SECTION ) ) );

[…]

Zooming out, it appears to me that “successful” STM32H7 ports are using a variety of MPU settings, sometimes blindly copying, without in most cases understanding what they are doing or why. This made it difficult to know how to proceed. I have tried to keep things as simple as possible, with the result that my version again differs from all previous ones :slight_smile: and therefore, it’s quite possible something is wrong. If so, I hope we can pinpoint what is wrong with it, instead of “try this, it worked for me” style debugging! I’d certainly like to understand it better.

That said, I would be surprised if faulty MPU config explains the symptoms I’m seeing. Why is the problem with TCP packets only, not ARP or ICMP packets (of any size! I checked…)? Why does it happen exactly once, when attempting initial connection, and then never again, nor for data packets nor for subsequent disconnections / reconnections?

BTW the FreeRTOS task priorities can be seen in my debugging output up-thread. There are no other tasks running on the system. System core clock is 300 MHz. Interrupt priorities are as follows:

void SetInterruptPriorities()
{
  // Ethernet should be at the highest priority that still allows FreeRTOS syscalls
  NVIC_SetPriority(ETH_IRQn, 5);

  // Debug serial port (no FreeRTOS interaction)
  //
  // USART character match (RX end-of-line detector)
  NVIC_SetPriority(USART3_IRQn, 4);
  // DMA USART3 RX (only happens on RX buffer overflow)
  NVIC_SetPriority(DMA1_Stream0_IRQn, 4);
  // DMA USART3 TX (happens routinely)
  NVIC_SetPriority(DMA1_Stream1_IRQn, 4);

  // Run-time stats counter (no FreeRTOS interaction)
  NVIC_SetPriority(TIM6_DAC_IRQn, 0);
}

and configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY is the default “5” on my system.

I am using the ST HAL Ethernet driver from the FreeRTOS+TCP repository. I have only changed it in a few places to eliminate dependencies on other HAL components. (I try to use as little of the HAL as I possibly can. That is why, e.g., my MPU configuration goes direct to the CMSIS function calls instead of detouring through the HAL obfuscation layer.)

1 Like

I discovered that I was not accessing the Ethernet buffers region address correctly, and thus the MPU was not set up for the correct region. I now have this in my linker script:

  .eth_sec :
  {
    . = ALIGN(32);  /* just in case---should check if the ETH periph has a requirement */
    __eth_sec_start = .;
	  KEEP (*(SORT(.RxDescripSection)))
	  KEEP (*(SORT(.TxDescripSection)))
	  KEEP (*(SORT(.EthBuffersSection)))
  } >RAM_ETH

and this in the MPU config code:

ARM_MPU_SetRegion(
    ARM_MPU_RBAR(0, (uint32_t) (&__eth_sec_start)),
    ARM_MPU_RASR(1,               // disable instruction fetch
                 ARM_MPU_AP_FULL, // full access
                 1,               // TEX = 001 (see Table B3-13 in ARMv7M arch manual)
                 0,               // not shareable
                 0,               // not cacheable
                 0,               // not bufferable
                 0,               // sub-region disable field
                 ARM_MPU_REGION_SIZE_128KB)
);
ARM_MPU_Enable(MPU_CTRL_PRIVDEFENA_Msk); // | MPU_CTRL_HFNMIENA_Msk);

so the base address is now 0x3000 0000 as it should be. Unfortunately, nothing has changed in the behavior of my code. Perhaps this suggests MPU config is still the culprit, and there is another bug in the above?

Also, FWIW, I’ve never enabled Icache or Dcache in SCB.

Hi, Mark. Try to enable maximum optimization levels for your GCC:

-O3 -flto=auto

And the same time, enable all common and extra warnings. If it’s not an MPU itself, link time optimizer will go much more deeper analyzing the code. And it’s warnings may provide you some clue what’s wrong with the code flow, what’s wrong with an arrays etc.

P.S. I usually enabling all optimization just on creating a project. By my experience, maximum optimization level while developing code is best tester of code integrity and reveals a lot of possible errors.

Hi Mark, it looks as if most incoming syn packets are discarded due to a bad check sum. That looks odd. Can you run a parallel wireshark trace to see if the check sum on the wire is the same as seen by your dut?

If not, the packets may be get corrupted on reception. I have seen that kind of thing when the quartz that drives the PHY does not meet the PHY manufacturers precision, resulting in frames being corrupted beginning with a certain frame length.

Edit: An easy test whether packet size influences the issue would be to ping the DUT with subsequently larger packet sizes (use the -s option with the ping command).

Another edit: Your observation that adding a debug solves the problem hints at a bug in the ethernet driver. Are you using 0 copy, possibly implying that an incoming packet must not be reused by the dma engine until fully processed by the application?

When programming a STM32H7, I always used my own network interface, the one that is now called “Legacy”.

But I like the new STM32 generic driver written by HTRamsey, although I haven’t had the time yet to test all components.

Last week I decided to test the latest version, but I am not lucky.

Sending out data is easy, that always seems to work well. I use UDP for debugging/logging, and that works perfectly.

I see many problems when receiving packets, prvAcceptPacket() complains very often:

HAL_ETH_RxLinkCallback: Buffer Dropped

As a test, I run ping continuously, and it seems to work in waves: 10 seconds bad, 10 seconds well.

I keep on searching.

EDIT

Removing the following test makes everything run:

if( ( ulRxDesc & ETH_IP_PAYLOAD_MASK ) == ETH_IP_PAYLOAD_UNKNOWN )
{
    iptraceETHERNET_RX_EVENT_LOST();
    break;
}

Now ICMP is perfect, and also iperf3 works in two directions at a great speed.

@mark03, aren’t you looking at the same problem?
In short: for me ARP didn’t work. Once an IP address is looked up, a TCP connection can do it’s thing.

1 Like

Hi everyone,

Good news—I believe I know the source of the problem. @Nikolay-Po’s suggestion was instrumental, but not because of any code flaws per se. Rather, it (by chance?) allowed the error to continue happening even with the presence of debugging printf statements. That, in turn, allowed me to drill much deeper into the code and find where the SYN packets were failing to be handled.

The problem turned out to be in my initial sequence number generator. I am using the STM32H7 RNG peripheral but I had forgotten to select and enable the second clock (kernel clock) into the RNG. It defaults to HSI48 but I didn’t have that oscillator turned on. As a consequence, I was returning zero from that function, causing prvHandleListen() (in FreeRTOS_TCP_State_Handling_IPv4.c) to fail silently.

I can’t explain why my connection requests sometimes worked (after a long delay at first, and then immediately on subsequent connects), or why inserting a printf in certain builds also made it work. But bad sequence numbers do seem to be the root cause.

Thanks for the suggestions!

3 Likes

@htibosch, I went to HTRamsey’s github and all I saw there was a fork of the STMicro HAL drivers. If someone actually has a quality, clean-room implementation of an Ethernet driver for STM32, that would be fantastic. All of my HAL dependencies scare me. Knock on wood… but the “non-legacy” HAL driver in the FreeRTOS repos seems to be OK so far. (The problem is that functionality vs non-functionality is one of the worst ways to evaluate a chunk of code…)