I have heavy ping lose rate on my STM32F107 board, with FreeRTOS+TCP

lizhang · November 19, 2020, 3:23pm

I updated the existing STM32F4 network driver for my STM32F107 board. I can get ping works periodically. As the same board with Lwip works fine, HW is fine.
Here is some of my findings. The heap allocation failure hook is never called. So the heap is not exhausted.
I also have found that IP-task was invoked 3 times in one second sometimes, and only once during the other period. So I believe that if IP-task is invoked 3 times very quickly, ping was responsed correctly. And if I saw only once IP-task switched in, something wrong.
To debug the error, I have now only about 3 task switches to and fro. See the attached task switch log.task_switch.zip (900 Bytes) . The second column is the switch-in time, in ms. the final column is the task priority.
It is interesting that the board could recover after losing some ping packet. See below logs. Sorry for the chinese characters, but I think you can guess sometimes ping packets are lost, sometimes ping pass through.

来自 192.168.1.2 的回复: 无法访问目标主机。
来自 192.168.1.2 的回复: 无法访问目标主机。
来自 192.168.1.2 的回复: 无法访问目标主机。
来自 192.168.1.100 的回复: 字节=32 时间=1218ms TTL=64
来自 192.168.1.100 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.1.100 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.1.100 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.1.100 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.1.100 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.1.100 的回复: 字节=32 时间=1ms TTL=64
来自 192.168.1.100 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.1.100 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.1.2 的回复: 无法访问目标主机。
来自 192.168.1.2 的回复: 无法访问目标主机。
来自 192.168.1.2 的回复: 无法访问目标主机。
来自 192.168.1.2 的回复: 无法访问目标主机。
来自 192.168.1.2 的回复: 无法访问目标主机。
来自 192.168.1.2 的回复: 无法访问目标主机。
来自 192.168.1.2 的回复: 无法访问目标主机。
来自 192.168.1.100 的回复: 字节=32 时间=1216ms TTL=64
来自 192.168.1.100 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.1.100 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.1.100 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.1.100 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.1.100 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.1.100 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.1.100 的回复: 字节=32 时间=1ms TTL=64
来自 192.168.1.100 的回复: 字节=32 时间=1ms TTL=64
来自 192.168.1.2 的回复: 无法访问目标主机。
来自 192.168.1.2 的回复: 无法访问目标主机。
来自 192.168.1.2 的回复: 无法访问目标主机。
来自 192.168.1.2 的回复: 无法访问目标主机。
来自 192.168.1.2 的回复: 无法访问目标主机。
来自 192.168.1.2 的回复: 无法访问目标主机。
来自 192.168.1.2 的回复: 无法访问目标主机。
来自 192.168.1.100 的回复: 字节=32 时间=1402ms TTL=64
来自 192.168.1.100 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.1.100 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.1.100 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.1.100 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.1.100 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.1.100 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.1.100 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.1.100 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.1.2 的回复: 无法访问目标主机。
来自 192.168.1.2 的回复: 无法访问目标主机。
来自 192.168.1.2 的回复: 无法访问目标主机。
来自 192.168.1.2 的回复: 无法访问目标主机。
来自 192.168.1.2 的回复: 无法访问目标主机。
来自 192.168.1.2 的回复: 无法访问目标主机。
来自 192.168.1.2 的回复: 无法访问目标主机。
来自 192.168.1.100 的回复: 字节=32 时间=1399ms TTL=64
来自 192.168.1.100 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.1.100 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.1.100 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.1.100 的回复: 字节=32 时间<1ms TTL=64
来自 192.168.1.100 的回复: 字节=32 时间<1ms TTL=64

Could anyone suggest a way forward?
BR
/Li

rtel · November 19, 2020, 5:29pm

What is the time between each incoming ping?
How are you signaling the TCP task from the Ethernet MAC?
Which versions of the FreeRTOS kernel and TCP stack are you using?
Do you have configASSERT() defined?

lizhang · November 19, 2020, 11:02pm

Richard,
The ping was sent once every second.
I am using the file FreeRTOS-Plus/Source/FreeRTO
S-Plus-TCP/portable/NetworkInterface/STM32Fxx/NetworkInterface.c for network interface. I havn’t read much about it.
The kernel version is 10.4.0.
I have added code in configASSERT to make sure a RTT log will be print out. And during my ping testing, the assert always succeeded.
BR
/Li

htibosch · November 20, 2020, 8:11am

Reading this, I would think of a resource problem. Packets are probably dropped because all buffers are busy.

This may be related to heavy CPU usage: if you have high-priority tasks that ask too much CPU time.
Or you didn’t reserve enough resources? Think of the heap size, and the number of network buffers ( ipconfigNUM_NETWORK_BUFFER_DESCRIPTORS ).

Do you have FreeRTOS_printf() enabled? Do you see its output somewhere?

Would you mind to try this adapted driver ? It has more complete logging of the resources.

Sorry for the chinese characters, but I think you
can guess sometimes ping packets are lost

No problem, it was easy to have it translated on the web:

Reply from 192.168.1.2: Unable to reach the target host

htibosch · November 20, 2020, 8:15am

Just to make sure, beside defining the function vApplicationMallocFailedHook(). have you also defined:

#define configUSE_MALLOC_FAILED_HOOK   1

in your FreeRTOSConfig.h?

Could you share your copy of FreeRTOSIPConfig.h? Maybe you can attach it as a file?

lizhang · November 20, 2020, 11:37am

Hein,
Thank you for your new investigation on my problem again.
These are the config files.
FreeRTOSConfig.h (12.0 KB) FreeRTOSIPConfig.h (22.5 KB)
There is only 64K SRAM on STM32F107. So I have modified some network memory to make sure the memory is still available for the application.
I have FreeRTOS_printf enabled. And I updated the file NetworkInterface.c from your indicated version. But it seems the added FreeRTOS_printf will not log anything for ping packet. For the other two files stm32fxx_hal_eth.c and stm32fxx_hal_eth.h, I didn’t merge your changes in. As I am using STM32F1, my driver is not same as the STM32F4 base. But if you need that info to check, I can merge that and capture logs.
I think it is my STM32F1 driver not ported correctly. But network is new to me, and I am wondering if anyone has met the problem. The network is not stable, very strange.
BR
/Li

lizhang · November 20, 2020, 2:03pm

I begain to review my code, and I am thinking the __HAL_LOCK macro may cause problem. It simply cause the invoking function return if the lock can not be obtained. But it seems FreeRTOS community is still using the code. Really interesting.

#if (USE_RTOS == 1U)
/* Reserved for future use */
#error "USE_RTOS should be 0 in the current HAL release"
#else
#define __HAL_LOCK(__HANDLE__)                                           \
                                do{                                        \
                                    if((__HANDLE__)->Lock == HAL_LOCKED)   \
                                    {                                      \
                                       return HAL_BUSY;                    \
                                    }                                      \
                                    else                                   \
                                    {                                      \
                                       (__HANDLE__)->Lock = HAL_LOCKED;    \
                                    }                                      \
                                  }while (0U)

#define __HAL_UNLOCK(__HANDLE__)                                          \
                                  do{                                       \
                                      (__HANDLE__)->Lock = HAL_UNLOCKED;    \
                                    }while (0U)
#endif /* USE_RTOS */

htibosch · November 21, 2020, 12:33am

The __HAL_LOCK() macro that you show is indeed problematic because it never blocks, so it could make your application hang.

The macro’s __HAL_LOCK() and __HAL_UNLOCK() won’t do much unless you’re accessing the same handle from multiple tasks.
Within the NetworkInterface there is a rule: the EMAC peripheral and the PHY may be accessed by the IP-task until the prvEMACHandlerTask() has started and takes control.

In order to save RAM you can disable the TCP sliding window mechanism:

#define ipconfigUSE_TCP_WIN    0

You can decrease the size of a maximum packet:

#define ipconfigNETWORK_MTU    512

Use BufferAllocation_2.c in stead of BufferAllocation_1.c.

You can play with ipconfigNUM_NETWORK_BUFFER_DESCRIPTORS. The optimal value should be assessed by running you application while checking uxGetMinimumFreeNetworkBuffers() and uxGetNumberOfFreeNetworkBuffers().

You could define the following in your FreeRTOSIPConfig.h:

#define iptraceFAILED_TO_OBTAIN_NETWORK_BUFFER()    configASSERT( 0 )

which means that the application will halt as soon as it is running out of network buffers.
Only define this macro while testing, not in a real-life application.

About the network driver: I wonder if it would be possible to add support for STM32F107 in the existing STM32Fx driver.
And if not, I would like to check the driver that you are using.

lizhang · November 21, 2020, 12:41am

Hein,
Just quick response.
The MAC lacks support for enhanced DMA feature on F107, comparing with F4 series. So the main stuff I was doing is remove the enhanced DMA related code.
I will check your other suggestions later, and reply you when I found something.
BR
/Li

htibosch · November 21, 2020, 1:02am

Thanks for your quick reply.

It would be nice if the code can be used with a few extra #ifdef STM32F1xx statements, or with more clarity:

#if( HAS_ADVANCED_DMA != 0 )
#else
#endif

lizhang · November 21, 2020, 1:35am

Yes. That would be easy. As the change are just about 10+ pieces. But currently it is not stable. I can post the code if someone is interested.

htibosch · November 21, 2020, 4:33am

Sure, please post it, and also the ST source please ( stm32f1xx_hal_eth.{c,h} )

lizhang · November 21, 2020, 10:24am

Hein,
These are the driver sources.stm32f1xx_hal_eth.c (51.3 KB) stm32f1xx_hal_eth.h (103.3 KB)
The header is not modified since generated by CubeMX.
There is also a little modification in NetworkInterface.c to include the F1 header.
BR
/Li

htibosch · November 21, 2020, 2:58pm

Thank you!

Please find the two drivers in the attachment:
stm32l_eth_drivers.zip (19.7 KB)

If you compare the 2 files, you will see the exact differences.

Indeed, EnhancedDescriptorFormat is not implemented for the STM32F1xx part.

I wrote some _HT_ comments in stm32f1xx_hal_eth.c, they might be important.

lizhang · November 23, 2020, 12:30am

Hein,
I added some printf in the IRQ handler just for debug. As I am using RTT for stdout, it seems fine.
As the problem is unstable, I have tried to lower the speed to 10M. But no improvement.
Then I logged the RX packet, it seems there is always some CRC wrong packet received, even during the correct ping response period.
Here are the logs:

648 dmasr = 0x40
RDES0 0x400381
01 02 03 04 05 06 00 e1 4c 68 07 8f 08 06 00 01 08 00 06 04 00 01 00 e1 4c 68 07 8f c0 a8 01 02 01 02 03 04 05 06 c0 a8 01 64 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 94 3c 97 9c
RDES0 0x400381 (this indicate a error during RX, IPv header checksum error)
ff ff ff ff ff ff 00 e1 4c 68 07 8f 08 06 00 01 08 00 06 04 00 01 00 e1 4c 68 07 8f c0 a8 01 02 00 00 00 00 00 00 c0 a8 01 fe 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 90 16 cc fc
863 dmasr = 0x40
RDES0 0x4e0320 (this is a correct RX)
01 02 03 04 05 06 00 e1 4c 68 07 8f 08 00 45 00 00 3c 28 b3 00 00 40 01 ce 57 c0 a8 01 02 c0 a8 01 64 08 00 3c 80 00 01 10 db 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f 70 71 72 73 74 75 76 77 61 62 63 64 65 66 67 68 69 25 0d 6c 65
865 dmasr = 0x1
648 dmasr = 0x40
648 dmasr = 0x40
RDES0 0x400381
01 02 03 04 05 06 00 e1 4c 68 07 8f 08 06 00 01 08 00 06 04 00 01 00 e1 4c 68 07 8f c0 a8 01 02 01 02 03 04 05 06 c0 a8 01 64 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 94 3c 97 9c
RDES0 0x400381
ff ff ff ff ff ff 00 e1 4c 68 07 8f 08 06 00 01 08 00 06 04 00 01 00 e1 4c 68 07 8f c0 a8 01 02 00 00 00 00 00 00 c0 a8 01 fe 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 90 16 cc fc
881 dmasr = 0x40
RDES0 0x4e0320
01 02 03 04 05 06 00 e1 4c 68 07 8f 08 00 45 00 00 3c 28 b4 00 00 40 01 ce 56 c0 a8 01 02 c0 a8 01 64 08 00 3c 7f 00 01 10 dc 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f 70 71 72 73 74 75 76 77 61 62 63 64 65 66 67 68 69 26 75 b3 3e
883 dmasr = 0x1
648 dmasr = 0x40
648 dmasr = 0x40

Since it indicate there is something wrong with the RX DMA (there is a difference between the F1 and F4 on the DMA engine). I also tried to disable the ipconfigZERO_COPY_RX_DRIVER. Still no improvement.

Can you see something from the wrong packet received? I have no experience about the ethernet packet.
BR
/Li

htibosch · November 23, 2020, 4:05am

@lizhang : would it be possible for you to make a PCAP file, using Wireshark or so? That makes it so much easier to analyse the data.

About the checksums: does FreeRTOS+TCP see a checksum error? Or do you see them in Wireshark? The latter might be caused by checksum offloading.
( I often disable all checksum offloading in my LAN driver: IP, UDP, and TCP ).

While using the driver that I sent here above, I saw a mistake, so here is a new version:
stm32l_eth_drivers_v2.zip (20.0 KB)

I see that you printed hclk, how much is it?

lizhang · November 23, 2020, 6:46am

Hein,
The log is added in prvNetworkInterfaceInput, according to pxDMARxDescriptor->Status. I think the checksum error is calculated by STM32 ETH DMA engine.
hclk is 72M, same as my configuration in cubemx. It is also the max freq for STM32F107.
I used wireshark several years ago. At that time, I use the tool to capture network traffic goes into the machine running wireshark. But now my PC is connected to a wireless router, and so is my STM32F107 board. Both are connected by wire, no wifi is used. Do you think I can use wireshark to capture the traffic between the router and the board? Is there any setting I need to check?
BR
/Li

htibosch · November 23, 2020, 5:05pm

It is not yet clear to me: with what device is the STM32F communicating? To another STM32F? In that case it is difficult to make a real PCAP.

lizhang · November 23, 2020, 11:53pm

Hein,
I am using PC <>wifi router<>STM32F107 board. The connection are by wire LAN.
I got a project running lwip, and I saw no IP header error with the “ff ff ff ff” message. So I decide to check what is different between lwip and my driver. There should be a direct finding.
I believe you are also interested in the root cause. Stay tuned, I will update you later.
BR
/Li

lizhang · November 25, 2020, 12:33am

Hein,
I think I have found the root cause. It is the DMA engine. Here is the description of DMA descripor from ST manual on F1

. The bit position is not correct in the table.
From the log, we can see the reject packet falls to the category: Type frame which is neither IPv4 or IPv6 (checksum offload bypasses the checksum. check completely). After some investigation, I found the rejected packet are ARP packet.
So I added some code as below. It seems the bug is fixed.

— a/FreeRTOS-Plus/Source/FreeRTOS-Plus-TCP/portable/NetworkInterface/STM32Fxx/NetworkInterface.c
+++ b/FreeRTOS-Plus/Source/FreeRTOS-Plus-TCP/portable/NetworkInterface/STM32Fxx/NetworkInterface.c
@@ -908,13 +908,20 @@ uint8_t *pucBuffer;
hold a complete Ethernet packet (1536 bytes).
Therefore, two sanity checks: */
configASSERT( xReceivedLength <= ETH_RX_BUF_SIZE );

         //printf("RDES0 0x%x\n", pxDMARxDescriptor->Status);^M

         //  hex_string_dump(pxDMARxDescriptor->Buffer1Addr, xReceivedLength+4);^M
   if( ( pxDMARxDescriptor->Status & ( ETH_DMARXDESC_CE | ETH_DMARXDESC_IPV4HCE | ETH_DMARXDESC_FT ) ) != ETH_DMARXDESC_FT )
   {
       /* Not an Ethernet frame-type or a checmsum error. */
       xAccepted = pdFALSE;
   }

```
   else
```

          // reverse-filter the non-IPV4, IPV6 ethernet packet for STM32F107^M

          if( ( pxDMARxDescriptor->Status & ( ETH_DMARXDESC_MAMPCE | ETH_DMARXDESC_IPV4HCE | ETH_DMARXDESC_FT ) ) ==^M

               (ETH_DMARXDESC_MAMPCE | ETH_DMARXDESC_IPV4HCE ))^M

```
           {^M
```
```
               xAccepted = pdTRUE;^M
```
```
           }^M
```
```
   if(xAccepted == pdTRUE)^M
   {
```

Thank you for all your help. There are still some options you mentioned above for memory saving, which I am going to try.
Originally, I only seek for some advices. But you go deep into the code level, that is too much to express appreciation. I hope you can understand what I want to say.
BR
/Li