STM32F207 FreeRTOS-Plus-TCP loosing pings

I am trying to get FreeRTOS+TCP (FreeRTOS 10.4.1 with corresponding FreeRTOS+TCP 2.2.2) running on a NUCLEO-F207ZG board. The board uses the STM32F207ZG and LAN8742A PHY (RMII connection between the two).

The network stack is running and successfully gets an IP address using DHCP but a simple ping test (64 bytes) to the NUCLEO board is unreliable. It will successfully respond for a number of cycles and then fail for a while before responding again.

I have tried both BufferAllocation_1 and BufferAllocation_2, zero copy both enabled and disabled and nothing seems to be affecting the behaviour.

Enabling both ipconfigHAS_PRINTF and ipconfigHAS_DEBUG_PRINTF along with adding s debug statement to prvProcessICMPEchoRequest() shows that on the cycles that fail, this function doesn’t get called.

The readme.txt file for the STM32Fxx Ethernet driver seems a bit vague as to whether it has been tested on the STM32F2xx devices.

Any guidance on where to poke next?

-Andy.

Hi Andy,

The readme.txt file for the STM32Fxx Ethernet driver seems a bit vague as to whether it has been tested on the STM32F2xx devices

Well, there is this text:

It is assumed that one of these words are defined:

    STM32F7xx
    STM32F407xx
    STM32F417xx
    STM32F427xx
    STM32F437xx
    STM32F429xx
    STM32F439xx

The STM32F2xx is not included.

When pings get lost, I would always check the resources. pvPortMalloc() should never fail, and so does pxGetNetworkBufferWithDescriptor().
You might want to define:

/* In FreeRTOSConfig.h */
#define configUSE_MALLOC_FAILED_HOOK              1
/* In FreeRTOSIPConfig.h */
#define iptraceFAILED_TO_OBTAIN_NETWORK_BUFFER()  configASSERT( 0 )

and define this function somewhere:

void vApplicationMallocFailedHook( void )
{
    configASSERT( 0 );
}

Does your logging work? Via serial/USB?

When ipconfigHAS_PRINTF is defined, the driver will call vPrintResourceStats() regularly, which will warn for resources running low.

Did you include the STM32Fxx drive as it as?

Two typical cases for the behavior you observe:

  1. Not enough RX DMA buffers allocated in the buffer chain (possibly in conjunction with challenging network load such as broadcast storms)

  2. An application task with higher priority that starves out the TCP message pump.

You may want to check on those…

To try and answer all of the questions in one go…

The logging is through a UART and appears to work OK other than the small footprint printf() implementation that it uses doesn’t support “%lxip” so the vDHCPProcess messages come out as hex values rather as dotted notation.

I am using the Ethernet driver as-is. In the implementation in FreeRTOS+TCP 2.2.2 it outputs the metrics directly in prvEMACHandlerTask() rather than calling vPrintResourceStats() which the implementation with FreeRTOS 202012.00 release seems to do.

The only output that seems to have be generated is maybe one or two occurrences of Network buffers: 1 lowest 1 or Network buffers: 2 lowest 1.

configUSE_MALLOC_FAILED_HOOK is defined and the implementation outputs a message to the debug UART before hanging the system - this never happens so I am assuming there aren’t any issues allocating memory.

In terms of buffers I have tried the recommended values from the readme.txt file:

#define ETH_RXBUFNB                                   3 or 4
#define ETH_TXBUFNB                                   2 or 3

I have also tried not defining them and allowing the driver to use its default values which are 5 and 5.

At this point in the debug cycle, none of the application tasks are running. If I break the execution using the debugger the only three tasks are IDLE, EMAC and IP-Task. In addition to this, Timer 2 triggers every millisecond to call the HAL_IncTick() and xPortSysTickHandler() functions.

-Andy.

The obvious next step would be to evaluate trace flow counters (count how many inbund packets can be seen on the wireshark trace vs how many arrive at the DMA layer vs how many arrive in the TCP SW and so on).

Hi RAc,

I’ve sorted out the issue with needing to use Timer 2 rather than SysTick to drive calling xPortSysTickHandler(). That was the same issue that was reported in post startup-freertos-9-0-0-stm32f4-interrupt-before-starting-scheduler-hang/7105 on these forums (it won’t let me post a link). As a result I have one less hardware interrupt being fired.

The output of the latest test shows:

--- 192.168.10.179 ping statistics ---
807 packets transmitted, 597 packets received, 26.0% packet loss
round-trip min/avg/max/stddev = 0.406/0.552/0.740/0.056 ms

In the debug output there is one entry Network buffers: 0 lowest 0 which is concerning but I doubt that accounts for 210 packets being lost.

I’ll break Wireshark out tomorrow and see if it gives any clues!

-Andy.

Frankly, I doubt it will, but any information is useful in this context… what you really want to find out is where the sequence of events “occurrence of ICMP request packet in the PHY”-“copy of the packet from the RX DMA descriptor ring to an internal packet” - “processing of the packet in the +TCP SW” - “compute the response packet” - “schedule response packet to the TX DMA response ring” - “PHY puts the packet out onto the net” is broken.

That was the same issue that was reported in post startup-freertos-9-0-0-stm32f4-interrupt-before-starting-scheduler-hang/7105

Mentioned post appeared first on Sourceforge, but there is a copy here

I have also tried not defining them and allowing the driver to use its default values which are 5 and 5.

Note that you can safely set ETH_TXBUFNB to 1, and ETH_RXBUFNB to many, 9 if you have. When sending, the driver will wait patiently for a free DMA buffer. That happens in this line:

if( xSemaphoreTake( xTXDescriptorSemaphore, xBlockTimeTicks ) != pdPASS )
{
    /* Time-out waiting for a free TX descriptor. */
    break;
}

It will wait for at most 50 ms, which gives enough time for the previous packet to be transmitted.
So I recommend:

#define ETH_RXBUFNB      9
#define ETH_TXBUFNB      1

Note that each RX DMA buffer needs one Network Buffer. When there is no buffer available, an incoming packet will be dropped, in prvNetworkInterfaceInput.

About versions, have you tried the latest version of FreeRTOS+TCP?

And a last thought: I still wonder if there is a slight difference between the EMACs of an STM32F2 and STM32F4.

@sdcandy wrote:

break Wireshark out tomorrow and see if it gives any clues!

@RAc replied:

Frankly, I doubt it will, but any information is useful in this context…

Maybe unlikely, but it might give a clue. Maybe there are other packets on the LAN that disturb the pings.
If you want, you can send an unfiltered PCAP to my email : hein [at] htibosch [at] .net

I just need to see a few answered, and a few missing answers.

Thanks

Looking at what is going on in the prvIPTask() function, when a ping gets a reply then the case eNetworkRxEvent: part of the switch statement is executed. When the ping is getting no reply then each time the IP task is executed then it goes down the case eNoEvent: code path.

It looks to me like when the IP task gets the ICMP request packet that it processes it correctly but that the requests aren’t always getting that far.

I think I need to start debugging a level lower down to try to determine which of “occurrence of ICMP request packet in the PHY”-“copy of the packet from the RX DMA descriptor ring to an internal packet” is going wrong.

-Andy.

you mean the responses? Maybe there is a NULL return from pxGetNetworkBufferWithDescriptor()?

I am trying to make sense of what is going on in prvNetworkInterfaceInput() in NetworkInterface.c and I am seeing a lot of the calls to xMayAcceptPacket() returning pdFALSE.

ipconfigETHERNET_DRIVER_FILTERS_PACKETS isn’t set to all xMayAcceptPacket() is doing is checking for ipARP_FRAME_TYPE or ipIPv4_FRAME_TYPE.

I don’t think there is a one-to-one mapping between a failing ping and a packet not being accepted, as there seems to be some packet rejection even when the ping is successfully replied to.

-Andy.

Are your RX descriptor buffers properly aligned?

The RX and TX descriptors are defined as `attribute( ( aligned( 32 ) ) ).

I have followed what is in the driver’s readme.txt which states:

#define ETH_RX_BUF_SIZE                               ( ipconfigNETWORK_MTU + 36 )
#define ETH_TX_BUF_SIZE                               ( ipconfigNETWORK_MTU + 36 )

The optimal value of 'ETH_RX_BUF_SIZE' and 'ETH_TX_BUF_SIZE' depends on the actual value
of 'ipconfigNETWORK_MTU'.

When MTU is 1500, MTU+36 becomes a well-aligned buffer of 1536 bytes ( 0x600 ).
When MTU is 1200, MTU+48 will make 1248 ( 0x4E0 ), which is also well aligned.

ipconfigNETWORK_MTU is set to 1500 in FreeRTOSIPConfig.h

-Andy

yeah, from what Andy wrote recently, it almost looks as if sometimes Rx DMA buffers are being considered valid where they aren’t… @AndyP , have you checked the reference manual of your POD against the driver’s source code, in particular with regard to the desriptor buffer structure? Maybe you need different header files?

Summarising the PCAP file that Andi just emailed me:
22 bad, 3 good
26 bad, 4 good
Pings are send at a rate of 1 Hz, and the device receives no other packets. All the time it is quiet on the LAN.

@RAc wrote:

Maybe there is a NULL return from pxGetNetworkBufferWithDescriptor()?

That is why I proposed to add the following define to FreeRTOSIPConfig.h:

    #define iptraceFAILED_TO_OBTAIN_NETWORK_BUFFER() \
            configASSERT( 0 )

I like an assert(0) better than a line of logging because you will notice that for sure.

But after seeing the PCAP file, it looks like the network buffers are not exhausted.

Would it be an idea to connect the STM32 directly to a laptop and let both use static IP-addresses?

I am seeing a lot of the calls to xMayAcceptPacket() returning pdFALSE.

That is interesting. Can you find out why packets are dropped?

ipconfigETHERNET_DRIVER_FILTERS_PACKETS is set to 0 and so the implementation of xMayAcceptPacket() is:

static BaseType_t xMayAcceptPacket( uint8_t * pcBuffer )
{
    const ProtocolPacket_t * pxProtPacket = ( const ProtocolPacket_t * ) pcBuffer;

    switch( pxProtPacket->xTCPPacket.xEthernetHeader.usFrameType )
    {
        case ipARP_FRAME_TYPE:
            /* Check it later. */
            return pdTRUE;

        case ipIPv4_FRAME_TYPE:
            /* Check it here. */
            break;

        default:
            /* Refuse the packet. */
        	FreeRTOS_debug_printf(("usFrameType: 0x%x\n", pxProtPacket->xTCPPacket.xEthernetHeader.usFrameType));
            return pdFALSE;
    }
    return pdTRUE;
}

The call to FreeRTOS_debug_printf() is mine. Doing some analysis of the output then the packets that are dropped all fall into one of three values for pxProtPacket->xTCPPacket.xEthernetHeader.usFrameType

usFrameType: 0x7574
usFrameType: 0x7473
usFrameType: 0x8073

I can’t seem to find any definition of what these frame types are even if I byte swap them (this is a little endian system and networking generally is big endian).

-Andy.

In a quick test, that seems to work without issue :slight_smile:

257 packets transmitted, 257 packets received, 0% packet loss

-Andy.

A wild guess: Those may be split buffers that overwrite the frame headers of chained descriptor blocks. The data you see could be readable characters (‘s’,‘t’,‘u’) from packet payloads. Again, this looks like an indication for something the two different POD families do differently, eg handle oversized frames.

Wireshark also doesn’t recognise the packets, it only says “Ethernet II”.
Note that the frame types logged should be byte swapped: 0x7574 becoming 0x7475.
These packets are indeed broadcast packets, aimed at the all-ones MAC address.

In a quick test, that seems to work without issue :slight_smile:
257 packets transmitted, 257 packets received, 0% packet loss

Thanks for testing this. Now at least we see that the Network Interface works for the STM32F2x family!
What I can do is “replay” Andy’s PCAP file on my LAN to see the effect of the “Ethernet II” packets on the IP-stack. As the packets are dropped, I don’t expect a big effect.

In this ZIP file I collected Ethernet II packets:

eth_2_packets.zip (650 Bytes)

In one such packet I read:

0030 42 42 34 34 33 45 02 09 65 74 68 74 75 6e 6e 65 BB443E…ethtunne
0040 6c l

Looks like an Ethernet tunnel. I have never encountered that yet, only protocol tunnels.