FreeRTOS+TCP STM32H7 UDP Ping Lockup RX Buffer Starvation

Hi All,

I’m looking for some pointers on debugging a FreeRTOS+TCP issue on an STM32H7.

Using FreeRTOS: 11.2.0

With FreeRTOS+TCP: 4.3.3

The unit receives requests and sends responses via UDP.

On the whole it performs really well however the stack seems to occasionally lockup, requiring a reboot to recover.

I’ve managed to reliably reproduce this recently having added file offloads via UDP. Responding to a UDP request with typically 8MB worth of UDP packets, sent back to back. Using the zero-copy mechanism.

During a transfer, pinging the unit appears to be enough to trigger the lockup. Without pinging it, multiple (>50) back-to-back 200MB transfers completed without issue. While pinging it, the first transfer doesn’t typically complete before it becomes unresponsive.

Checked a few instances from the host side, while anecdotal it seems the crash occurs if a UDP packet from the host arrives immediately before the ping request. i.e the application code is somewhere between picking up the request and sending its response.

Highlighted row being a request from the desktop to the MCU. Packets before are the end of the last data block. Blue packets after the ping are retries due to the lack of response from the MCU.

Using the included ethernet driver, with the following CMake snippet:

# Add FreeRTOS-Plus-TCP
set(FREERTOS_PLUS_TCP_BUFFER_ALLOCATION "1" CACHE STRING "" FORCE)
set(FREERTOS_PLUS_TCP_NETWORK_IF "STM32" CACHE STRING "" FORCE)
set(FREERTOS_PLUS_TCP_STM32_IF_DRIVER "H7" CACHE STRING "" FORCE) # We're going to use the HAL driver
add_subdirectory(lib/FreeRTOS-Plus-TCP)
target_compile_definitions(freertos_plus_tcp_network_if PRIVATE
    STM32H7
)
target_compile_options(freertos_plus_tcp PRIVATE
    # Prevent memcpy and memset being replaced by the compiler which can lead to unaigned access issues
    -fno-builtin-memcpy -fno-builtin-memset
)
target_link_libraries(freertos_plus_tcp_network_if PRIVATE
    stm32cubemx
)

Using the following FreeRTOSIPConfig.h contents:

#ifndef FREERTOS_IP_CONFIG_H
#define FREERTOS_IP_CONFIG_H

/* Set to 1 to print out debug messages. If ipconfigHAS_DEBUG_PRINTF is set to
 * 1 then FreeRTOS_debug_printf should be defined to the function used to print
 * out the debugging messages. */
#define ipconfigHAS_DEBUG_PRINTF (ipconfigENABLE)
extern void IP_SUPPORT_DebugPrintf( const char *pcFormatString, ... );
#define FreeRTOS_debug_printf( X ) IP_SUPPORT_DebugPrintf X

/* Set to 1 to print out non debugging messages, for example the output of the
 * FreeRTOS_netstat() command, and ping replies. If ipconfigHAS_PRINTF is set to 1
 * then FreeRTOS_printf should be set to the function used to print out the
 * messages. */
#define ipconfigHAS_PRINTF (ipconfigENABLE)
extern void IP_SUPPORT_Printf( const char *pcFormatString, ... );
#define FreeRTOS_printf( X ) IP_SUPPORT_Printf X

/* Define the byte order of the target MCU (the MCU FreeRTOS+TCP is executing
 * on). Valid options are pdFREERTOS_BIG_ENDIAN and pdFREERTOS_LITTLE_ENDIAN. */
#define ipconfigBYTE_ORDER (pdFREERTOS_LITTLE_ENDIAN)

/* STM32 hardware can perform checksum offloading */
#define ipconfigDRIVER_INCLUDED_TX_IP_CHECKSUM (ipconfigENABLE)
#define ipconfigDRIVER_INCLUDED_RX_IP_CHECKSUM (ipconfigENABLE)

#if 0 /* Hardware filtering is supported but doesn't appear to work correctly with the STM32H7. */

/* STM32 hardware supports packet filtering */
#define ipconfigETHERNET_DRIVER_FILTERS_PACKETS (ipconfigENABLE)

#endif

/* STM32 driver when using IPv4 and IPv6 has an unused variable, making the define below a nop */
#define ipconfigETHERNET_DRIVER_FILTERS_FRAME_TYPES (ipconfigDISABLE)

/* Supress warnings related to the settings above */
#define ipconfigPORT_SUPPRESS_WARNING (ipconfigENABLE)

/* STM32 hardware supports zero-copy */
#define ipconfigZERO_COPY_TX_DRIVER (ipconfigENABLE)
#define ipconfigZERO_COPY_RX_DRIVER (ipconfigENABLE)

/* Support linking of RX messages */
#define ipconfigUSE_LINKED_RX_MESSAGES (ipconfigENABLE)

/* Define task priority and stack size */
#define ipconfigIP_TASK_PRIORITY         (configMAX_PRIORITIES - 2U)
#define ipconfigIP_TASK_STACK_SIZE_WORDS (configMINIMAL_STACK_SIZE * 5U)

/** Support for network down event */
#define ipconfigSUPPORT_NETWORK_DOWN_EVENT (ipconfigENABLE)

/* Call network event hook when a network event occurs. */
//#define ipconfigUSE_NETWORK_EVENT_HOOK (ipconfigENABLE)

/* Support static IP addressing only */
#define ipconfigUSE_DHCP (ipconfigDISABLE)

/* Include support for FreeRTOS_inet_addr as well as FreeRTOS_indet_addr_quick() */
#define ipconfigINCLUDE_FULL_INET_ADDR (ipconfigENABLE)

/* ipconfigNUM_NETWORK_BUFFER_DESCRIPTORS defines the total number of network buffer that
 * are available to the IP stack. The total number of network buffers is limited
 * to ensure the total amount of RAM that can be consumed by the IP stack is capped
 * to a pre-determinable value. */
#define ipconfigNUM_NETWORK_BUFFER_DESCRIPTORS (64)

/* Do not use TCP use windowing mechanism, as we're on a local network we expect good reliability */
#define ipconfigUSE_TCP_WIN (ipconfigDISABLE)

/* Disable DNS support we'll be dealing with IPs only */
#define ipconfigUSE_DNS (ipconfigDISABLE)

/* If ipconfigSUPPORT_SELECT_FUNCTION is set to 1 then the FreeRTOS_select()
 * (and associated) API function is available. */
#define ipconfigSUPPORT_SELECT_FUNCTION (ipconfigENABLE)

/* Each TCP socket has a circular buffers for Rx and Tx, which have a fixed
 * maximum size.*/
#define ipconfigTCP_RX_BUFFER_LENGTH (1 * 1024)
#define ipconfigTCP_TX_BUFFER_LENGTH (1 * 1024)

/* Include support for TCP keep-alive messages. */
#define ipconfigTCP_KEEP_ALIVE          (ipconfigENABLE)
#define ipconfigTCP_KEEP_ALIVE_INTERVAL (20) /* in seconds */

/* Enable socket callbacks */
#define ipconfigUSE_CALLBACKS (ipconfigENABLE)

/* Monitor transmission, reception and buffer allocation failures */
#include "diag.h"
#define iptraceNETWORK_INTERFACE_RECEIVE()    DIAG_IncEthernetRx()
#define iptraceNETWORK_INTERFACE_TRANSMIT()   DIAG_IncEthernetTx()
#define iptraceFAILED_TO_OBTAIN_NETWORK_BUFFER() DIAG_IncEthernetBufferAllocFailed()

/* How often to check link status */
#define ipconfigPHY_LS_HIGH_CHECK_TIME_MS (2000U)

/* Allow extended waits for UDP buffers */
#define ipconfigUDP_MAX_SEND_BLOCK_TIME_TICKS (pdMS_TO_TICKS(2000))

#endif

Application wise, UDP packets are being sent with the following sequence in a tight loop.

// Receive packet, to obtain source address

// Process request

/* Allocate buffer for outgoing data */
uint8_t *pabyDataOut = (uint8_t*)FreeRTOS_GetUDPPayloadBuffer_Multi(OUTPUT_BUFFER_LEN, OUTPUT_BUFFER_ALLOC_TIMEOUT, ipTYPE_IPv4);

// Encode response

/* Send UDP response */
if (0U == FreeRTOS_sendto(xSocket,
                          pabyDataOut, stOutputStream.bytes_written,
                          FREERTOS_ZERO_COPY,
                          pstSourceAddr, uiSourceAddrLength)) {
    /* Send failed, we still own buffer, release it */
    FreeRTOS_ReleaseUDPPayloadBuffer(pabyDataOut);
    pabyDataOut = NULL;
}

For receiving the socket is opened, a notification callback is registered and socket is placed in non-blocking mode before being bound to its port:

        /* Register callback */
        F_TCP_UDP_Handler_t stHandlers = {
            .pxOnUDPReceive = HandleUDPPacketReceivedNotification,
        };
        FreeRTOS_setsockopt(xSocket, 0, FREERTOS_SO_UDP_RECV_HANDLER, (void*)&stHandlers, sizeof(stHandlers));

        /* Set to non-blocking mode */
        BaseType_t xTimeout = 0;
        FreeRTOS_setsockopt(xSocket, 0, FREERTOS_SO_RCVTIMEO, (void*)&xTimeout, sizeof(xTimeout));

The function HandleUDPPacketReceivedNotification simply sends a direct to task notification to the task handing network communication amongst other background activities.

Any debugging pointers would be much appreciated.

Thanks,

Phil

your tight loop always bears the danger of starving lower priority tasks if the sendto fn immediately returns with an error. What are your task priorities? How do you handle error conditions?

Thanks for the reply,

Priority wise the task is sat one higher than the idle task. The mac and ip tasks themselves are higher priority.

If a send does fail it aborts the transfer….and logs an error to a serial console - there’s a bit of housekeeping not shown above for brevity. Which I haven’t seen trigger.

I was occasionally failing to obtain a UDP buffer, which I traced to the limited 20ms timeout the IP stack applies to the user’s requested timeout. Increasing it to 1000ms I’ve not seen any buffer allocation failures since. Behavior being the same as a send failure, transfer is aborted until the next request from the desktop arrives.

Its feels like I’ve triggered some strange race or deadlock inside the network stack itself. Tried with a debug build, its slower as you’d expect but it doesn’t seem to lockup. Pausing a release build after the stack has got stuck, looks normal. All the tasks just waiting on their respective queues or notifications but with no further communication.

Thanks,

Phil

Minor update,

Appears its just the receive path that’s broken.

Once ping replies and UDP request replies stop I can still see ARP announcements from the device.

If I attempt to periodically send a UDP packet to the desktop or another device I can see the ARP requests going out to it. The remote device replies to the ARP request but the MCU doesn’t appear to receive it. It then just sits timing out the ARP cache and placing the same request over and over.

Think I’ve found the root cause….not quite sure how to fix it yet though. @RAc was right, it is a resource starvation issue, somewhat caused by me….which the stack seemingly can’t recover from.

Had a deeper read through the reference manual at the RX DMA ring buffer implementation and driver, when a packet is received HAL_ETH_RxAllocateCallback is called to allocate a new DMA buffer.

In my case while attempting to fill the TX queue, this callback can sometimes fail to retrieve a buffer from the allocator.

I’ve 64 statically allocated buffers, with a 32 deep RX ring buffer and 32 deep TX ring buffer.

When idle a call to uxGetNumberOfFreeNetworkBuffers returns 32, as expected, 64 buffers in total, 32 of them would be assigned to the rx path of the ethernet waiting for inbound packets. Leaving 32 free.

After each RX in HAL_ETH_ReadData, the private function ETH_UpdateDescriptor is called, which in turn calls HAL_ETH_RxAllocateCallback for each DMA descriptor that needs a new buffer.

Aside from when the controller is starting up ETH_UpdateDescriptor is only called following a successful RX.

Therefore after each of the 32 buffers completes reception….say by someone pinging the unit, ARPs etc. If new buffers can’t be acquired the system deadlocks. Never calling ETH_UpdateDescriptor again to retry acquiring a buffer.

As a quick test….I made ETH_UpdateDescriptor public and began calling it from the main loop of the MAC task, if a buffer allocation had failed as indicated by a new flag hacked into HAL_ETH_RxAllocateCallback. The function itself has protection against being called at the wrong time.

This got things working again, I can hammer the unit with pings, although not all get replies as expected. While streaming UDP packets off the unit.

How to fix it properly…ideally without having to maintain my own fork of the IP stack is the question now.

I clearly need to tweak my application to be slightly less aggressive. Or at least get it to deal with / drop any inbound UDP packets that happen to be pending….taking up precious buffers.

At the same time, I’d like to know that I’m keeping the TX DMA fully fed with packets. While ideally not starving the RX DMA from replacing its used buffers.

Is it because 32 buffers are not enough to keep up with the incoming traffic? Does increasing the number of buffers help?

That is likely not the right thing to do.

Thanks for taking a look.

When it comes to sending UDP as fast as possible the problem appears to be due to the shared TX/RX buffer pool in combination with a lack of back pressure from the stack.

For example, with my pool of 64 buffers and 32 x RX / 32 x TX DMA descriptor rings.

Upto 32 buffers may be in the TX ring, awaiting transmission. With 32 buffers in the RX ring awaiting reception, so no buffers free. When a packet comes in, the emac task adds it to the ip stack event queue. Then attempts to obtain a free buffer for the RX ring. There aren’t any available so it locks up.

If I increased the buffer pool size or reduced the RX ring size there would be free buffers in the pool. However as my application is attempting to send as fast as possible it would inadvertently consume them and queue them in the IP stack for transmission. Leading to the same problem.

What I’ve implemented for now as a solution, although not a general one and a tad ugly is this:

BaseType_t NETWORK_INTERFACE_STM32_CheckForFreeTxDesc(TickType_t xTicksToWait)
{
    BaseType_t xResult = xSemaphoreTake(xTxDescSem, xTicksToWait);
    if (pdTRUE == xResult)
    {
        /* Return the semaphore as this function only checks for availability */
        xSemaphoreGive( xTxDescSem );
    }

    return xResult;
}

Before my application requests a buffer to prepare its next UDP packet, it checks that at least one transmit DMA descriptor is available. Thereby preventing it from consuming all of the free buffers in the pool. It’s a bit race condition prone, in that a higher priority task (the IP stack in my case) might step in and use the buffer, but limits the app from consuming all the buffers.

I’ve also reduced the RX DMA ring size from 32 to 8. Helping to ensure there are free buffers in the pool. While also setting ipconfigUDP_MAX_RX_PACKETS to limit the number of inbound buffers waiting on my UDP socket.

There is a still a chance that the pool may be exhausted, causing the lockup but I’ve not been able to trigger it after 10’s of gigabytes of transfers from the unit now with continuous 100ms pings.

As I’ve been tinkering with the stack I’m tempted to keep the rx recovery mechanism in place as a failsafe. I’ve certainly seen the issue while debugging before, but couldn’t work out the root cause at the time.

I haven’t looked at emac drivers for RTOS in a while, but if that’s the way it is implemented, it would be a bug in the driver and needs to be fixed. There is always the possibility of DMA Rx buffers overflowing, eg in the case of broadcast storms. If a driver isn’t able to recover gracefully from that (by simply dropping the frame and indicating some kind of overflow condition somewhere), then any software utilizing that driver would frequently and reliably flaw in the field.

In zero copy systems, the responsibility to release the buffers may effectively be delegated to the application. At the end of the day, some part of the system must somehow ensure that ingress packets eventually have a chance to travel into the software (unless of course the ingress data rate constantly and continuously exceeds the device’s processing capability in which case there is no solution to the problem but employ a trget hardware with higher processing capability).

Just in case our fairly complex project had some other hidden gremlins, I setup a project on an ST NUCLEO dev board.

The fault’s really easily reproducible now and explains how I triggered it before while debugging.

  • Setup a UDP socket thats listening, with no queued datagram limit.
    • Don’t read from the socket…let the queue build.
  • Send UDP packets to the socket from the host, until it stops responding to pings
    • All buffers or near enough all of them should be waiting in the UDP socket queue
  • Repeatedly read the UDP socket to free all buffers
  • Stack is dead, reception never resumes (indicated by ping continuing to fail) despite there now being buffers available.

I’ve pushed this to GitHub here: GitHub - pgreenland/Nucleo-H7_FreeRTOS_TCP at udp_hoard_buffers

There’s notes in the README on how it behaves….essentially as above, with the UDP reads triggered by the onboard button.

My workaround for now is here: Retry RX buffer allocation on failure · pgreenland/FreeRTOS-Plus-TCP@d31d519 · GitHub - without editing ST’s HAL driver, it seemed to be the best way to deal with it.

I also found a second issue, having enabled some of the stack debug while tracking down the above.

ST’s upstream HAL driver has a fix for incorrect DMA reception tail pointer management, discussed here, with commit that fixes it here.

In our project I was using the upstream ST driver, as it was newer….and newer is better…right?

However while running my high load tests with debug enabled I frequently see: “vReleaseNetworkBufferAndDescriptor: 0x20006e70 ALREADY RELEASED (now 32)” from the allocator. Which eventually ends up with corrupt transmissions and a progressively broken IP stack.

In NetworkInterface.c for the STM32, HAL_ETH_RxAllocateCallback should set the pointer its passed to NULL if it can’t get a buffer….it doesn’t. In my high load case ETH_UpdateDescriptor needs to re-allocate multiple used buffers in a single call. Providing the first allocation succeeds, a failure on the second will lead to the first buffer being mistakenly reused and allocated to two or more RX ring buffer entries at the same time.

I’ve prepared a PR to fix this one here: Fix buffer allocation failure handling in STM32 NetworkInterface by pgreenland · Pull Request #1307 · FreeRTOS/FreeRTOS-Plus-TCP · GitHub

It happens with ST’s latest driver but not the included one as the old code appears to effectively only allow a single one of the configured ring buffer entries to be active at a time. As such the allocation failure is handled correctly. With the new driver, the ring length - 1 entries may be pending reception at a time. Leading to multiple re-allocations, which can trigger the fault.

The classic faults hiding faults. 300GB and counting of error free transfers overnight now though, I think I may have finally fixed it.

1 Like

Hi Phil,

first of all, thank you for both your very diligent and hard work in tracing this down and for sharing your results, that is much appreciated!

I’m not that deep in the code, so this comment may be barking up a wrong tree, for wich I would like to apologize beforehand, but maybe that is useful.

As I mentioned before, in zero copy systems, it is my understanding that the system and the application must cooperate to prevent memory leaks. Because the hardware assumes that a DMA buffer is not free to fill unless explicitly freed - which in zero copy systems may be the responsibility of the application - if the application fails to release a buffer, then the driver must lock up.

I looked at the code in your github repo and believe that the infinite loop of your Example task can be reduced to so:

/* Task loop /
for (;:wink:
{

int32_t iReceivedBytes = FreeRTOS_recvfrom(xSocket,
&pabyRxBuffer, 0,
FREERTOS_MSG_DONTWAIT | FREERTOS_ZERO_COPY,
&stSourceAddress, &uiSourceAddressLength);
if (iReceivedBytes <= 0)
{
/
No more packets */
break;
}


/* Free the received packet */
FreeRTOS_ReleaseUDPPayloadBuffer(pabyRxBuffer);
pabyRxBuffer = NULL;
}

This assumes that pabyRxBuffer != 0 if AND ONLY IF iReceivedBytes > 0. Is it possible that FreeRTOS_recvfrom() returns failure (<=0) but the buffer is still valid (in which cause there would be a buffer leak)? Can you check on that?

Again, that is a short in the dark, but I am hesitant to believe that a feature so well tested in the field as a network stack should have such an obvious bug.

Thank you for reporting these test results.

One remark, a bit off-topic, but also important: when ipconfigUDP_MAX_RX_PACKETS is defined in FreeRTOSIPConfig.h, the IP-task will make sure that a UDP socket will never contain more than ipconfigUDP_MAX_RX_PACKETS packets waiting.

There is also a socket option called FREERTOS_SO_UDP_MAX_RX_PACKETS. It lets you set the maximum waiting queue manually.

The length check in FreeRTOS_UDP_IPv4.c:

    #if ( ipconfigUDP_MAX_RX_PACKETS > 0U )
    {
        if( xReturn == pdPASS )
        {
            if( listCURRENT_LIST_LENGTH( &( pxSocket->u.xUDP.xWaitingPacketsList ) ) >= pxSocket->u.xUDP.uxMaxPackets )
            {
                FreeRTOS_debug_printf( ( "xProcessReceivedUDPPacket: buffer full %ld >= %ld port %u\n",
                                         listCURRENT_LIST_LENGTH( &( pxSocket->u.xUDP.xWaitingPacketsList ) ),
                                         pxSocket->u.xUDP.uxMaxPackets, pxSocket->usLocalPort ) );
                xReturn = pdFAIL; /* we did not consume or release the buffer */
            }
        }
    }
    #endif /* if ( ipconfigUDP_MAX_RX_PACKETS > 0U ) */

This setting, along with the limitation of TCP clients ( the “backlog” parameter in FreeRTOS_listen ) helps you against network buffer underflow.

Beside this, we must make sure that there will be no over- or underflow and no assert().

I will respond to “RX Buffer Starvation” in a separate post.

Hey @htibosch ,

I did see the socket option, although it looks like its only enabled once ipconfigUDP_MAX_RX_PACKETS is set…..so being lazy with few sockets, I set it once by the config option.

Haven’t been using TCP in this project but its good to see there’s a similar option there :slight_smile: + thanks for looking at my PR.

Regards,

Phil

You’re completely right, zero is a valid UDP packet length that may return a buffer. I should be checking for specifically less than zero.

Tested it with a one line python script and bit of bash to send a bunch of empty packets.

for i in `seq 64`; do python3 -c 'import socket,sys; socket.socket(socket.AF_INET,socket.SOCK_DGRAM).sendto(b"", (sys.argv[1] if len(sys.argv)>1 else "127.0.0.1", int(sys.argv[2]) if len(sys.argv)>2 else 12345))' 192.168.1.63 5001; done

Before making the change, the buffers would be exhausted in the socket, pushing the button would leak rather than free them. Without reporting anything.

For my original test I was using non zero length packets, so the issue didn’t occur.

With your fix in place, I can send zero length packets and still lock it up.

Example of the new output when hitting the button:

...
[    MAIN] [INFO ] [015536]: Received UDP packet from 192.168.1.79:58759, length 0
[    MAIN] [INFO ] [015543]: Received UDP packet from 192.168.1.79:60884, length 0
[    MAIN] [INFO ] [015550]: Received UDP packet from 192.168.1.79:59707, length 0
[    MAIN] [INFO ] [015557]: Received UDP packet from 192.168.1.79:54932, length 0
[    MAIN] [INFO ] [015564]: Received UDP packet from 192.168.1.79:60078, length 0
[    MAIN] [INFO ] [015572]: Received UDP packet from 192.168.1.79:58932, length 0
[    MAIN] [INFO ] [015579]: Received UDP packet from 192.168.1.79:62791, length 0
...

The problem occurs because when the RX DMA runs out of buffers and isn’t given a replacement it stalls, by design. There’s no code in the NetworkInterface driver to re-start it again when it gets into this state.

It’s not only zero length packets. I think there may be fringe scenarios in which the function returns an error but did allocate a buffer.

You may want to add something like the following change to verify that:

if (iReceivedBytes <= 0)
{
/* No more packets */
ASSERT(pabyRxBuffer == 0);
break;
}

and if you hit the assert change the control flow such that non zero buffers always get freed regardless of the return value of the send() function.

If that can happen I’d class that as a separate fault, a serious API design one.

Having had a read of the function, following the zero copy route, if there is an issue with the packet in the buffer it’s released within the API call as expected. I cant see any way for it to return a < 0 result with a buffer.

Is there a specific section of it that looks suspicious?

It wouldn’t hurt to check if the buffer is non null on return, regardless of the result….providing it was set to null before the call.

Expanding my full trace above, there are 57 packets queued. I’ve got 64 buffers in the pool in total. Given I’m using the FreeRTOS+TCP included ST HAL Ethernet driver with the DMA tail bug, there will be 7 unused buffers stuck in the RX DMA ring. When the last packet came in, there’s no free buffer to replace it, so the tail doesn’t get incremented and the DMA stalls.

Replacing the driver with the upstream ST HAL version, and including the fix from my PR brings the queued buffers upto 63, with virtually the entire RX ring being used. The hardware ring buffer doesn’t have a counter, so uses the old trick of always keeping one back to indicate the ring is full.

Thank you all for this discussion. @pgreenland what is the and-conclusion at this moment?

I have a remark about the number of descriptors:

Could you tell how many TX packets are claimed. I expect that when the CPU sends the 4th packet, the first descriptor is already available.

And likewise I would be very curious about the RX statistics: could you log the highest number of occupied RX slots.

My prediction is that 4 TX descriptors and say 15 or 20 RX descriptors are enough. And if not, we should recheck the whole model.

@pgreenland: I reread your posts and realized that the driver you are using is supplied by the ST HAL. There is a wide consensus in this forum that the ST HAL is a substandard piece of software with a number of known shortcomings, which led many to abandon the HAL altogether.

I remember a customer installation in which the driver got out of sync with the hardware supplied linked list of DMA buffers and failed to realize that there are available buffers and in turn delivering them to the software which in turn caused the chip to fill up all buffers. That was one of the reasons we decided to let go of the HAL. I do not remember the exact sequence that lead to this driver induced lockup; very likely some race condition.

You may want to reach out to ST support to see if that may be a driver issue.

Hey @htibosch ,

Conclusion atm, with FreeRTOS+TCP on the STM32 you have to be careful not to exhaust the buffer pool or rx may stall.

A mitigation is applying limits to queued buffers we discussed, ipconfigUDP_MAX_RX_PACKETS in my case for UDP traffic.

Without limits, all buffers could be queued on a socket, stopping rx. If buffers are then made available, for example by consuming packets from the socket, tx becomes possible again but rx doesn’t resume on its own.

With the limits applied I haven’t seen any issues, however as a mitigation I’ve patched resumption of rx on buffer allocation failure in my fork of FreeRTOS+TCP: Retry RX buffer allocation on failure · pgreenland/FreeRTOS-Plus-TCP@d31d519 · GitHub

Happy to submit that as a PR to the mainline, but it feels a little dirty. Effectively after a buffer allocation failure, it calls rx periodically until a buffer is allocated, allowing rx to resume.

Before the limit, I was able to trigger the fault by simply stepping through the task which consumes the udp traffic. If running our host application would retry its requests, until all the rx buffers were used and the IP stack would stall.

Would be good to get the ST HAL Ethernet drivers updated in the repo too, I’m happy to prepare a PR but don’t have dev boards available to test the other families.

Thanks,

Phil

The HAL isn’t the best and certainly has a lot of rough edges. It still beats implementing every driver from scratch.

The FreeRTOS+TCP implementation for STM32 makes use of the HAL in NetworkInterface.c. It includes a copy of the HAL’s Ethernet driver for various families directly in the repo (see FreeRTOS-Plus-TCP/source/portable/NetworkInterface/STM32/Drivers at main · FreeRTOS/FreeRTOS-Plus-TCP · GitHub )

The issue isn’t in the HAL (on this occasion) but NetworkInterface.c, the integration layer between FreeRTOS+TCP and the HAL driver.

See my reply above. The issue I found can be mitigated by careful buffer management or resolved by a patch.

Hi [Phil]( Profile - pgreenland - FreeRTOS Community Forums ) , I created a simple UDP server on an STM32H755.

A Python client will send packets of 1000 bytes each, to UDP port 7000.

The results look great:

Sending 5000 packets (1000 bytes each)...
------------------------------
Results:
Total Sent: 4882.81 KB
Time Taken: 0.4153 seconds
Avg Rate: 11.48 MB/s
Throughput: 96.31 Mbps
------------------------------

This is the UDP server code:

UDP server task:

for( ;; )
{
    uint8_t * pucReceivedUDPPayload;
    int32_t lReturned = FreeRTOS_recvfrom(xSocket,
        &pucReceivedUDPPayload,
        0,
        FREERTOS_ZERO_COPY,
        &xRxAddress,
        &xAddressLength);
    if( lReturned > 0 )
    {
        FreeRTOS_ReleaseUDPPayloadBuffer( pucReceivedUDPPayload );
    }
}

Here is 

Here is the Python script that I used:

measure.zip (944 Bytes)

Mind you, beside serving UDP, the MCU had nothing to do.

EDIT : I also observe the problem that you mentioned: a ping or other non-related activity causes troubles. I will get into that.

It stopped in a macro iptraceFAILED_TO_OBTAIN_NETWORK_BUFFER.