TCP IP Stack stops working

Hello,

We have the FreeRTOS TCP Stack (v4.0.0) running on a STM32H7.

After some time, 1 day or 2, the stack stops replying (no ping reply, no possible to open a socket connection, the debug printout stops).
During this period, there are not clients connected, so the stack only replies to ARP requests. The PHY is connected to the lab network that’s why there are many ARP requests.

The system also have other tasks running and they are not affected, no hardfault is triggered, and vApplicationMallocFailedHook is not called.

prvProcessIPEventsAndTimers still gets executed (we can breakpoint).

In the debug printout we see network buffer count going down and eventually reaching 0.
Is this normal?

How could we debug further the issue?

Thanks
Stefano

Similar post: TCP-IP stack stops working

FreeRTOSIPConfig.h (18.8 KB)
debug_printout.zip (420.4 KB)

You wrote:

network buffer count going down and eventually reaching 0.
Is this normal?

That looks like a network buffer leak: buffers are allocated and they don’t get released, which is not normal.

One possible cause can be UDP sockets: these sockets store incoming packets in Network Buffers, and when the application forgets to call FreeRTOS_recv(), you get a network buffer leak.

How could we debug further the issue?

You can debug it but it is a bit of work.

I used to have a version of BufferAllocation_1.c that stores the owner of every buffer. I added an extra parameter to this function:

NetworkBufferDescriptor_t * pxGetNetworkBufferWithDescriptor(
    size_t xRequestedSizeBytes,
    TickType_t xBlockTimeTicks,
    const char *pcOwner );  // New parameter

and I gave a new member to NetworkBufferDescriptor_t called :

const char *pcOwner;

Now try to compile the project, and the compiler will show you where the new parameter must be added, for instance here:

pxBufferDescriptor = pxGetNetworkBufferWithDescriptor(
    uxSize, uxBlockTimeTicks, "STM32Hxx" );

pxGetNetworkBufferWithDescriptor() will set pcOwner, the function vReleaseNetworkBufferAndDescriptor() will clear it (assign NULL).

And the next step is: run the program until the problem occurs. Use the debugger to inspect this array in BufferAllocation_1.c:

static NetworkBufferDescriptor_t xNetworkBuffers[
    ipconfigNUM_NETWORK_BUFFER_DESCRIPTORS ];

Now I would be very curious about who has allocated all network buffers.

3 Likes

Is it possible that you using the zero copy functionality/interface and the network buffers are not released after they are received or given back to the stack after its obtained via FreeRTOS_GetUDPPayloadBuffer_Multi / FreeRTOS_GetUDPPayloadBuffer?

1 Like

Hi @zugo83,

Please check if the buffers are getting added back to the free list. I suspect the issue is a combination of what Tony and Hein had suggested (buffer leak resulting from used beffers not getting added back to free list).

Hi everyone and thank you very much for your suggestions.

I haven’t found the leak yet but I have some more details and questions:

  1. I tried what @htibosch suggested, instrumenting the code to check who calls pxGetNetworkBufferWithDescriptor.
    The only 2 callers I see in the full xNetworkBuffers are:
    34 times prvNetworkInterfaceInput
    27 times vNDAgeCache
    Now I think prvNetworkInterfaceInput is called all the time so if it wouldn’t free memory the issue would appear almost immediately.
    vNDAgeCache instead is called more sporadically so it could be the source of the issue. (BTW there is also a comment in the source code that a memory leak is suspected in that function??!!)

  2. About UDP socket, the system does not make use of it so it cannot be right? The issue happens while there are no open connections/ sockets.

  3. @tony-josi-aws Yes we are using zero copy. I’m running an experiment to see if the issue disappears when not using it.

  4. @Shub , how can I check if buffers are added back to the free list?

  5. Even if uxMinimumFreeNetworkBuffers reaches 0, the stack still works for a while (some hours)

Would you be so kind to test your application with the attached version of
FreeRTOS_ND.c ?

The change will avoid a possible network buffer leak.

And maybe you want to put a breakpoint on this function call in vNDSendNeighbourSolicitation():

if( ( pxDescriptor != NULL ) && ( xReleased == pdFALSE ) )
{
    vReleaseNetworkBufferAndDescriptor( pxDescriptor );
}
1 Like

Hello @htibosch and happy new year!

I tried your solution and the leak seems to be fixed. I ran it under the same network conditions as before for 2 days, and the network buffers now never reach 0, and it is always possible to open a new socket.

Would you be able to tell how such a leak went unnoticed? Is it because it depends on the network traffic?

It would be nice to know if you can also reproduce it so you could validate the fix and open a PR.

Thank you

3 Likes

All best wishes for you too @zugo83. Thanks for testing it. I will turn it into a PR.