+TCP buffer exhaustion STM32F7

I’m having an issue in the field with what I am suspect is network buffer exhaustion, or some type of buffer leak. It takes about a week for this to show up. Symptoms are slower and slower network until complete loss of network happens. FreeRTOS_SendPingRequest fails, I’m thinking because of inability to obtain a buffer. The only thing that fixes the connection is a device reboot.

+TCP version is V2.3.3.
I’m using standard FreeRTOS V10.2.1, not the HAL version.
The device is a STM32F769NIH using the standard NetworkInterface and BufferAllocation_1

Some questions then…

  1. Does closing a socket always release all pending i/o buffers?
  2. Does the HTTP server always reserve 8(or some other defined value) buffers, or why would FreeRTOS_netstat always show 8 buffers in use regardless?

Here are some settings for review, do these look ok?

#define ipconfigNUM_NETWORK_BUFFER_DESCRIPTORS		( 25 )

#define ipconfigUSE_DHCP                      1
#define ipconfigDHCP_REGISTER_HOSTNAME        1
#define ipconfigDHCP_USES_UNICAST             1
#define ipconfigUSE_DHCP_HOOK                 1

#define ipconfigTCP_WIN_SEG_COUNT 256
#define ipconfigTCP_RX_BUFFER_LENGTH			( 4 * 1460 )
#define ipconfigTCP_TX_BUFFER_LENGTH			( 2 * 1460 )
#define ipconfigTCP_KEEP_ALIVE				( 1 )
#define ipconfigTCP_KEEP_ALIVE_INTERVAL		( 20 ) 
#define ipconfigFTP_TX_BUFSIZE				( 4 * ipconfigTCP_MSS )
#define ipconfigFTP_TX_WINSIZE				( 2 )
#define ipconfigFTP_RX_BUFSIZE				( 8 * ipconfigTCP_MSS )
#define ipconfigFTP_RX_WINSIZE				( 4 )
#define ipconfigHTTP_TX_BUFSIZE				( 3 * ipconfigTCP_MSS )
#define ipconfigHTTP_TX_WINSIZE				( 2 )
#define ipconfigHTTP_RX_BUFSIZE				( 4 * ipconfigTCP_MSS )
#define ipconfigHTTP_RX_WINSIZE				( 4 )

Hi @friesen ,

Thank you for reaching out. Could you elaborate a little more on what your project does and what +TCP APIs you use? Is your project based on any existing FreeRTOS demos?

My project is essentially an audio control surface for external DSP. The core communication with the external DSP happens in a single thread, using standard FreeRTOS TCP socket calls. When connection is lost, the socket is closed, and reconnection is attempted. My testing bandwidth is rx of around 12k bytes/Second, more typical would probably be 6k.

There is also an HTTP web server, built on top of the freertos http example, a UDP discovery tool using a single bound listen socket, an SNTP tool, and a CLI server on TCP, single socket at a time.

I’m not finding any non recv issue, although I suppose some type of burst could be problematic in some unknown way. However, recycling the sockets should clear those buffers, which it doesn’t appear to be.

Review this one, any comments? Note that lengths on this capture are not correct.

Questions I have are:

  1. Is my capture code somewhat correct?
  2. Any thoughts about why there seem to be a minimum of 8 continual buffers in use at all times?
  3. If there is a leaked buffer, will it move around the way this dump code works?
  4. This LLC protocol
    think looks suspicious to me, they appear to be static/leaked. Comments?

buffers (8).zip (3.5 KB)


My capture code is like this

uint8_t *getNetBuffCapture(int32_t *length) {
  extern NetworkBufferDescriptor_t *debug_xNetworkBuffers(void);
  extern int isBufferFree(NetworkBufferDescriptor_t *pxNetworkBuffer);
  int i;
  NetworkBufferDescriptor_t * bd = debug_xNetworkBuffers();
  pcap_hdr_t GlobalHeader;
  GlobalHeader.magic_number = 0xA1B2C3D4;
  GlobalHeader.version_major = 2;
  GlobalHeader.version_minor = 4;
  GlobalHeader.thiszone = 0;
  GlobalHeader.sigfigs = 0;
  GlobalHeader.snaplen = 65535;
  GlobalHeader.network = 1; //LINKTYPE_ETHERNET
  uint8_t *buf = x_malloc(sizeof(GlobalHeader) + ipconfigNUM_NETWORK_BUFFER_DESCRIPTORS * ETH_MAX_PACKET_SIZE, XMEM_HEAP_SDRAM);

  memcpy((void *)buf, &GlobalHeader, sizeof(GlobalHeader));
  *length += sizeof(GlobalHeader);
  // Iterate through buffers
  time_t StartTime = time(NULL);
  unsigned int nBytes;
  pcaprec_hdr_t Hdr;
  Hdr.ts_sec = StartTime;
  for (i = 0; i < ipconfigNUM_NETWORK_BUFFER_DESCRIPTORS; i++) {
    if (!isBufferFree(&bd[i])) {
      nBytes = bd[i].xDataLength;
      Hdr.ts_usec = i;
      Hdr.orig_len = nBytes;
      Hdr.incl_len = nBytes;
      memcpy(&buf[*length], &Hdr, sizeof(Hdr));
      *length += sizeof(Hdr);
      memcpy(&buf[*length], bd[i].pucEthernetBuffer, nBytes);
      *length += nBytes;
    }
  }

  return buf;
}

The shims in BufferAllocation_1.c are as follows

NetworkBufferDescriptor_t *debug_xNetworkBuffers(void) {
  return xNetworkBuffers;
}

int isBufferFree(NetworkBufferDescriptor_t *pxNetworkBuffer) {
  return listIS_CONTAINED_WITHIN(&xFreeBuffersList, &(pxNetworkBuffer->xBufferListItem));
}

Hello @friesen,

Can you enable debugging on the device to see what exactly goes wrong here? Because guessing what is happening might take a long time to debug. To expedite the process, maybe you can reduce the value of ipconfigNUM_NETWORK_BUFFER_DESCRIPTORS to 5 (or maybe even less)?

Is there a public repository which has your code?

And to answer your question about

  1. Does closing a socket always release all pending i/o buffers?

Yes, it does. All buffers are released back into the pool of free network buffers.

In the meanwhile, I’ll take a look at the HTTP server example to understand your problems better.

Regards,
Aniruddha

Also, can you point me to the example of FreeRTOS http server that you are using so that we are on the same page?

Thanks,

Mine is rather custom. So far I have this debugged down to what I suspect is an issue with the STM32F7 NetworkInterface.c

Use any example, doesn’t matter.

By the code, these packets should never make it into a reserved buffer. I think its happening in prvNetworkInterfaceInput

This example is LLC, but I think any unhandled type will do the same. It doesn’t happen every time, but after about 20 tries it’ll save one.

here is my test code, you’ll have to gcc compile it on something linux

sendRawEth.zip (1.3 KB)

Thanks for this information. I am very curious to know what happens if you reduce the number of network buffers. That will allow us to pinpoint the cause better.

The unhandled types of packets (such as LLC) are dropped as you can see here. And in case any packet is accepted (IPv4 or ARP), and sent further up for processing, if there is no socket corresponding to the IP-port number tuple, then that packet will also be dropped.

EDIT: In case of ST, the packet will be rejected even before it reaches the IP-task here.

Hello Erik,

Thinking back, there was another case similar to yours. That person’s code would run fine for a while and then suddenly stop. The issue there was an unhandled interrupt which kept triggering. STM chips have a counter which counts frames that are sent/received. This counter triggers an interrupt when it is half full. It might be useful in some cases - however if left unhandled it can wreak havoc in the system. Thus, it might also be a good idea to mask counter interrupts.

You should be able to do that by adding the below piece of code here:

heth->Instance->MMCRIMR = ETH_MMCRIMR_RGUFM | ETH_MMCRIMR_RFAEM | ETH_MMCRIMR_RFCEM;
heth->Instance->MMCTIMR = ETH_MMCTIMR_TGFM | ETH_MMCTIMR_TGFMSCM | ETH_MMCTIMR_TGFSCM;

Let me know if that works for you.

I am guessing that your issue is different since you notice a gradual reduction in network performance instead of a sudden stop. Still it would be beneficial to add this piece of code to your driver. I shall create a PR for this soon.

Regards,
Aniruddha

What is reducing the number of buffers going to tell me?

Its somewhat random, but I can see these buffers getting occupied and staying so. Any comments on how this is possible seeing as these are rejected?

What is the purpose of the __DSB(); in the code? Is there absolutely no way this code can fall through and reserve a buffer on a xAccepted = false?

it is going to allow you to reach the “no-available-buffer” condition faster.

If you can see that buffers are getting occupied, can you see what kind of data they have? Can you print a buffer number when it is allocated and when it is freed? I am not sure what exactly is the data in the buffers and why are they being occupied - hard to tell without more details.

It is a data barrier which makes sure that all memory access has completed before the next instruction in program order executes. You can see more detail here.

I took a quick look at the code and think that there should not be a way to reserve a buffer when xAccepted is false.

This code more reliably causes the issue when repeated about 1000 times. Call like this ./sendRawEthRpt eth0 1000
sendRawEthRpt.zip (1.3 KB)

I don’t think its happening because xAccepted is true, rather there is some subtle fallthrough somewhere reserving these in the list, or misplacing with another packet.

I already know what is getting in the buffer, it is LLC packets in this case. The above wireshark is a dump of the reserved buffers, its not a standard wireshark.

Can you put breakpoints in the code? It would allow you to step through the code and see why is the packet is not being dropped. It would help if the exact location is pinpointed which is messing up the logic.

The packets are being dropped, yes I can put breakpoints in my code, or what? Even though the packets are dropped, they end up occupying the buffer.

Arg. I had left the 2.0.11 NetworkInterface.c driver in place due to a wireshark shim.

Somehow if pertinent commments could bubble up to freertos repo this could be nice. There is no mention of this fix and why, but this is important. Otherwise preceding packets followed by rejects will cause this issue. Any comments ? @htibosch

V2.3.3 of TCP has the fix that you mentioned in the above PR as that PR was merged in January whereas the V2.3.3 was released in July.

Yes, I would follow the packet to see why is the occupying buffer not being freed by stepping through the code.

Hi Erik, I am sorry that I drop into this discussion so late. We are on holiday and since long I left my laptop closed. Aniruddha wrote me a text, asking me to have a look at your conversation so far.

Before I go on, Aniruddha, thank you very much for investigating this problem. I would have had very much the same questions and suggestions.

Erik, would you mind to attach the following files to your post:

stm32fxx_hal_eth.h
FreeRTOSConfig.h
FreeRTOSIPConfig.h

You wrote:

+TCP version is V2.3.3.
I’m using standard FreeRTOS V10.2.1, not the HAL version.

You write “not the HAL version”. Are you using ST’s SPL ?
And if not, what do you mean?

Does closing a socket always release all pending i/o buffers?

and Aniruddha wrote:

Yes, it does. All buffers are released back into the pool of free network buffers.

And to be more precisely, a UDP socket can hold network buffers. A TCP socket has two stream buffers, but it can not hold/poses a network buffer.

So when you think there is a leakage of network buffers, pay extra attention to UDP sockets. They have a List_t of network buffers containing received packets.

Does the HTTP server always reserve 8(or some other defined value) buffers, or why would FreeRTOS_netstat always show 8 buffers in use regardless?

The HTTP server doesn’t need network buffers directly, as I wrote here above. TCP sockets have stream buffers.

However, recycling the sockets should clear those buf

What do you mean with “recycling”? calling closesocket() and socket()?

Questions I have are:

Is my capture code somewhat correct?

It is difficult to say if the code for creating a PCAP is totally correct.

What I have often done is register who is allocating a Network Buffer. And with who I mean from which function and so.
Now when I run out of buffers, I can inspect all network buffers and see from which function they were allocated. That method has helped me several times.

Any thoughts about why there seem to be a minimum of 8 continual buffers in use at all times?

If you configured your driver to be zero-copy, a certain number of Network Buffers are constantly allocated. Their packet buffers are assigned to RX DMA. See macro ETH_RXBUFNB.

If there is a leaked buffer, will it move around the way this dump code works?

Yes the idea sounds good. You dump the contents of the buffers that are allocated?

The PCAP does look strange and corrupted to me.

Its somewhat random, but I can see these buffers getting occupied and staying so

And what is the contents of those buffers? Are those all TCP packets? Or UDP?

Regards,

Some notes and clarifications.

I did re-find the bug, and the problem was found and fixed. The above mentioned Move local variables etc commit is the fix. I’m not sure if its possible, but I’d like to suggest changing the commit message to infer that this fixes a buffer leak, as I had reviewed the commit messages and missed this one. Without this change, buffers leak when good packets are followed by unhandled packets in the same handling loop.

You write “not the HAL version”. Are you using ST’s SPL ?
And if not, what do you mean?

I’m using the stock FreeRTOS api’s and code, not the ST HAL and cmsis etc.

With the nature of this bug, this wouldn’t have worked, because pxGetNetworkBufferWithDescriptor was never called, yet these buffers were getting occupied.

Yes, its pretty strange, because its dumping the raw contents of the non-free buffers, thus lengths aren’t correct with further work, and there is no TCP continuance. I was only using wireshark as a tool here to view what was occupying those buffers. It took about of day of running before these started to show up. The only thing I don’t really understand here is why there are always 8 buffers reserved at all times. The packets in those 8 are always changing, but they are always non-free.

Ah, the problem was solved already by using the newer version. There was so much text, that I failed to see that. You might want to change the status of this post to solved, we can not do that for you.

That problem had existed only a very short time. It was introduced because of a PR about complexity or MISRA (not sure which), and as I use STH32Fxx very often, I discovered it quickly.

Anyway, I hope that your application now works smoothly. I often work in the same field: audio, DSP’s, sound filtering, streaming audio. Also, I am a fan of HTML user interfaces running JS/JQuery.

Thanks