Mutex issue when receiving multiple frames

I’m having some difficulty, where the FreeRTOS+TCP hangs when trying to transmit a ARP reply frame, because the mutex for the ethernet driver is still held by the code that is in the middle of receiving multiple frames.

I am using a KSZ8851 Ethernet chip with my own driver, and have a FreeRTOS mutex controlling access to the chip. When receiving frames, it goes through the following steps:

  • Take Mutex
  • Get number of Frames
  • Get Frame, and give to the network stack with xSendEventStructToIPTask
    • Repeat the above until there are no more frames
  • Reset interrupt
  • Release Mutex

This process runs in its own task, and waits for notification to come from an interrupt line, that is tiggered by the KSZ8851 when a frame arrives.

I haven’t had any issues with this, but I’ve been using a minimal network setup. (Either just a switch and a couple devices with static IPs, or a simple router and 3 or 4 devices) Now, I’m working from home, and the device is plugged into my home network, with all the associated traffic. I suppose this is an excellent test of the system.

In my case, on a regular basis, the router (A Netgear R6400v2) will send a bunch of ARP requests to its list of devices. These are shown in Wireshark as “Who has 192.168.1.x? Tell 192.168.1.1” This flood of packets comes through the stack on my device very quickly (I get two calls to the interrupt, one with 10 packets in queue, and the other with 7), and when it gets to the one for the device’s IP (.19), it attempts to send a ARP reply. And then the ip thread hangs, waiting for the mutex in the transmit thread, and the receive thread hangs in the receive, not continuing the receive process…

The two processes (receive packets and transmit packets) are in different threads, so on first glance, this should resolve normally (isn’t that what mutexs are designed for?), but looking that it again, the receive thread calls pxGetNetworkBufferWithDescriptor and the code is hanging there (probably on the line if( xSemaphoreTake( xNetworkBufferSemaphore, xBlockTimeTicks ) == pdPASS ) ?). I do have the timeout set to portMAX_DELAY.

Is there some way to prevent this deadlock without complete redoing the program flow in teh network driver?

Thanks for any help.

1 Like

You ran out netbufs. I afraid there is no clean way to escape especially with many ARPs except increasing available netbufs or just dropping ethernet/ARP packets by specifying a reasonable timeout for the netbuf budget semaphore. I had a similar problem last time and I did both to mitigate the ARP burst issue.

1 Like

Thanks for the heads up. After I sent the email and had a chance to mull it over, I had an inkling that it might be that.