I’m having some difficulty, where the FreeRTOS+TCP hangs when trying to transmit a ARP reply frame, because the mutex for the ethernet driver is still held by the code that is in the middle of receiving multiple frames.
I am using a KSZ8851 Ethernet chip with my own driver, and have a FreeRTOS mutex controlling access to the chip. When receiving frames, it goes through the following steps:
- Take Mutex
- Get number of Frames
- Get Frame, and give to the network stack with xSendEventStructToIPTask
- Repeat the above until there are no more frames
- Reset interrupt
- Release Mutex
This process runs in its own task, and waits for notification to come from an interrupt line, that is tiggered by the KSZ8851 when a frame arrives.
I haven’t had any issues with this, but I’ve been using a minimal network setup. (Either just a switch and a couple devices with static IPs, or a simple router and 3 or 4 devices) Now, I’m working from home, and the device is plugged into my home network, with all the associated traffic. I suppose this is an excellent test of the system.
In my case, on a regular basis, the router (A Netgear R6400v2) will send a bunch of ARP requests to its list of devices. These are shown in Wireshark as “Who has 192.168.1.x? Tell 192.168.1.1” This flood of packets comes through the stack on my device very quickly (I get two calls to the interrupt, one with 10 packets in queue, and the other with 7), and when it gets to the one for the device’s IP (.19), it attempts to send a ARP reply. And then the ip thread hangs, waiting for the mutex in the transmit thread, and the receive thread hangs in the receive, not continuing the receive process…
The two processes (receive packets and transmit packets) are in different threads, so on first glance, this should resolve normally (isn’t that what mutexs are designed for?), but looking that it again, the receive thread calls pxGetNetworkBufferWithDescriptor and the code is hanging there (probably on the line if( xSemaphoreTake( xNetworkBufferSemaphore, xBlockTimeTicks ) == pdPASS )
?). I do have the timeout set to portMAX_DELAY.
Is there some way to prevent this deadlock without complete redoing the program flow in teh network driver?
Thanks for any help.