FreeRTOS-Plus-TCP ARP cache behavior question

I have a FreeRTOS-Plus-TCP device that has a few custom network features and I noticed it’s sending a lot of ARP queries. I don’t want to get into my custom protocol handlers here, but let’s just say, my device sees broadcast/multicast traffic from almost all other devices on the network. That appears to be causing the ARP cache to get filled up fast and as the entries age, the stack tries to maintain the cache by spitting out ARP queries for the aging entries.

I’m struggling to understand the reasoning behind this…
Correct me if I’m wrong, but if there is any traffic with a remote host, the ARP cache will get refreshed every time a packet comes in and if there is no traffic with a remote host, the entry is supposed to expire. Why is the stack desperately trying to save this aging entry?

In essence, the stack is trying to have the MAC addresses of all devices on the network whether it’s communicating with them or not. Why?
There are hundreds of other devices the networks that my devices live on and I don’t want to waste RAM on an overly large ARP cache.

Maybe someone would suggest that even when the cache is full, new entries overwrite the oldest entry so no error is generated, but overwriting the oldest entry may in fact overwrite an entry that is being used. This would potentially force and ARP query so communication could continue… and why? Just so we prevent an entry from expiring?

It just doesn’t make sense. Please enlighten me.

Thanks

Hello @epopov,

Which version of the FreeRTOS+TCP stack are you using?

For UDP, I think new entries are added to the cache only when there is a corresponding socket for it with the same port number. Otherwise nothing is added. Why are all these entries getting added to your ARP cache? Is your device receiving (as in processing) the packets from all the peers?

If that is the case, then ARP seems to be doing its job properly. Is designed to be preemptive. Rather than wait for an entry to expire, it will send an ARP request just in case the user wants to send some data to the peer. If the peer has lost power/shutdown, then the ARP entry will expire and will not get added again until and unless the user tries to send some data to the peer or vice versa.

Did I understand your question correctly? Or did I miss something?
Let me know if that is the case and I can improve on my answer.

Thanks,
Aniruddha

@kanherea : I couldn’t have answered it better.

You asked:

Which version of the FreeRTOS+TCP stack are you using?

That matters because at certain points we changed the behaviour of ARP, after doing extensive protocol testing about a year ago. These rules were then implemented:

  • The DUT may only treat an incoming packet when it has confirmed its address by sending out an ARP request.
  • While waiting for the ARP reply, the DUT can store one “waiting packet” in:
    NetworkBufferDescriptor_t * pxARPWaitingNetworkBuffer;
  • When receiving subsequent packets from a “known” IP-address, the ARP cache entry will be touched so it becomes the youngest entry.
  • We have been careful not to store IP-addresses that are not of interest for the DUT.

So the following happens for every incoming IP-packet;

if( xCheckRequiresARPResolution( pxNetworkBuffer ) == pdTRUE )
{
    /* Store the packet and wait for ARP resolution. */
}
else
{
    /* When refreshing the ARP cache with received UDP packets we must be
     * careful;  hundreds of broadcast messages may pass and if we're not
     * handling them, no use to fill the ARP cache with those IP addresses.
     */
    vARPRefreshCacheEntry( xSourceAddress, ulSourceIPAddress );
}

The above code is only executed when there is a socket bound to that target port.
When no socket is found, the source port number is compared with the LLMNR/NBNS/DNS port numbers. When they match, the ARP cache is updated.

Could you check if the latter happens very often in your case?
I mean this code, in three occasions:

    vARPRefreshCacheEntry( xSourceAddress, .ulSourceIPAddress );
    xReturn = ( BaseType_t ) ulDNSHandlePacket( pxNetworkBuffer );

Or, maybe, could you point out what entries in your ARP cache should not be there?
Thanks

Thanks for responding guys,
First of all I’d like to apologize, that I cannot currently test the latest commit on “main”. Right now I’m using what is equivalent to 2.4.0 ( b6eac0ca7df8c8935cb840fd9642f6533a0da2e6 ) I’m also using a heavily modified version of the stack that that handles IGMP, has IGMP snooping and also has a quick and dirty port of a mDNS responder. With that being said, I’m doing my best to comment out as much of this custom-ness when testing for this issue. Thanks for the understanding.

I fully agree with the idea the ARP should be preemptive in order to minimize the need for sending queries, however when you combine that with the active maintaining of aging entries, you end up with entries that will never expire even though there is no need for them. Here’s an example. A host pings out DUT and stops but never again attempts to communicate with the DUT. The result is that DUT will never forget the host’s address even though it will never need it again. Therefor it is my personal opinion that aging entries should not be proactively maintained. If “proper” preemptive learning and updating is used, there should be nothing wrong with letting an entry expire. Think about it, if it expires, that means there has not been a single packet between this remote host and the DUT for the default 25 minutes.

Parallel to the above, please look at the following code:

                if( ucProtocol != ( uint8_t ) ipPROTOCOL_UDP)
                {
                    if( xCheckRequiresARPResolution( pxNetworkBuffer ) == pdTRUE )
                    {
                        eReturn = eWaitingARPResolution;
                    }
                    else
                    {
                        /* Refresh the age of this cache entry since a packet was received. */
                        vARPRefreshCacheEntryAge( &( pxIPPacket->xEthernetHeader.xSourceAddress ), pxIPHeader->ulSourceIPAddress );
                    }
                }

The above in combination of xCheckRequiresARPResolution() results in all IGMPv1 and v2 reports for 224.0.0.252 to be entered into the ARP cache given of course ipconfigUSE_LLMNR = 1
I’d also ask what the reasoning is to exclude UDP from the if() above but that is not that important.

Another thing that I identified as problematic is this:

        #if ( ipconfigUSE_LLMNR == 1 )
            /* A LLMNR request, check for the destination port. */
            if( ( usPort == FreeRTOS_ntohs( ipLLMNR_PORT ) ) ||
                ( pxUDPPacket->xUDPHeader.usSourcePort == FreeRTOS_ntohs( ipLLMNR_PORT ) ) )
            {
                 xReturn = ( BaseType_t ) ulDNSHandlePacket( pxNetworkBuffer );
            }
            else
        #endif /* ipconfigUSE_LLMNR */

        #if ( ipconfigUSE_NBNS == 1 )
            /* a NetBIOS request, check for the destination port */
            if( ( usPort == FreeRTOS_ntohs( ipNBNS_PORT ) ) ||
                ( pxUDPPacket->xUDPHeader.usSourcePort == FreeRTOS_ntohs( ipNBNS_PORT ) ) )
            {
                 vARPRefreshCacheEntry( &( pxUDPPacket->xEthernetHeader.xSourceAddress ), pxUDPPacket->xIPHeader.ulSourceIPAddress );
                 xReturn = ( BaseType_t ) ulNBNSHandlePacket( pxNetworkBuffer );
            }
            else
        #endif /* ipconfigUSE_NBNS */
        {
            xReturn = pdFAIL;
        }

The result from the above is that ANY LLMNR ( 224.0.0.252 ) or NBNS ( local net broadcast ) lookup will add the querier’s MAC address to the cache regardless of whether it needs to be there or not. Now couple that with actively maintaining aging entries and that table just keeps growing.

A better ( in my opinion ) approach would be to add entries only if the LLMNR query needs to be responded to or if the NBNS is for our name. I have implemented such solutions but will refrain from posting them here as I’m still evaluating them.

Let me know what you think and sorry for the lengthy post

I am sorry to hear that the recent reorganisation of source files makes it difficult for you to synchronise with “main”. Every function is moved around, some got renamed, and some new functions were added. For me personally that is also time consuming.

What we hope is that after all these changes, the FreeRTOS+TCP library will be easier to maintain, and also that quality of code increases.

@epopov wrote:

The result is that DUT will never forget the host’s address even though it will never need it again

That is true and I agree that it is not a good idea.

I am playing with extreme values to observe the behaviour of ARP renewal:

#define arpMAX_ARP_AGE_BEFORE_NEW_ARP_REQUEST    3
#define ipconfigMAX_ARP_AGE                      5

and I see the same behaviour.

Let me know what you think and sorry for the lengthy post

No need to say sorry, thank you for your post!

I also agree about the incoming LLMNR/mDNS/NBNS messages: the ARP cache table can get polluted when many devices do look-ups.

Do I understand that your ARP cache also contains entries with multi-cast addresses? That should not happen, of course.

Unless you are very eager to create a PR, I will write a proposal for changes of the ARP behaviour. Is that OK for you?

Hein,
It’s not that hard for me to get on main, I just have more important things right now. Reorganization is usually a good thing

Noooo, that would be hilarious :smiley: My main issue was that I noticed my ARP cache getting full and with that comes the possibility of overwriting entries that are actually being used. That’s what got me digging deeper and questioning parts of the code.

I’m not in a hurry at all. The current code works, so I’m in no rush. I have actually learned quite a bit and uncovered a few more issue that I want to think about for a few days. So if you don’t mind, I’d like to follow up with a proposal of my own that will include all my findings, thoughts and reasoning… Thing that maybe you and the team can consider when deciding what actions may be needed.