I’m working with a slightly unusual setup: I have a device with two network interfaces using FreeRTOS-Plus-TCP, each interface on a different subnet. Each interface has a TCP server socket, which are listening on the separate subnets. During normal operation there is a separate client device on each subnet respectively. These will connect (through a switch), one to each interface on the FreeRTOS device. Both of these client devices have the same MAC address, but they are different physical devices. We are not using DHCP. See the attached diagram.
The problem arises when there is IP traffic on both interfaces. Interface 1 will initially send an ARP request for it’s client’s IP and record the MAC address. Interface 2 will do the same. When interface 2 receieves a reply it will overwrite the ARP cache entry with a different IP (as it’s client is a different device). Interface 1 will then do the same and overwrite it back. The net result of all this is that almost all IP packets require an ARP request first since the corresponding ARP cache row is constantly being evicted by the other interface. This is causing network bandwidth issues and is not ideal in general.
More detailed sequence of events:
Interface 1 sends ARP request for 169.254.11.1
ARP reply for 169.254.11.1 MAC X. The cache now contains an entry associating 169.254.11.1 with MAC X
Unrelated but concurrently, interface 2 sends ARP request for 169.254.21.1
Stack receives ARP reply for 169.254.21.1 MAC X
vARPProcessPacketReply. The reply is destined for us, so call vARPRefreshCacheEntry.
The address is local (to Interface 2’s endpoint), so look it up with prvFindCacheEntry.
prvFindCacheEntry looks at the cache and will find a matching MAC (as it has been recorded from Interface 1), but no matching IP. xLocation.xMacEntry is populated but xLocation.xIpEntry is not.
Since there was a matching MAC, we will overwrite interface 1’s IP association to that MAC with the incoming packet’s source IP.
This same sequence happens again when Interface 1 comes to transmit an IP packet again, and will rewrite the cache entry back to it’s client’s IP.
I think the correct thing to do here is to partition the ARP cache lookup based on interface, so we effectively have N independent caches, one per interface. The cache rows already hold an endpoint pointer, so this could be used when looking up entries to filter based on the associated interface. Windows and Linux seem to handle this network topology fine so I imagine this is what they do. Before I start work on a fix I’d like to know if this is the correct approach or if there are any other considerations that should be taken into account.
MAC addresses are DEFINED as GLOBALLY unique identifiers, so your setup is non-conforming. Personally, I would change you setup to not have two interfaces with the same MAC address.
Yes, of course it is preferable that all MAC addresses are unique. Unfortunately there is a limitation (bug) with the client devices such that we cannot persistently change their MAC away from default. This network is completely internal to a system so it’s not an issue aside from the ARP problem.
These are link-local IPs. The windows client devices may be connected to the outside world though other network interfaces. We cannot use private IPs as these may conflict with the (unknown) external LAN.
Regardless of duplicated MAC addresses, isn’t the existing logic still an issue if, for example, a device presents multiple IP addresses with a single MAC address? In this case, the ARP cache should point both IPs to the same MAC. As it stands, I believe it will evict the entry for the other IP causing the two IPs to “fight” over the ARP cache slot.
Which is actually a misuse of the 169.254.0.0/16 Automatic Private IP Addressing range, which is DEFINED to have a netmask of 255.255.0.0
Something very much smells about this configuration, as devices should not have a “default MAC address” except in very limited lab testing environments before a product is released, and if they get a APIPA address. they should have a 255.255.0.0 net mask as a result of their probing the APIPA address space for a unused address, and that probe is defined to be to the 168.254.0.0 network.
We are using the /24 subnet mask because we need to differentiate between the two LANs (169.254.11.0/24 and 169.254.21.0/24). DHCP is explicitly disabled on the network and each device/interface has a static hardcoded IP, so they are never probing for APIPA addresses. These networks are part of a single larger system and it’s topology is expected to be eternally constant. Either way this is besides the point of the original post.
devices should not have a “default MAC address” except in very limited lab testing environments
Yes, as I mentioned above I am well aware of this. The problem is a silicon bug in these adapters which means we cannot change their MACs.
Changing the netmask on the APIPA network means that you can not allow ANY other device to attach to that physical network, as you have broken that protocol, and at that point might as well just put the ARP information in ROM and never query for it. By changing the netmask, your devices won’t respond to APIPA queries by some other device and thus you are subjecting yourself to a possible address conflict.
IF the adapters are under your control, the short term inefficiency while you fix the bug shouldn’t be that bad as you fix the problem. I hope you have no plans to release such a fatal bug out of a very controlled lab, at which point the need for the broken APIPA protocol shouldn’t be present, as you should be able to define some normal private nets to be used.
Ok, I think this thread has been derailed a bit. The MAC address silicon bug is out of my control as it’s made by Microchip. Rather than argue about an unrelated topic of IP addresses, I will maintain a local fix and not attempt to upstream anything. Cheers