STM32F207 FreeRTOS-Plus-TCP loosing pings

sdcandy · August 17, 2021, 8:12am

Looking at the entries in the wireshark capture (sent to Hein) where the protocol isn’t being recognised they all appear to be sent from MAC addresses listed as BSkyB_xx:xx:xx. Analysis of the capture seems to show 4 unique MAC addresses.

They will be the SkyQ boxes that we have for satellite TV. There is one “main” box that has a wired connection to the 192.168.10.xxx network and then two “remote” boxes that communicate with the main box using their own 5G mesh network. I assume the 4 MAC addresses correspond to the wired connection and the three wireless interfaces.

As the remote boxes are able to stream catch-up TV directly from the internet (as well as live TV via the dish) it looks like they have implemented some kind of magic to make it work.

I’ll try running a couple of quick tests firstly with just the main box disconnected from the network and then, if needed, with them all powered off and see what happens to the STM32F207.

BTW - this doesn’t look to be version specific as I have updated FreeRTOS and FreeRTOS+TCP to v202012.01 (current LTS version as of writing) and the problem persists.

-Andy.

RAc · August 17, 2021, 8:26am

Interesting stuff!

If I understand it correctly, those boxes generate traffic that to the STM node looks like a broadcast storm, using up most of the RX descriptors which in turn drops many of the incoming ICMP request packets (or, worse, any relevant network traffic once there is host communication with your unit)? That means that the target address of all those packets is a broadcast MAC address?

Or is there another chain of events that drops the packets?

sdcandy · August 17, 2021, 8:35am

As @htibosch pointed out, most of the packets are sent to the broadcast address but there are also a proportion of them that are sent to a specific MAC address (01:00:c1:ab:00:1c).

Wireshark identifies that as Madge_ab:00:1c but I have no idea what it is. It doesn’t appear to be anything on the local network that has asked for an IP address using DHCP.

-Andy.

sdcandy · August 17, 2021, 8:47am

Pulled the network cable out of the main Sky Q box and the packets being rejected by xMayAcceptPacket() but the ping test is still failing.

-Andy.

htibosch · August 17, 2021, 8:53am

Another test that I always like is flood ping: “sudo ping -f 192.168.10.179”, which sends as many pings per second as possible.

but the ping test is still failing

One would like to know if the packets really arrive in the EMAC or not. Maybe you can count the number of incoming frames, for instance increase a counter in the function HAL_ETH_RxCpltCallback().

htibosch · August 17, 2021, 9:23am

I just played back your PCAP file, but pinging was not disturbed by the broadcast messages.
( setup: a Zynq playing the PCAP file, and an STM32F4x for testing pings )

A long time ago I was also looking for lost packages, and it turned out that simple LAN packets got routed to the wrong interface. I made a simple batch command to clear the router and ARP tables, and that helped for a while.

I noticed that when my device did not respond timely (because of a breakpoint), Windows started looking for alternative routes to the 192.168.x.x address. I would be curious to know if that happens to you.
At that time I added the so-called gratuitous ARP request, just to let all hosts on the LAN update their ARP and routing tables. That seemed to help in some situations.

Note that the frequency of these messages is high by default:

    #define arpGRATUITOUS_ARP_PERIOD    ( pdMS_TO_TICKS( 20000U ) )

and that 20 seconds is about the time that your STM32F2x can not be reached:

22 bad, 3 good  // 22 seconds
26 bad, 4 good  // 26 seconds

What you can try is add this declaration to your FreeRTOSIPConfig.h:

    #define arpGRATUITOUS_ARP_PERIOD    ( pdMS_TO_TICKS( 1000U ) )

… which is every second, for a test. I think that ping will work better, but maybe not perfect.

RAc · August 17, 2021, 10:09am

makes snse, but you would see that in the pcap log by the target MAC address of the packets not seen by the peer? If all of the outboung ICMP requests are correctly built, including the unit’s MAC address, then the drops must occur in the unit, right?

Since Andy reports that his unit physically sees the packet but rejects them via the unexpected packet type signature, the packets must pass the PHYs MAC address filter. So it does appear to be an RX buffer overflow problem, no? I suspect that your setup won’t show the problem because it can’t play back in real time, or the pcap file won’t play back the packets it won’t decode or something like that?

sdcandy · August 17, 2021, 12:16pm

htibosch:

What you can try is add this declaration to your FreeRTOSIPConfig.h:
    #define arpGRATUITOUS_ARP_PERIOD    ( pdMS_TO_TICKS( 1000U ) )
… which is every second, for a test. I think that ping will work better, but maybe not perfect.

This seems to be correct, with arpGRATUITOUS_ARP_PERIOD defined to be 1 second the ping is better but I am still getting around a 10% packet loss.

I’m going to have a look this afternoon on what is going on in HAL_ETH_RxCpltCallback().

-Andy.

sdcandy · August 17, 2021, 5:40pm

By email, @htibosch asked me to filter the earlier Wireshark capture using icmp || arp.isgratuitous == 1.

When looking at the output that generated there were ARP announcements being generated for two different IP addresses using the same MAC address. The MAC address in question (80:12:34:56:78:90) is the default one I use for FreeRTOS+TCP projects which should be overwritten when a device is setup correctly.

It appears that there is a device (also running FreeRTOS+TCP) from a previous project sat on the network that hasn’t been properly configured as it is still using the default MAC address.

Reconfiguring the STM32F207 board to have a unique MAC address has resolved the problem. A simple ping test works without any lost responses and ping -f dropped two packets out of around 71,000 sent.

Thanks for your input!

-Andy.

htibosch · August 18, 2021, 2:41am

Andy, thank you very much for reporting back.

This does remind me that the IP-stack should at least warn if there is a conflict of either IP- or hardware addresses. That would save time.

Regards,

hs2 · August 18, 2021, 7:21am

That’s a good idea Hein ! Maybe with a user defined, optional macro like configASSERT ?
Some devices don’t provide logging/printing capabilities.
Especially duplicate MAC addresses are a potential issue with embedded devices using SW defined MAC addresses.

RAc · August 18, 2021, 7:31am

? That could only work on the node level for multiple IP addresses on the same MAC as the MAC filtering mechanism wouldn’t let packets with a different MAC address pass through to the MCU? Well, better than nothing, I guess…

hs2 · August 18, 2021, 7:42am

There is no silver bullet I think it could help. Duplicate addresses can cause weird symptoms. Non working devices are easier to debug and fix than halfway working devices.

RAc · August 18, 2021, 7:48am

Absolutely! I just wonder how a device should be able to detect IP address conflicts for its own IP address if the MAC filter prevents it from even seeing those weird packets (directed, by definition, to a different device)?

Of course, if the device sees multiple ARP mappings for a REMOTE IP address, that can and should be detected (should also be fairly straightforward to implement via the ARP cache).

hs2 · August 18, 2021, 8:00am

You’re right. But for diagnostics one could set the MAC e.g. to promiscuous mode.
We’ll see if it can be implemented in a useful way. It’s surely not so easy…
I guess the standard duplicate address detection for the IPv6 is already implemented.

RAc · August 18, 2021, 8:25am

That kind of thing sounds very much of a poster case for the Heisenberg effect as promiscuous mode will very certainly exhaust the RX buffer descriptor pool in a typically loaded network and thus change the rules dramatically. But as you correctly point out: Let’s see!

I’ve been pondering the idea of a network analyzer box that can be plugged between any two devices, works in prom. mode and analyses (possibly blocks) every packet in realtime. Something like a virus scanner in hardware. That could also be used to detect IP address conflicts without affecting the nodes themselves.

htibosch · August 18, 2021, 2:23pm

Hartmut wrote:

You’re right. But for diagnostics one could set the MAC e.g. to promiscuous mode.

I think that is not necessary. We are already using the gratuitous ARP request, this is like saying: "I use IP address a.b.c.d and this i my MAC-address.
I should look it up, by I think that any other device carrying the same IP-address should respond and protest. I am sure there is an RFC about this subject.

RAc wrote:

as promiscuous mode will very certainly exhaust the RX buffer descriptor pool in a typically loaded network

I think that the LAN hardware, i.e. the switch is already filtering a lot of traffic. I shoul try it, but my STM32F4 won’t see the streaming video that I am watching

I’ve been pondering the idea of a network analyzer box that can be plugged between any two devices

Yes, me too! I’d like to find a board with a Xilinx ( lots of memory and high speed ), which functions as a bridge and records every packet in a PCAP file on the SD-card. Then I can finally see what a FreeRTOS device is seeing, and also it will also show the exact timings.

That could also be used to detect IP address conflicts without affecting the nodes themselves

Not sure if it could, because the network switches are filtering all traffic.

It is time to find the RFC about IP- and MAC-address collisions. I think that many problems can be avoided if the IP-stack checks and responds to these collisions.

hs2 · August 18, 2021, 2:37pm

@htibosch Didn’t you already implement duplicate IPv6 address detection (likely following the corr. RFC) for link local auto-configuration ?

htibosch · August 18, 2021, 4:49pm

Yes you are right about that. When using RA ( Router Advertisement ), the device gets a network prefix, and it can chooce its own IP-address. It will try 3 times to advertise this IP-address, and wait if any other device on the LAN will complain.

Within the IPv4 branch there is also something like that: after DHCP has failed, it can fall back to using a link-local address. This happens when ipconfigDHCP_FALL_BACK_AUTO_IP is defined.

Now we’re looking at a situation in which two devices either have the same MAC- or the same IP-address. I think we can at least give a warning.
Calling configASSERT() is possible, but not in a real-life application.

hs2 · August 18, 2021, 7:29pm

Thanks for the explanation regarding autoconf !

Umm… right. I had only lab/development issues in mind. But sure, it‘d be even more valuable having an indication in the real word.