Real-time Ethernet Sending

Hello. I am working on a real-time application that is responsible for collecting data and sending it via a UDP packet at some sample rate >1kHz. I absolutely love using FreeRTOS_Plus for the TCP/IP stack capability and am hoping to keep using it, however I am between a rock and a hard place.

My FreeRTOS tick runs at a 1 ms rate, but say the data I’m collecting is at 5 kHz. My goal is to send, out the door, UDP packets at a 5 kHz rate, is this possible? When I call FreeRTOS_sendto(), the UDP packets get queued up and then 5 packets go out every millisecond. Is there a simple modification or something, anything, that I can do that lets me send packets from interrupts, or immediately when I call sendto()? I am basically trying to send Ethernet packets faster than the tick rate.

Thank you,

What is the priority of the task running the TCP/IP stack relative to the task calling sendto()? To minimise latency at the cost of more context switches it is recommended to have any MAC handling tasks as the highest priority (tasks that are handing packets going into or out of the Ethernet hardware), the TCP task at the next highest priority, and the application tasks below that. That way as soon as you unblock the TCP task by sending data to it the kernel will switch to the TCP task because it is unblocked and has a priority higher than the task calling sendto(). If you run out of network descriptors or packets then the task will block again to wait for more to free up - but keeping the TCP task at a high priority will ensure resources are freed earlier too.

The IP-task (prvIPTask in FreeRTOS_IP), created be FreeRTOS_IPInit() has a priority of ( configMAX_PRIORITIES - 1 ), this is defined in FreeRTOSIPConfig.h. Where configMAX_PRIORITIES is 8, this is on a Zynq 7000.

All of the tasks that I create, including the data processing task which is my highest priority tasks, range from 2 to 6. So the TCP/IP task is running higher than all of my tasks.

I believe the issue is that I just can’t call FreeRTOS_sendto() from an ISR. The data collecting ISR puts the data in a queue and then this wakes up the data processing task that calls FreeRTOS_sendto.

Make certain then that your ISR DOES actually wake up the task by activating the scheduler at the end, and not just end and wait for the tick to activate the scheduler.

In addition to Richard’s remark you should use the portYIELD_FROM_ISR( xHigherPriorityTaskWoken ); resp. taskYIELD_FROM_ISR(); mechanism as documented e.g. in this example FreeRTOS - Open Source Software for Embedded Systems - FreeRTOS xQueueSendFromISR() interrupt safe API function description
Otherwise the context switch occurs with the next systick so worst case 1 ms later.

Thanks team for all these great answers!

@Maty : the priority of the EMAC driver is the highest by default, and the IP-task’s priority is one lower:

#ifndef niEMAC_HANDLER_TASK_PRIORITY
    /* Define the priority of the task prvEMACHandlerTask(). */
    #define niEMAC_HANDLER_TASK_PRIORITY    ( configMAX_PRIORITIES - 1 )
#endif

#define ipconfigIP_TASK_PRIORITY            ( configMAX_PRIORITIES - 2 )

So if a task uses the IP-stack, it should run at ( configMAX_PRIORITIES - 3 ) or lower.

Reading your constraint of sending 5,000 UDP packets per second, I wonder if you can send bigger packets, at a lower frequency? That would be much more efficient.
When using IPv4, and an MTU of 1500, the UDP payload may contain 1,472 bytes.

Also you may have seen the possibility to call sendto() with the FREERTOS_ZERO_COPY flag? You can fill it with data, pass it to the IP-task, which will pass it directly to DMA for transmission. That should be the fastest method.

Remember to enable optimisation ( -Os or -O2 ), disable “expensive” options like stack checking, enable zero-copy in the network driver, use checksum offloading, anything else?

If you want you can attach source code and/or your config files to this post, and I will look at them.
If you want to do measurements, may this module “hr_gettime.c” may be useful. It returns the running time in uS, so you can set time-stamps at certain points.

Hi Maty,

another thing to consider of course is network infrastructure. To accomplish your goal, you mest ensure that the network segment your network runs in is never subject to problems arising from things like broadcast storms (those are more frequent than one would hope to believe, eg incurred by certain network management packets). If your network load is not predictable, the network stack may starve out your system, in particular WHEN your network has highest pri on irq or task level or both (in those cases, penetration tests will likely fail).

Which reminds me of the use of network sniffers: the communication with a host may become considerably slower as soon as you start monitoring the network traffic with a program like WireShark.

Actually I haven’t observed this yet, but I can see why that must be so (the PHY must be configured promiscuous for a sniffer to run which naturally affects the network load on the sniffing machine). I normally try to run the sniffer on a spearate machine which I hook up to a hub (yes I keep the old dumb Netgear boxes like a treasure as every “modern” switch will not route every packet on the segment to all ports). Yet there are follow up problems stemming from this (the network setup might not like the sniffing machine in the segment). The last desktop OS I was able to reconfigure absolutely silent on the net was NT4.0, so I used to carry a stone age Satellite Pro laptop running NT4 with Microsoft’s Netmon around (boy, do I remember spending hours in high security areas on airports due to that gizmo…).

That would be yet another case for a standalone sniffer box, preferrably running FreeRTOS ( :kissing_heart:)… any takers for that project? :grin:

@RAc wrote:

That would be yet another case for a standalone sniffer box, preferrably running FreeRTOS ( :kissing_heart:)… any takers for that project?

I would like to develop the software for it, but I don’t have an affordable 1 Gbps board with 2 x EMAC + PHY, plus an SD-card for storage.

When I developed an iperf3 server, I often created a PCAP file on both sides.

That gave a new insight:

  • on the host side, it seemed that its iperf client was super fast
  • at the embedded side, I saw the opposite: my iperf server was very fast and mostly waiting for the host

Actually, the setup I have in mind would have THREE EMACS+PHYs - Two for the 1:1 tunnel between the net and the machine to be logged, and one maintenance/logging interface over which the log would be pumped to a separate logger (a little like a debug probe with trace interface). That would eliminate the need for an SD card and allow for some intelligence (such as letting the logger formulate and send downstream filtering rules so that the box would also act as a remote controlled “hard firewall.”)

I remember looking at MCUs with more than one EMAC and PHY around two years ago, and there were one or two that would fit the bill. Now “all it takes” would be to get an eval board for one of those PODs, and off we could go… :wink:

That’s a good strategy in many use cases (as I’m sure you know - I don’t think I’d ever contradict you in anything you write… :slightly_smiling_face:) - for example, in some setups, nasty firewalls may close an existing connection in ugly ways, such as faking an RST/ACK/FIN sequence to one side but leaving the other side completly in the darkness. I’ve also encountered buggy virtualizers or routers who would at some time simply drop certain packages and/or miscompute packet checksums while translating packets to NAT layers. Those kinds of things one will exclusively be able to pinpoint with traces on both sides…

Thank you everyone for your help!! I’ve seen this xHigherPriorityTaskWoken parameter in all “…FromISR()” calls, the one that I’m doing is xQueueSendToBackFromISR(), but I’ve always set it as NULL. I’m still fairly new at the intricacies of FreeRTOS. Using that parameter and switching some of my priorities around, I’m now able to send UDP packets faster than 1ms.

2 Likes

In the spirit of what has been written here before, here’s a little magician’s trick to impress your 6 year old with - I just learned about this today:

Assume an off-the-shelf setup in which your, say, FreeRTOS+TCP based unit communicates against a host PC, both on your desk. It works as expected, but in order to trace a routing problem, you modify the subnet mask on your FreeRTOS+TCP board such that the outbound packets go through the default gateway (an active third machine not important for the discussion here). Since this is a routing problem, some node in the path between the default gateway and your host PC drops the packets from your board, so the communication won’t work.

So you open wireshark to at least try to figure out what’s going on, and loandbehold, KAMBAZAM! … the communication works fine. Just by opening wireshark. Close wireshark and the disturbance is back. Try this at home if you want to!

After some headscratching, I found a completly logical and crystal clear explanation. WIresharks’s PCAP driver will (MUST) put the local PHY into promiscuous mode, meaning that the unit’s MAC address filter will be disabled and ALL packets on the net will reach the app layer - that’s trivially necessary for wireshark to work, right?

Yet in return that means that not only puts this a heavy performance burden onto your host PC, as Hein pointed out earlier, BUT it also means that the local TCP stack will now receive and process ALL IP packets seen on the net.

For most packets, that means that the additional IP address filter in the network stack will drop the packets whose target IP address does not match any locally allocated IP address. However, the routed packets to your host PC will still contain the “correct” target IP address (in fact they’ll look identical to the packets directed to the unit directly), but were initially directed to a different MAC address - the one of the default gateway! Since the MAC address is stripped anyways after the packet is passed up the MAC layer, your IP stack can’t tell the difference between the routed and non-routed packets and will treat them exactly as if they had gone directly to the host.

Phew! Lesson reconfirmed: Do NOT sniff the traffic on a machine that is involved in the communication you want to trace. Heisenberg effect.

1 Like

Thank you @RAc for this interesting and funny story! It teaches that we really need an independent device for sniffing network data.

One detail:

Since the MAC address is stripped anyways after the packet is passed up the MAC layer

Uh? Is that true? Who can not see the MAC-address anymore?
FreeRTOS+TCP still inspects all MAC-addresses to filter packets.

EDIT : or are you talking about a TCP application/server that is running on the host?

Thanks, Hein

Hi there Hein,

I am referring to the PC implementation.

Obviously, any TCP stack must on the lowest level still inspect the MAC address to distinguish broadcasts from dedicated packets (assuming the default cases non-promiscuous handling). My explanation for the behavior I see is that the PC TCP stack apparently does not re-check non-broadcast frames (that would also sort of breach ISO/OSI layering because the MAC address should not be known at that layer anymore) but instead executes the following pseude code:

mac_input(Ptr to complete packet from DMA ptr)
{
    IP_Packet_Ptr ptr2 = StripEthHeader(ptr); 
    if (IS_BROADCAST(ptr->SrcAddr))
        HandleBroadcast(ptr2);
    else
        Dedicated_Input(ptr2);
}

Thus, the upper layer (Dedicated_Input()) does not know anymore what the original destination MAC was and thus will pass packets coming through the path exactly like those that passed the MAC filter.

Come to think of it, it pretty much must be this way because the packets that arrive at Dedicated_Input() may also come from an interface that doesn’t know about MAC addresses such as PPP lines.