FreeRTOS ISR routine Handling , Callback Function

“The one thing you need to make sure you do is since this is video, that if it is supposed to be ‘live’, that this rate is sufficient for passing it at the live rate, and you don’t want TOO much buffering to limited lag (if that matters)”

Thanks! Yes I trying to have much faster rate in the whole chain from Video fps.

" will admit that I am not familiar with this particular driver, but normally, the Rx routine should have a way to tell the machine that is sending that “I didn’t get that packet, I was too busy”, to get the transmitter to back off and resend it in a little bit. With that, every thing will run at the average speed of the slowest link in the chain."
So as far as I understand the RX DMA - TPC/IP send back Task should be informing Host machine or perhaps Tx DMA -TCP/IP recv Task? I thought that in the second case “losing” packets and re-sending them would be illegal , right?

Thank you very much,
Best Regards,
Theo

When the Rx gets a packet, if there is no buffer available to save the packet with, it needs to reject the packet. I think with TCP/IP if it just drops the packet, it then won’t ack the packet, and then the transmitter will see the packet lost (since it doesn’t get a Ack) and resend, and perhaps seeing this happen a lot, drop its sending rate assuming a congested link. This is basic TCP/IP protocols. This would be driver level stuff, but if you are implementing Rx and Tx DMA, you are working at part of the Driver level.

I would think that this would be in the TCP level protocol, which should be between your network packet driver and your application processing, as that is where the ACKs would be generated.

Hi Theo,

I’m still not quite sure about some aspects of your design, but if you expect your network communication to be real time, you’ll be in for some nasty surprises. TCP/IP can and will not make any guarantees for packet turnaround, and in practice you should never rely on any assumption about turnaround behavior. The network may very well be the performance bottleneck, and there is nothing you can do about that.

The other thing is that either side as well as intermediate instnaces like firewalls or providers may at any time terminate exisitng network connections, and your and your peer’s code must be able to recover from these, eg resync the data stream. That’s one of the things that the framing protocol Hartmut and I mentioned is needed for.

@TheokonT
I’d like to add that instead of just making use of TCPs internal flow control protocol to auto-adjust the data transfer rate it might be better (later on) to add an application layer flow control protocol (maybe a simple XON/XOFF protocol) to have better control about the data stream/rate and to keep it as fast and fluid as possible. TCP retransmissions might cause considerable lags/hickups out of your your control.
Given the network connection itself is not/never a problem. In this case as RAc explained you can’t do anything.

Concering your application do I got it right, that you basically have an image co-processor in a FPGA connected to a PC via LAN ?
So the data flow is: PC(LAN) → image to MCU → to FPGA(TxDMA) → processing in FPGA → back to MCU(RxDMA) → back to PC(LAN) and repeat with the next image ? Where do you expect the bottleneck ?

@richard-damon I think this is getting more clear now. But in my case i will have 2 sockets so 2 ports on my Host machine one for transmit the video and one for take it back , so will transmitter get an Ack?

@RAc @hs2 thank you both for helping! The connection will be not DCHP and no router between (I’ve tried this in the beginning and the throughput was very bad) . So local ethernet between host and FPGA. Do you think this will be still a problem concerning termination connection e.g?
@hs2 My application is for demonstration purposes and it is a little custom. Hardware IPs will be telecom FECs , so i would like to demonstrate the video correction depending on the SNR. I think you are right for the dataflow.
The bottleneck to be is IP dependent i think … Some IPs will be very slow for sure thus TCP/IP will not be the problem. But i am trying to do is to avoid the “serial latency” as much as i can so that’s why i had the idea of 2 sockets.Do you think it will be better?

“’d like to add that instead of just making use of TCPs internal flow control protocol to auto-adjust the data transfer rate it might be better (later on) to add an application layer flow control protocol (maybe a simple XON/XOFF protocol) to have better control about the data stream/rate and to keep it as fast and fluid as possible.”

This protocol will be between my Tx Rx DMA in MCU?

Thanks for the help guys!

At the protocol layer of TCP, every packet that is sent, when it is received gets an ACK sent. That is what makes it a ‘reliable’ transport. You get positive confirmation that every block made it to at least the next step in the chain, and that step will do the same to its next step.

Because of this, there is a latency in the system that can be a bit unpredictable.

As hs2 mentioned, one issue with this is that this timeout for an ack may be longer than you want if trying to keep a tight time schedule for data.

If this is just an ‘off line’ accelerator, then it might not be an issue, just affects the over all speed.

1 Like

It could. If there are any nodes in the same segment that take up bandwidth (for example, PCs involved in Zoom sessions or people downloading cat videos as if there was no tomorrow), you will suffer throughput issues.

If you wanted to roll out your software to customers, could you tell them to excusively used closed LANs that they have full control over? Ie no other nodes in the same segment, no siwtching or routing hardware in between, not even located in VPNs?

Another issue is that you will need to implement timeouts to prevent your app from freezing, and that will (positively. Trust me on that one) cause follow up issues you’ll need to deal with.

1 Like

“It could. If there are any nodes in the same segment that take up bandwidth (for example, PCs involved in Zoom sessions or people downloading cat videos as if there was no tomorrow), you will suffer throughput issues.”

The connection is the ethernet cable from FPGA to Host machine (Really old-fashioned way :slight_smile: )

If you wanted to roll out your software to customers, could you tell them to excusively used closed LANs that they have full control over? Ie no other nodes in the same segment, no siwtching or routing hardware in between, not even located in VPNs?"

Yes these are considerable issues…

“Another issue is that you will need to implement timeouts to prevent your app from freezing, and that will (positively. Trust me on that one) cause follow up issues you’ll need to deal with.”

Hmmm… I had no idea…So is this depend from the time the communication will exist e.g?

So, finally , probably TCP/IP is not the best solution … right?

Many thanks,
Best Regards,
Theo

Considering raw trough-put UDP is a bit faster and requires a bit less computing power and RAM. But it can be lossy i.e. there might be dropped packets (also on the host). This might cause problems if parts of the image data are lost. Also you have to implement an application protocol as mentioned because there is also no flow control and this has to be done if the FPGA processing is slower than the network.
For a tech demo I’d stick to TCP and start basically with the simple chained processing as described earlier.
Depending on (buffering) resources of the MCU and the FPGA and its design you could pipeline some actions. Even if the FPGA can only process 1 image at a time, you could receive the next image from LAN while transmitting the last processed one back to the host as you already mentioned if you have enough RAM on the MCU to buffer 2 images at the same time. Then indeed you could split receive from LAN, prepare RxDMA and TxDMA, transmit to FPGA, (optionally wait for TxDMA completion if needed,) wait for RxDMA completion, signal the LAN transmit task to send the processed image and immediately start over to receive the next image from the host.
Since TCP sockets are full-duplex 1 socket connection would be sufficient at the first place. You can send to and receive from the same socket at the same time, which simplifies the application.
I think it depends on the TCP stack implementation how this approach works out. TCP basically supports ACK packets (when receiving data) including send data. This would be much more efficient when doing full-duplex transfers, but I’m unsure if this is supported by FreeRTOS+TCP stack. However, you’ll easily see what’s going on in Wireshark.

Thanks very much h2s. Really helpful! Some last questions :

“Even if the FPGA can only process 1 image at a time, you could receive the next image from LAN while transmitting the last processed one back to the host as you already mentioned if you have enough RAM on the MCU to buffer 2 images at the same time. Then indeed you could split receive from LAN, prepare RxDMA and TxDMA, transmit to FPGA, (optionally wait for TxDMA completion if needed,) wait for RxDMA completion, signal the LAN transmit task to send the processed image and immediately start over to receive the next image from the host.”
My initial thought was (in case that the FPGA processing would not be the bottle neck and would be faster) to start-over to receive next image after TxDMA completion (and not after RxDMA completion) to “steal” more time but perhaps this could not happen because it does not guarantee correct flow control right? I thought that in this way the stream to the FPGA IP would be in some way “continuous” without the delay of LAN processing so taking advantage of all the “throughtput” of the IP in the total fps requirements.

“Even if the FPGA can only process 1 image at a time, you could receive the next image from LAN while transmitting the last processed one back to the host as you already mentioned if you have enough RAM on the MCU to buffer 2 images at the same time. Then indeed you could split receive from LAN, prepare RxDMA and TxDMA, transmit to FPGA, (optionally wait for TxDMA completion if needed,) wait for RxDMA completion, signal the LAN transmit task to send the processed image and immediately start over to receive the next image from the host.”

Yes, but i started the project with lwip… And i have read that lwip cannot have recv and
transmit in different tasks (and if these functions are in the same process then it could not be full-duplex because of serial right? ) . However, i read i the forum that FreeRTOS+TCP fullfils the full-duplex requirement with 1 socket. Perhaps should i change my design to FreeRTOS+TCP?

Sincerely,
Theo

Seems I’ve forgotten my own previous proposal :laughing:
Well, you’re right if you have enough RAM to buffer 2 images at the same time and along with that your FPGA design is capable to have 2 images in flight. The 1st MCU buffer is dedicated to receive an image from host right after having the previous one completely transmitted to the FPGA (if possible) and the 2nd one to RxDMA the processed image, which is then sent back to the host after completion. I also think that this would better if supported by the FPGA.
I guess this can be done more sophisticated by using scatter-gather Tx/RxDMA transferring junks (smaller rectangles ?) of image data for more continuous receiving/sending resp. processing of image data. This might obviously also reduce the required total buffer sizes. It’s then closer to what real GPUs are doing with graphics processing.
Edit: I can’t imagine that lwIP is not able to receive and send data from 2 tasks. I think that might be misunderstanding or wrong information. However, there is a lwIP documentation you’ve to browse through anyway. Good luck :+1:

1 Like

Thanks very very much!
I hope i will not bother you again :slight_smile:

Best Regards,
Theo