FreeRTOS ISR routine Handling , Callback Function

TheokonT · March 23, 2021, 2:50pm

Hello,
I’m quite new to FREErtos. I am working with FREErtos via Xilinx SDK in a Zynq Ultrascale+ device (freertos10_xilinx). I am trying to make DMA communication. I have already instantiated correctly my own Interrupt Handler but my handler didn’t needed a callback routine function. From Xilinx’s examples , DMA Tx and Rx Interrupt Handlers have “a Callback pointer to the TX channel of the DMA engine” .
As far as I know it should be avoided running code in ISR .Thus , we call Semaphores ,Task Notifications etc… From that I understand that I should run the Callback function inside the Task called form ISR right?
In my previous custom Interrupt I successfully used vTaskNotifiGiveFromISR and everything was fine. But know, if I understand correctly I need to “pass” somehow the *Callback pointer to the Task that is responsible for the interrupt. How should I do that? Do I need to use a Queue to pass that pointer? Is it possible with TaskNotify API?

Sorry if my question is silly,
Thanks in advance,
Theo

rtel · March 23, 2021, 3:19pm

You could use xTaskNotifyFromISR() with the eAction parameter set to eSetValueWithOverwrite and the ulValue parameter set to the pointer value. That will unblock the task and send it the pointer value in one go. There is a risk another interrupt occurs before the value has been read though, so the task that receives the notification should loop until it has processed all interrupts (it should not assume only one is pending).

Is the pointer value 32-bits or 64-bits? ulValue is only 32-bits so that might need to change to a size_t rather than a uint32_t.

TheokonT · March 24, 2021, 8:22am

Hello @rtel ,
Thanks for the fast response. This is a CallbackRef pointer and it is void. Probably returning a pointer to a struct as I understand. Would it be a problem?

Also , “so the task that receives the notification should loop until it has processed all interrupts (it should not assume only one is pending).” How should I do that? Perhaps should I use a count semaphore to “keep” all the interrupts came ?

And one last question , should I disable the interrupt every time or nor? I am a bit confused about this link : FreeRTOS - 64-bit demo on UltraScale MPSoC Cortex-A53 core
It says that the source of interrupt should be cleared. Does this mean that I should do vPortDisableInterrupt(INTERRUPT ID) ?

Sincerely,
Theo

rtel · March 25, 2021, 12:19am

Yes you could use a counting semaphore.

By process all the interrupts I mean sit in a loop checking the interrupt status of each source by querying the hardware, then not blocking until the hardware indicated that all pending interrupt had been processed.

rtel · March 25, 2021, 12:20am

Missed the last point. I think the reference to clearing the interrupt is within the entry point for the IRQ, before the individual handler is called, and is relevant if you want to use nested interrupts - that is - allow other interrupts to be accepted while you are still processing an pre-existing interrupt.

TheokonT · March 29, 2021, 7:51am

Thanks very much for helping @rtel !
I think I will try counting semaphore. But regardless of this I have already tried without Tasks and it seems works… These ISRs are the DMA ISRs and I think are very fast.
Would be possible to run only the ISRs without deferred interrupt processing or this could cause problems?
I found a way to measure the time consumed in ISR ( with xtime_l.h functions ) and found that the time is 2~4us. Do you think this will be a problem? My Tick rate is 100Hz as default.
Do you thing that better way is just the counting semaphore?
Below I have the code of ISR , Some DMA needed stuff and just a counter:

void RxIntrHandler(void *Callback)
{
    XAxiDma_BdRing *RxRingPtr = (XAxiDma_BdRing *) Callback;
	u32 IrqStatus;
	int TimeOut;

	/* Read pending interrupts */
	IrqStatus = XAxiDma_BdRingGetIrq(RxRingPtr);

	/* Acknowledge pending interrupts */
	XAxiDma_BdRingAckIrq(RxRingPtr, IrqStatus);

	/*
	 * If no interrupt is asserted, we do not do anything
	 */
	if (!(IrqStatus & XAXIDMA_IRQ_ALL_MASK)) {
		return;
	}

	/*
	 * If error interrupt is asserted, raise error flag, reset the
	 * hardware to recover from the error, and return with no further
	 * processing.
	 */
	if ((IrqStatus & XAXIDMA_IRQ_ERROR_MASK)) {

		XAxiDma_BdRingDumpRegs(RxRingPtr);

		Error = 1;

		TimeOut = RESET_TIMEOUT_COUNTER;

		while (TimeOut) {
			if(XAxiDma_ResetIsDone(&AxiDma)) {
				return;
			}

			TimeOut -= 1;
		}

	}

	/*
	 * If completion interrupt is asserted, handle the processed BDs and then raise the 
            according flag.
	 */
	if ((IrqStatus & (XAXIDMA_IRQ_DELAY_MASK | XAXIDMA_IRQ_IOC_MASK))) {
		int BdCount;
		XAxiDma_Bd *BdPtr;

		/* Get finished BDs from hardware */
		BdCount = XAxiDma_BdRingFromHw(RxRingPtr, XAXIDMA_ALL_BDS, &BdPtr);
		RxDone += BdCount;
	}
}

hs2 · March 29, 2021, 12:53pm

Question is, what do you need to do with the DMA buffers ?
Descriptor ring maintenance is not much work in itself and can be done in the ISR given you have a way to refresh/recycle the DMA buffers there.
A dedicated pool of pre-allocated fixed size buffers can be used for fast buffer management e.g. using a consumed buffer queue signaled to a task and a free buffer queue signaled back to the ISR.
If you don’t have a way to manage DMA buffers in ISR you need forward the DMA events to a task where you can use e.g. a (thread-safe) heap.
In this case the post-processing latency (done in the signaled task) is handled resp. mitigated by tuning the length/size of the descriptor rings(s) accordingly.

RAc · March 29, 2021, 1:14pm

I agree with Hartmut. There must be some (generally low pri) logic to process the DMA buffers. That may include copying the buffer from DMA to application buffers in the ISR. You may want to study ST ethernet drivers that also use DMA ring buffers.

Furthermore, also look into portYIELD_FROM_ISR(). If you do some kind of signalling mechanism but do not use portYIELD_FROM_ISR(), the task to process the signal may not be woken up until the sys tick handler terminates the current time slice.

TheokonT · March 29, 2021, 1:52pm

Hello @hs2 ,
Thanks for fast responding!
I want to just transfer the DMA data to lwip_send() function to send image frames (video perhaps in some fps) them via Ethernet in host. As far as I understand pool is assumed in case of multiple interrupts coming before finished to process right? I don’t know if my thought is correct ,but I think my process will be some kind “serial”. First lwip_recv taking data, after sending data to DMA (in the same task) , receiving data from DMA and at the end lwip_send (in the same task) to host ( I think my assumption is that this will be faster than the gap between the next frame receive). My thought was to put one TaskNotify per ISR (Tx,Rx) and when all processed BDs from HW (from Tx and Rx ) are done then the nested ulTaskNotifyTake (from Rx and Tx) will allow to lwip_send the data.
So perhaps pool consumed buffer logic is for more complicated stuff right? Also, There will be no other processing to the data , so no other delay.
Do you think this could work?

Many thanks for the help,
really appreciated,
Theo

RAc · March 29, 2021, 2:22pm

well, I’m not Hartmut, but from what I can decipher from your descrition, I do see some potentially serious problems. For example, it is not a good idea to send a video frame verbatim via Ethernet; you’ll need some kind of level 4+ protocol to wrap your video frames into frames that the receiver app can make sense of (for example, adding length info). If you don’t do that, you risk the network data stream hopelessly getting out of sync when DMA buffers get corrupted (for whatever reason) or contain inconsistent data. So you must make some kind of manipulation of your frames before passing them on. In-place manipulation in the DMA buffers does not sound like a good idea, so you’ll need to copy the data anyways before passing it on.

hs2 · March 29, 2021, 2:50pm

If you’re overall processing is serial things are much simpler to implement.
If I got it right you do some image-processing in the FPGA i.e. you receive a frame from a host via TCP, forward it to the FPGA using Tx-DMA, wait for processing complete signaled by Rx-DMA (signaled from the ISR to the task e.g. with a simple task notification) and send the processed image back to the host, right ?
In this case you don’t even need descriptor rings given that you can transfer an image with 1 DMA transfer. A single descriptor+buffer would be sufficient.
I also agree with RAc that you maybe need some frame format wrapped around the raw image data and/or an application protocol for robustness reasons to recover from corrupted transfers properly etc. Remember that TCP is just a stream of bytes at the 1st place.

TheokonT · March 29, 2021, 3:26pm

Sorry @RAc I thought I was replying to @hs2 .

“If I got it right you do some image-processing in the FPGA i.e. you receive a frame from a host via TCP, forward it to the FPGA using Tx-DMA, wait for processing complete signaled by Rx-DMA (signaled from the ISR to the task e.g. with a simple task notification) and send the processed image back to the host, right ?”

Yes, this is exactly what I want to do. You read my mind
“In this case you don’t even need descriptor rings given that you can transfer an image with 1 DMA transfer. A single descriptor+buffer would be sufficient”.

Currently I am just looping back from a Stream Buffer for testing purposes but later the real HW application will need different size packets than Image packet. So I thought I should separate the image frame into X number of BD’s needed by my HW to process one image and interrupt only when image process is finished.

" I also agree with RAc that you maybe need some frame format wrapped around the raw image data and/or an application protocol for robustness reasons to recover from corrupted transfers properly etc. Remember that TCP is just a stream of bytes at the 1st place."

I thought that lwip TCP/IP ensures this robustness… Could you please advice me where to search about this topic? I though that TCP/IP will ensure the stability of the mechanism because of the re-transmition and not losing packets right? What extra format should be used and for which purpose?

Thanks again both of you guys,
Really helped me,
Theo

hs2 · March 29, 2021, 3:43pm

Sure, you can make use of scatter/gather DMA’ing and use multiple descriptors transferring a number chunks of the complete data packet.
However, that doesn’t change the relatively simple/straight forward overall design if you’re able to receive a complete frame, fire and forget the TxDMA maybe using N descriptors (pointing to chunks of the the receive buffer), waiting for RxDMA complete notification and send the frame back to the host.
You’re right, TCP ensures reliable transfer of data, but just data bytes. When dealing with varying frame sizes it’s useful to transfer some header information like the size of the frame data etc. That’s more flexible.
A simple error recovery protocol might be that you just close the socket on any FPGA/HW error. Then the host knows that something went very wrong, discards the maybe partially received data, closes the socket and re-connects to restart image processing.

TheokonT · March 29, 2021, 7:05pm

That’s nice!
I think know things are more clear! Many thanks to all!Your help was very important!
(My first post. I think i closed the topic if i hadn’t please notify me)
Best Regards,
Theo

TheokonT · April 23, 2021, 10:23am

Hello @hs2 ,
After finished my application I configured out that the “serial” overhead of the software (lwip plus writting buffers to DMA for recv and transmit and back) is too much for my specs and depends from the Image Frame size (which perhaps would be variable). I would like to give “as much possible time” to the HW and the “software time” should be as much as less with purpose of evaluation of the Hardware specs. I was thinking of creating 2 sockets (I would like to send and receive parallel to minimize the software overhead). But my problem is if I should separate the lwip_recv - TxDma (and inverse ) in two tasks or keep them in the same task…
Is there any way to do “flow control” with a Queue i.e between the TxDMA and lwip_Recv task or should be serial in the same task?
Perhaps giving priority to DMA task just solves the problem? My fear is for the Queue to not overflow from the continuously recv_lwip frames…
I know that this is not a true FREERtos problem but any advice would be acceptable…

P.S : Should I create a new post or is it ok?

Thanks in Advance,
Theo

hs2 · April 23, 2021, 12:25pm

I know too little about your application, but usually a DMA signals a completion interrupt. So in e.g. the network/image receive task you could start the TxDMA and after that wait on a TxDMA completion notification or a binary semaphore, which is signaled by the TxDMA completion ISR. Then the task can proceed with receiving the next image and so on. This makes use of the TCP flow control.

TheokonT · April 23, 2021, 12:52pm

Hello @hs2 ,
Many thanks for fast responding.
You 're right sorry for not explaining my application.
I would like to send-receive (loopback) video (30fps ) images in a custom IP implemented in FPGA but for now I am trying in FIFO stream loopback to ensure stability of the upper mechanism , so no HW complexity.

“the network/image receive task you could start the TxDMA and after that wait on a TxDMA completion notification or a binary semaphore, which is signaled by the TxDMA completion ISR. Then the task can proceed with receiving the next image and so on”:

So that’s nice! As far as I understand it will be one Task which will be receiving → sending ->waiting for send completion and back again.

Concerning the other side (of DMA receiver) I am trying to understand your advice (in above post) about the “dedicated pool of pre-allocated buffers” .Perhaps DMA Receive ISR sends 2 frames before my network/image send task catch up to send the 1st frame , thats why i should use this right? In this situation how can I ensure that buffers do not overflow (Because of faster Rx DMA) ? Or should I implement this and “tune” the total buffers length depending on my test after implementing the whole design?

Many thanks again,
Best Regards,
Theo

richard-damon · April 23, 2021, 1:03pm

If you get data faster than you can send it, you will ALWAYS overflow, so you have to control the receive rate to be slower than what you can send. Buffering can allow for a short term transmission slow down that will catch back up, but can’t help if you just can’t send fast enough.

The amount of buffers you need is a function of how much the transmitter can fall behind and still be able to catch up reasonably.

TheokonT · April 23, 2021, 1:17pm

And the receive rate usually will be controlled from my network/image receive-TX DMA Task or from my Host PC before TCP/IP connection?

Thanks

richard-damon · April 23, 2021, 1:36pm

I will admit that I am not familiar with this particular driver, but normally, the Rx routine should have a way to tell the machine that is sending that “I didn’t get that packet, I was too busy”, to get the transmitter to back off and resend it in a little bit. With that, every thing will run at the average speed of the slowest link in the chain.

The one thing you need to make sure you do is since this is video, that if it is supposed to be ‘live’, that this rate is sufficient for passing it at the live rate, and you don’t want TOO much buffering to limited lag (if that matters)