NXP iMXRT1052 intermittent UART Rx Overrun

aggarg · February 10, 2021, 5:12am

NXP advises me that Rx Overrun is caused by receiving a new character while the UART hardware buffer has NOT been read yet.

The above was the hint for me that the software is probably not reading UART fast enough.

When FreeRTOS_IPInit is called, the last two tasks are created and run and the Rx overruns occur.
When FreeRTOS_IPInit is NOT called, Rx overruns does NOT happen.

This indicated that either IP task or Ethernet task was hindering with the task reading UART. That is why I asked for task priorities.

So I think the reason is that both IP and Ethernet tasks had higher priorities than task2 which caused the task2 to starve.

Thanks.

hs2 · February 10, 2021, 7:24am

Although the ethernet interface might be very important for your application it don’t necessarily require the highest priorities (IRQ and maybe task).
Especially the driver IRQ priority is mainly determined by the HW interface and the associated driver.
So in the case that the UART interface requires that every received byte MUST be fetched from the RX latch before the next one is shifted in (i.e. you can’t use DMA doing that for you) it has a hard deadline. The ethernet interface in turn is DMA based and is much more flexible regarding driver timing due to RX/TX DMA descriptor rings.
All in all it can be reasonable to give the UART IRQ the highest prio to ensure its deadline is met. The driver internal UART RX data buffer relaxes the timing constraints of the UART task depending of the configured buffer size.
In addition a badly designed (ethernet) driver especially coming with lengthy ISR code is always a problem regarding (task) responsiveness. Even if it’s IRQ prio is not the highest and nesting interrupts are possible and enabled, which allows other driver ISRs with higher prio IRQs to run, it preempts/stops task execution similar to a critical section and hence should be kept as short as possible.

jamesk · February 10, 2021, 6:25pm

Thanks @aggarg & @hs2 .
I agree with you both.
However, I honestly have NOT found true evidence that UART ISR is being starved. What startles me the most is that the issue is resolved only when I changed the priority of my UART task (from 6 to 9 which is same as the FreeRTOS IP task)
1)
Keeping it at priority 6, I also tried changing the UART ISR priority from 5 to 3 but the problem persists.
(IP-Task at 9, Ethernet-ISR at 4, UART-Task at 6, UART-ISR at 3)

With this priority settings, it might be reasonable to then see RxRingBufferOverrun (as opposed to RxOverrun) since UART-Task may be getting starved not reading its Rx ring buffer fast enough. Even then I think that’s only reasonable if I had not big enough Rx ring buffer.
But I am seeing RxOverrun when there is barely any ethernet activities.
How can one say that this is a priority issue? Yes, I know but puzzlingly, changing the priority of UART-Task resolves the issue. Is there a way to prove this convincingly?

My doubt that the issue may be some place is backed up my logic analyzer captures.

As you can see, UART ISR(Channel D6) would trigger upon every UART byte received(Channel D0).
When the issue occurs, UART (Rx) interrupts to fail to trigger even though it appears that there is plenty of time that the interrupt is NOT disabled by FreeRTOS critical section or other high priority interrupts.

How can one think that this is because UART-Task was starved because it did not have high enough priority. If that was the case, it would have resulted in RxRingBufferOverrun because UART ISR should overwrite the ring buffer if the ring buffer was not read fast enough by UART-Task. Obviously that is only if UART ISR did trigger to read from its buffers and copy them over to the ring buffer.
But what I see is that the UART ISR is somehow starved by something but it does not look like it’s because of interrupt being disabled (as the capture shows).
What’s really puzzling is how is this happening only FreeRTOS IP stack is enabled although practically there is only minimum ethernet activity.(ie. there is absolutely no application specific network traffic).
BTW, I also checked the Ethernet ISR on the logic analyzer and there isn’t much activity either.

aggarg · February 10, 2021, 7:55pm

My suspicion is that your UART Receive function disables UART interrupt and when the UART task is starved, the UART interrupt is disabled for too long resulting in Rx hardware buffer overrun.

I will need to look at your code to confirm that but from what I could find on GitHub - LPUART_RTOS_Receive calls LPUART_TransferReceiveNonBlocking which disables UART RX interrupt by calling LPUART_DisableInterrupts(base, kLPUART_RxDataRegFullInterruptEnable) and later enables it by calling LPUART_EnableInterrupts(base, kLPUART_RxDataRegFullInterruptEnable);. If a high priority task (IP or Ethernet) preempts the UART task (application task2 in your case) after disabling UART Rx interrupt and before enabling it, it can potentially disable UART Rx interrupt for too long thereby resulting in hardware buffer overrun.

Thanks.

hs2 · February 10, 2021, 8:20pm

OMG … if this UART driver is indeed implemented as Gaurav described, then this would explain the very odd behavior and to be honest, it’d be a very poor design.
Get rid of it as you did with the seemingly also poorly written ethernet driver and your life will be better
A reasonable UART driver isn’t that hard to implement especially if it’s without DMA.
Fetch the data byte from the Rx latch in the ISR and push it (preferably) into a stream buffer large enough and leave the IRQ enabled. Give the IRQ a high prio to ensure meeting the deadline given by the chosen baud rate and post-process the UART stream by just waiting for stream buffer data in the UART task.
Maybe find a good student to implement and test the driver and you’ll be fine

NoMaY-jp · February 10, 2021, 10:49pm

Hello James,

This reply is based on my experience of Renesas RX and RL78. (i.e. I don’t know i.MX but the reason might be the same…)

Q) Does your ‘RingBuffer’ implementation at task side has the following steps to avoid simultaneous operation between task side and ISR side?

(1) Mask the UART interrupt by setting ‘UART interrupt mask register’ = ‘1’
(2) Do some operations (such as getting data bytes from RingBuffer)
(3) Unmask the UART interrupt by setting ‘UART interrupt mask register’ = ‘0’

(*) The reason why ‘UART interrupt mask register’ is used is to avoid a global interrupt disabled period. I think that this method is widely used in case of NON-RTOS based application.

But the following steps of (0) and (4) are necessary in case of RTOS based application.

(0) Enter critical section
(1) Mask the UART interrupt by setting ‘UART interrupt mask register’ = ‘1’
(2) Do some operations (such as getting data bytes from RingBuffer)
(3) Unmask the UART interrupt by setting ‘UART interrupt mask register’ = ‘0’
(4) Exit critical section

If implementation lacks the steps of (0) and (4), the task doing this steps may be switched to other task during step (2) and then the UART interrupt may be masked very long time.

I wonder that this might be the reason of your problem.

Best regards,
NoMaY

jamesk · February 10, 2021, 11:49pm

Wow I am speechless.
I guess for two reasons.

Two thumbs up to @aggarg for his excellent debugging skill.
Shame on NXP for providing this kind of driver to their customers.

The UART driver is from NXK SDK 2.6. This is really really crazy to write a driver to have its ISR behavior affected by application task priority.
When I reached out to them, they acted very innocent obviously saying that there is no known issue.
I guess this is NOT a bug as long as you keep your UART task(s) to be highest priority task.
I think this is insane. I never thought that NXP would write their driver in such manner.

Thanks @aggarg and @hs2 for your help.