Does Task switching mask hardware interrupts

joehinkle wrote on Sunday, August 19, 2018:

I have two cpus communicating with each other over a uart with a baud of 1,500,00

The process is the master programming the slave and expects a 5 byte reply from the slave after every sector of flash is programmed.

I recently added a task that runs at a higher priority than the programming task and started to experience random uart message receive failures – not all 5 bytes were being received.

I used a logic probe to monitor uart traffic and I toggled an I/O pin when inside the uart’s interrupt routine.

I have the uart’s hardware interrupt priority set at 4 (ARM cpu) - the hightest hardware priority in the application.

The failure is that the uart’s IRQ routine is not fired on acquisition of a byte (random) – hence my message byte count is not reached – and I declare an error.

To me – the only way the uart’s interrupt should not fire is if a higher priority interrupt was active for the complete duration of receiving 1 byte at 1500000 bits/sec or if hardware interrupts were disabled.

The issue goes away if I suspend the higher priority task which beckens my question if hardware interrupts are used when FreeRtos is performing a scheduled task switch?

Thanks in advance for any insights.

Joe

richarddamon wrote on Sunday, August 19, 2018:

Tasks should not block interrupts unless they use critical sections. There will be some very short critical sections when switching tasks inside of FreeRTOS, but at 1,500,000 Baud, you would need a critical section of at least 6 us to lose data (assuming the UART is a normal single buffered uart, no fifo but with an output register seperate from the data shift register). I wouldn’t think that the FreeRTOS critical sections would be that long if the machine was properly chosen for this task.

joehinkle wrote on Sunday, August 19, 2018:

I looked into task.c and there are multiple critical sections.

I did not take the time today, but will insert my I/O pin probe into them to find out which one is causing the issue.

My suspending the task during cpu programming solved the immediate concern but now I’m concerned about normal run time. I’m running a tic of 5 msec and my identified issue could corrupt a uart message under normal use.

The cpu I’m using is a Kinetis K64 (Arm M4) with a clock of 120mhz.

What’s the difference between your critical section implementation and (“cpsid i”)?

richarddamon wrote on Sunday, August 19, 2018:

FreeRTOS Critical sections on the Arm do not just disable interrupts, but raise the interrupt mask so very high priority interrrupts (those that don’t interract with FreeRTOS) can still occur. I doubt the critical sections in task.c could cause the problem, as they are all very short (it is one of the design requirements in FreeRTOS).

The question is if YOUR task creates critical sections that last for a long time.

Also, I am presuming that your ISR actually takes the data out of the uart, and doesn’t just try to pass off that action to a task, and the task processes from some sort of buffer the ISR fills.

joehinkle wrote on Sunday, August 19, 2018:

I don’t use a critical section as I never have the possibility of the ISR and task attempting to modify the same data at the same time.

The uart messages have a length embedded in them which the ISR uses to set a flag telling the task the message is available - so no critical section required.

richarddamon wrote on Sunday, August 19, 2018:

I am fairly sure that ordinary task switching isn’t going to cause the issue you are seeing, as the FreeRTOS kernal won’t need to disable the interrutp for anywhere need the length of time needed to cause your problem.

Thinking a bit about it, one long shot possibility is that you have some low power enabling code that is causing this sort of issue (I don’t think the default Tickless Idle could cause that big of a block out), but then I would expect that turning on another task should help not hurt.

From your answers, I am still not sure that the higher priority task (the one that suspending fixes the problem) has been carefully inspected for use of critical sections. If it uses a critical section that lasts for 6 usec, that could cause the issue. (or could it disable the serial interrupt or enable an interrupt that take a long time to run)

Another thought, when your serial port routine flags that a message is ready, does it disable the interrupts, and then when the message is processed they are re-enabled? That would put a window when you could miss an interrupt, especially if the higher priority task blocks you from processing the block qucikly.

joehinkle wrote on Sunday, August 19, 2018:

Thanks for your suggestions Richard but I do not use SDKs or thitd party software for the exact reasons you are suggesting – unknown code causing difficult to find issues.

I use FreeRtos, it’s Sister FreeRtos+IP, and wolf SSL. All outher drivers and code are mine so I know them inside and out.

The few critical sections in my code have already been tested and are not causing the issue.

I have not tested the TCP stack yet. One of the three tasked I suspended to resolve the issue runs SSL on top of TCP. I have searched for uses of (“cpsid i”) and found none. I need to identify the instructions that could be used as a critical section and search for them - when found, instrument them to see if they are the culprit.

I have found FreeRtos’s implementation of a Critical Section – what other ways on an ARM M4 can be used to suspend interrupts for a while.

Thanks for your thoughts on this matter.

richarddamon wrote on Monday, August 20, 2018:

The only ways I know of to disable an interrupt are:

In the device itself, change the interrupt enable bits
In the VIC, change the interrupt enable bits
Set the current interrupt priority level (like the FreeRTOS critical section)
Be in an interrrupt that is higher level or which doesn’t enable nesting
Globally disable interrupts

If the serial port supports it, you might try having it test for overrun and if it detects it just hang, and check with a debugger where you are comming from. If it is code that is enabling interrupts then you have the culprit.