I am running FreeRTOS in AMP configuration on each of two RPU(Cortex-R5)s of a Zynq UltraScale+ MPSoC.
On rare occasions, the tick counter (xTickCounts) of one of the two RPUs stops incrementing.
What could be the cause?
FreeRTOS version number is 10.0.0 provided by AMD Xilinx.
I have already confirmed the followings.
The tick counter stop incrementing but tasks are running and context switch works properly.
The “FreeRTOS_Tick_Handler” function itself is not broken because no write instruction is executed in the area where the interrupt handler “FreeRTOS_Tick_Handler” is stored.
XScuGic_ConfigTable.HandlerTable where the function pointer to “FreeRTOS_Tick_Handler” is stored is not broken.
I suspect your tick interrupt is not firing. If this is the case then you’ll still see a task running and event-based context switches will still work (example - unblocking a task through giving a semaphore, posting to a queue, etc). Time-based context switches (example: a task completing a delay) will not work.
As richard-damon has suggested you can check if the tick increment function is being called by a breakpoint. I’d also check your CPU to see what the interrupt configuration is when this starts occurring.
Thank you for your reply, richard-damon.
For debugging, I temporarily changed the interrupt source of tick from Triple Timer Counter to FPGA, prepared an interrupt count register in FPGA, and checked it.
The FPGA interrupt counter register was incremented, but xTickCounts was not. Therefore, it seems that “FreeRTOS_Tick_Handler” was not called even though the tick interrupt itself was received.
I have a limited number of JTAG debuggers, and this phenomenon occurs infrequently, and sometimes it is not reproduced even if I prepare 20 operating environments and run them continuously for 3 days…
What can be considered to disable the interrupt for the tick?
The CPU interrupt settings have not been changed since they were set via the driver at FreeRTOS startup. I have also confirmed that interrupts other than tick are operating normally.
As I told richard-damon, “FreeRTOS_Tick_Handler” seemed to be not called.
You are correct, and I first noticed this phenomenon when a task did not return from vTaskDelay.
The CPU interrupt configuration should not have been changed after FreeRTOS was started, but what specific configurations should be checked after the phenomenon occurs?
Thank you for your reply, aggarg.
As you guessed, the tick interrupt signal is coming from the FPGA to the CPU, but the tick interrupt handler is not being called.
Other interrupt handlers seem to be called fine, so as richard-damon says, it’s possible that only the tick interrupt is being disabled by something.
Is it possible to stop the system in this state and then use the debugger to examine if the interrupts are disabled? This would confirm the hypothesis and then you’d be able to focus on finding where the interrupt is getting disabled.
There is a limited number of J-TAG debuggers, and I do not know whether this phenomenon can be reproduced in an operating environment with a debugger connected, so for the time being I will try to incorporate debug code to check the state of the interrupt mask register when the phenomenon occurs.
I am not familiar with the R5 processor, but often the “Interrupt Controller” isn’t actually part of the “CPU” but a module attached to it, and that has a number of registers that control each of the interrupts going into it. Unless you read back those registers to confirm that no “wild write” hasn’t change it, you don’t know that it hasn’t changed.
“Wild write” means “unintentional write”, doesn’t it?
I’ve got it. As you has suggested, I will confirm the registers that control interrupts when the phenomenon occurs.
A “wild write” is a write through a pointer that points to the wrong address, or to an array with a bad index. Its harder to have a write that you didn’t intend to occur, but not hard to end up writing where you didn’t intend if pointers/indexes get messed up.
Referring to the AMD Xilinx Zynq UltraScale+ MPSoC TRM, Register Map, and ARM’s interrupt controller Manual, I added debug code to check whether the settings of the following registers, which seem to be related to the phenomenon, were different before and after the phenomenon occurred, and then performed a continuous operation test.
Interrupt Set-Enable Registers
Interrupt Priority Registers
Interrupt Priority Mask Registers
Interrupt Configuration Registers
Interrupt Set-Pending Registers
As a result, although the phenomenon itself occurred, there was no change in the register settings before and after the phenomenon.
Since I changed the source of the tick interrupt back to TTC (Triple Timer Counter), I also checked the TTC registers, but found no particular problems.
If there is anything else I should check, please let me know.
That is strange as the ISR is not executing even when the interrupt is not masked. As it seems specific to the hardware, can you also reach out to Xilinx and ask them?
Thank you for your reply, aggarg.
I checked the AMD Xilinx support page to see if a similar problem had been reported, but I couldn’t find anything. I’m going to ask about this issue on the support page by myself.