I am implementing a watchdog and it, among other things, is checking that the kernel is still functioning as normal. And “normal” here is defined as the tick value is not the same as it was the last time that the check was performed.
As the watchdog functionality is done in the highest priority of the system, it can’t call xTaskGetTickCountFromISR() from the function. What I have done instead is to save a copy of it from the tick interrupt.
static TickType_t last_tick = 0u;
void vApplicationTickHook(void) {
// Overwrite last_tick with a fresh value
last_tick = xTaskGetTickCountFromISR();
}
This works, but seems a bit overkill. What would actually be the harm of allowing read of the tick value from any IRQ level if the read of xTickCount inside xTaskGetTickCountFromISR() can be done atomically?
Hi Martin, calling xTaskGetTickCountFromISR() from an interrupt priority that is “too high” for API calls isn’t recommended/supported (as you know). But if the FreeRTOS tick type on your platform is an atomic type, then it is probably safe and probably will remain safe.
But you won’t need to do that at all if you tweak your strategy a little bit. The high priority interrupt is important to your design so the watchdog can reset the system “no matter what”. But the feeding/tickling the watchdog can be done from a lower-priority interrupt (or even task). The tickle procedure could simply be to copy the system tick count to a shared variable. The high priority watchdog interrupt could compare the value in this shared variable to its own private copy of the previous value it saw in the shared variable.
This strategy can/should be extended to all of your tasks and/or important features to make a fairly robust software-based watchdog.
In this use case I would not even worry about atomicity. Since your high pri int only reads from the timer tick and never writes to it, I do not see a potential for data corruption, at worst potential for an incorrect read which can be dealt with.
The design of the high prio interrupt is not to reset the system but rather to actually hinder the activated HW watchdog from doing it. All tasks, except the Idle task, make promises on how much time that can pass until they need to make a new promise. If they dont full fill that, then this would be detected in the high priority ISR and it would stop to feed the HW-watchdog.
The problem I have with xTaskGetTickCountFromISR is the checks that makes it impossible to call it without triggering asserts, which absolutely makes sense in many usecases.
One of the features I want apart from the actual reset is to be able to do a post mortem on what has caused the watchdog reset. If I were to leave it to the tasks to update a variable from xTaskGetTickCount then I would not be able to determine if the tasks had hanged or if it was the systick/kernel.
I was just about to write my own version that allowed access when I noticed that xTaskGetTickCount() has enter/exit critical checks that depend on atomicity. This means that I can use xTaskGetTickCount from my high priority interrupt, but I can’t use xTaskGetTickCountFromISR
I do not understand the reasoning here, I am afraid. Again, atomicity to me is not a relevant factor, so I do not see why you could not use xTaskGetTickCountFromISR() to obtain the tick count.
Other than that, your architecture looks sound to me, fairly common and well proven way to combine live counters and the watch dogs.
I could possibly do that, but then i would need to introduce a extra ISR to run a that level.
As weird as it looks, the call to xTaskGetTickCount() works for my purposes.
I have other ISRs at higher than configMAX_SYSCALL_INTERRUPT_PRIORITY prio and I also want to make sure that if they cause a hang/starvation, that I can record it and do a post mortem.
What if your timer isr is starved, resulting in your system timer not incrementing? Will your supervisor isr notice that condition or simply compute the deltas between scheduled event expiration and current time stamp? If the latter, your safety guard would never kick in if the sys timer is starved.
Would be easy to fix this, of course (simply keep a local sw wd in your supervisor that is disarmed once a delta is found in the sys tick).
Not sure I follow you so I will describe what I do in more detail. I start a HW based watchdog (IWDG on a stm32G4) and use a HW timer (TIM16) to generate periodic interrupts. These are generated at the highest priority in the system. Inside this ISR, I read the system tick count and compare it with a statically stored value from the last call of the ISR.
If my Timer ISR is starved, that would mean that there is either a lockup somewhere within the Timer ISR (it either never returned from the last call, I forgot to clear the IRQ-bit or it is spinning in a loop before doing the evaluation of the tics on the next iteration.) or some other reason that is unknown to me, as there should not be a higher prio context, then I will not be able to record the information and a IWDG reset will occur.
That looks sound to me. I was just speculating that your “event monitor” might be implemented such that only the deadlines of scheduled events would be monitored by means of the system tick, in which case a starved system tick would remain unnoticed.
The only remaining remark here is that in terms of safety, an MCU internal watch dog might not be considered acceptable, but you are proably not concerned about highest safety certification levels.
No, for this system I dont need the highest safety certifications. And while I know that external watchdogs are normally considered to be safer, to me, it does not always feel like they are. If the entire system logic is located within the same MCU, having that the protected by the MCUs own watchdog seems safer to me. It is of course a different thing if the logic is more spread out.