High prio ISR access tick count

MartinB · August 12, 2024, 12:49pm

Hi

I am implementing a watchdog and it, among other things, is checking that the kernel is still functioning as normal. And “normal” here is defined as the tick value is not the same as it was the last time that the check was performed.
As the watchdog functionality is done in the highest priority of the system, it can’t call xTaskGetTickCountFromISR() from the function. What I have done instead is to save a copy of it from the tick interrupt.

static TickType_t last_tick = 0u;
void vApplicationTickHook(void) {
	// Overwrite last_tick with a fresh value
	last_tick = xTaskGetTickCountFromISR();
}

This works, but seems a bit overkill. What would actually be the harm of allowing read of the tick value from any IRQ level if the read of xTickCount inside xTaskGetTickCountFromISR() can be done atomically?

Best regards
Martin

jefftenney · August 12, 2024, 5:07pm

Hi Martin, calling xTaskGetTickCountFromISR() from an interrupt priority that is “too high” for API calls isn’t recommended/supported (as you know). But if the FreeRTOS tick type on your platform is an atomic type, then it is probably safe and probably will remain safe.

But you won’t need to do that at all if you tweak your strategy a little bit. The high priority interrupt is important to your design so the watchdog can reset the system “no matter what”. But the feeding/tickling the watchdog can be done from a lower-priority interrupt (or even task). The tickle procedure could simply be to copy the system tick count to a shared variable. The high priority watchdog interrupt could compare the value in this shared variable to its own private copy of the previous value it saw in the shared variable.

This strategy can/should be extended to all of your tasks and/or important features to make a fairly robust software-based watchdog.

RAc · August 12, 2024, 5:20pm

In this use case I would not even worry about atomicity. Since your high pri int only reads from the timer tick and never writes to it, I do not see a potential for data corruption, at worst potential for an incorrect read which can be dealt with.

MartinB · August 13, 2024, 9:56am

Hi Jeff and RAc and thanks for your responses.

The design of the high prio interrupt is not to reset the system but rather to actually hinder the activated HW watchdog from doing it. All tasks, except the Idle task, make promises on how much time that can pass until they need to make a new promise. If they dont full fill that, then this would be detected in the high priority ISR and it would stop to feed the HW-watchdog.

The problem I have with xTaskGetTickCountFromISR is the checks that makes it impossible to call it without triggering asserts, which absolutely makes sense in many usecases.

One of the features I want apart from the actual reset is to be able to do a post mortem on what has caused the watchdog reset. If I were to leave it to the tasks to update a variable from xTaskGetTickCount then I would not be able to determine if the tasks had hanged or if it was the systick/kernel.

I was just about to write my own version that allowed access when I noticed that xTaskGetTickCount() has enter/exit critical checks that depend on atomicity. This means that I can use xTaskGetTickCount from my high priority interrupt, but I can’t use xTaskGetTickCountFromISR

Regards
Martin

RAc · August 13, 2024, 10:15am

I do not understand the reasoning here, I am afraid. Again, atomicity to me is not a relevant factor, so I do not see why you could not use xTaskGetTickCountFromISR() to obtain the tick count.

Other than that, your architecture looks sound to me, fairly common and well proven way to combine live counters and the watch dogs.

aggarg · August 13, 2024, 10:29am

I guess this assert is getting triggered.

@MartinB, Is it not possible to run the interrupt that calls xTaskGetTickCountFromISR() at configMAX_SYSCALL_INTERRUPT_PRIORITY?

RAc · August 13, 2024, 10:32am

Oh I see, sorry I missed that, thanks for pointing it out.

My solution in this case would probably be to add another “read as is with all risks accepted” access function for the tick count variable.

MartinB · August 13, 2024, 10:37am

Hi aggarg

I could possibly do that, but then i would need to introduce a extra ISR to run a that level.
As weird as it looks, the call to xTaskGetTickCount() works for my purposes.

regards,
Martin

aggarg · August 13, 2024, 5:45pm

Just for understanding, why can you not run the ISR that pats the watchdog at that priority?

MartinB · August 13, 2024, 8:48pm

I have other ISRs at higher than configMAX_SYSCALL_INTERRUPT_PRIORITY prio and I also want to make sure that if they cause a hang/starvation, that I can record it and do a post mortem.

Regards
Martin

RAc · August 14, 2024, 5:28am

What if your timer isr is starved, resulting in your system timer not incrementing? Will your supervisor isr notice that condition or simply compute the deltas between scheduled event expiration and current time stamp? If the latter, your safety guard would never kick in if the sys timer is starved.

Would be easy to fix this, of course (simply keep a local sw wd in your supervisor that is disarmed once a delta is found in the sys tick).

MartinB · August 14, 2024, 8:39am

Hi RAc

Not sure I follow you so I will describe what I do in more detail. I start a HW based watchdog (IWDG on a stm32G4) and use a HW timer (TIM16) to generate periodic interrupts. These are generated at the highest priority in the system. Inside this ISR, I read the system tick count and compare it with a statically stored value from the last call of the ISR.

If my Timer ISR is starved, that would mean that there is either a lockup somewhere within the Timer ISR (it either never returned from the last call, I forgot to clear the IRQ-bit or it is spinning in a loop before doing the evaluation of the tics on the next iteration.) or some other reason that is unknown to me, as there should not be a higher prio context, then I will not be able to record the information and a IWDG reset will occur.

Regards,
Martin

RAc · August 14, 2024, 8:53am

That looks sound to me. I was just speculating that your “event monitor” might be implemented such that only the deadlines of scheduled events would be monitored by means of the system tick, in which case a starved system tick would remain unnoticed.

The only remaining remark here is that in terms of safety, an MCU internal watch dog might not be considered acceptable, but you are proably not concerned about highest safety certification levels.

MartinB · August 14, 2024, 9:45am

No, for this system I dont need the highest safety certifications. And while I know that external watchdogs are normally considered to be safer, to me, it does not always feel like they are. If the entire system logic is located within the same MCU, having that the protected by the MCUs own watchdog seems safer to me. It is of course a different thing if the logic is more spread out.

But we are getting so off topic by now.

Regards
Martin