Where should I update or clean the watchdog with freeRTOs?

DominusDRR · October 26, 2021, 1:39pm

Hi.

It is the first time that I’m using freeRTOs for a project.

In all the projects that we commercialize, we activate the watchdog and always restart it in the main super loop of the code as follows:

int main ( void )
{
    /* Initialize all modules, including application(s). */
    SYS_Initialize ( NULL );

   
    while ( true )
    {
        /* Maintain state machines of all polled modules. */
        SYS_Tasks ( );
        WDTCONbits.WDTCLRKEY =  0x5743; // <-------- reset WDT
    }

    /* Execution should not come here during normal operation */

    return ( EXIT_FAILURE );
}

With freeRTOs, I don’t know where there is a super loop or something similar.

I’ve tried looking for the ‘while’ of the Scheduler, but can’t find it.

An option, not so good, would be to put it in the loop of one of the tasks:

/* Handle for the APPTCPIPSERVER_Tasks. */
TaskHandle_t xAPPTCPIPSERVER_Tasks;

void _APPTCPIPSERVER_Tasks(  void *pvParameters  )
{   
    while(1)
    {
        WDTCONbits.WDTCLRKEY =  0x5743; // <-------- reset WDT
        APPTCPIPSERVER_Tasks();
    }
}

Any comment or suggestion is welcome.

RAc · October 26, 2021, 2:03pm

Impossible to say in general. It strongly depends on your system architecture - if you can guarantee a time limit on all CPU bound computations in your system, it’s much easier than if any computation has a high variance in computational spikes.

A few rules, but none of that is cast in stone:

If you retrigger the watch dog in an ISR, you won’t catch infinite CPU bound loops within tasks which account or a lot of problems that are supposed to be addressed by the WD in the first place
Try to place the WD retrigger in a task that has a pri above all tasks that can legitimatly hold the CPU for indeterminable amounts of time such as the network task (Network loads are by definition indeterministic)
You may want to combine your WD with software watchdogs. Ie there could be one task that’ll reset the unit unless retriggered by your worker tasks (provided again that your worker tasks only execute determinstic computations in the regular case); in order to prevent that task from being subject to starvation itself, it’s also a good place to itself disarm the MCU WD.

That’s only a very very few rough sketches for good reasons. We are talking about system architecture here.

jjulich · October 26, 2021, 5:17pm

The perfect solution is going to be dependent upon your overall architecture but a common solution is to put the watchdog update in the idle task. This can be easily done by adding a function. vApplicationIdleHook() to your application. The idle hook will run when all the higher priority tasks are blocked or sleeping. Sometimes the idle dog strategy is not appropriate but many applications delay, or block waiting for events often enough that the idle task is able to keep the dog fed.

For more information about the idle hook read this page FreeRTOS - RTOS hook (callback) functions for task stack overflows, tick interrupts, idle task, daemon task startup, and malloc failure (pvPortMalloc() returning NULL)

rtel · October 26, 2021, 6:46pm

All the FreeRTOS demos in the main download are self monitoring whereby a high priority task runs periodically to check all the other tasks are still running (they keep counters) and not in an error state (they can stop incrementing their counters if they have an error) - stop kicking the watchdog if that self monitoring task finds a problem.

Xavier · November 1, 2021, 7:37am

Never ever feed the dog inside a periodic task or interrupt. Ever!
Before you feed the dog be sure the system is in a valid state.
The task that feeds the dog should run in the lowest possible priority, so you might want to use the idle hook for this endeavor.

Take a look to this post, that although it’s written in Spanish, the source code is mostly in English. There I address the problem you’re facing.

richard-damon · November 1, 2021, 11:55am

I wouldn’t say NEVER feed the watchdog in a periodic task. Some watchdogs are actually very fussy about how often you need to kick them, and thus a periodic timer might be needed. The key is this operation, wherever it is, needs to really check that everything that needs to be running is actually running.

When you have a watch dog, all essential services need to leave a flag somewhere as proof that they are still working properly. Either setting a ‘running’ flag that the dog clears, or a ‘last ran’ timestamp that the dog makes sure is current ‘enough’ to satisfy system requirements.

Note, with this sort of system, that actual hardware watchdog might only get tripped for a total system failure that blocks this monitor software dog, and the software dog can report a lot of system error conditions which has the advantage that you can get a good report of what is actually wrong, as opposed to just knowing that something killed the watchdog system.

RAc · November 1, 2021, 12:11pm

Agreed, and this doesn’t only apply to window watchdogs. I was wondering what Xavier meant by “periodic tasks.” In my understanding, >90% of all FreeRTOS tasks are periodic in that they run as infinite loops. And since WDs are required to be disarmed periodically AND ISRs are out of the question, not a whole lot would be left if “periodic tasks” are outlawed as well.

DominusDRR · November 1, 2021, 2:15pm

Hi

Thanks for all the comments.

I used the Idle task to restart the WDT.

/* The Idle task.
 * ----------------------------------------------------------
 *
 * The portTASK_FUNCTION() macro is used to allow port/compiler specific
 * language extensions.  The equivalent prototype for this function is:
 *
 * void prvIdleTask( void *pvParameters );
 *
 */
static portTASK_FUNCTION( prvIdleTask, pvParameters )
{
	/* Stop warnings. */
	( void ) pvParameters;

	/** THIS IS THE RTOS IDLE TASK - WHICH IS CREATED AUTOMATICALLY WHEN THE
	SCHEDULER IS STARTED. **/

	/* In case a task that has a secure context deletes itself, in which case
	the idle task is responsible for deleting the task's secure context, if
	any. */
	portALLOCATE_SECURE_CONTEXT( configMINIMAL_SECURE_STACK_SIZE );

	for( ;; )
	{
		
        WDTCONbits.WDTCLRKEY =  0x5743; // reset WDT
        /* See if any tasks have deleted themselves - if so then the idle task
		is responsible for freeing the deleted task's TCB and stack. */
		prvCheckTasksWaitingTermination();

		#if ( configUSE_PREEMPTION == 0 )
         ....

        }
        ....
}

RAc · November 1, 2021, 2:22pm

Hi Fabian,

again, this is perfectly fine provided that your system’s timing guarantees that your idle task gets cycles to disarm the WD reliable frequently enough; otherwise you’ll get false positives and thus unjustified resets.

aggarg · November 1, 2021, 6:12pm

You could use Idle task hook to avoid changing FreeRTOS code - FreeRTOS - RTOS hook (callback) functions for task stack overflows, tick interrupts, idle task, daemon task startup, and malloc failure (pvPortMalloc() returning NULL)

Thanks.

Xavier · November 4, 2021, 4:52am

“With constant period”, like when using vTaskDelay() or vTaskDelayUntil().

The Idle task is also periodic, but its period isn’t constant, and that’s a good thing for WD. Besides we can take advantage of its low priority, better off with windowed WDs. If some task keeps the CPU by itself for a long time, then the Idle task isn’t executed, so the WD isn’t feed.

Xavier · November 4, 2021, 4:59am

Here there is an implementation that takes into account when a critical task either has missed a deadline or has lost its way.

RAc · November 4, 2021, 10:32am

Sorry Xavier, all of this is complete nonsense. May I suggest to re(?)-read the previous contributions in this thread to understand that disarming a watch dog in the idle task is at most useful in sandbox applications like blinky. Apparently you also misinterpret the concept of a window watchdog.

Doing what you suggest can only work in a real life scenario if you set the WD period to a very high interval that is guaranteed to not generate false positives in all possible valid scenarios (which in turn implies that scenarios in which the WD SHOULD fire will be delayed by this very long interval) or you don’t care about false positives and thus unjustified asynchronous resets. In systems that need to prove themselves in the field, neither of the two are generally acceptable.

Again, the best place and time to determine where a WD should be disarmed is highly dependent on the system in question. The best strategy would be to study the system behavior using Tracealyzer or a compatible tool.

Xavier · November 4, 2021, 3:46pm

I don’t know you, but I respect you, honestly. But I guess you don’t get it. You can’t use any periodic task or interrupt (both with constant period) to feed the dog. And of course you must set a deadline to full cover the average scenario, and that’s application dependent.

Not all tasks in the system are critical. And all critical tasks must show a healthy state before feeding the dog.

When using the idle task to feed the dog, then your overall system must assure that this task will be executed before the deadline expires. Of course that’s a good thing. If some critical task takes longer than expected (a critical real time task has missed its deadline) or gets hung, then a reset is fired. That’s what we want!

Please see this post (look for the paragraphs that begin with “Build a watchdog that monitors…” and “Any system that uses an ISR…”). When Ganssle mentions “build a watchdog task…” think about the idle task.

RAc · November 4, 2021, 5:09pm

well I won’t comment on youir assessment, but I’ve been architecting, implementing, testing, debugging and supporting RTOS based systems for 20+ years now, and I spent many many hours tracing watchdog resets.

To me it looks as if you don’t understand the concept, thus there isn’t a point in continuing this debate.