How can a task with lower priority to recover from starvation?

Hi,

I was writing an example code using xTaskNotify() and xTaskNotifyWait() functions, where the TaskP is the producer and the TaskC is the consumer. I’m using this mechanism to send a single 32 bit value from one task to another. TaskC has lower priority than TaskP.

Purposely I make the TaskP to fail, so TaskC might recover after a time-over (avoiding starvation). TaskP enters into an infinite loop without doing anything useful. However, as TaskP is more important, TaskC never reaches the timed-out code. taskYIELD() doesn’t work in this scenario.

The know solution is that TaskP might rise the TaskC’s priority before it fails (or lower its own), however, the point here is that if TaskP fails then TaskC must go on (we don’t know when and how TaskP is going to fail).

Is there a better way for a lower priority task waiting for higher priority task to recover from starvation other than playing with priorities? How can we use the time-out parameter (for the xTaskNotifyWait() function) if it only will work with tasks of the same priority?

Here is my test code. Thank you!

#include <FreeRTOS.h>
#include <task.h>

TaskHandle_t consumidor_h;
// consumer handler

// producer task:
void Productor_task( void* pvParameters )
{
    uint32_t cont = 1;

    uint8_t cont_to_fail = 10;

    TickType_t last_wake_time = xTaskGetTickCount();

    while( 1 )
    {
        vTaskDelayUntil( &last_wake_time, pdMS_TO_TICKS( 1000 ) );

        Serial.println( "P" );

        xTaskNotify( consumidor_h, cont, eSetValueWithOverwrite );

        ++cont;
        if( cont > 3 ) cont = 1;

        // WE MAKE THE PRODUCER TASK TO FAIL:
        if( --cont_to_fail == 0 ){ 

           while( 1 ) taskYIELD();
        }
    }
}

// consumer task:
void Consumidor_task( void* pvParameters )
{
    pinMode( 13, OUTPUT );

    uint32_t blinks;

    while( 1 )
    {
        if( xTaskNotifyWait( 0x00, 0x00, &blinks, pdMS_TO_TICKS( 2000 ) ) == pdPASS ){

           Serial.println( blinks );

           for( uint8_t i = 0; i < blinks; ++i ){

               digitalWrite( 13, HIGH );
               vTaskDelay( pdMS_TO_TICKS( 100 ) );
               digitalWrite( 13, LOW );
               vTaskDelay( pdMS_TO_TICKS( 100 ) );

           }
         } else{ // time-over:

            // THIS CODE IS NEVER REACHED

            digitalWrite( 13, HIGH );

            while( 1 ){
               Serial.println( "Error" );
               vTaskDelay( pdMS_TO_TICKS( 1000 ) );
            }
         }
    }
}

void setup()
{
    Serial.begin( 115200 );

    xTaskCreate( Productor_task, "PROD",  128, NULL, tskIDLE_PRIORITY + 1, NULL );

    xTaskCreate( Consumidor_task, "CONS", 128, NULL, tskIDLE_PRIORITY, &consumidor_h );

    vTaskStartScheduler();
}

void loop() 
{
}

Basically you have to avoid that a high(est) prio task fails in the way that it life-locks your system. That’s a fatal error or fail. Often a watchdog is used to restart a system if something like this happens.
Also the whole application might get useless if a tasks stops doing its job…

1 Like

taskYIELD() just runs the scheduler and the scheduler does not choose a lower priority task when there is a higher priority task in ready state.

In general, to prevent starvation you have 2 options:

  1. Design your system so that each task has the same priority.
  2. Make sure that every task (other than the ones with the lowest priority) blocks.

The first one is quite limiting in terms of system design, so going for the second one is more logical.

So, taskYIELD() is only useful for switching between equal priority tasks. It does not block the calling task. The calling task remains in ready state, and prevents the lower priority tasks from running.

In your case however, the solution is simple. Instead of

while( 1 ) taskYIELD();

just use

vTaskSuspend(NULL);

This places the calling task into blocked state, so the lower priority ones can run. You need #define INCLUDE_vTaskSuspend defined as 1 in your FreeRTOSConfig.h file.

1 Like

Basic fact, a lower priority task CAN’T force itself to get time when I higher priority task is using all the time (except by changing priorities) as that is fundamental to how task priorities are defined.

This is one application for defining a ‘watchdog’, as either a very high priority task or an interrupt that monitors task operations and detects system stalls.

Ultimately the best recovery action is to let the programmer know that some task is taking more time than it should so they can fix the bug that is causing that (or re-evaluate the whole system design).

2 Likes

I’ve simulated a failure through an infinite loop; however, I was thinking more on cosmic rays o power glitches that make the higher priority task to misbehave (so of course, we can’t call neither taskYIELD nor vTaskSuspend())

Setting priorities on the same level for all involved task solves the problem, but it limits the overall design.

I think there is a better solution, like resetting all system through a hardware watchdog, for a more robust system.

Anyway, you all have showed me the light, Thank you!

2 Likes

The sort of upsets you are describing basically mean you can’t trust ANYTHING the computer is doing, and the only real protection is a very simple and robust hardware watchdog that totally resets the system. With such a hardware watchdog, no software loop can block the watchdog, and in fact make it more likely that the dog will be tripped. Safety critical systems will even separate that hardware (even to being powered from alternate sources) to be even better to isolate it from such problems.

Software level watchdogs are meant to handle software bugs, perhaps triggered by much more mundane hardware errors that failed to be in the test plan.

Any such protection method ALWAYS needs to be built with a solid idea of what you are protecting from. For most hardware, adding code to try and catch misbehavior from cosmic rays is apt to DECREASE the reliability of the system due to the higher chance of program errors rather than the odds of such an actual error.