How to gracefully stop multiple tasks if an error occurs in any of them?

Hi there,

I have the following multi-task setup and wonder how to gracefully stop my tasks if an error occurs in any one of them. In particular, the setup looks as follows:

  1. The user presses a button that starts multiple worker tasks
  2. Ideally, the worker tasks do their job without errors
  3. The user presses another button that stops each worker task (and waits for them to acknowledge the stop)

Unfortunately, things can go wrong on the worker tasks, especially on those ones that communicate via Wi-Fi. Taking Wi-Fi disconnects as an example, I want to handle those in a simple way:

If a Wi-Fi disconnect (or any other error) occurs on any worker task, I want all worker tasks to stop gracefully, indicate the problem to the user (e.g. blink an LED), and return to the state where all worker tasks are stopped and the user can start them via the start button again.

The direct approach I have in mind to solve this is to implement code in each worker task that stops the other worker tasks in case of an error. But then, each worker task would have to know the other tasks. Also, I could imagine that problems would arise if multiple worker tasks crashed at the same time.

Alternatively, I was thinking of the worker tasks to notify the “main” task that started them and to let that task handle the stop of all worker tasks. But currently, the starting task is the one that handles button events, so I cannot block and wait for error notifications on that one. Especially not because I wouldn’t recognize the stop button press anymore.

So maybe, I have to run a separate main task that waits for the start button to start the worker tasks and wait for worker errors and the stop button to stop them. But before refactoring my code, I want to ask you:

Is this the way to go? Or is there some design pattern to handle my problem? Also, are there recommended design patterns for such multi-task problems, specifically for embedded development?

Thanks a lot in advance

Personally, I rarely kill a task. My application starts up and all tasks remain active (although mostly sleeping) until the power goes off, or until a reboot.

When a WiFi task is failing, it can report that it failed and you can have it respond to a “retry” message.

You can have your main task do multiple things. If you create a loop-for-ever, and have it sleep on a call to xTaskNotifyWait(). Mentioned function will tell which bits have been set.

You can program the GPIO of the button to trigger an ISR, which in turn can call xTaskNotifyFromISR(). That will wake-up your main task so it can respond.

Other tasks can also “ring the bell” by calling xTaskNotify().

Beside task-notification, you could have a look at queues.
They can be very handy to turn a task into a kind of “server”.

I mention two examples taken from FreeRTOS+TCP:

Example of xTaskNotifyWait(), which is a network interface.

An example of xQueueReceive() which is a TCP/IP server, handling messages.

1 Like

Thanks a lot for the suggestions!

I should have noted, that I don’t delete tasks, either. With “starting” and “stopping” my tasks, I do indeed mean “put them into running state” and “put them into waiting state”.

So, you would also go with a central “main” task that waits for both, a stop button signal and “task crashed” signals. And to differentiate between “stop” and “error”, you would use the notification value. I will try to implement that.

I was also thinking about using an event group or a task sync, but I’m glad to go with notifications if they are sufficient.

Also, thanks for the examples.

Hi @Tobias

So, you would also go with a central “main” task

Sure, a main task that knows everything. The other tasks are only helpers each one doing their thing.

That is my personal preference.

You are free to assign a meaning to the bits in a task notification.

Task notification is the cheapest and most efficient way of notifying and in many cases it can replace event-groups, semaphores or queues. You can read more about it here.

FreeRTOS+TCP uses event groups: the IP-task can communicate to the tasks using event groups.

Every socket has an event group. In this way, the IP-task can notify the user task that owns a socket by triggering an event, eg. “socket is connected”, “data has been sent”, or “data has been received”. Several bits can be “on” at the same time.

But there are many ways to Rome.

My opinion is you can’t give one task enough knowledge about what another task is doing to make it really safe to actively halt it.

The best way to handle a task getting “stuck” is make sure it can’t happen. Never use “block forever”, but always check the return value of your operations, and handle errors.

Where possible, potentially long operations should have check-points to see if the operation should be “aborted” and that part of the system returned to a stable state.

Sometimes, if things are going very badly, the only thing that make sense is to totally restart, especially if something that “can’t happen” has occurred, as that means your assumptions don’t hold, and it is VERY hard to figure out what to do. If you do restart, try to have a way to log what happened so it can be reported.