Task notification stops working

fxrb · September 9, 2024, 10:07am

After a random amount of time one of my tasks does not seem to change to Ready state (according to this diagram) when notified with either xTaskNotify() or xTaskNotifyFromISR(). Here’s the simplified task code:

void worker_task(void * params) {
	uint32_t ev;

	while (true) {
		xTaskNotifyWait(ULONG_MAX, 0, &ev, portMAX_DELAY);
		if (ev & DO_JOB_1) {
			vTaskSuspend(NULL);
			...
			vTaskResume(worker_task_handle);
			...
			vTaskSuspend(NULL);
			...
			vTaskResume(worker_task_handle);
		}
		if (ev & DO_JOB_2) {
			...
		}
		// uncomment the next line and things work like charme
//		xTaskNotifyStateClear(h_worker_task);
	}
}

And these are the notification calls I’m using at different places to trigger the task:
xTaskNotify(worker_task_handle, DO_JOB_1, eSetBits);
xTaskNotifyFromISR(worker_task_handle, DO_JOB_2, eSetBits, NULL);

Using xTaskNotifyStateClear() before calling xTaskNotifyWait() seems to ‘solve’ the problem (see comment in code above). But I don’t feel good with that kind of hack as I seem to be missing something important.

The documentation for xTaskNotifyWait() states the following:

If the receiving RTOS task was already Blocked waiting for a notification when the notification it is waiting for arrives the receiving RTOS task will be removed from the Blocked state and the notification cleared.

As the calls to either xTaskNotify() or xTaskNotifyFromISR() are none deterministic, the above will happen from time to time. Does removing the task from Blocked state mean putting it into Ready state then?
Is the use of xTaskNotifyStateClear() as shown above correct (and why ?) or does it just hide some other (yet unknown) bug of my code?

hs2 · September 9, 2024, 1:01pm

Where and how is the suspended task resumed and I also wonder how a suspended task can resume itself ?
Why do you use task suspend/resume at all since your task properly waits for incoming events/notifications ?
As mentioned many times task suspend/resume is special and prone to race conditions.
It might not be the right tool for the things you want to do…

fxrb · September 9, 2024, 1:22pm

The task is resumed by another subsystem (I2C in this case). The task can’t resume itself, that was bit too much of simplification (NULLs). I will fix the code.

I just tried to simplify the code. Suspend and resume are not called directly by the worker_task but by some code executed by the worker_task. In my case this is a subsystem accessing a temperature sensor on the I2C bus. The worker_task for instance calls a function ReadPCBTemperature() which in turn makes calls to suspend and resume.

hs2 · September 9, 2024, 1:34pm

And why not also using task notifications from the I2C handler (or handler task ?) to wake up this task ?
You know, resuming a non-suspended task does nothing.
So when your current I2C handler resumes the task but the task wasn’t suspended yet… race condition.
It’s simply not the right tool for task synchronization.

richard-damon · September 9, 2024, 2:12pm

Totally agree with Hartmut here. Suspend/Resume synchronization has an inherent race condition that just makes it unreliable without increadible care, and when the direct-to-task notifications were added to FreeRTOS, the need to use suspend/resume basically went away except for legacy applications.

danielglasser · September 9, 2024, 9:28pm

To put what others have said a bit differently: There is no need to suspend the task if it’s already using a call to xTaskNotifyWait(). The task will block until a notification arrives (if the notification has already arrived when the call is made, no suspension takes place.

In most cases, getting removed from the Blocked state inserts the task on the runnable list in priority order.