Anyone seen this? (suspend-resume)

How does this happen?
It’s FreeRTOS V10.2.1 for STM32F4 series MCU.

The task is suspended.
Then vTaskResume is called for that task
It happens many times when the system goes to standby and wakes up.

After about 10 standby-wakeup cycles, the task stops responding and:

state = eTaskGetState((TaskHandle_t)myTask);
state	eTaskState	eReady


task_handle	TaskHandle_t	0x20009198 <ucHeap+29948>	
	pxTopOfStack	volatile StackType_t *	0x20009114 <ucHeap+29816>	
	xStateListItem	ListItem_t	{...}	
		xItemValue	TickType_t	0	
		pxNext	struct xLIST_ITEM *	0x20008d34 <ucHeap+28824>	
		pxPrevious	struct xLIST_ITEM *	0x20007ccc <ucHeap+24624>	
		pvOwner	void *	0x20009198 <ucHeap+29948>	
		pvContainer	struct xLIST *	0x2000f9a0 <xSuspendedTaskList>	

Looks like the scheduler doesn’t run that task any more.

I don’t know why you need suspend/resume a task when doing power saving.
However, do you suspend/resume from a control task to ensure it’s properly synchronized ?

After boot, all tasks are created.
Then, a small set of tasks is resumed to find out which of the two modes it’s getting into.
Then the tasks are suspended and the appropriate tasks are resumed.

That’s handled by a common “task template”.
And yes. There is a task controlling task that suspends and resumes tasks according to tables that list the tasks for the modes.

There’s a counting semaphore, and tasks that are created, signal the semaphore. When all signals are got, the control task gets to continue. The created tasks then suspend themselves.
After that, the controller task starts suspend and resume tasks as needed.

When a task is resumed, it enters the main loop and waits for a notification.

After resume, the controller gives a task a notification.

After the tasks are started, the notifications are sent by a timer or other tasks.
Could that affect? (I think, the code looks like it shouldn’t.)

I guess that there is a subtle race condition somewhere and I think you overuse suspend/resuming tasks. This is not the right tool for task synchronization because the mechanism is asynchronous. Externally suspending a task might stop the task right in the middle of a sequence which shouldn’t be stopped (e.g. a HW access), resuming a ready/running task (silently) does nothing, etc.
Why do you think you need it when also using semaphores and task notification signaling ?
I’d get rid of suspend/resume and stick to a synchronization/signaling mechanism which is designed for this purpose like task notifications.
This will lead to more robust and reliable applications.

1 Like

Like I tried to describe, suspend/resume is used for enabling/disabling tasks.
The synchronization uses notifications.

Another example:

			cnt = 0;
			task_current_state = eTaskGetState((TaskHandle_t)(task->impl));
			while (task_current_state != eSuspended)
				task_current_state = eTaskGetState((TaskHandle_t)(task->impl));
				if (cnt > 10)
					break; /* breakpoint here */
			cnt = 0;

cnt	uint32_t	11	
task_current_state	eTaskState	eReady	

task_handle	TaskHandle_t	0x20003050 <ucHeap+5044>	
	pxTopOfStack	volatile StackType_t *	0x20002fbc <ucHeap+4896>	
	xStateListItem	ListItem_t	{...}	
		xItemValue	TickType_t	0	
		pxNext	struct xLIST_ITEM *	0x2000f9a8 <xSuspendedTaskList+8>	
		pxPrevious	struct xLIST_ITEM *	0x20004f8c <ucHeap+13040>	
		pvOwner	void *	0x20003050 <ucHeap+5044>	
		pvContainer	struct xLIST *	0x2000f9a0 <xSuspendedTaskList>	
	xEventListItem	ListItem_t	{...}	
		xItemValue	TickType_t	1	
		pxNext	struct xLIST_ITEM *	0x0	
		pxPrevious	struct xLIST_ITEM *	0x0	
		pvOwner	void *	0x20003050 <ucHeap+5044>	
		pvContainer	struct xLIST *	0x0	
	uxPriority	UBaseType_t	6	
	pxStack	StackType_t *	0x20002048 <ucHeap+940>	
	pcTaskName	char [16]	0x20003084 <ucHeap+5096>	
	uxBasePriority	UBaseType_t	6	
	uxMutexesHeld	UBaseType_t	0	
	ulNotifiedValue	volatile uint32_t	0	
	ucNotifyState	volatile uint8_t	0 '\0'	
	ucStaticallyAllocated	uint8_t	0 '\0'

The code stopped at the breakpoint.
If the xStateListItem belongs to xSuspendedTaskList, why does eTaskGetState return eReady?

‘Disable/enable’ tasks using suspend/resume is almost always an inappropriate way.
Better use a real task signaling/event mechanism to control tasks (acting as state machines).
Also the task state polling code seems broken or unreliable at least.

This is in fact a busy loop maybe not giving up the CPU (depending on priorities) and may or may not succeed. The cnt guard is also pretty fragile, isn’t it ? How do you know that the magic limit of 10 polls is (always) right ?
IMHO this approach will cause problems now and later on…

I wonder what would be appropriate way of enabling/disabling tasks?
That is, the disabled tasks should not react to any notifications and such.
Dynamical task creation/deletion is not an option here.
(And a bad policy in most smaller embedded systems.)

And 10 polls should be enough. I think I mentioned that (even if it doesn’t show in the code) that the loop is preceded with vTaskSuspend. The loop just waits IF it takes effect, eventually .
The code snippet is from the control task code.

But I got that state polling fixed (the debug code). I’m still wondering what turns the task suspended. The control task doesn’t do that. And it seems to be suspended, because the xEventListItem is empty and

	uxMutexesHeld	UBaseType_t	0	
	ulNotifiedValue	volatile uint32_t	0	
	ucNotifyState	volatile uint8_t	0 '\0'	

So, it’s not blocked.

Tasks usually wait for some events to react on. If you want a tasks to do nothing i.e. sleep/block just don’t signal events/notify it :wink:

1 Like

Like I mentioned before, there are different tasks running in different mode.
Tasks inappropriate to the mode, must not run, but you can switch the mode.

The whole concept of Task A doing something to Task B so that it ignores the requests from Task C just triggers my spicy sense.

If conditions are such that Task B shouldn’t act on Task C request, Task C should know about it and not ask, or at least B get the request and reply with a NO.

If your in a mode where Task B should step back and not be acting, it should know about that and be acting to achieve that goal.

Real time systems need to be cooperative, in the sense that you establish a set of ground rules that everyone needs to follow.

Trying to enforce them in that sort of manner almost always ends up creating races and deadlocks that cause your systems problems like you are seeing.

I think I found the problem. A terribly long delay between starting a task and getting the first notification. The task signals a semaphore and waits for notification. The control task waits for semaphore from the task and then gives it a notification.

The “template” is not made by me. I just have to try to live with it.

It looks like the OS has a lot to do between notification and scheduling the task.

The system has 3 sets of tasks - some of them shared with each other.
A minimal set enough to probe the environment to see which application mode is requested.
When that is clear, the first set is disabled, the right task list is fetched and the tasks on it are started. The task lists are - basically - configuration files.
The same SW is used with several product variants.

You may want to run Tracealyzer or a compatible tool to determine what causes the delay. It is very likely NOT the OS itself. FreeRTOS never spends a significant amount of cycles for anything OS related, it really can’t. Linux. for example, is notorious for taking potentially several MINUTES to start up into an operational state, mostly due to the fact that Linux relies on a mounted file system to do anything meaningful. FreeRTOS doesn’t nor rely on anything to become operational, so anything causing a significant delay after the scheduler has started working can not be blamed on FreeRTOS.

I found the problem. The framework just trusted that no scheduling takes place while it’s starting a task. Well scheduling did take place, and and the control task got to resume the task before that task got to suspend itself. So the resume was lost.

One of the BIG issues with suspend/resume that using task notifications can get rid of. Rather than suspending yourself, you wait to be notified. If you got notified first, it still gets remembered.

The thing is, the not-used tasks (the tasks left suspended) should not respond to anything, except resume from the control task.

The framework is in use in other projects as well, and you can’t just go and change things as you like. Filed a bug report, though.

As mentioned this can be solved in many other and more robust ways e.g. entering an idle state by those tasks when signaled to do so where they just wait for being signaled to move out to an active state later on.
Even if a rework is too late for your current project it’s worth keeping this in mind next time.

Yes, I’d do it differently. BTW, we are not using FreeRTOS queues for messages.
Also, the boot time is somewhat critical, in the sense that some tasks need to be ready to act upon input quite early after start.

Anyway, the problem really was false assumptions when “task initializations” were ready. The same thing could have happened with other ways as well: sending something to a task that wasn’t yet ready to receive it.

Not exactly - queues have a state/memory (analog to semaphores, notifications) A message sent will reach it’s destination as opposed to a suspend/resume which is a fire and forget mechanism.

Unless you have absurdly short power on to operation times, task and semaphore creation is quite quick (faster if using static memory), so it normally make sense to do all of that before starting the scheduler. The biggest problem with stuff you create after starting the scheduler is the need to make sure you have handled all the possible races, and that often costs you MORE time then just doing it first.

To me, the only reason to put off creating something is that I don’t know some required parameter until the scheduler is started (often, if I need the thing at all).

I had a similar situation on a project that I was using a port of Kelvin Lawson’s “AtomThreads” for on an STM32. I created a new synchronization object I call a “gate” (“switch” might be better). A gate has 2 states, open and closed. When a thread calls the “wait” interface requesting the state that is not current, it is blocked until the state changes. When the gate state changes, all blocked threads are released. This requires that all controlled threads call the gate wait function every cycle (or even mid-cycle). The controlling task controls the gates used by various groups of threads. It is not intended as a resource management method, but is great for handling mode-specific thread groups. A similar thing could be created for FreeRTOS.
In my case, it was used for power-sequencing, configuration, and operating mode changes.