About semaphore deadlock and idle task hook

MervynZong · December 18, 2024, 6:27am

Hello，

Will the following problems occur in task scheduling? Please confirm.

After the TASK A obtained the Semaphore, it is interrupted by the interrupt execution while executing to release the Semaphore.
But after the interrupt is executed, it jumps to TASK B for execution.
At this time, Task B also needs to obtain the Semaphore.
In this way, Task A and Task B both obtain the same Semaphore resource, forming a deadlock.

Please confirm whether the following result is or not correct.

The project side uses Idle Task hook to execute WDT Clear.
If the above deadlock problem occurs, will it trigger WDT Reset?
The Dummy test found that when TASK A obtains Semaphore but does not release it, and then TASK B obtains Semaphore again, a deadlock will occur.
At this time, it seems that the application tasks are blocked, but the idle task hook is not affected.
Therefore, the expected WDT Reset did not occur.

BR

aggarg · December 18, 2024, 8:02am

No, this should not happen. Are you facing a problem? Semaphores are usually meant for inter task synchronization - if the same task is releasing it, you likely want to use mutex.

It is technically not deadlock but starvation - Task B is starved by Task A never releasing the semaphore. This would not impact IDLE task and IDLE task will continue to run if there is no other runnable task.

MervynZong · December 18, 2024, 11:50pm

Thanks for your reply.

Yes, in order to prevent the same resource from being used by different tasks at the same time, Semaphore is used as a mutex.
Now there is a bug, and the cause seems to be determined to be caused by this problem.
Is there any logic control in the FreeRTOS kernel that will prevent the execution of tasks from switching? Please confirm.

In this case, both task A and task B cannot obtain Semaphore and are blocked.
Can this be considered a deadlock?
I would like to ask, how to prevent this problem from happening or how to recover after it happens? Please confirm.
The original idea was to restore it through WDT Reset, but in this case the Idle task is not affected, and then WDT does not work.

BR

richard-damon · December 19, 2024, 3:17am

Tasks stuck waiting for a mutex don’t use any CPU time, so won’t block the idle task, and thus not affect your watchdog. Just always kicking the watchdog in the idle task does not verify that your program isn’t stuck, it just says that no task is just using up all the CPU time. You generally need ever “important” task to periodically do something visible to the watchdog kicker, that it can make sure everyone is making progress.

As to your “Deadlock” question, it doesn’t sound like what you are describing is what is normally considered to be a deadlock, just some bug that made somebody not release a resource. The normal thing called “Deadlock”, is where task A get resource A, and then tries to get resource B, but can’t because task B has that resource, but it can’t finish to release it because now it needs resource A, but A won’t free that until it gets resource B.

My preferred solution to your problem, is not to use an infinite block, especially if you are holding a resource, but to block with a definite timeout, and if you reach that timeout you need to “abort” your operation and give back your resources.

Another rule that eliminates the problem, is to figure out an order for your resources, and you must never go backwards in that order in requesting, so if one task does A before B, no task can do B then A, but if it might need A, it takes it before taking B.

aggarg · December 19, 2024, 3:52pm

Assuming that you mean you are using xSemaphoreCreateMutex API, you are good.

Are you disabling interrupts in your application like calling taskENTER_CTRITICAL?

@richard-damon has already pointed the flaw in your Watchdog implementation and a potential solution.

Try to remove other logic from your tasks and increment a variable or blink an LED and see if those are getting scheduled.

MervynZong · December 20, 2024, 1:25am

No, the xSemaphoreCreateMutex API is not used.
The Semaphore being used is a counting semaphore used for resource management.

Disable interrupts are used in the application, such as calling "CPSID I ". I think disabling interrupts has little to do with task scheduling.

richard-damon · December 20, 2024, 3:42am

Disabling Interrupts does a LOT about task scheduling, as without the tick interrupt, time doesn’t pass.

hs2 · December 20, 2024, 9:34am

Note that a mutex is the right tool to protect resources. I’d propose to use it instead of a counting semaphore, which is or should be used to count and signal events.

MervynZong · March 13, 2025, 1:24am

Yes, we are facing this problem, as shown in the trace result below.

Task A (task priority 1) is acquiring resources using a semaphore
SysTick_Handler interrupt. Preemption is performed
As a result, it is dispatched to prvTimerTask (task priority 3),
It transitions to the next highest priority B task (task priority 2),
Task B waits to acquire a semaphore.
From there, one result is that the OS keeps calling SemaphoreTake to try to obtain resources, which causes TaskA to be unable to execute and an exception occurs,
even prvTimerTask stops working (it is unclear why FreeRTOS does not transition tasks).
But another result is that the OS switches back to TaskA to release Semapore resources, and then switches back to TaskB for execution, so no exception occurs.

So I want to ask why there are two different results, and under what circumstances will the problem 6) occur?
※Since it involves project information, the function calls in the project are hidden.

richard-damon · March 13, 2025, 4:14am

B should only occur it Task B does a take with zero timeout and then just keeps trying to take the semaphore.

It might also occur if you have done something to corrupt the scheduler state, and it is trying to block TaskB but getting stuck. Reasoning about corruption is generally hard to do, if that is the problem you want to find the source of it, which can be hard, Do you have configASSERT defined to trap the system on an detected error?

RAc · March 13, 2025, 6:54am

This does not make sense. As @richard-damon pointed out, task B is suspended, so neither task B nor “the OS on behalf of task B” repeatedly calls xSemaphoreTake. What makes you think that this is the sequence of events (it certainly is not)?

Is Task B waiting for the semaphore with a timeout, and you are ignoring the return value?

Is this the official FreeRTOS distribution or some custom version?

aggarg · March 13, 2025, 7:27am

Just to add to @richard-damon and @RAc, it would be really helpful if you can share your code snippets.

MervynZong · March 13, 2025, 7:31am

I don’t think so, because the current timeout timer is set to portMAX_DELAY, so I don’t think timeout will occur.
But as mentioned in 6), I want to confirm under what conditions will the prvTimerTask not work?
If the prvTimerTask does not work, what will be the problem?

aggarg · March 13, 2025, 7:39am

What do you mean by “prvTimerTask not work”? Timer task only runs when it needs to process an expired software timer. Do you have software timers in your application that are not firing?

Again, instead of describing your application code in text, can you please share the code of tasks that you consider problematic.

MervynZong · March 13, 2025, 7:41am

I understand that TaskB will not be called.
Someone has made this sequence, but I have not been able to get the same sequence, so I want to confirm it.

No, as mentioned above, Semapore timeout is set to portMAX_DELAY, so it is not expected to occur.

The kernel part is taken from the official website, the version is 10.6.1, but the porting part is provided by the chip manufacturer.

MervynZong · March 13, 2025, 7:51am

This means that prvTimerTask will not be executed after the problem sequence occurs.

Sorry, due to the project information involved, the source code cannot be provided, so I can only describe the process of the problem occurring according to 1) to 7).

aggarg · March 13, 2025, 11:19am

As I said, timer task only runs when it has something to do i.e. process expired timers or process application submitted commands. Are you using software timers in your application?

You can still provide a code snippet without providing any confidential info. Something like the following:

void Task1( void * param )
{
    for( ;; )
    {
        xSemaphoreTake( xSemA, portMAX_DELAY );
        {
            /* Do confidential stuff. */
        }
        xSemaphoreGive( xSemA );
    }
}

MervynZong · March 14, 2025, 12:08am

Yes, there are software timers in the application.
Before the problem occurred, the timers were working with a period of 1ms, after the problem occurred, they stopped working.

The program logic is as follows, please refer to it.
Prerequisites:

All interfaces are the same semaphore.
TaskA’s priority is 1, TaskB’s priority is 2.

void getTest1Data(void * param)
{
	ret = xSemaphoreTake( xSemA, portMAX_DELAY );
	if (ret = pdPass)
	{
		memcpy(param, DATA1, DataLength1);
		xSemaphoreGive( xSemA );
	}
}

void getTest2Data(void * param)
{
	ret = xSemaphoreTake( xSemA, portMAX_DELAY );
	if (ret = pdPass)
	{
		memcpy(param, DATA2, DataLength2);
		xSemaphoreGive( xSemA );
	}
}

void setTest1Data(void * param)
{
	ret = xSemaphoreTake( xSemA, portMAX_DELAY );
	if (ret = pdPass)
	{
		memcpy(DATA1, param, DataLength1);
		xSemaphoreGive( xSemA );
	}
}

void setTest2Data(void * param)
{
	ret = xSemaphoreTake( xSemA, portMAX_DELAY );
	if (ret = pdPass)
	{
		memcpy(DATA2, param, DataLength2);
		xSemaphoreGive( xSemA );
	}
}

void TaskA(void * param)
{
	int test1data, test2data;
	
	ret = xQueueReceive(stQueueHandle, (void *)&pkdata, portMAX_DELAY);
	if (ret == pdPass)
	{
		getTest1Data(&test1data);
		getTest2Data(&test2data);
	}
}

void TaskB(void * param)
{
	int testdata;
	
	ret =xEventGroupWaitBits(stEventHandle, 0xffff, pdTRUE, pdFALSE，1);
	setTest1Data(&testdata);
	setTest2Data(&testdata);
	
	switch(ret)
	{
		case XXXX:
		  break;
		default:
		  break;
	}
}

aggarg · March 16, 2025, 4:54pm

Are you using xSemaphoreCreateMutex to create xSemA?

MervynZong · March 17, 2025, 12:14am

No, xSemA is created by xSemaphoreCreateCountingStatic(1,1,&StaticSemaphore_tBuffer).