Will the following problems occur in task scheduling? Please confirm.
After the TASK A obtained the Semaphore, it is interrupted by the interrupt execution while executing to release the Semaphore.
But after the interrupt is executed, it jumps to TASK B for execution.
At this time, Task B also needs to obtain the Semaphore.
In this way, Task A and Task B both obtain the same Semaphore resource, forming a deadlock.
Please confirm whether the following result is or not correct.
The project side uses Idle Task hook to execute WDT Clear.
If the above deadlock problem occurs, will it trigger WDT Reset?
The Dummy test found that when TASK A obtains Semaphore but does not release it, and then TASK B obtains Semaphore again, a deadlock will occur.
At this time, it seems that the application tasks are blocked, but the idle task hook is not affected.
Therefore, the expected WDT Reset did not occur.
No, this should not happen. Are you facing a problem? Semaphores are usually meant for inter task synchronization - if the same task is releasing it, you likely want to use mutex.
It is technically not deadlock but starvation - Task B is starved by Task A never releasing the semaphore. This would not impact IDLE task and IDLE task will continue to run if there is no other runnable task.
Yes, in order to prevent the same resource from being used by different tasks at the same time, Semaphore is used as a mutex.
Now there is a bug, and the cause seems to be determined to be caused by this problem.
Is there any logic control in the FreeRTOS kernel that will prevent the execution of tasks from switching? Please confirm.
In this case, both task A and task B cannot obtain Semaphore and are blocked.
Can this be considered a deadlock?
I would like to ask, how to prevent this problem from happening or how to recover after it happens? Please confirm.
The original idea was to restore it through WDT Reset, but in this case the Idle task is not affected, and then WDT does not work.
Tasks stuck waiting for a mutex don’t use any CPU time, so won’t block the idle task, and thus not affect your watchdog. Just always kicking the watchdog in the idle task does not verify that your program isn’t stuck, it just says that no task is just using up all the CPU time. You generally need ever “important” task to periodically do something visible to the watchdog kicker, that it can make sure everyone is making progress.
As to your “Deadlock” question, it doesn’t sound like what you are describing is what is normally considered to be a deadlock, just some bug that made somebody not release a resource. The normal thing called “Deadlock”, is where task A get resource A, and then tries to get resource B, but can’t because task B has that resource, but it can’t finish to release it because now it needs resource A, but A won’t free that until it gets resource B.
My preferred solution to your problem, is not to use an infinite block, especially if you are holding a resource, but to block with a definite timeout, and if you reach that timeout you need to “abort” your operation and give back your resources.
Another rule that eliminates the problem, is to figure out an order for your resources, and you must never go backwards in that order in requesting, so if one task does A before B, no task can do B then A, but if it might need A, it takes it before taking B.
Note that a mutex is the right tool to protect resources. I’d propose to use it instead of a counting semaphore, which is or should be used to count and signal events.
Yes, we are facing this problem, as shown in the trace result below.
Task A (task priority 1) is acquiring resources using a semaphore
SysTick_Handler interrupt. Preemption is performed
As a result, it is dispatched to prvTimerTask (task priority 3),
It transitions to the next highest priority B task (task priority 2),
Task B waits to acquire a semaphore.
From there, one result is that the OS keeps calling SemaphoreTake to try to obtain resources, which causes TaskA to be unable to execute and an exception occurs,
even prvTimerTask stops working (it is unclear why FreeRTOS does not transition tasks).
But another result is that the OS switches back to TaskA to release Semapore resources, and then switches back to TaskB for execution, so no exception occurs.
So I want to ask why there are two different results, and under what circumstances will the problem 6) occur?
※Since it involves project information, the function calls in the project are hidden.
B should only occur it Task B does a take with zero timeout and then just keeps trying to take the semaphore.
It might also occur if you have done something to corrupt the scheduler state, and it is trying to block TaskB but getting stuck. Reasoning about corruption is generally hard to do, if that is the problem you want to find the source of it, which can be hard, Do you have configASSERT defined to trap the system on an detected error?
This does not make sense. As @richard-damon pointed out, task B is suspended, so neither task B nor “the OS on behalf of task B” repeatedly calls xSemaphoreTake. What makes you think that this is the sequence of events (it certainly is not)?
Is Task B waiting for the semaphore with a timeout, and you are ignoring the return value?
Is this the official FreeRTOS distribution or some custom version?
I don’t think so, because the current timeout timer is set to portMAX_DELAY, so I don’t think timeout will occur.
But as mentioned in 6), I want to confirm under what conditions will the prvTimerTask not work?
If the prvTimerTask does not work, what will be the problem?
What do you mean by “prvTimerTask not work”? Timer task only runs when it needs to process an expired software timer. Do you have software timers in your application that are not firing?
Again, instead of describing your application code in text, can you please share the code of tasks that you consider problematic.
I understand that TaskB will not be called.
Someone has made this sequence, but I have not been able to get the same sequence, so I want to confirm it.
No, as mentioned above, Semapore timeout is set to portMAX_DELAY, so it is not expected to occur.
The kernel part is taken from the official website, the version is 10.6.1, but the porting part is provided by the chip manufacturer.
This means that prvTimerTask will not be executed after the problem sequence occurs.
Sorry, due to the project information involved, the source code cannot be provided, so I can only describe the process of the problem occurring according to 1) to 7).
As I said, timer task only runs when it has something to do i.e. process expired timers or process application submitted commands. Are you using software timers in your application?
You can still provide a code snippet without providing any confidential info. Something like the following:
Yes, there are software timers in the application.
Before the problem occurred, the timers were working with a period of 1ms, after the problem occurred, they stopped working.
The program logic is as follows, please refer to it.
Prerequisites: