SMP kernel spinlock deadlock (ABBA deadlock)

eroy.yang · January 7, 2025, 3:04am

I use FreeRTOS SMP version on my platform, which has 2 CA7 cores.
I meet a deadlock situation.

The spin_lock() api is implemented by load-exclusive instruction similar to Linux.
And the portGET_TASK_LOCK() is implemented by recursive spin lock.

Core0:
vTaskSuspendAll() which restore interrupt, and get task lock only.
…
Interrupt occur, it use spin_lock(&lock_A ) to acquire lock_A in ISR().

Core1:
Acquire lock_A by spin_lock(&lock_A ) in task and successed
…
Call any kernel API which invoke vTaskEnterCritical() to get task lock

This flow cause the deadlock. (ABBA type)
Is any suggestion for prevent from this situation?
Thanks

aggarg · January 7, 2025, 5:08am

What is lock_A? Can you share small code snippet which will help in understanding the problem? Which port are you using or are you writing a SMP port?

eroy.yang · January 7, 2025, 5:39am

Hi, aggarg.

I’m writing a SMP port for platform with two cortex-A7 cores (aarch32 environment).

lock_A is a spinlock for protect some global resource.

Core0 executing sequence:

thread0_entry
{
vTaskSuspendAll(); <— get task lock
…
do something
…
xTaskResumeAll();
}

ISR
{
spin_lock(&lock_A); <— try to get lock_A
…
access global resource
…
spin_unlock(&lock_A)
}

After vTaskSuspendAll(), an interrupt occurs, and it try to acquire spinlock lock_A in its interrupt handler, but lock_A has locked in Core1 thread1_entry.

Core1 executing sequence:

theread1_entry
{
spin_lock(&lock_A); <— get lock_A
…
eTaskGetState(); <— try to get task lock
…
access global resource
…
spin_unlock(&lock_A);
}

The eTaskGetState() will invoke vTaskEnterCritical() to get task lock, but the task lock has locked by core0 thread0_entry.

So, I meet the deadlock. two cores are waiting the different locked locks.
The flow of core0 or core1 is common to use and couldn’t be prohibited.
I need some suggestion for solve this situation.

richard-damon · January 7, 2025, 2:10pm

As you describe it, you design has that deadlock as an essential feature. As you point out, you need to avoid having cases of Lock A then Lock B, and Lock B then Lock A in the program. Since it would be hard to add a taking of lock_a to the taking of the task lock, since that is a system action, you need to enforce that in your code, if you take lock_a, either you can not do an action that will need the task lock, or you need to take the task lock before taking lock_a.

The only other alternative that I know of is to make it so that the acquisition of lock_a can fail either by timeout or detection of the deadlock, and then handling that failure when you try to take it (which means the ISR might not be able to update the shared resource)

aggarg · January 7, 2025, 2:26pm

Can you avoid calling eTaskGetState() after taking lock_A? This would remove the dependency on task lock.

kstribrn · January 8, 2025, 6:29pm

Richard said it best. Your options are:

Add a timeout mechanism to your lock acquisition function (this will prevent permanent deadlock)
Rearrange how locks are acquired in each task.