SMP kernel spinlock deadlock (ABBA deadlock)

I use FreeRTOS SMP version on my platform, which has 2 CA7 cores.
I meet a deadlock situation.

The spin_lock() api is implemented by load-exclusive instruction similar to Linux.
And the portGET_TASK_LOCK() is implemented by recursive spin lock.

Core0:
vTaskSuspendAll() which restore interrupt, and get task lock only.

Interrupt occur, it use spin_lock(&lock_A ) to acquire lock_A in ISR().

Core1:
Acquire lock_A by spin_lock(&lock_A ) in task and successed

Call any kernel API which invoke vTaskEnterCritical() to get task lock

This flow cause the deadlock. (ABBA type)
Is any suggestion for prevent from this situation?
Thanks

What is lock_A? Can you share small code snippet which will help in understanding the problem? Which port are you using or are you writing a SMP port?

Hi, aggarg.

I’m writing a SMP port for platform with two cortex-A7 cores (aarch32 environment).

lock_A is a spinlock for protect some global resource.

Core0 executing sequence:

thread0_entry
{
vTaskSuspendAll(); <— get task lock

do something

xTaskResumeAll();
}

ISR
{
spin_lock(&lock_A); <— try to get lock_A

access global resource

spin_unlock(&lock_A)
}

After vTaskSuspendAll(), an interrupt occurs, and it try to acquire spinlock lock_A in its interrupt handler, but lock_A has locked in Core1 thread1_entry.

Core1 executing sequence:

theread1_entry
{
spin_lock(&lock_A); <— get lock_A

eTaskGetState(); <— try to get task lock

access global resource

spin_unlock(&lock_A);
}

The eTaskGetState() will invoke vTaskEnterCritical() to get task lock, but the task lock has locked by core0 thread0_entry.

So, I meet the deadlock. two cores are waiting the different locked locks.
The flow of core0 or core1 is common to use and couldn’t be prohibited.
I need some suggestion for solve this situation.

As you describe it, you design has that deadlock as an essential feature. As you point out, you need to avoid having cases of Lock A then Lock B, and Lock B then Lock A in the program. Since it would be hard to add a taking of lock_a to the taking of the task lock, since that is a system action, you need to enforce that in your code, if you take lock_a, either you can not do an action that will need the task lock, or you need to take the task lock before taking lock_a.

The only other alternative that I know of is to make it so that the acquisition of lock_a can fail either by timeout or detection of the deadlock, and then handling that failure when you try to take it (which means the ISR might not be able to update the shared resource)

1 Like

Can you avoid calling eTaskGetState() after taking lock_A? This would remove the dependency on task lock.

Richard said it best. Your options are:

  1. Add a timeout mechanism to your lock acquisition function (this will prevent permanent deadlock)
  2. Rearrange how locks are acquired in each task.