I’m not sure if this issue is severe enough to lodge a proper issue on GitHub, or if I should file a bug instead with the pico-sdk folk (I don’t know who’s in charge of the rp2040 port in FreeRTOS). I’m working around my issue it by switching to heap_4 and keeping the application heap separate from the FreeRTOS heap.
In my application I launch an “init” task that launches a bunch of other tasks and then destroys itself. In one of the other tasks, I call a bunch of malloc’s to allocate application memory at the beginning of that task before its main loop (yes, I know I can just allocate this before running FreeRTOS and just pass pointers around). Apparently in my application these malloc calls happen right around the time the IDLE task on the other core is reaping the now-defunct “init” task.
In the pico-sdk, malloc
and free
are “wrapped”, and have a mutex associated with them. The root of the problem is a weird interplay between the FreeRTOS task lock, multiple cores, and rp2040 hardware spinlocks. When the IDLE0 (so in core 0) task is freeing “init”, the heap_3 implementation grabs the task lock via vTaskSuspendAll
and then eventually calls into free and into mutex_exit
where it tries to grab the core synchronization spinlock. However, my other task (in core 1) wins the race and successfully grabs the core synchronization spinlock (in mutex_enter_blocking
in the pico-sdk) first, but then tries to grab the task lock (which IDLE0 claimed with vTaskSuspendAll
in heap_3) and hangs. Because my task claimed the core synchronization spinlock, the IDLE0 task then hangs waiting for the spinlock.
core 0 backtrace:
0x200002f2 in spin_lock_unsafe_blocking (lock=0xd0000140) at /home/gabriel/Projects/pico-sdk/src/rp2_common/hardware_sync/include/hardware/sync.h:265
265 while (__builtin_expect(!*lock, 0)) {
(gdb) bt
#0 0x200002f2 in spin_lock_unsafe_blocking (lock=0xd0000140) at /home/gabriel/Projects/pico-sdk/src/rp2_common/hardware_sync/include/hardware/sync.h:265
#1 spin_lock_blocking (lock=0xd0000140) at /home/gabriel/Projects/pico-sdk/src/rp2_common/hardware_sync/include/hardware/sync.h:291
#2 mutex_exit (mtx=mtx@entry=0x20002d60 <malloc_mutex>) at /home/gabriel/Projects/pico-sdk/src/common/pico_sync/mutex.c:179
#3 0x10019458 in __wrap_free (mem=0x20013b68) at /home/gabriel/Projects/pico-sdk/src/rp2_common/pico_malloc/pico_malloc.c:92
#4 0x10019714 in vPortFree (pv=0x20013b68) at /home/gabriel/Projects/FreeRTOS-Kernel/portable/MemMang/heap_3.c:89
#5 0x1001b044 in prvDeleteTCB (pxTCB=pxTCB@entry=0x20015b70) at /home/gabriel/Projects/FreeRTOS-Kernel/tasks.c:6502
#6 0x1001c1c8 in prvCheckTasksWaitingTermination () at /home/gabriel/Projects/FreeRTOS-Kernel/tasks.c:6175
#7 0x1001c1e2 in prvIdleTask (pvParameters=<optimized out>) at /home/gabriel/Projects/FreeRTOS-Kernel/tasks.c:5839
#8 0x10019828 in xPortPendSVHandler () at /home/gabriel/Projects/FreeRTOS-Kernel/portable/ThirdParty/GCC/RP2040/port.c:640
core 1 backtrace:
vPortRecursiveLock (ulLockNum=ulLockNum@entry=1, pxSpinLock=pxSpinLock@entry=0xd000013c, uxAcquire=uxAcquire@entry=1) at /home/gabriel/Projects/FreeRTOS-Kernel/portable/ThirdParty/GCC/RP2040/include/portmacro.h:232
232 while( __builtin_expect( !*pxSpinLock, 0 ) )
(gdb) bt
#0 vPortRecursiveLock (ulLockNum=ulLockNum@entry=1, pxSpinLock=pxSpinLock@entry=0xd000013c, uxAcquire=uxAcquire@entry=1) at /home/gabriel/Projects/FreeRTOS-Kernel/portable/ThirdParty/GCC/RP2040/include/portmacro.h:232
#1 0x1001c06a in vTaskEnterCritical () at /home/gabriel/Projects/FreeRTOS-Kernel/tasks.c:7035
#2 0x1001cd5c in xTaskGetSchedulerState () at /home/gabriel/Projects/FreeRTOS-Kernel/tasks.c:6622
#3 0x10019e46 in xEventGroupWaitBits (xEventGroup=0x2001342c <xStaticEventGroup>, uxBitsToWaitFor=uxBitsToWaitFor@entry=65792, xClearOnExit=xClearOnExit@entry=1, xWaitForAllBits=xWaitForAllBits@entry=0,
xTicksToWait=xTicksToWait@entry=4294967295) at /home/gabriel/Projects/FreeRTOS-Kernel/event_groups.c:326
#4 0x10019b9c in vPortLockInternalSpinUnlockWithWait (pxLock=pxLock@entry=0x20002d60 <malloc_mutex>, ulSave=<optimized out>) at /home/gabriel/Projects/FreeRTOS-Kernel/portable/ThirdParty/GCC/RP2040/port.c:1008
#5 0x20000244 in mutex_enter_blocking (mtx=mtx@entry=0x20002d60 <malloc_mutex>) at /home/gabriel/Projects/pico-sdk/src/common/pico_sync/mutex.c:44
#6 0x10019420 in __wrap_malloc (size=2508) at /home/gabriel/Projects/pico-sdk/src/rp2_common/pico_malloc/pico_malloc.c:37
#7 0x1001948e in operator new (n=n@entry=2508) at /home/gabriel/Projects/pico-sdk/src/rp2_common/pico_standard_link/new_delete.cpp:15
#8 0x10000838 in std::make_unique<random_controller> () at /usr/lib/gcc/arm-none-eabi/13/include/g++-v13/bits/unique_ptr.h:1070
#9 0x100008ca in hid_task () at /home/gabriel/Projects/snes_controllers_to_usb/firmware/src/main.cpp:62
#10 0x10019828 in xPortPendSVHandler () at /home/gabriel/Projects/FreeRTOS-Kernel/portable/ThirdParty/GCC/RP2040/port.c:640
I need to sleep on this to see if I can think of a fix, and at what level the fix makes the most sense (pico-sdk or FreeRTOS). The answer here might just be that we don’t need two levels of mutexes-- the pico SDK may not need to provide its own mutexes around malloc/free when using heap_3 so long as the rp2040 port of FreeRTOS provides a good implementation of portGET_TASK_LOCK
(which is used in vTaskSuspendAll
), which I think it does.