New Lab-Project-FreeRTOS-FAT Concurrency Problems

I have a project (carlk3/FreeRTOS-FAT-CLI-for-RPi-Pico ) running on the official SMP FreeRTOS RP2040 port (FreeRTOS/FreeRTOS-Kernel) on Raspberry Pi Pico.
I have been making some changes lately, and decided to do some git submodule updates. Now I am having problems running the old vMultiTaskStdioWithCWDTest from FreeRTOS-Plus/FreeRTOS-Plus/Demo/Common/FreeRTOS_Plus_FAT_Demos/test/ff_stdio_tests_with_cwd.c. It runs OK with #define fsTASKS_TO_CREATE 2, but with #define fsTASKS_TO_CREATE 3 it fails quickly with

assertion "( xEventGroupGetBits( pxIOManager->xEventGroup ) & FF_FAT_LOCK_EVENT_BITS ) == 0" failed: file "/home/carlk/pi/pico/FreeRTOS+FAT+CLI/Lab-Project-FreeRTOS-FAT/ff_locking.c", line 297, function: FF_UnlockFAT

at:

rp2040.core0:
vPortRecursiveLock@0x10006892 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/FreeRTOS-Kernel/portable/ThirdParty/GCC/RP2040/include/portmacro.h:192)
vTaskEnterCritical@0x10006892 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/FreeRTOS-Kernel/tasks.c:5278)
ulTaskGenericNotifyTake@0x10007810 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/FreeRTOS-Kernel/tasks.c:5665)
spi_transfer@0x1001010a (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/FreeRTOS+FAT+CLI/portable/RP2040/spi.c:101)
sd_spi_transfer@0x1000fb68 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/FreeRTOS+FAT+CLI/portable/RP2040/sd_spi.c:85)
sd_spi_write@0x1000fba6 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/FreeRTOS+FAT+CLI/portable/RP2040/sd_spi.c:96)
sd_wait_ready@0x100104c4 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/FreeRTOS+FAT+CLI/portable/RP2040/sd_card.c:343)
sd_write_block@0x10010c1a (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/FreeRTOS+FAT+CLI/portable/RP2040/sd_card.c:889)
in_sd_write_blocks@0x10010cfa (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/FreeRTOS+FAT+CLI/portable/RP2040/sd_card.c:936)
sd_write_blocks@0x10010f60 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/FreeRTOS+FAT+CLI/portable/RP2040/sd_card.c:980)
prvWrite@0x1000fccc (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/FreeRTOS+FAT+CLI/portable/RP2040/ff_sddisk.c:63)
FF_BlockWrite@0x1000d35e (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/Lab-Project-FreeRTOS-FAT/ff_ioman.c:651)
FF_ClearCluster@0x1000a334 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/Lab-Project-FreeRTOS-FAT/ff_fat.c:605)
FF_MkDir@0x10009ee2 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/Lab-Project-FreeRTOS-FAT/ff_dir.c:3345)
ff_mkdir@0x1000ee86 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/Lab-Project-FreeRTOS-FAT/ff_stdio.c:700)
prvFileSystemAccessTask@0x10003f2a (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/example/tests/ff_stdio_tests_with_cwd.c:1249)
xPortPendSVHandler@0x10007f78 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/FreeRTOS-Kernel/portable/ThirdParty/GCC/RP2040/port.c:401)

rp2040.core1:
my_assert_func@0x100111d6 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/FreeRTOS+FAT+CLI/src/my_debug.c:154)
FF_UnlockFAT@0x1000e5b2 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/Lab-Project-FreeRTOS-FAT/ff_locking.c:299)
FF_GetChainLength@0x1000a93c (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/Lab-Project-FreeRTOS-FAT/ff_fat.c:1310)
FF_InitEntryFetch@0x10008a32 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/Lab-Project-FreeRTOS-FAT/ff_dir.c:1080)
FF_FindEntryInDir@0x10008fec (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/Lab-Project-FreeRTOS-FAT/ff_dir.c:392)
FF_MkDir@0x10009e76 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/Lab-Project-FreeRTOS-FAT/ff_dir.c:3296)
ff_mkdir@0x1000ee86 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/Lab-Project-FreeRTOS-FAT/ff_stdio.c:700)
prvFileSystemAccessTask@0x10003f2a (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/example/tests/ff_stdio_tests_with_cwd.c:1249)
xPortPendSVHandler@0x10007f78 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/FreeRTOS-Kernel/portable/ThirdParty/GCC/RP2040/port.c:401)

If I cd Lab-Project-FreeRTOS-FAT; git checkout d725a4c14a and rebuild, the test runs fine. So, it seems that something changed between

and

that broke my code.

Where do I start looking?

By brute force, I’ve narrowed it down to somewhere in here:

Maybe a change that I suggested? D’oh!

Hi Carl, thanks for reporting this. I was still hoping that you’d come with the solution :slight_smile:

I will look at the differences and see if there was a suspect change.

Thanks

I thought maybe there was a conflict on use of notification values, so I switched my use of Direct To Task Notifications to notification value index 1:

    /* Send a notification directly to the task to which interrupt processing is
     being deferred. */
    vTaskNotifyGiveIndexedFromISR(
        pSPI->owner,  // The handle of the task to which the
                      // notification is being sent.
        1,  // uxIndexToNotify: The index within the target task's array of
            // notification values to which the notification is to be sent.
        &xHigherPriorityTaskWoken);

    /* Wait until master completes transfer or time out has occured. */
    rc = ulTaskNotifyTakeIndexed(
        1, pdFALSE, pdMS_TO_TICKS(timeOut));  // Wait for notification from ISR

but the test still quickly fails:

core1: stdio Task: vMultiTaskStdioWithCWDTest(pcMountPath=/sd0, usStackSizeWords=768)
core1: FS0: prvFileSystemAccessTask()
core0: FS1: prvFileSystemAccessTask()
core0: FS2: prvFileSystemAccessTask()
core0: FS3: prvFileSystemAccessTask()
core0: FS4: prvFileSystemAccessTask()
core0: FS0: Task 20004E08, FS0, /sd0/0
core0: FS1: Task 20005EC8, FS1, /sd0/1
core0: FS2: Task 20006F88, FS2, /sd0/2
core1: FS3: Task 20008048, FS3, /sd0/3
core0: FS4: Task 20009108, FS4, /sd0/4
core0: FS3: FS3: assertion "( xEventGroupGetBits( pxIOManager->xEventGroup ) & FF_FAT_LOCK_EVENT_BITS ) == 0" failed: file "/home/carlk/pi/pico/FreeRTOS+FAT+CLI/Lab-Project-FreeRTOS-FAT/ff_locking.c", line 297, function: FF_UnlockFAT

core0:

vPortRecursiveLock@0x10006882 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/FreeRTOS-Kernel/portable/ThirdParty/GCC/RP2040/include/portmacro.h:192)
vTaskEnterCritical@0x10006882 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/FreeRTOS-Kernel/tasks.c:5278)
ulTaskGenericNotifyTake@0x100077fc (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/FreeRTOS-Kernel/tasks.c:5665)
spi_transfer@0x100100e6 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/FreeRTOS+FAT+CLI/portable/RP2040/spi.c:80)
sd_spi_write@0x1000fbd6 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/FreeRTOS+FAT+CLI/portable/RP2040/sd_spi.c:98)
sd_write_block@0x10010c62 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/FreeRTOS+FAT+CLI/portable/RP2040/sd_card.c:881)
in_sd_write_blocks@0x10010d62 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/FreeRTOS+FAT+CLI/portable/RP2040/sd_card.c:935)
sd_write_blocks@0x10010fc8 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/FreeRTOS+FAT+CLI/portable/RP2040/sd_card.c:979)
prvWrite@0x1000fcfc (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/FreeRTOS+FAT+CLI/portable/RP2040/ff_sddisk.c:63)
FF_BlockWrite@0x1000d34e (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/Lab-Project-FreeRTOS-FAT/ff_ioman.c:651)
FF_ClearCluster@0x1000a324 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/Lab-Project-FreeRTOS-FAT/ff_fat.c:605)
FF_MkDir@0x10009ed2 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/Lab-Project-FreeRTOS-FAT/ff_dir.c:3345)
ff_mkdir@0x1000ee76 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/Lab-Project-FreeRTOS-FAT/ff_stdio.c:700)
prvFileSystemAccessTask@0x10003f1e (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/example/tests/ff_stdio_tests_with_cwd.c:1249)
xPortPendSVHandler@0x10007f68 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/FreeRTOS-Kernel/portable/ThirdParty/GCC/RP2040/port.c:401)

core1:

my_assert_func@0x1001123e (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/FreeRTOS+FAT+CLI/src/my_debug.c:154)
FF_UnlockFAT@0x1000e5a2 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/Lab-Project-FreeRTOS-FAT/ff_locking.c:299)
FF_GetChainLength@0x1000a92c (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/Lab-Project-FreeRTOS-FAT/ff_fat.c:1310)
FF_InitEntryFetch@0x10008a22 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/Lab-Project-FreeRTOS-FAT/ff_dir.c:1080)
FF_FindEntryInDir@0x10008fdc (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/Lab-Project-FreeRTOS-FAT/ff_dir.c:392)
FF_MkDir@0x10009e66 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/Lab-Project-FreeRTOS-FAT/ff_dir.c:3296)
ff_mkdir@0x1000ee76 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/Lab-Project-FreeRTOS-FAT/ff_stdio.c:700)
prvFileSystemAccessTask@0x10003f1e (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/example/tests/ff_stdio_tests_with_cwd.c:1249)
xPortPendSVHandler@0x10007f68 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/FreeRTOS-Kernel/portable/ThirdParty/GCC/RP2040/port.c:401)

core0 is executing this line of code:

    /* Ensure this task does not already have a notification pending by calling
     ulTaskNotifyTake() with the xClearCountOnExit parameter set to pdTRUE, and
     a block time of 0 (don't block). */
    BaseType_t rc = ulTaskNotifyTake(pdTRUE, 0);

Hmm, I should have switched that to indexed, too.

Sill fails. This time, core0 is at this line:

    rc = ulTaskNotifyTakeIndexed(
        1, pdFALSE, pdMS_TO_TICKS(timeOut));  // Wait for notification from ISR

and inside that it’s in a spin lock:

    static inline void vPortRecursiveLock(uint32_t ulLockNum, spin_lock_t *pxSpinLock, BaseType_t uxAcquire) {
        static uint8_t ucOwnedByCore[ portMAX_CORE_COUNT ];
        static uint8_t ucRecursionCountByLock[ portRTOS_SPINLOCK_COUNT ];
        configASSERT(ulLockNum >= 0 && ulLockNum < portRTOS_SPINLOCK_COUNT );
        uint32_t ulCoreNum = get_core_num();
        uint32_t ulLockBit = 1u << ulLockNum;
        configASSERT(ulLockBit < 256u );
        if( uxAcquire )
        {
            if( __builtin_expect( !*pxSpinLock, 0 ) )
            {
                if( ucOwnedByCore[ulCoreNum] & ulLockBit )
                {
                    configASSERT(ucRecursionCountByLock[ulLockNum] != 255u );
                    ucRecursionCountByLock[ulLockNum]++;
                    return;
                }
                while ( __builtin_expect( !*pxSpinLock, 0 ) );

in the while loop, and core1 is in here:

void FF_UnlockFAT( FF_IOManager_t * pxIOManager )
{
    if( xTaskGetSchedulerState() != taskSCHEDULER_RUNNING )
    {
        /* Scheduler not yet active. */
        return;
    }

    configASSERT( ( xEventGroupGetBits( pxIOManager->xEventGroup ) & FF_FAT_LOCK_EVENT_BITS ) == 0 );
    pxIOManager->pvFATLockHandle = NULL;
    xEventGroupSetBits( pxIOManager->xEventGroup, FF_FAT_LOCK_EVENT_BITS );
}

If I comment out the configASSERT at line 297 in ff_locking.c, then it fails on this one:

assertion "( pxIOManager->pvFATLockHandle != NULL ) && ( pxIOManager->pvFATLockHandle == handle )" failed: file "/home/carlk/pi/pico/FreeRTOS+FAT+CLI/Lab-Project-FreeRTOS-FAT/ff_locking.c", line 247, function: FF_Assert_Lock

at

my_assert_func@0x10011216 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/FreeRTOS+FAT+CLI/src/my_debug.c:154)
FF_Assert_Lock@0x1000e502 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/Lab-Project-FreeRTOS-FAT/ff_locking.c:247)
FF_putFATEntry@0x1000a4b6 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/Lab-Project-FreeRTOS-FAT/ff_fat.c:896)
FF_FindFreeCluster@0x1000a836 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/Lab-Project-FreeRTOS-FAT/ff_fat.c:1221)
FF_CreateClusterChain@0x1000a8aa (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/Lab-Project-FreeRTOS-FAT/ff_fat.c:1259)
FF_MkDir@0x10009ea6 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/Lab-Project-FreeRTOS-FAT/ff_dir.c:3323)
ff_mkdir@0x1000ee4e (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/Lab-Project-FreeRTOS-FAT/ff_stdio.c:700)
prvFileSystemAccessTask@0x10003f1e (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/example/tests/ff_stdio_tests_with_cwd.c:1249)
xPortPendSVHandler@0x10007f68 (/home/carlk/pi/pico/FreeRTOS+FAT+CLI/FreeRTOS-Kernel/portable/ThirdParty/GCC/RP2040/port.c:401)

EDIT:
Further up the stack of the assert, this bit of code in FF_FindFreeCluster is interesting:

    BaseType_t xTakeLock = FF_Has_Lock( pxIOManager, FF_FAT_LOCK ) == pdFALSE;

    if( xTakeLock )
    {
        FF_LockFAT( pxIOManager );
    }

In this case, xTakeLock is 0. So, apparently the pxIOManager has the lock at that point in FF_FindFreeCluster, but has lost it by the time FF_FindFreeCluster calls FF_putFATEntry.

EDIT 2:
If I replace

        xBits = xEventGroupWaitBits( pxIOManager->xEventGroup,
                                     FF_FAT_LOCK_EVENT_BITS, /* uxBitsToWaitFor */
                                     pdTRUE,                 /* xClearOnExit */
                                     pdFALSE,                /* xWaitForAllBits n.a. */
                                     FF_TIME_TO_WAIT_FOR_EVENT_TICKS );

with

        xBits = xEventGroupWaitBits( pxIOManager->xEventGroup,
                                     FF_FAT_LOCK_EVENT_BITS, /* uxBitsToWaitFor */
                                     pdFALSE,                 /* xClearOnExit */
                                     pdFALSE,                /* xWaitForAllBits n.a. */
                                     FF_TIME_TO_WAIT_FOR_EVENT_TICKS );
        xBits = xEventGroupClearBits( pxIOManager->xEventGroup, FF_FAT_LOCK_EVENT_BITS );

in FF_LockFAT, and

        xBits = xEventGroupWaitBits( pxIOManager->xEventGroup,
                                     FF_DIR_LOCK_EVENT_BITS, /* uxBitsToWaitFor */
                                     pdTRUE,                 /* xClearOnExit */
                                     pdFALSE,                /* xWaitForAllBits n.a. */
                                     FF_TIME_TO_WAIT_FOR_EVENT_TICKS );

with

        xBits = xEventGroupWaitBits( pxIOManager->xEventGroup,
                                     FF_DIR_LOCK_EVENT_BITS, /* uxBitsToWaitFor */
                                     pdFALSE,                 /* xClearOnExit */
                                     pdFALSE,                /* xWaitForAllBits n.a. */
                                     FF_TIME_TO_WAIT_FOR_EVENT_TICKS );
        xBits = xEventGroupClearBits( pxIOManager->xEventGroup, FF_DIR_LOCK_EVENT_BITS );

in FF_LockDirectory, the test runs fine.

I can’t explain that.

I’ve expanded the configASSERTs in ff_locking.c to try to get some more information.

   320	#if 0
   321	    configASSERT( ( xEventGroupGetBits( pxIOManager->xEventGroup ) & FF_FAT_LOCK_EVENT_BITS ) == 0 );
   322	#else
   323	    EventBits_t x = xEventGroupGetBits( pxIOManager->xEventGroup );
   324	    if (( x & FF_FAT_LOCK_EVENT_BITS ) != 0) {
   325	        task_printf("xEventGroupGetBits( pxIOManager->xEventGroup ): 0x%lX\r\n", x);
   326	        configASSERT(false);
   327	    }
   328	#endif

gives

core0: FS1: prvFileSystemAccessTask()
core0: FS2: prvFileSystemAccessTask()
core0: FS3: prvFileSystemAccessTask()
core0: FS4: prvFileSystemAccessTask()
core0: FS0: Task 20004D38, FS0, /sd0/0
core0: FS1: Task 20005DF8, FS1, /sd0/1
core0: FS2: Task 20006EB8, FS2, /sd0/2
core0: FS3: Task 20007F78, FS3, /sd0/3
core0: FS4: Task 20009038, FS4, /sd0/4
core1: FS2: xEventGroupGetBits( pxIOManager->xEventGroup ): 0x7
core1: FS2: FS2: assertion "false" failed: file "/home/carlk/pi/pico/FreeRTOS+FAT+CLI/Lab-Project-FreeRTOS-FAT/ff_locking.c", line 326, function: FF_UnlockFAT

I don’t understand what this configASSERT is asserting. Is it asserting that there should be no other tasks waiting to lock the FAT when this task is unlocking the FAT?

   255	#if 0
   256	        configASSERT( ( pxIOManager->pvFATLockHandle != NULL ) && ( pxIOManager->pvFATLockHandle == handle ) );        
   257	#else
   258	    void *h = pxIOManager->pvFATLockHandle;
   259	    if (!( ( h != NULL ) && ( h == handle ) )) {
   260	        task_printf("pxIOManager->pvFATLockHandle: %p (%s), handle: %p (%s)\r\n", 
   261	            h, pcTaskGetName(h), handle, pcTaskGetName(handle));
   262	        configASSERT(false);
   263	    }
   264	#endif

gives

core0: FS0: prvFileSystemAccessTask()
core0: FS1: prvFileSystemAccessTask()
core1: FS2: prvFileSystemAccessTask()
core0: FS3: prvFileSystemAccessTask()
core0: FS4: prvFileSystemAccessTask()
core0: FS0: Task 20004D38, FS0, /sd0/0
core0: FS1: Task 20005DF8, FS1, /sd0/1
core0: FS2: Task 20006EB8, FS2, /sd0/2
core0: FS3: Task 20007F78, FS3, /sd0/3
core0: FS4: Task 20009038, FS4, /sd0/4
core1: FS1: pxIOManager->pvFATLockHandle: 20006EB8 (FS2), handle: 20005DF8 (FS1)
core1: FS1: FS1: assertion "false" failed: file "/home/carlk/pi/pico/FreeRTOS+FAT+CLI/Lab-Project-FreeRTOS-FAT/ff_locking.c", line 262, function: FF_Assert_Lock

So, this is saying that task FS1 is FF_Assert_Locking that it owns the lock, but actually it is owned by FS2?

If I stick some FF_Assert_Locks into FF_FindFreeCluster:

  1144	            /* Start from a sector where the first free entry is expected,
  1145	             * and iterate through every FAT sector. */
  1146	            for( ulFATSector = ( ulFATOffset / pxIOManager->xPartition.usBlkSize );
  1147	                 ulFATSector < pxIOManager->xPartition.ulSectorsPerFAT;
  1148	                 ulFATSector++ )
  1149	            {
  1150	
  1151	FF_Assert_Lock( pxIOManager, FF_FAT_LOCK );
  1152	
  1153	                pxBuffer = FF_GetBuffer( pxIOManager, pxIOManager->xPartition.ulFATBeginLBA + ulFATSector, FF_MODE_READ );
  1154	
  1155	FF_Assert_Lock( pxIOManager, FF_FAT_LOCK );
  1156	
  1157	                if( pxBuffer == NULL )

it asserts on the one after FF_GetBuffer:

core0: FS0: prvFileSystemAccessTask()
core0: FS1: prvFileSystemAccessTask()
core0: FS2: prvFileSystemAccessTask()
core0: FS3: prvFileSystemAccessTask()
core0: FS4: prvFileSystemAccessTask()
core0: FS0: Task 20004D38, FS0, /sd0/0
core0: FS1: Task 20005DF8, FS1, /sd0/1
core1: FS2: Task 20006EB8, FS2, /sd0/2
core0: FS3: Task 20007F78, FS3, /sd0/3
core0: FS4: Task 20009038, FS4, /sd0/4
vCreateAndVerifyExampleFiles(pcMountPath=/sd0/0)
prvCreateDemoFilesUsing_ff_fwrite(pcMountPath=/sd0/0)
core0: FS0: Creating file root001.txt in /sd0/0
vCreateAndVerifyExampleFiles(pcMountPath=/sd0/4)
prvCreateDemoFilesUsing_ff_fwrite(pcMountPath=/sd0/4)
core0: FS4: Creating file root001.txt in /sd0/4
vCreateAndVerifyExampleFiles(pcMountPath=/sd0/1)
prvCreateDemoFilesUsing_ff_fwrite(pcMountPath=/sd0/1)
core1: FS0: pxIOManager->pvFATLockHandle: 20007F78 (FS3), handle: 20004D38 (FS0)
core1: FS0: FS0: assertion "false" failed: file "/home/carlk/pi/pico/FreeRTOS+FAT+CLI/Lab-Project-FreeRTOS-FAT/ff_locking.c", line 262, function: FF_Assert_Lock

OK, I think I understand it now. Yup, my suggested change broke the code.

In FF_LockFAT, for example, xEventGroupWaitBits and xEventGroupClearBits must be done as two separate steps. There is a meaningful difference between

		xEventGroupWaitBits( pxIOManager->xEventGroup,
			FF_FAT_LOCK_EVENT_BITS, /* uxBitsToWaitFor */
			pdFALSE,                /* xClearOnExit */
			pdFALSE,                /* xWaitForAllBits n.a. */
			pdMS_TO_TICKS( 10000UL ) );
		xBits = xEventGroupClearBits( pxIOManager->xEventGroup, FF_FAT_LOCK_EVENT_BITS );

and

        xBits = xEventGroupWaitBits( pxIOManager->xEventGroup,
                                     FF_FAT_LOCK_EVENT_BITS, /* uxBitsToWaitFor */
                                     pdTRUE,                 /* xClearOnExit */
                                     pdFALSE,                /* xWaitForAllBits n.a. */
                                     FF_TIME_TO_WAIT_FOR_EVENT_TICKS );

When xEventGroupSetBits( pxIOManager->xEventGroup, FF_FAT_LOCK_EVENT_BITS ); is called, it will unblock all tasks waiting on FF_FAT_LOCK_EVENT_BITS in xEventGroupWaitBits. If xClearOnExit is pdTRUE, they will all clear the bit, but that doesn’t help anything. xEventGroupClearBits needs to be done as a second atomic step. Only a single task will get back FF_FAT_LOCK_EVENT_BITS in xBits, and after that task has atomically cleared the bit, all the other tasks will get back zero. So, only one task succeeds in the following test:

        if( ( xBits & FF_FAT_LOCK_EVENT_BITS ) != 0 )
        {
            /* This task has cleared the desired bit.
             * It now 'owns' the resource. */
            pxIOManager->pvFATLockHandle = xTaskGetCurrentTaskHandle();
            break;
        }

and the others just loop back around to wait again.

I will make a Pull Request.

I must admit that I fully agreed with mentioned changes to ff_locking.c
The new code was shorter and more elegant.
Now I wonder if the locking also shows problems when running on a single core. I will test that next week.

A fragment from event_groups.c:

xWaitConditionMet = prvTestWaitCondition();
if( xWaitConditionMet != pdFALSE )
{
    /* The wait condition has already been met so there is no need to
   block. */
   uxReturn = uxCurrentEventBits;
    xTicksToWait = ( TickType_t ) 0; /* Clear the wait bits if requested to do so. */
    if( xClearOnExit != pdFALSE )
    {
        pxEventBits->uxEventBits &= ~uxBitsToWaitFor;
    }
}

The above code made me think that the new method is also safe.
The first task who sees that the condition is met will clear bit while either suspending the task-handler, or in a critical section.

Where I got confused was this sentence from Mastering the FreeRTOS™ Real Time Kernel, “The xEventGroupWaitBits() API Function”,

The xClearOnExit parameter is provided to avoid these potential race conditions. If
xClearOnExit is set to pdTRUE, then the testing and clearing of event bits appears to the
calling task to be an atomic operation (uninterruptable by other tasks or interrupts).

The part that I missed is that

xEventGroupSetBits is a single step, and one that unblocks all waiting tasks in this case. The fact that they all atomically clear the bit if xClearOnExit is pdTRUE doesn’t help because they’re all already running. So, a second atomic step, xEventGroupClearBits, is needed to select a winner and send the others back to xEventGroupWaitBits.

I agree, but it’s really the old, old method. I don’t know who wrote it in the first place, but they knew what they were doing (although I do wonder about the efficiency of this locking mechanism).

You mean, recreating the problem with the broken code on a single core? It is not easy to come up with good stress tests for this. Hard to get the timing right so that the first task doesn’t always proceed before there are any other waiters. With this particular bug, the problem only showed up when there was contention. Adding a configASSERT to FF_LockFAT helps with early detection.

This is not correct, There is not A winner to a call to xEventGroupSetBits, an EventGroup is NOT like a semaphore where each set activates a single get. Event Groups are DEFINED so that ALL tasks that are waiting fora given event get released, not just a single one.

I’m not saying that there is.

I am saying that xEventGroupClearBits is used to select a winner in this case. Assuming that xEventGroupClearBits is atomic*, only one task will find the bit to be set in xBits after this call**:

xBits = xEventGroupClearBits(pxIOManager->xEventGroup, FF_FAT_LOCK_EVENT_BITS);

* However, I can’t find any documentation specifying that it is guaranteed to be atomic.
** Context: https://github.com/carlk3/Lab-Project-FreeRTOS-FAT/blob/fix_locking/ff_locking.c#:~:text=test%20%26%20set%20operation%3A%20*/-,xBits%20%3D%20xEventGroupClearBits(pxIOManager->xEventGroup%2C,-FF_FAT_LOCK_EVENT_BITS)%3B

I opened a Pull Request:

if anyone wants to review it.

That is a good point. It does look like it would work. Maybe there is an important relevant difference between vTaskSuspendAll and taskENTER_CRITICAL in the SMP environment?

However, wouldn’t this approach be depending on behavior that is not documented? Is that implementation of xEventGroupWaitBits even correct if xEventGroupSetBits is semantically guaranteed to unblock all tasks that are waiting for a given event?