+FAT: Optimization bug in ff_locking.c? xEventGroupGetBits has side effects?

I’m having problems with a Release build that defines NDEBUG, which was basically doing away with configASSERT:

#ifdef NDEBUG           /* required by ANSI standard */
# define configASSERT(__e) ((void)0)
#else
//...

A simple dir command resulted in
Error: ff_findfirst() failed.
followed by
FF_GetBuffer[0x5EC0]: failed mode 0x1

I started playing with variations of configASSERT macros and I found that if I use a version that allows the expression to have side effects*, the problem goes away. For example, this works fine:

#define assert_se(expr)     \
     do {                   \
        if (!(expr))        \
             { for( ;; ); } \
     } while (false)
#define configASSERT( x ) assert_se(x)

* See also: Examine ASSERT Statements

The problem seems to lie in ff_locking.c. There are a couple of calls like this:

void FF_UnlockDirectory( FF_IOManager_t *pxIOManager )
// ...
	configASSERT( ( xEventGroupGetBits( pxIOManager->xEventGroup ) & FF_DIR_LOCK_EVENT_BITS ) == 0 );
	xEventGroupSetBits( pxIOManager->xEventGroup, FF_DIR_LOCK_EVENT_BITS );
}

and

void FF_UnlockFAT( FF_IOManager_t *pxIOManager )
{
// ...
	configASSERT( ( xEventGroupGetBits( pxIOManager->xEventGroup ) & FF_FAT_LOCK_EVENT_BITS ) == 0 );
	pxIOManager->pvFATLockHandle = NULL;
	xEventGroupSetBits( pxIOManager->xEventGroup, FF_FAT_LOCK_EVENT_BITS );
}

Could xEventGroupGetBits( pxIOManager->xEventGroup ) be having side effects? It doesn’t look like it:

#define xEventGroupGetBits( xEventGroup ) xEventGroupClearBits( xEventGroup, 0 )

but I don’t know how else to explain what I’m seeing.

Everything works if I replace configASSERT in ff_locking.c with assert_se, but leave the optimized configASSERTs everywhere else.

EDIT: I’m compiling with:

arm-none-eabi-gcc.exe -mcpu=cortex-m4 -mfloat-abi=softfp -mfpu=fpv4-sp-d16 -mthumb 
 -I..\..\printf
 -I..\FreeRTOS\FreeRTOS\Source\include
 -I..\FreeRTOS\FreeRTOS\Source\portable\GCC\ARM_CM4F
 -I.
 -IGenerated_Source\PSoC6
 -IGenerated_Source\PSoC6\pdl\cmsis/include/
 -IGenerated_Source\PSoC6\pdl\devices/psoc6/include/
...
 -IGenerated_Source\PSoC6\pdl\middleware/emWin/code/include/osnts_softfp/
 -IGenerated_Source\PSoC6\pdl\utilities/ 
-Wa,-alh=.\CortexM4\ARM_GCC_Generic\Release/ff_locking.lst -D NDEBUG -D CY_CORE_ID=0 -D HEAP_NEWLIB=1 -D CY_PSOC_CREATOR_USED=1 -D CY8C6347BZI_BLD53 -Wall -ffunction-sections -ffat-lto-objects -O2 -Wno-unused-parameter -Wno-unused-variable -fdata-sections -g3 -c ff_locking.c -o .\CortexM4\ARM_GCC_Generic\Release\ff_locking.o

"C:\Program Files (x86)\GNU Arm Embedded Toolchain\10 2020-q4-major\bin\arm-none-eabi-gcc.exe" --version
arm-none-eabi-gcc.exe (GNU Arm Embedded Toolchain 10-2020-q4-major) 10.2.1 20201103 (release)
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

I’ve done some more digging, trying to narrow down this issue. Now, I am seeing something different.

Does

// There should be no more notifications pending:
configASSERT(!ulTaskNotifyTake(pdTRUE, 0));

have side effects (assuming the assert does not trigger)? My impression was that if ulTaskNotifyTake returns 0, then 0 was the value of the task’s notification value before it was decremented or cleared, so xClearCountOnExit set to pdTRUE would do nothing.

However, I’m finding that it seems to be necessary in my code.

// uint32_t ulTaskNotifyTake( BaseType_t xClearCountOnExit, TickType_t xTicksToWait );   
rc = ulTaskNotifyTake(pdFALSE, pdMS_TO_TICKS(timeOut)); // Wait for notification from ISR

// Deal with notification...

// There should be no more notifications pending:
configASSERT(!ulTaskNotifyTake(pdTRUE, 0));

I had assumed that the first ulTaskNotifyTake, with xClearCountOnExit set to pdFALSE, would decrement the RTOS task’s notification value before ulTaskNotifyTake() exited.

But, if I change

// There should be no more notifications pending:
configASSERT(!ulTaskNotifyTake(pdTRUE, 0));

to

assert_se(!ulTaskNotifyTake(pdTRUE, 0));

everything works when NDEBUG is defined. (Previously, if configASSERT does not evaluate the expression, i.e., when NDEBUG is defined, it fails in ugly ways.)

I’ve attached my code in case you’d like to take a look. See sd_spi_write and sd_spi_transfer. sd_spi_transfer is a slightly more complicated case in that it is waiting for two notifications from the ISR.
sd_spi.c (10.1 KB)
spi.c (10.9 KB)

Agree with this. So it does have a side effect, but only in a case that would have made the assert fail when the assert is defined.

Can you try defining configASSERT() to be a memory barrier in case the compiler is rearranging the code in the helpful way it sometimes does - as per this line here: https://github.com/FreeRTOS/FreeRTOS-Kernel/blob/main/tasks.c#L2143 If portMEMORY_BARRIER is not defined for your port then you should be able to define it thus:

#define portMEMORY_BARRIER()    __asm volatile ( "" ::: "memory" ) 

I struggled with the syntax a bit, and wound up with this in FreeRTOS.h:

#ifdef NDEBUG           /* required by ANSI standard */
#  define configASSERT(__e) {__asm volatile ( "" ::: "memory" );}    
#else
#  define configASSERT(__e) ((__e) ? (void)0 : my_assert_func (__FILE__, __LINE__, \
    __func__, #__e))
#endif
#define assert_se(x) configASSERT(x)

which I think does the same thing.

In the intermediate file, sd_spi_write now expands to this:

uint8_t sd_spi_write(sd_card_t *this, const uint8_t value) {
 uint8_t received = 0xFF;
 sd_spi_acquire(this);
 {__asm volatile ( "" ::: "memory" );};
 {__asm volatile ( "" ::: "memory" );};
 {__asm volatile ( "" ::: "memory" );};
 BaseType_t rc = ulTaskGenericNotifyTake( ( ( 0 ) ), ( ( ( BaseType_t ) 1 ) ), ( 0 ) );
 {__asm volatile ( "" ::: "memory" );};
 this->spi->base->INTR_M_MASK = 0x200UL;
 uint32_t n = Cy_SCB_SPI_Write(this->spi->base, value);
 {__asm volatile ( "" ::: "memory" );};
 uint32_t timeOut = 1000UL;
 rc = ulTaskGenericNotifyTake( ( ( 0 ) ), ( ( ( BaseType_t ) 0 ) ), ( ( ( TickType_t ) ( ( ( TickType_t ) ( timeOut ) * ( TickType_t ) 1000u ) / ( TickType_t ) 1000U ) ) ) );
 if (!rc) {

                                                         ;
  sd_spi_release(this);
  return
# 258 "sd_spi.c" 3 4
    0
# 258 "sd_spi.c"
         ;
 }
 {__asm volatile ( "" ::: "memory" );};
 {__asm volatile ( "" ::: "memory" );};
 {__asm volatile ( "" ::: "memory" );};
 {__asm volatile ( "" ::: "memory" );};
 n = Cy_SCB_SPI_Read(this->spi->base);
 {__asm volatile ( "" ::: "memory" );};
 received = n;
 {__asm volatile ( "" ::: "memory" );};
 {__asm volatile ( "" ::: "memory" );};
 sd_spi_release(this);
 return received;
}

but I am seeing the same results.

You can probably guess by the number of configASSERTs that this piece of code has been troublesome for me. I went back and reviewed the documentation for this SPI implementation, and I found this: “Note: To avoid being triggered by events from previous transactions, whenever the firmware enables an interrupt mask register bit, it should clear the interrupt request register in advance.” I wasn’t doing that, so I tried it, and now I can run successfully with the original disappearing configASSERTs. Apparently, extra interrupts from previous transactions were sending unexpected (by me, anyhow) notifications to the task.

Still curious about the ulTaskNotifyTake(pdTRUE, 0)s, though.

Glad you found the cause - and yes I’m curious too.

Could it have something to do with interrupt priorities?

I have

//PSoC 6 (ARM Cortex M4) __NVIC_PRIO_BITS = 3
#define configKERNEL_INTERRUPT_PRIORITY         0xFF
#define configMAX_SYSCALL_INTERRUPT_PRIORITY    0x2F 
#define configMAX_API_CALL_INTERRUPT_PRIORITY 	configMAX_SYSCALL_INTERRUPT_PRIORITY

The SPI interrupt is at ARM CM4 Priority (0-7): 7 (as are all of my user-defined hardware interrupts).

I would not have thought it would be related to this - 7 is effectively the lowest priority in your case.

I guess it’s just a timing thing.

Thanks for your help!