Cortex-M4F: Problems with xSemaphoreGiveFromISR / portYIELD_FROM_ISR

damien_d wrote on Friday, October 12, 2018:

Dear All,

I am currently attempting to debug a problem I have running stock FreeRTOS v10.1.0 on an NXP S32K144 EVK (ARM Cortex-M4F) using GCC 7.3.1:

/opt/gcc-arm-none-eabi-7-2018-q2-update/bin/arm-none-eabi-gcc --version
arm-none-eabi-gcc (GNU Tools for Arm Embedded Processors 7-2018-q2-update) 7.3.1 20180622 (release) [ARM/embedded-7-branch revision 261907]

In this application, I have a sensor sampling task triggered by a HW timer servicing 3 sensors at 3200Hz, two comms tasks (a UART and CAN), and a general debugging task triggered from the tick hook.

The problem I have is in the sensor task. The three sensors are each on their own SPI bus, which operate almost simultaneously. At the end of the SPI transaction, an interrupt is generated, where it wakes the main sensor task with a semaphore, i.e.

void SensorN_SPI_Interrupt(void)
    // <snip>
    // Check if the transaction is complete
    if (m_spi_txIndex >= m_spi_numTransactions)
        #if (SPI_BUSY_WAIT)
            m_spi_ready = 1;
            // Flag the main loop
            BaseType_t xHigherPriorityTaskWoken = pdFALSE;
            xSemaphoreGiveFromISR(m_spi_semaphore, &xHigherPriorityTaskWoken);
        // Stuff more data in the SPI port

(The ISR is implemented three times, each pointing to unique data structures, and have their own interrupt vector and priority).

The sensor loop waits on the sensor as follows:

void SensorN_SPI_TransactionSequence_Wait(void)
    #if (SPI_BUSY_WAIT)
        while (!m_spi_ready)
        const TickType_t xMaxExpectedBlockTime = portMAX_DELAY;
        xSemaphoreTake(m_spi_semaphore, xMaxExpectedBlockTime);

So far, I have observed the following behavior:

a) If I have exactly one sensor operating, then the system operates indefinately

b) if I have all three sensors running, then it will crash anywhere between 10sec and 3 minutes, with no apparent activity in any thread or interrupt. configASSERT() is defined, but appears not to execute.

c) If I define SPI_BUSY_WAIT which uses a busy wait rather than the semaphore mechanism, then the system operates indefinately.

I suspect therefore it has something to do with yielding when multiple interrupts are pending execution.

I have checked the usual advice with respect to Cortex-M interrupt. The NXP S32K144 implements 4 priority bits. The SPI interrupts each operate at (unshifted) priority 9, although changing the priorites to be 7, 8 and 9 on each sensor makes no difference.

Stack overflow detection is enabled, and there is no evidence of any task getting close to their stack limit.

/* The lowest interrupt priority that can be used in a call to a "set priority"
function. */

/* The highest interrupt priority that can be used by any interrupt service
routine that makes calls to interrupt safe FreeRTOS API functions.  DO NOT CALL
PRIORITY THAN THIS! (higher priorities are lower numeric values. */

/* Interrupt priorities used by the kernel port layer itself.  These are generic
to all Cortex-M ports, and do not rely on any particular library functions. */

/* !!!! configMAX_SYSCALL_INTERRUPT_PRIORITY must not be set to zero !!!!
See */

I have been following this thread with interest as it appears to also have some problems with task switching behavior:

Is there anything you might suggest that I try to help debug what may be happening?

Kind regards,

damien_d wrote on Friday, October 12, 2018:

Interesting, the initial workaround on the forum topic linked does NOT work for me, although the lock-up does seems to take much longer to manifest. That is, whereas previously it would take between 1-3 minutes to manifest, it will now run for a long as 10 minutes without locking up. Not having the task switching (i.e. using SPI_BUSY_WAIT in the original description) runs overnight without problems.

So, it seems to help, but it’s not the full story.

		/* Select a new task to run using either the generic C or port
		optimised asm code. */
        __asm ("CPSID  i");
		taskSELECT_HIGHEST_PRIORITY_TASK(); /*lint !e9079 void * is used as this macro is used with timers and co-routines too.  Alignment is known to be fine as the type of the pointer stored and retrieved is the same. */
         __asm ("CPSIE  i");

rtel wrote on Friday, October 12, 2018:

I think the conclusion of that thread was that the work around you
posted would not actually fix the issue. In the other thread the issue
manifests itself (and can be explained) when run time stats are turned
on - do you have run time stats on, or any other code inserted into the
context switch code (such as a trace macro implemented)?

damien_d wrote on Monday, October 15, 2018:

Strangely enough, no. I’m still working on any particular configuration that appears to trigger ti more (or less) often, or not at all.

My baseline FreeRTOS config looks like this:

#define configUSE_PREEMPTION                     1
#define configSUPPORT_STATIC_ALLOCATION          1
#define configSUPPORT_DYNAMIC_ALLOCATION         1
#define configUSE_IDLE_HOOK                      0
#define configUSE_TICK_HOOK                      1
#define configCPU_CLOCK_HZ                       ( 80000000UL )
#define configTICK_RATE_HZ                       ((TickType_t)1000)
#define configMAX_PRIORITIES                     ( 7 )
#define configMINIMAL_STACK_SIZE                 (128)
#define configTOTAL_HEAP_SIZE                    ((size_t) 4096)
#define configMAX_TASK_NAME_LEN                  ( 16 )
#define configUSE_16_BIT_TICKS                   0
#define configUSE_MUTEXES                        1
#define configQUEUE_REGISTRY_SIZE                8
#define configUSE_TIME_SLICING                   0

/* Co-routine definitions. */
#define configUSE_CO_ROUTINES                    0
#define configMAX_CO_ROUTINE_PRIORITIES          ( 2 )

/* Set the following definitions to 1 to include the API function, or zero
to exclude the API function. */
#define INCLUDE_vTaskPrioritySet        1
#define INCLUDE_uxTaskPriorityGet       1
#define INCLUDE_vTaskDelete             1
#define INCLUDE_vTaskCleanUpResources   1
#define INCLUDE_vTaskSuspend            1
#define INCLUDE_vTaskDelayUntil         1
#define INCLUDE_vTaskDelay              1

Note that there is no configGENERATE_RUN_TIME_STATS defined.

In this “baseline” configuration, it will generally last about 15min before locking up.

if I add the following line, then it takes generally 1-3 minutes to lock up:

#define configCHECK_FOR_STACK_OVERFLOW           1

which is the only extra line that appears to add code to the FreeRTOS task switching routine.

damien_d wrote on Wednesday, October 17, 2018:

After spending a few days debugging, the best I can come up with is as follows:

  1. The lock-up is an imprecise hard fault. Setting DISDEFWBUF turns it into a precise fault.
  2. The HFSR reports 0x40000000, noting the FORCED bit is set.

I have tried all sorts of things to attempt what may or may not cause the problem, including:

  1. Removing all FPU computations (they exist only in one task) to remove the possibility it may have to do with lazy stacking of the FPU registers
  2. All FreeRTOS objects are static. Padding queue and stack buffers with extra data, and esnuring all FreeRTOS onbjects are aligned to 8 bytes makes no difference.
  3. Based on (2), inspecting the memory shows no evidence of any structure over-running its alloted bounds.

There are only two ways I can stop the lock-up from occuring:

  1. Replace the binary semaphores with busy loops (see the original post); or
  2. Re-work the task to use direct-to task notifications instead of binary semaphores.

I am at a loss to explain how or why this is occuring, or why the binary semaphores seem to be the common theme here. Even more troubling is that the application was a port from an STM32 to an S32K (both Cortex M4F) using the same FreeRTOS primitives, but (of course) using device-specific drivers for CAN, SPI, timers, etc. From FreeRTOS’s perspective, only two things changed:

  1. Upgrade from FreeRTOS 10.0.0 to 10.1.0, and
  2. Change from dynamically allocated structures to statically allocated structures.

I’m happy to test ideas, but for the moment, I will proceed with the task notification based “semaphores” as it seems to solve my problem.

– Damien.