vTaskDelay() hard fault on priority higher than idle task

I’ve been battling for a while with a hard fault issue on my system. Before I explain the issue, here is some important information:
Port: PIC32
Compiler: XC32

FreeRTOSConfig.h relevant info:

#define configUSE_PREEMPTION					1
#define configUSE_PORT_OPTIMISED_TASK_SELECTION	1
#define configUSE_IDLE_HOOK						0
#define configTICK_RATE_HZ						( ( TickType_t ) 1000 )
#define configMAX_PRIORITIES					( 5UL )
#define configMINIMAL_STACK_SIZE				( 190 )
#define configISR_STACK_SIZE					( 400 )
#define configIDLE_SHOULD_YIELD					1
#define configCHECK_FOR_STACK_OVERFLOW			3 /* Three also checks the system/interrupt stack. */
#define configUSE_TASK_FPU_SUPPORT				1
#define configKERNEL_INTERRUPT_PRIORITY			0x01
#define configMAX_SYSCALL_INTERRUPT_PRIORITY	0x03
#define configASSERT( x ) if( ( x ) == 0  ) vAssertCalled( __FILE__, __LINE__ )
#define configSUPPORT_STATIC_ALLOCATION 1
#define configSUPPORT_DYNAMIC_ALLOCATION 0
#define configLIST_VOLATILE volatile

Basically, I have a driver for an SPI device which required some delays between transaction when the device is resetting. Something like this pseudo-code:

set_register(0x00,0x81);
vTaskDelay(2);
set_register(0x00, 0x00);
vTaskDelay(11);

What I observed however, is that when the priority of the calling task is set to tskIDLE_PRIORITY, it works as expected. When the priority of the calling task is tskIDLE_PRIORITY+1, there seems to be undetected stack memory corruption eventually turning into a “tlb refill exception”.

One interesting thing to note, is that this port supports interrupts nesting. It is expected that interrupts keep coming while the task is delayed.

How did you determine that it is stack corruption? Does it go away when you increase the task’s stack size?

I did increase the stack sizes of all tasks, and the ISR stack, but no difference.

Basically I found the memory corruption with a data breakpoint in my debugger. I added a data breakpoint on my variable (allocated on the task stack) to know when it is modified.

The corruption happens in the portSAVE_FPU_REGS assembly macro which is called at the beginning of each interrupt. In my case, the task does have an FPU context because floating-point math is used elsewhere in the task.

The program flow is the following:
Task n creates an data structure for the SPI transfer. The data structure contains the number of bytes to transmit and a “transmit done” flag. A pointer to this structure is placed on an Queue and the SPI interrupts are enabled. The task will then yield until the “done” flag is true before continuing the program.

The SPI interrupts get the pointer from the queue and sends the bytes on the bus, decrementing the number of bytes until it reaches 0, then set the “done” flag to true.

What I observe however, is that sometimes (seems to be related to task priority) the data structure gets corrupted and the SPI driver will eventually try to de-reference an invalid pointer.

Please post the code instead of describing it.

I’m always hesitant to post code directly, because there is a lot of stuff in the code which is not relevant to the issue. After removing the delays the issue still happen, so I do not think it is related to vTaskDelay anymore, but rather some error when the context is saved and reloaded. Somehow, if the debugger can be trusted, when the context is saved before jumping into interrupt code, it sometimes overwrites data that is still needed by another task.

I did make an interesting finding though: when I place a breakpoint in my interrupt, I see the variable ulTaskHasFPUContext is always 1 when my task has a priority of 1. When I lower the priority of my task to 0, the variable ulTaskHasFPUContext alternates between 0 and 1.

But let me try anyway to post the [simplified] code here:

Typedefs:

typedef struct
{
    void * tx_buffer;
    void * rx_buffer;
    uint32_t tx_buffer_size;
    uint32_t rx_buffer_size;
    bool done;
} spi_xfer_t;

typedef struct
{
    volatile uint32_t * IECxCLR;
    volatile uint32_t * IECxSET;
    volatile uint32_t * IFSxCLR;
    const uint32_t      FAULT_INT_MASK;
    const uint32_t      RX_INT_MASK;
    const uint32_t      TX_INT_MASK;
    
    volatile uint32_t * SPIxCON;
    volatile uint32_t * SPIxBUF;
    volatile uint32_t * SPIxBRG;
    volatile uint32_t * SPIxSTAT;
    
    const uint32_t      _SPIxSTAT_SPITBF_MASK;
    const uint32_t      _SPIxSTAT_SPIRBE_MASK;
    const uint32_t      _SPIxSTAT_SPITBE_MASK;
    
    volatile uint32_t * SPIxSTATCLR;
    const uint32_t      _SPIxSTAT_SPIROV_MASK;

    volatile uint32_t * SPIxCONSET;
    volatile uint32_t * SPIxCONCLR;
    const uint32_t      _SPIxCON_ON_MASK;

    spi_xfer_t          *tx_xfer;
    spi_xfer_t          *rx_xfer;
    
    StaticQueue_t xSpiMasterxQueueBuffer;
    uint8_t ucSpiMasterxQueueStorage[ SPIMASTERx_QUEUE_LENGTH * 
                                      sizeof( spi_xfer_t* ) ];
    
    QueueHandle_t xSpiMasterxQueue;

} spi_master_common_ctx_t;

Task code:

    int set(unsigned int regAddr, unsigned int regMask, unsigned int regValue)
    {   
        uint8_t newValue;
        if (regMask >= 0xFF)
            newValue = regValue&0xFF;
        else 
        {
            /* perform read-modify-write */
            //read
            unsigned int oldValue;
            get(devId, regAddr, &oldValue);

            //modify
            newValue = (regValue & regMask) | (oldValue & ~regMask);
        }
        //write
        uint8_t spibuffer[3];
        spibuffer[0] = (regAddr&0x7F00)>>8;
        spibuffer[1] = regAddr&0xFF;
        spibuffer[2] = newValue;
        
        spi_xfer_t xfer;
        xfer.tx_buffer = spibuffer;
        xfer.rx_buffer = NULL;
        xfer.tx_buffer_size = 3;
        xfer.rx_buffer_size = 0;
        
        SPIMasterx_Blocking_Xfer( &ctx, &xfer );
        
        return 0;
    }

int SPIMasterx_Blocking_Xfer( spi_master_common_ctx_t *ctx, spi_xfer_t *xfer )
{
    xfer->done = false;

    xQueueSendToBack( ctx->xSpiMasterxQueue, &xfer, portMAX_DELAY );
    
    /* Wait for the transfer to be done */
    while ( !xfer->done ) taskYIELD();
    
    return 0;
}


Interrupts code:

void SPIMasterx_TX_InterruptHandler( spi_master_common_ctx_t *ctx )
{
    portBASE_TYPE xHigherPriorityTaskWoken = pdFALSE;
    if (ctx->tx_xfer != NULL)
    {
        if ( ctx->tx_xfer->tx_buffer_size == 0 )
        {
            ctx->tx_xfer = NULL;
        }
    }
    
    if (ctx->tx_xfer == NULL)
    {
        //fetches the next transfer
        if ( xQueueIsQueueEmptyFromISR( ctx->xSpiMasterxQueue ) )
        {
            /* No pending transactions, turn off the interrupt */
            *ctx->IECxCLR = ctx->TX_INT_MASK;
            /* Clear the transmit interrupt flag */
            *ctx->IFSxCLR = ctx->TX_INT_MASK;
            return;
        } else
        {
            xQueueReceiveFromISR( ctx->xSpiMasterxQueue, 
                                             &ctx->tx_xfer, 
                                             &xHigherPriorityTaskWoken);
        }
    }
    
    /* Here, tx_xfer is not NULL. If there are more words to be 
     * transmitted, then transmit them here and keep track of the count. Push
     * bytes into the fifo while it is not full
     */

     while ( (bool)( *ctx->SPIxSTAT & ctx->_SPIxSTAT_SPITBF_MASK) == false && 
                     ctx->tx_xfer->tx_buffer_size > 0 )
    {
        *ctx->SPIxBUF = *((uint8_t *)ctx->tx_xfer->tx_buffer++);
        ctx->tx_xfer->tx_buffer_size--;
    }
    
    /* Clear the transmit interrupt flag */
    *ctx->IFSxCLR = ctx->TX_INT_MASK;

    portEND_SWITCHING_ISR( xHigherPriorityTaskWoken );
}

void SPIMasterx_RX_InterruptHandler( spi_master_common_ctx_t *ctx )
{
    uint8_t rx_data;
    
    if ( ctx->rx_xfer == NULL )
    {
        ctx->rx_xfer = ctx->tx_xfer;
    }
    
    if ( ctx->rx_xfer != NULL )
    {
        /* Check if the receive buffer is empty or not */
        while ( (bool)(*ctx->SPIxSTAT & ctx->_SPIxSTAT_SPIRBE_MASK) == false )
        {
            /* Receive buffer is not empty. Read the received data. */
            rx_data = *ctx->SPIxBUF;
            
            if ( ctx->rx_xfer->rx_buffer_size > 0 )
            {
                if ( ctx->rx_xfer->rx_buffer != NULL )
                    *((uint8_t *)ctx->rx_xfer->rx_buffer++) = rx_data;
                
                ctx->rx_xfer->rx_buffer_size--;
            }
            
            if ( ctx->rx_xfer->tx_buffer_size == 0 &&
                 ctx->rx_xfer->rx_buffer_size == 0 )
            {
                /* Transfer complete */
                ctx->rx_xfer->done = true;
                ctx->rx_xfer = NULL;
                
                /* Enable the tx interrupt to fetch the next transfer */
                *ctx->IECxSET = ctx->TX_INT_MASK;
                break;
            }
        }
    
    }
    else
    {
        while ( (bool)(*ctx->SPIxSTAT & ctx->_SPIxSTAT_SPIRBE_MASK) == false )
        {
            /* Discard data when no RX is required */
            rx_data = *ctx->SPIxBUF;
            (void)rx_data;
        }
    }

    /* Clear SPIx RX Interrupt flag */
    /* This flag should cleared only after reading buffer */
    *ctx->IFSxCLR = ctx->RX_INT_MASK;
}

In the above code, you can assument that ctx is a static variable containing all the required register addresses for that specific SPI peripheral. I use the same SPI driver on 5 different SPI interfaces, but only one (which happens to need an FPU context) is making the system crash.

Why? I do not think that the assumption is necessarily correct. You may want to add an assert here. Also, DO check the return value of every API call including xQueueReceiveFromISR().

Because the queue is checked to be non-empty right right before, the xQueueReceiveFromISR() is guaranteed to be successful. I was actually checking the return value of the API before instead of checking for the array to be non-empty, but I refactored as a performance improvement after looking at the queue source code.

Besides, as I mentionned before the same SPI driver is used by multiple tasks, but only when the priority level is raised does it cause corruption.

Why is that? xQueueReceiveFromISR() is non blocking, why is it guaranteed that there is always something deposited into it?

I do understand that there appears to be a correlation with the priorities, but that may be coincidental.

First of all thank you very much for helping me out here.

xQueueReceiveFromISR() is non blocking, why is it guaranteed that there is always something deposited into it?

It is guaranteed to return pdPASS because it is only called after checking the return value of xQueueIsQueueEmptyFromISR AND because this is the only place where the queue elements are read.

This is guaranteed, at least in the current version of FreeRTOS, because the presence of elements in the queue is the only condition that can change the return value. Here’s the code from queue.c

BaseType_t xQueueReceiveFromISR( QueueHandle_t xQueue,
                                 void * const pvBuffer,
                                 BaseType_t * const pxHigherPriorityTaskWoken )
{
    BaseType_t xReturn;
    UBaseType_t uxSavedInterruptStatus;
    Queue_t * const pxQueue = xQueue;

    configASSERT( pxQueue );
    configASSERT( !( ( pvBuffer == NULL ) && ( pxQueue->uxItemSize != ( UBaseType_t ) 0U ) ) );

    /* RTOS ports that support interrupt nesting have the concept of a maximum
     * system call (or maximum API call) interrupt priority.  Interrupts that are
     * above the maximum system call priority are kept permanently enabled, even
     * when the RTOS kernel is in a critical section, but cannot make any calls to
     * FreeRTOS API functions.  If configASSERT() is defined in FreeRTOSConfig.h
     * then portASSERT_IF_INTERRUPT_PRIORITY_INVALID() will result in an assertion
     * failure if a FreeRTOS API function is called from an interrupt that has been
     * assigned a priority above the configured maximum system call priority.
     * Only FreeRTOS functions that end in FromISR can be called from interrupts
     * that have been assigned a priority at or (logically) below the maximum
     * system call interrupt priority.  FreeRTOS maintains a separate interrupt
     * safe API to ensure interrupt entry is as fast and as simple as possible.
     * More information (albeit Cortex-M specific) is provided on the following
     * link: https://www.FreeRTOS.org/RTOS-Cortex-M3-M4.html */
    portASSERT_IF_INTERRUPT_PRIORITY_INVALID();

    uxSavedInterruptStatus = portSET_INTERRUPT_MASK_FROM_ISR();
    {
        const UBaseType_t uxMessagesWaiting = pxQueue->uxMessagesWaiting;

        /* Cannot block in an ISR, so check there is data available. */
        if( uxMessagesWaiting > ( UBaseType_t ) 0 )
        {
            const int8_t cRxLock = pxQueue->cRxLock;

            traceQUEUE_RECEIVE_FROM_ISR( pxQueue );

            prvCopyDataFromQueue( pxQueue, pvBuffer );
            pxQueue->uxMessagesWaiting = uxMessagesWaiting - ( UBaseType_t ) 1;

            /* If the queue is locked the event list will not be modified.
             * Instead update the lock count so the task that unlocks the queue
             * will know that an ISR has removed data while the queue was
             * locked. */
            if( cRxLock == queueUNLOCKED )
            {
                if( listLIST_IS_EMPTY( &( pxQueue->xTasksWaitingToSend ) ) == pdFALSE )
                {
                    if( xTaskRemoveFromEventList( &( pxQueue->xTasksWaitingToSend ) ) != pdFALSE )
                    {
                        /* The task waiting has a higher priority than us so
                         * force a context switch. */
                        if( pxHigherPriorityTaskWoken != NULL )
                        {
                            *pxHigherPriorityTaskWoken = pdTRUE;
                        }
                        else
                        {
                            mtCOVERAGE_TEST_MARKER();
                        }
                    }
                    else
                    {
                        mtCOVERAGE_TEST_MARKER();
                    }
                }
                else
                {
                    mtCOVERAGE_TEST_MARKER();
                }
            }
            else
            {
                /* Increment the lock count so the task that unlocks the queue
                 * knows that data was removed while it was locked. */
                configASSERT( cRxLock != queueINT8_MAX );

                pxQueue->cRxLock = ( int8_t ) ( cRxLock + 1 );
            }

            xReturn = pdPASS;
        }
        else
        {
            xReturn = pdFAIL;
            traceQUEUE_RECEIVE_FROM_ISR_FAILED( pxQueue );
        }
    }
    portCLEAR_INTERRUPT_MASK_FROM_ISR( uxSavedInterruptStatus );

    return xReturn;
}

As we can see from the code above, the interrupts are disabled before the check for an empty queue which is a sizeable performance hit on the PIC32. In contrast, in the function xQueueIsQueueEmptyFromISR the interrupts are not disabled which makes the check very fast.

This indicates towards a stack oveflow. What are your stack sizes?

That will depend on the outgoing and incoming task. .Do you have other tasks in your application?

The stack sizes are 590 words for the task, and 400 words for the ISR. I tried increasing the stack sizes to 2000+ but it made no difference.

I normally do have multiple tasks, but in order to find the root cause of this issue, I disabled all of them except one, and I still see the issue.

Oh I see, sorry for not seeing that. As a side note, your call to xQueueIsEmptyFromISR() appears to be redundant in that scenario, you culd use the return of the receive to cover all cases of your control flow.

That observation is just a side not and most likely not related to your priblem.

This is indeed what how my original implementation did things, but FreeRTOS changes the interrupt priorities before making the check in this function, which takes time.

Was the call stack at the data breakpoint hit was still the same? Can you check the stack pointer register value when the data breakpoint hits and see if it seems out of range for the task?

Sorry, but that does not appear to make sense. Are you saying that the subsequent call to xQueueReceiveFromISR() will behave differently depending on a previous call to xQueueIsEmptyFromISR()? That sounds extremly odd and even if that was the case, I would not write code that relies on interdependencies like that as the implementation of any API call may change at any time.

If that is not the case, all your check does (as I am sure you are aware of) is burn cycles in your ISR…

No this is not what I am saying. Because of interrupt nesting support, the interrupts are never truly off. There is complicated code in xQueueReceiveFromISR function that disable interrupts globally, checks for interrupt priorities, then re-enable those above configMAX_SYSCALL_INTERRUPT_PRIORITY at the beginning, then does the opposite operation at the end of the function.

Since the check for data is done inside this “critical section” block, it slows down the check. xQueueIsEmptyFromISR simply makes the check and returns immediately.

well, yes, but the same code path that xQueueReceiveFromISR() is taken regardless of whether you do the empty check before or not, right? So what exactly do you gain by doing the empty check before the receive?..

I gain ~300ns each time the queue is empty. Since this code has to run at the SPI bus speed, even at only 1MHz, this would waste 5% of the time I have to process the interrupt.

And the best part about this is, because the PIC32MZ has cache, it only slows down the interrupt by ~100ns when the queue contains data. But in that case, the timing is no longer critical since this happens in between consecutive transfers, rather than in the middle of a transfer.

I checked the TCB of the calling task. I increased the size of the stack to eliminate overflow as a possible cause.

Stack size = 2190 words = 8760 bytes

pxCurrentTCB->pxStack = 0x80013488
End of stack = 0x80013488 + 8760 = 0x800156C0

When I hit the data breakpoint, the stack pointer register shows 0x80018E08, which is outside this range, it is actually part of the xISRStack.

The actual address where memory gets corrupted is 0x800155B0 (and 0x800155B4).

Is it possible that the ISR stack is overflowing? Can you trying increasing the size of the ISR stack?