Incorrect pxReadyTasksLists state causes fault

I have a M4 MCU (STM32F476) and I’m using FreeRTOS V10.2.1, and somehow I have the perfect set of circumstances which causes a usage fault. Unfortunately this is extremely timing sensitive, in almost all cases, adding breakpoints or changing the firmware prevents the issue from happening, so I can only debug post-mortem.

Digging in, the fault occurs when the ldmia assembly instruction is called in xPortPendSVHandler (The moment the core registers are popped during a context switch, after pxCurrentTCB is updated via the vTaskSwitchContext call).

During this fault, the value of pxCurrentTCB is not sensible (0x4, which definitely would cause the usage fault). When I inspected the task lists, I noticed something wrong in the highest priority in the pxReadyTasksLists. The uxNumberOfItems is 1, pxIndex is pointing to the xListEnd item, which is not usual, but the xListEnd item pxNext and pxPrevious is pointing to itself, which means the list should actually be empty. When the list end item is interpreted as a normal item, the resulting pvOwner value is 0x4.

There’s no evidence of stack overflows that could explain the corrupted list. Overflow checking is enabled, and I can also inspect the stacks post-mortem, which do not show evidence of overflows.

Are there other possibles ways for the list to get corrupted? Assuming it’s not some random FW accidentally writing in a very specific spot in the tasks list under specific circumstances.

Possibly related, both uxCriticalNesting and uxSchedulerSuspended are 0 at the time of crash, which I think is what would be expected. I also checked my configuration:

  • No interrupts have a higher priority than configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY
  • I did notice some tasks are created with prioirty higher than configMAX_PRIORITIES-1, but looking at kernel code, I don’t think this is a problem, the kernel will limit the priority.

Thanks for the help!

Hi Sebastien,

Can you share your FreeRTOSConfig.h?

Is this a custom or a standard port? In a custom port, typical pitfalls to look out for are faulty implementations of the citical section calls or misconfigured sys tick timer and port service interrupt priorities.

Other than that, experience has shown that in sporadic time sensitive fault conditions, there is little merit in looking too closely at the system list state at fault time as the root problem originated many many cycles before the fault and can literally cause anything to break, depending on the memory layout in effect at the time of the root problem. You may want to go through the usual list of suspects.

On the Cortex ARM, it is also possible to set DWT data breakpoints. You can configure one that hits when pxCurrentTCB is assigned the value of 4, that gets you much closer to the root cause.

@jefftenney As a new forum user, I can’t upload the FreeRTOSConfig.h file, so I copied it below. @RAc it’s a standard ARM CM4F port. I’ll look into DWT, that’s a great suggestion.

#ifndef FREERTOS_CONFIG_H
#define FREERTOS_CONFIG_H


#define configUSE_PREEMPTION                    1
#define configUSE_PORT_OPTIMISED_TASK_SELECTION 1
#define configUSE_TICKLESS_IDLE                 0
#define configCPU_CLOCK_HZ                      180000000
#define configTICK_RATE_HZ                      1000
#define configMAX_PRIORITIES                    5
#define configMINIMAL_STACK_SIZE                128
#define configMAX_TASK_NAME_LEN                 16
#define configUSE_16_BIT_TICKS                  0
#define configIDLE_SHOULD_YIELD                 1
#define configUSE_TASK_NOTIFICATIONS            1
#define configUSE_MUTEXES                       1
#define configUSE_RECURSIVE_MUTEXES             1
#define configUSE_COUNTING_SEMAPHORES           0
#define configUSE_ALTERNATIVE_API               0 /* Deprecated! */
#define configQUEUE_REGISTRY_SIZE               10
#define configUSE_QUEUE_SETS                    0
#define configUSE_TIME_SLICING                  1
#define configUSE_NEWLIB_REENTRANT              0
#define configENABLE_BACKWARD_COMPATIBILITY     0
#define configNUM_THREAD_LOCAL_STORAGE_POINTERS 5
#define configSTACK_DEPTH_TYPE                  uint16_t
#define configMESSAGE_BUFFER_LENGTH_TYPE        size_t
#define configUSE_LIST_DATA_INTEGRITY_CHECK_BYTES 0



/* Memory allocation related definitions. */
#define configSUPPORT_STATIC_ALLOCATION         1
#define configSUPPORT_DYNAMIC_ALLOCATION        0
//#define configTOTAL_HEAP_SIZE                   10240
//#define configAPPLICATION_ALLOCATED_HEAP        1



/* Hook function related definitions. */
#define configUSE_IDLE_HOOK                     1
#define configUSE_TICK_HOOK                     1
#define configCHECK_FOR_STACK_OVERFLOW          1
#define configUSE_MALLOC_FAILED_HOOK            0
#define configUSE_DAEMON_TASK_STARTUP_HOOK      0



/* Run time and task stats gathering related definitions. */
#define configGENERATE_RUN_TIME_STATS           1
#define configUSE_TRACE_FACILITY                1
#define configUSE_STATS_FORMATTING_FUNCTIONS    0



/* Co-routine related definitions. */
#define configUSE_CO_ROUTINES                   0
#define configMAX_CO_ROUTINE_PRIORITIES         1



/* Software timer related definitions. */
#define configUSE_TIMERS                        1
#define configTIMER_TASK_PRIORITY               3
#define configTIMER_QUEUE_LENGTH                10
#define configTIMER_TASK_STACK_DEPTH            configMINIMAL_STACK_SIZE



/* Cortex-M specific definitions. */
#ifdef __NVIC_PRIO_BITS
/* __BVIC_PRIO_BITS will be specified when CMSIS is being used. */
#define configPRIO_BITS       		__NVIC_PRIO_BITS
#else
#define configPRIO_BITS       		4        /* 15 priority levels */
#endif

/* The lowest interrupt priority that can be used in a call to a "set priority"
function. */
#define configLIBRARY_LOWEST_INTERRUPT_PRIORITY			0xf

/* The highest interrupt priority that can be used by any interrupt service
routine that makes calls to interrupt safe FreeRTOS API functions.  DO NOT CALL
INTERRUPT SAFE FREERTOS API FUNCTIONS FROM ANY INTERRUPT THAT HAS A HIGHER
PRIORITY THAN THIS! (higher priorities are lower numeric values. */
#define configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY	5

/* Interrupt priorities used by the kernel port layer itself.  These are generic
to all Cortex-M ports, and do not rely on any particular library functions. */
#define configKERNEL_INTERRUPT_PRIORITY 		( configLIBRARY_LOWEST_INTERRUPT_PRIORITY << (8 - configPRIO_BITS) )
/* !!!! configMAX_SYSCALL_INTERRUPT_PRIORITY must not be set to zero !!!!
See http://www.FreeRTOS.org/RTOS-Cortex-M3-M4.html. */
#define configMAX_SYSCALL_INTERRUPT_PRIORITY 	( configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY << (8 - configPRIO_BITS) )




/* Define to trap errors during development. */
//#define configASSERT( ( x ) ) if( ( x ) == 0 ) vAssertCalled( __FILE__, __LINE__ )



/* FreeRTOS MPU specific definitions. */
#define configINCLUDE_APPLICATION_DEFINED_PRIVILEGED_FUNCTIONS 0



/* Optional functions - most linkers will remove unused functions anyway. */
#define INCLUDE_vTaskPrioritySet                1
#define INCLUDE_uxTaskPriorityGet               1
#define INCLUDE_vTaskDelete                     1
#define INCLUDE_vTaskSuspend                    1
#define INCLUDE_xResumeFromISR                  1
#define INCLUDE_vTaskDelayUntil                 1
#define INCLUDE_vTaskDelay                      1
#define INCLUDE_xTaskGetSchedulerState          1
#define INCLUDE_xTaskGetCurrentTaskHandle       1
#define INCLUDE_uxTaskGetStackHighWaterMark     0
#define INCLUDE_xTaskGetIdleTaskHandle          0
#define INCLUDE_eTaskGetState                   0
#define INCLUDE_xEventGroupSetBitFromISR        1
#define INCLUDE_xTimerPendFunctionCall          1
#define INCLUDE_xTaskAbortDelay                 0
#define INCLUDE_xTaskGetHandle                  0
#define INCLUDE_xTaskResumeFromISR              1


#include "stopwatch.h"
#define portCONFIGURE_TIMER_FOR_RUN_TIME_STATS() while(0)
#define portGET_RUN_TIME_COUNTER_VALUE() StopwatchTicks()

#endif /* FREERTOS_CONFIG_H */

stopwatch is simply a peripheral timer used for high res run time counter. I also use it to measure elapsed times. __NVIC_PRIO_BITS is 4

I noticed that configASSERT() is not defined – the definition is commented out. If you use a very basic definition (like the one below), it might help you catch unexpected violations of configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY. That’s a common cause of corruption in kernel data structures. I realize you’ve checked for it already, but sometimes one will slip through – especially if you’re using CubeMX.

/* Normal assert() semantics without relying on the provision of an assert.h
header file. */
#define configASSERT( x ) if ((x) == 0) {taskDISABLE_INTERRUPTS(); for( ;; );}

Bingo! You were right, there was an interrupt priority missed. And this is related to recently added code (when this started happening). Looks like if I only change the priority, I don’t have the issue anymore. Thank you very much!

1 Like