ARM Cortex M7 fault exception and stack corruption


I am working on a C++ project using a Microchip SAMv71 (Cortex M7) and I have issues with some stack corruption when I use float.
During my development I encountered INVSTATE or IACCVIOL exceptions, and found that come from stack corruption as the return address was set to 0xA5A5A5A4 (the default stack pattern initialisation).
I identify the task that triggered the exception and it was using floating point calculation. For test, I implemented one of the math task found in the demo application which use float computation. I put it at high priority task, and each time it triggers the exception (no other tasks are running at this time).
The FPU is enabled at start-up, and the task code is running well on main context (either before start scheduler or during tick hook).

I share you the registers value when it entering in the memory fault exception, the task handle structure values and the stack content in memory.

R0      0x00000001
R1      0x20402144
R2      0x10000000
R3      0xA5A5A5A5
R4      0xA5A5A5A5
R5      0xA5A5A5A5
R6      0x00400000
R7      0x00000064
R8      0xA5A5A5A5
R9      0xA5A5A5A5
R10     0xA5A5A5A5
R11     0xA5A5A5A5
R12     0x00400000
SP      0x20404E18
PC      0x00400308
xPSR    0x61000004
MSP     0x20404E18
PSP     0x2040A9E0
Stack memory
2040A860    00000208  FFFFFFFC  A5A5A5A5  A5A5A5A5
2040A870    A5A5A5A5  A5A5A5A5  A5A5A5A5  A5A5A5A5
2040A880    A5A5A5A5  A5A5A5A5  A5A5A5A5  A5A5A5A5
2040A890    A5A5A5A5  A5A5A5A5  A5A5A5A5  A5A5A5A5
2040A8A0    A5A5A5A5  A5A5A5A5  A5A5A5A5  A5A5A5A5
2040A8B0    A5A5A5A5  A5A5A5A5  A5A5A5A5  A5A5A5A5
2040A8C0    A5A5A5A5  A5A5A5A5  A5A5A5A5  A5A5A5A5
2040A8D0    A5A5A5A5  A5A5A5A5  A5A5A5A5  A5A5A5A5
2040A8E0    A5A5A5A5  A5A5A5A5  A5A5A5A5  A5A5A5A5
2040A8F0    A5A5A5A5  A5A5A5A5  A5A5A5A5  A5A5A5A5
2040A900    A5A5A5A5  A5A5A5A5  A5A5A5A5  A5A5A5A5
2040A910    A5A5A5A5  A5A5A5A5  A5A5A5A5  A5A5A5A5
2040A920    A5A5A5A5  A5A5A5A5  A5A5A5A5  A5A5A5A5
2040A930    A5A5A5A5  A5A5A5A5  A5A5A5A5  A5A5A5A5
2040A940    A5A5A5A5  A5A5A5A5  A5A5A5A5  A5A5A5A5
2040A950    A5A5A5A5  A5A5A5A5  A5A5A5A5  A5A5A5A5
2040A960    A5A5A5A5  A5A5A5A5  A5A5A5A5  A5A5A5A5
2040A970    A5A5A5A5  A5A5A5A5  A5A5A5A5  A5A5A5A5
2040A980    A5A5A5A5  A5A5A5A5  A5A5A5A5  A5A5A5A5
2040A990    A5A5A5A5  A5A5A5A5  A5A5A5A5  A5A5A5A5
2040A9A0    A5A5A5A5  A5A5A5A5  A5A5A5A5  00000001
2040A9B0    2040AA4C  00400000  00000064  A5A5A5A5
2040A9C0    A5A5A5A5  A5A5A5A5  A5A5A5A5  004003B9
2040A9D0    00000000  20402144  10000000  E000E000
2040A9E0    00000001  20402144  10000000  A5A5A5A5
2040A9F0    00400000  004039C5  A5A5A5A4  61000000
2040AA00    A5A5A5A5  A5A5A5A5  A5A5A5A5  A5A5A5A5
2040AA10    A5A5A5A5  A5A5A5A5  A5A5A5A5  A5A5A5A5
2040AA20    00000064  00000000  00403899  00000001
2040AA30    2040AA4C  00403A8B  004039F9  20402120
2040AA40    20402134  0040D1E1  A5A5A5A5  00000064
2040AA50    A5A5A5A5  0040D25B  A5A5A5A5  00404511
stack handle = {
    .pxStack = 0x2040A868,
    .pxTopOfStack = 0x2040A9AC,
    .ucStaticallyAllocated =2,
    .uxBasePriority = 7,
    .uxPriority = 7,

I recently moved to static task creation but no big change on the behaviour. I am using newlib implementation and passed to heap 3 (instead of 4) to use the same heap for all the system.

I checked for stack overflow but it doesn’t seem to be that. I filled the heap with DEADBEEF pattern, check the stack memory before/after task creation, and after the exception. I saw that the PSP and pxTopOfStack is not up-to-date but I supposed that it come from the ISR branching.

I followed the ARM FreeRTOS guideline, I try the solutions from this forums with similar issues.
I am trying the reproduce my bugs on the evaluation board from the demo application in order to share more things with you. By the way, I took the demo code from recent Github hash, and the minimal stack size configured is not enough for the “Notified” task (stack overflow hook is triggered).

Here is my FreeRTOS configuration:

#define configPRIO_BITS     __NVIC_PRIO_BITS
#define configPRIO_BITS     3   // 7 priority levels.
#endif  // __NVIC_PRIO_BITS

#define configMAX_PRIORITIES                    8

#define configUSE_TICKLESS_IDLE                 0
#define configCPU_CLOCK_HZ                      300000000
#define configSYSTICK_CLOCK_HZ                  150000000
#define configTICK_RATE_HZ                      1000

#define configUSE_PREEMPTION                    1
#define configSTACK_DEPTH_TYPE                  uint32_t
#define configMINIMAL_STACK_SIZE                128
#define configMAX_TASK_NAME_LEN                 32
#define configUSE_16_BIT_TICKS                  0
#define configIDLE_SHOULD_YIELD                 1
#define configUSE_TASK_NOTIFICATIONS            1
#define configUSE_MUTEXES                       1
#define configUSE_RECURSIVE_MUTEXES             1
#define configUSE_COUNTING_SEMAPHORES           1
#define configUSE_ALTERNATIVE_API               0 
#define configQUEUE_REGISTRY_SIZE               10
#define configUSE_QUEUE_SETS                    1
#define configUSE_TIME_SLICING                  1
#define configUSE_NEWLIB_REENTRANT              1
#define configUSE_MINI_LIST_ITEM                1
#define configMESSAGE_BUFFER_LENGTH_TYPE        uint16_t

#define configSUPPORT_STATIC_ALLOCATION         1
#define configSUPPORT_DYNAMIC_ALLOCATION        1
#define configTOTAL_HEAP_SIZE                   32768
#define configHEAP_CLEAR_MEMORY_ON_FREE         1
#define configAPPLICATION_ALLOCATED_HEAP        1

#define configUSE_IDLE_HOOK                     1
#define configUSE_TICK_HOOK                     1
#define configCHECK_FOR_STACK_OVERFLOW          2
#define configUSE_MALLOC_FAILED_HOOK            1
#define configUSE_DAEMON_TASK_STARTUP_HOOK      0
#define configUSE_SB_COMPLETED_CALLBACK         0

#define configGENERATE_RUN_TIME_STATS           0
#define configUSE_TRACE_FACILITY                1

#define configUSE_CO_ROUTINES                   0
#define configMAX_CO_ROUTINE_PRIORITIES         1

#define configUSE_TIMERS                        1
#define configTIMER_TASK_PRIORITY               6
#define configTIMER_QUEUE_LENGTH                5
#define configTIMER_TASK_STACK_DEPTH            (2 * configMINIMAL_STACK_SIZE)

#define configKERNEL_INTERRUPT_PRIORITY         (\
    7 << (8 - configPRIO_BITS) \
    4 << (8 - configPRIO_BITS) \

#define configASSERT(x) if ((x) == 0) { taskDISABLE_INTERRUPTS(); for(;;); }

#define configTOTAL_MPU_REGIONS                                8
#define configTEX_S_C_B_FLASH                                  0x07ul
#define configTEX_S_C_B_SRAM                                   0x07ul
#define configENFORCE_SYSTEM_CALLS_FROM_KERNEL_ONLY            0 
#define configENABLE_ERRATA_837070_WORKAROUND                  0

#define secureconfigMAX_SECURE_CONTEXTS         1
#define INCLUDE_eTaskGetState                   1
#define INCLUDE_vTaskDelay                      1
#define INCLUDE_vTaskDelete                     1
#define INCLUDE_vTaskPrioritySet                1
#define INCLUDE_vTaskSuspend                    1
#define INCLUDE_uxTaskGetStackHighWaterMark     0
#define INCLUDE_uxTaskGetStackHighWaterMark2    0
#define INCLUDE_uxTaskPriorityGet               1
#define INCLUDE_xTaskAbortDelay                 1
#define INCLUDE_xTaskDelayUntil                 1
#define INCLUDE_xTaskGetHandle                  1
#define INCLUDE_xTaskGetCurrentTaskHandle       1
#define INCLUDE_xTaskGetIdleTaskHandle          1
#define INCLUDE_xTaskGetSchedulerState          1
#define INCLUDE_xTaskResumeFromISR              1
#define INCLUDE_xResumeFromISR                  1
#define INCLUDE_xEventGroupSetBitFromISR        1
#define INCLUDE_xQueueGetMutexHolder            1
#define INCLUDE_xSemaphoreGetMutexHolder        1
#define INCLUDE_xTimerPendFunctionCall          1

And the task code:

float d1_ = 123.4567;
float d2_ = 2345.6789;
float d3_ = -918.222;
const float expected = (d1_ + d2_) * d3_;
while (1)
    float d1 = 123.4567;
    float d2 = 2345.6789;
    float d3 = -918.222;
    float d4 = (d1 + d2) * d3;
    if (fabs(d4 - expected) > 0.001)
        // Set LED
        // Clear LED

Did you try increasing the stack size of the task? Can you share code where you create your task?

Which demo application and evaluation board are you using?

Yes, I already tried to increase the minimal stack size but I don’t see any improvement. Also, the stack memory space is still filled with 0xA5, so it does seem to be an overflow.

I used “CORTEX_M7_SAMV71_Xplained_AtmelStudio” demo on an ATSAMV71-XULT board. For the demo test, I am on the main branch (8f3233e0), but my own application is based on kernel 10.5.1 with the CMake modification for target configuration (215a5418).

Can you follow these instructions to find out the faulting instruction - Debugging and diagnosing hard faults on ARM Cortex-M CPUs?

I sanitize my project, do you have a way to share some code less publicly ?

I got the stack context on exception:

r0  0x00000001
r1  0x204005cc
r2  0x10000000
r3  0x00000000
r12 0x00400000
lr  0x00401f09
ret 0x00000000
psr 0x60000000

LR register point to a scheduler code xTaskResumeAll when it exit critical section.

  401eda:       f3bf 8f4f       dsb     sy
  401ede:       f3bf 8f6f       isb     sy
  401ee2:       2401            movs    r4, #1
  401ee4:       e00e            b.n     401f04 <xTaskResumeAll+0x128>
  401ee6:       3c01            subs    r4, #1
  401ee8:       d007            beq.n   401efa <xTaskResumeAll+0x11e>
  401eea:       4b12            ldr     r3, [pc, #72]   @ (401f34 <xTaskResumeAll+0x158>)
  401eec:       4798            blx     r3
  401eee:       2800            cmp     r0, #0
  401ef0:       d0f9            beq.n   401ee6 <xTaskResumeAll+0x10a>
  401ef2:       4b0d            ldr     r3, [pc, #52]   @ (401f28 <xTaskResumeAll+0x14c>)
  401ef4:       2201            movs    r2, #1
  401ef6:       601a            str     r2, [r3, #0]
  401ef8:       e7f5            b.n     401ee6 <xTaskResumeAll+0x10a>
  401efa:       4b0d            ldr     r3, [pc, #52]   @ (401f30 <xTaskResumeAll+0x154>)
  401efc:       2200            movs    r2, #0
  401efe:       601a            str     r2, [r3, #0]
  401f00:       e7e2            b.n     401ec8 <xTaskResumeAll+0xec>
  401f02:       2400            movs    r4, #0
  401f04:       4b0c            ldr     r3, [pc, #48]   @ (401f38 <xTaskResumeAll+0x15c>)
**401f06:       4798            blx     r3
  401f08:       4620            mov     r0, r4
  401f0a:       bd38            pop     {r3, r4, r5, pc}
  401f0c:       20400c98        .word   0x20400c98  (uxSchedulerSuspended)
  401f10:       00402b4d        .word   0x00402b4d  (vPortEnterCritical)
  401f14:       20400cc0        .word   0x20400cc0  (uxCurrentNumberOfTasks)
  401f18:       20400cf0        .word   0x20400cf0  (xPendingReadyList)
  401f1c:       20400cb8        .word   0x20400cb8  (uxTopReadyPriority)
  401f20:       20400d34        .word   0x20400d34  (pxReadyTasksLists)
  401f24:       20400dd4        .word   0x20400dd4  (pxCurrentTCB)
  401f28:       20400cac        .word   0x20400cac  (xYieldPending)
  401f2c:       004014a1        .word   0x004014a1  (prvResetNextTaskUnblockTime)
  401f30:       20400cb0        .word   0x20400cb0  (xPendedTicks)
  401f34:       00401c4d        .word   0x00401c4d  (xTaskIncrementTick)
  401f38:       00402b99        .word   0x00402b99  (vPortExitCritical)

R3 register is NULL so the jump failed. The register is not well save or restore ?

This tells us that the PC somehow became NULL when the code was in vPortExitCritical (because LR is updated). Share the assembly of vPortExitCritical.

Also can you analyze the fault registers to find out which fault is this -

Exception: UsageFault INVSTATE

00402b98 <vPortExitCritical>:
  402b98:       4b0a            ldr     r3, [pc, #40]   @ (402bc4 <vPortExitCritical+0x2c>)
  402b9a:       681b            ldr     r3, [r3, #0]
  402b9c:       b953            cbnz    r3, 402bb4 <vPortExitCritical+0x1c>
  402b9e:       f04f 0380       mov.w   r3, #128        @ 0x80
  402ba2:       b672            cpsid   i
  402ba4:       f383 8811       msr     BASEPRI, r3
  402ba8:       f3bf 8f6f       isb     sy
  402bac:       f3bf 8f4f       dsb     sy
  402bb0:       b662            cpsie   i
  402bb2:       e7fe            b.n     402bb2 <vPortExitCritical+0x1a>
  402bb4:       3b01            subs    r3, #1
  402bb6:       4a03            ldr     r2, [pc, #12]   @ (402bc4 <vPortExitCritical+0x2c>)
  402bb8:       6013            str     r3, [r2, #0]
  402bba:       b90b            cbnz    r3, 402bc0 <vPortExitCritical+0x28>
  402bbc:       f383 8811       msr     BASEPRI, r3
  402bc0:       4770            bx      lr
  402bc2:       bf00            nop
  402bc4:       204000f0        .word   0x204000f0  (uxCriticalNesting)

Nothing looks wrong here. Please share your project.

I haven’t the right to share attachment or link here (new user). How can I share with you ?

You should be able to share now.

Thank, here is My Project

This probably has nothing to do with that problem you are having; however, we use the SAME70 and SAMV71 with FreeRTOS too and there are a few things worth noting:

  • The Arm Cortex M7 core revision used by these chips is r1p1 (assuming you are using the B versions of these chips) which suffers from erratum 1259864. This means that you can’t safely use write-through caching. It’s difficult to use DMA with write-back caching, so we work around this by allocating all DMA buffers in an area of RAM that we set to non-cacheable in the MMU. A minor complication is that processor accesses to non-cacheable RAM must be aligned. We had to replace a few functions from newlib (in particular memcpy) to ensure aligned access.
  • The examples from Microchip that use the Atmel Software Framework (ASF) won’t work reliably if you enable data memory caching, because the ASF doesn’t take account of the interaction between cache and DMA. This affects at least the HSMCI, CAN, USB and Ethernet drivers and associated code that declares buffers used by these drivers. I haven’t checked whether the Harmony code generators have the same issue.

I note that the difference between .pxStack and .pxTopOfStack in the data you supplied is quite small, so I suspect insufficient stack may be part of the problem. Bear in mind that saving the state when the FPU is in use uses an extra 132 bytes of stack.

HTH David

Thank David, I saw the erratum few days ago.

I tried to disable the cache and also set the MPU areas with “non cacheable” attribute but same issue.
I retry to increase the stack but nothing change. As you said, the FPU context take space, but as the stack memory space is still fill with 0xA5A5A5A5 the overflow don’t seem to happen.

I am unable to download that. Can you put that in GitHub and share a link to that?

Re the stack still being filled with the 0xA5 pattern, bear in mind that the CPU uses lazy stacking of FPU registers. This means that 132 bytes of stack will be reserved for the FPU registers, but unless the ISR uses the FPU then the registers will not be saved and the corresponding stack space will still contain the 0xA5 pattern.

With the help of Gaurav, we found the issue. It cames from the way I called the FreeRTOS handlers (PendSV, Systick, SVCall).

void SVCall_Handler(void)

void PendSV_Handler(void)

void SysTick_Handler(void)

The functions are not naked and the compiler add prologue before branching to the RTOS handler.
Adding the naked attribute was not enough to resolve the issue. So it required to use the classic method with #define inside the FreeRTOS configuration.

But due to project constraints I found an alternative solution and add an assembly command with a direct branch to the handler and it works. Here is the working code:

__attribute__((naked, section(".startup")))
void SVCall_Handler(void)
    asm volatile("b vPortSVCHandler");

__attribute__((naked, section(".startup")))
void PendSV_Handler(void)
    asm volatile("b xPortPendSVHandler");

__attribute__((naked, section(".startup")))
void SysTick_Handler(void)
    asm volatile("b xPortSysTickHandler");