EFR32 - TCB is corrupted after Context Switch

martin1 wrote on Friday, June 07, 2019:

Hello everyone,

I run FreeRTOS v10.2.0 on a Silicon Labs EFR32MG12 MCU and I use the port GCC / Cortex-M3.

My application reads measurement data from an IMU (inertial measurement unit) and outputs the data via UART. I have two tasks, an IMU task and a UART task, which run asynchronically. Both tasks communicate via a queue.

First the IMU task becomes active and then the UART task. After the context switch, the TCB of the UART task appears to be corrupted (I check the pxCurrentTCB in tasks.c). The task name is partially overwritten by 0xA5’s and the TCB element pxTopOfStack points to 0xA5A5A5A5 which is outside of the RAM area. I mention the value 0xA5 because it is also used to initialize the stack. The TCBs of both tasks are fine after creation, i.e. they have correct task names with the trailing characters being \0’ed out and the stack pointers are correct.

Things I have tried so far:

  • Create only one of the tasks to rule out any errors of the IMU driver or the UART driver. Result: everything works fine, no TCB corruption.

  • Create a simple multi-tasking demo for my hardware to rule out any configuration error. Result: everything works fine, no TCB corruption.

Once I integrate both tasks on my target, I get the behavior described above. Any ideas what might cause the TCB corruption? I do not use any FreeRTOS calls inside of an ISR.

Best regards,
Martin

rtel wrote on Friday, June 07, 2019:

First off - have a look through https://www.freertos.org/FAQHelp.html
paying particular attention to:

  1. Having configASSERT() defined as…
  2. The most common cause of corruption is invalid interrupt priorities,
    a lot of which will be caught by asserts.
  3. Having stack overflow detection turned on.

If you still have issues after reading through the above then post again.

martin1 wrote on Tuesday, June 11, 2019:

Thank you for your reply.

I have read and checked the points above. My configuration is based on the FreeRTOS demo CORTEX_EFM32_Giant_Gecko_Simplicity_Studio. I have only changed the CPU frequency and enabled the function uxTaskGetStackHighWaterMark(). Otherwise, my configuration is identical to the demo configuration.

  1. configASSERT() is defined as follows:
#define configASSERT( x )	if( ( x ) == 0 ) { taskDISABLE_INTERRUPTS(); for( ;; ); }
  1. The configuration of the interrupt priorities is defined as follows:
#define configKERNEL_INTERRUPT_PRIORITY		 ( configLIBRARY_LOWEST_INTERRUPT_PRIORITY << (8 - configPRIO_BITS) )
#define configLIBRARY_LOWEST_INTERRUPT_PRIORITY			0x07
#define configPRIO_BITS		       __NVIC_PRIO_BITS // = 0x03

From what I understand, this should only be relevant if calls to FreeRTOS functions are made from an ISR. However, this is not the case. I do not call any FreeRTOS function from an ISR.

  1. I have stack overflow detection enabled but it does not work properly due to the TCB corruption. The TCB is already corrupted after the first context switch and at that time the pointer pxTCB->pxStack is zero (in file tasks.c, function uxTaskGetStackHighWaterMark(), line 3807). The TCB is OK after task creation.

As stated above, the same configuration works on other projects without any problems. Assuming the interrupt configuration is correct, are there any other pitfalls that can cause this behavior? Or ideas how to find the root cause of this problem?

martin1 wrote on Thursday, June 13, 2019:

I have (likely) resolved the issue and it has to do with floating point operations. Here is a similar post which lead to me on the right track:

https://www.freertos.org/FreeRTOS_Support_Forum_Archive/April_2017/freertos_Float_and_double_cause_hardfault_handler_on_STM32F417_6f77a241j.html

The third-party driver library which I use for the IMU sensor is using floating point operations. My MCU (= ARM Cortex-M4) does not have an FPU which is why I use the ARM Cortex-M3 port. This should be fine. However, I noticed that the option “Enable Hardware Floating Point (-mfpu=)” was active in my IDE and this adds the compiler flag “-mfloat-abi=softfp”. I think this option causes the OS to store the FPU registers which are non-existent and so my program ends up in a fault handler. This explains why the error always surfaced during a context switch and also why the error did not show up in a bare metal program (i.e. a non-RTOS program). Sorry for the confusion.

rtel wrote on Friday, June 14, 2019:

Appreciate you taking the time to report back.