Solution (?): FreeRTOS+PIC32MZEF-Harmony floating point crash

andrewgt wrote on Friday, February 08, 2019:

Recently I encountered a problem with a math and interrupts-intensive application running under FreeRTOS: it appears that at least one thread somehow crash the floating-point processing after some (normally few hours) operation, long double variables, both local and once-written globals, became corrupted on-the-fly.
Unfortunately, I have no time to produce a stripped barebone application to reproduce the situation for your reference, but the problem exist for sure.
The problem reveals mostly after 4-5 hours of (200 MHz) operation, however, increasing the external hardware baudrate sometimes helps to encounter the crash (unfortunately, not reboot ) in a few munites.

The debugging tools at hand (ICD4, RealIce) hardly helped to locate the error, in particular, the h/w debugger data access breakpoint for some reason don’t fire (it fires quite good in initialisation phase, but when someone writing the wrong value - no [BTW, it might be noone writes the wrong values, may be the stack pointer itself points the wrong way?, hardly - neighbouring variables where quite normal]). Stopping and inspecting the suspect variable after the problem reveals at debugging output (to debug I use RS-232 port with my own printf-like routines) confirms the wrong values:

x = y = 2.0;
z = x * y;
next statement;

was quite normal.

I should state that all threads:

  1. Are equipped with generous stack (increasing it+doubling the interrupt stack was the first attempt to cure).
  2. Call portTASK_USES_FLOATING_POINT(); as the first statement (I am afraid that at the end all threads, even not using FPU, do so).

In the meantime I upgraded the entire toolchain to the latest releases: MPLAB-X 5.1, FreeRTOS 10.1.1, XC32 2.15, - no help.
Manupulation with interrupt priorities (all h/w interrupts (UARTx, SPIx, CANx, ETH) where initialised to priotiy 1, subprio 1) - no help.

Suspecting a problem wth stack manupulation being done by FreeRTOS scheduler, I came to a solution which, at least, will bear no harm to anybody, but apparentely cures the issue:

I propose to cease using heap allocation for any thread stack, declare stack storage static and align to 8-byte boundary (ISR stack follows this in FreeRTOS source). It is accomplished in the following manner:

StaticTask_t UDPServTCB;
attribute ((aligned(8))) StackType_t UDPServSTK[DISIZE];

h = xTaskCreateStatic(UDPServer,“UDPS”,DISIZE,NULL,5,UDPServSTK,&UDPServTCB);
configASSERT(h);

Yours
Andy

rtel wrote on Friday, February 08, 2019:

Thanks for the information.

Checking the PIC32MZ port I notice that portBYTE_ALIGNMENT is already
set to 8 for the PIC32MZ port, and the alignment is checked with an
assert within the kernel code. Could your problem be something to do
with where in memory the stack is being placed? Allocating statically
would move the stack some way away from the FreeRTOS heap where perhaps
the characteristics of the memory is different, or perhaps something
that was clobbering the memory now just missing anything vital.