Concurrency hardfaults on Cortex M33 (LPC55S69) with GNU Tools for Arm Embedded Processors 9-2019-q4-major

dnadler · October 22, 2020, 1:48pm

Hi Folks - I’m looking at colleague @victorcanoz’s issue with hardfaults on the latest GCC toolchain (FreeRTOS, GCC, newlib). He reduced it to a very simple stress-test that works fine on arm-none-eabi-gcc (GNU Tools for Arm Embedded Processors 8-2019-q3-update) 8.3.1 20190703 (release) [gcc-8-branch revision 273027] but crashes promptly on arm-none-eabi-gcc (GNU Tools for Arm Embedded Processors 9-2019-q4-major.

I’ve checked all the usual suspects: there is absolutely enough stack space (FreeRTOS tasks and ISR stack), all configuration settings look sane. I reproduced the problem on a LPC55S28-EVK (I don’t have the later processor here). Project attached. Crashes using MCUXpresso version 11.2 (but not earlier version 11.1).

I don’t see the problem running the stress test on a Kinetis K64F using the later toolchain and FreeRTOS 10.0.1 - It may be related to the M33 port of FreeRTOS, or using the latest GCC to build that port.

@richard-damon, @rtel - anyone able to take a look?
Looks like a real problem…

Thanks in advance,
Best Regards, Dave

PS: Some additional discussion here:

20201021b_failure_lpcxpresso55s28_freertos_generic.zip (798.9 KB)

rtel · October 22, 2020, 3:50pm

So there are five tasks all of which don’t do much, or even interact with each other…

void hello_task(void *pvParameters)
{
	int cnt = 0;
    for (;;)
    {
    	for (int j =0; j < 50; j++)
    	{
			char buf[100] = {0};

			//xSemaphoreTake(semaphore, portMAX_DELAY);
			snprintf(buf, 100, "%d" , cnt);
			//xSemaphoreGive(semaphore);
			cnt++;
    	}
    	vTaskDelay(1);
    }
}

… but they are making a call into NewLib (which I know you know a lot about, but I have to ask), given this quite from the NXP forum:

I confirm that the EXACT same project work flawlessly on McuXpresso V11.1.1, and crashes immediatly when compiled on McuXpresso V12.0.

can you confirm there are no differences in the implementation of snprintf() between the working and non working projects and that the function is configured to be thread safe?

–
Next, how does the crash manifest itself? Does it end up in a fault handler? Can you see where the crash occurs?

–
If the compiler version makes a difference, can you play with the command line a bit to see if there is any particular option that makes a difference. Currently the command line, excluding the include path options, is:

-std=gnu99 -DCPU_LPC55S28JBD100 -DCPU_LPC55S28JBD100_cm33 -DSERIAL_PORT_TYPE_UART=1 -DFSL_RTOS_FREE_RTOS -DSDK_DEBUGCONSOLE=1 -DCR_INTEGER_PRINTF -DPRINTF_FLOAT_ENABLE=0 -D__MCUXPRESSO -D__USE_CMSIS -DDEBUG -D__NEWLIB__ -I"C:\temp\delete\20201021b_failure_lpcxpresso55s28_freertos_generic\lpcxpresso55s28_freertos_generic\board"  -O0 -fno-common -g3 -Wall -mcpu=cortex-m33  -c  -ffunction-sections  -fdata-sections  -ffreestanding  -fno-builtin -fmerge-constants -fmacro-prefix-map="../$(@D)/"=. -mcpu=cortex-m33 -mfpu=fpv5-sp-d16 -mfloat-abi=hard -mthumb -D__NEWLIB__ -fstack-usage

…so maybe try without the -fstack-usage or without -fmerge-constants, etc.

victorcanoz · October 22, 2020, 5:27pm

Hello @rtel, @dnadler

We finally found the culprit of all the problems we’ve encountered : configENABLE_FPU is set to 0 in FreeRTOSConfig.h provided in the examples from NXP.

Whether we use the HARD FPU or the SOFT FPU, this flag enables FPU thread safety, you must definitely set it to 1 with Cortex M33, even if you don’t do obvious float maths.

We think the reason why the newest version of McuXpresso crashed with our example is that most probably the snprintf function uses float operations even if we pass integers to it.

Now we do not have any problem with any one of the McuXpresso versions since we changed this flag.

To the Freertos developers : if possible, it could be great to add a warning or an assert when this flag is set to 0. This can save potentially a lot of time to others teams looking for a very strange bug

Many thanks,
Victor

rtel · October 22, 2020, 5:44pm

Great you found the solution - and, like most things once you root cause them it makes perfect sense.

Yes, it is likely snprintf() is going to touch the flop registers, and yes that is likely to change between compiler versions.

To you request about a warning if the flag is not set. I’m wondering if we even need that flag in the Cortex-M33 port? Are there any M33 parts that don’t have a floating point unit?

hs2 · October 22, 2020, 7:49pm

That would be a serious compiler bug if it emits FPU instructions and/or makes use of FPU registers if it’s not enabled i.e. setting the corresponding compiler flag -mcpu=cortex-m33+nofpu (not sure about the DSP extension).
Did you disable FPU usage for compilation ?

Edit: I should add that it’s a common compiler optimization to make use of FPU/SIMD registers and instructions even in non-floating point math code until it’s not explicitly disabled.

brendonsNXP · October 23, 2020, 12:28am

Hi Victor,

FYI, I’ve asked our SDK developers to look into this issue. I am hoping we can resolve for our January release.

Regards,
Brendon
NXP Semiconductors Inc.

dnadler · October 23, 2020, 1:47pm

@brendonsNXP - NXP is still including heap4 (rather than heap_useNewlib) in the generic example I pulled for the LPC55S28. It would be great if you can fix this, and also ensure configUSE_NEWLIB_REENTRANT is set to 1.
Thanks,
Best Regards, Dave

brendonsNXP · November 11, 2020, 8:26pm

Sorry for the slow reply - thanks for the feedback. Unfortunately we cant do this (heap setting change) for our January release, but its been added to the list for review for the July release.

brendonsNXP · November 11, 2020, 8:29pm

FYI, the FPU issue has been fixed and will be in our SDK2.9 release in mid January 2021.