LPC4350 Cortex M4 Hard Fault at start of xPortPendSVHandler on "mrs r0, psp" instruction

gregd29 wrote on Thursday, February 15, 2018:

I have been using FreeRTOS (currently v9.0.0) on our LPC4350 application for several years. It is used on the M4 processor but not the M0. I have had three boards running tests in an evironmetal chamber for the past several days with no issues while doing real time clock calibration tests. When I checked on my tests this morning, I noticed that the M4 processor on all three boards had locked up. I connected each one to my J-Link, IAR EWARM and connected to running target to determine what was going on. All three of the boards were getting continuous hard fault exceptions at the following spot:

mrs r0, psp //<— Hard Fault Exception occurred here

The code which was running when this occurred was in the middle of a delay function where xTaskGetTickCount() was repeatidly called waiting for a timeout while calling taskYIELD to allow other tasks to run. From the call stack, it looks like taskYIELD was triggering the xPortPendSVHandler when the hard fault exception occurred. The IAR Fault Exception Viewer indicated that the fault was caused by an invalid instruction.

Looking in the ARMv7-M Architecture Reference Manual, it shows that the “mrs r0, psp” instruction should not cause any excpetions. I could see the temperature in the environmental chamber causing the issue if the board had a problem, but this issue occurred on three different boards. The temperature was left at a fixed 140F overnight during this test. These boards are 100% tested -40 to 185F with no issues normally so I am not convinced that temperature caused this issue.

I brought each board into my office and connected to the JLink and EWARM. Each board showed that it was stopped at the same point and my hard fault exception handler showed the same address in the 8 item hard fault info array that I keep. With each board, after single stepping through the code during the analysis, it seemed to recover and take off running normally. I suppose the mrs r0, psp the instruction could have been executing from cache with corruption in the instruction memory location. Maybe the cache was cleared as I ran in the debugger.

Any ideas on what could be going on here?

Greg Dunn

rtel wrote on Friday, February 16, 2018:

It would seem odd to fault on that instruction as it should work whatever value was in PSP. Are you executing rom RAM or Flash? If RAM, can you be sure the instruction has not been corrupted in RAM - so if you read the contents of RAM back it decodes to the correct mrs r0, psp instruction expected.

Alternatively, could it me an imprecise fault so the fault actually occurred on an earlier instruction? There is some code at the bottom of this https://www.freertos.org/Debugging-Hard-Faults-On-Cortex-M-Microcontrollers.html page that describes how to catch imprecise faults more easily, but the problem is it will also change the timing of your application so may just prevent the problem occurring in the first place – thwarting the attempt to debug it.

gregd29 wrote on Friday, February 16, 2018:

The code is running from external quad SPI flash. I agree that this is a strange place for a fault to occur. That particular instruction is not supposed to cause any exceptions. The error message indicated an invalid instruction so I would agree that it must be some type of corruption. The IAR IDE showed the proper intruction and machine code in the dissasembler. I assumed that it read the code from target memory but it may actually read from source code. I should have thought to open up a memory window and compare the bytes as you suggested. Unfortunately, I don’t have the board in that state right now. If I can repeat the problem again I will investigate further. I will review your link related to imprecise faults - thanks. I did step through the fault exception and it would return back to the same instruction and then fault again in a loop. My fault exception logs the fault address in a short array. Each element of the array was filled with the offending address when I first looked at it after stopping the debugger. I did notice after I started looking at different things it would misteriously recover and continue running normally however. That is why I was suspecting something related to cacheing of the code - maybe the dissassembler window just didn’t show the corrupted cached version.