Intermittent CM4 FAULT! Bus Fault! in floating point app

OK, here is this morning’s fault:

CM4 FAULT!!

SCB->CFSR = 0x00008200
Bus Fault!
Fault address = 0x30303032
r0 = 0x00000000
r1 = 0xffffffff
r2 = 0x08045a18
r3 = 0x7fd7fbeb
r12 = 0x100aa23f
lr = 0x100aaf30
pc = 0x81010000
psr = 0x3e025e8c

(This does seem to be happening with some regularity on overnight runs. Maybe I need to print out the time. )

The call stack, such as it is according to GDB:

0 Cy_SysLib_ProcessingFault() main_cm4.c 401 0x10088B5A (All) 
1 Cy_SysLib_FaultHandler(const uint32_t * faultStackAddr = <optimized out>) Generated_Source\PSoC6\pdl\drivers/peripheral/syslib/cy_syslib.c 444 0x10082DB8 (All) 
2 UsageFault_Handler() gcc/startup_psoc6_01_cm4.S 455 0x1008034E (All) 
3 <signal handler called>() ?????? ?????? 0xFFFFFFED (All) 
4 prvPortStartFirstTask() ..\FreeRTOS\FreeRTOS\Source\portable\GCC\ARM_CM4F/port.c 267 0x100869E4 (All) 
5 xPortStartScheduler() ..\FreeRTOS\FreeRTOS\Source\portable\GCC\ARM_CM4F/port.c 379 0x1008B24E (All) 

Registers in Cy_SysLib_ProcessingFault: r0=,0x00000000,r1=,0x90000000,r2=,0x00000000,r3=,0x08026B18,r4=,0x080278A8,r5=,0x08026B20,r6=,0x00000001,r7=,0x080477CC, r8=,0x6E85E9E1,r9=,0xBFEBB995,r10=,0x00000001,r11=,0x00000000,r12=,0xFFFFFFFF,sp=,0x080477CC,lr=,0x1009F433,pc=,0x10088B5A, xpsr=,0x01010005,msp=,0x080477CC,psp=,0x0803F618,

sp=,0x080477CC, is a location in SRAM:

|0x0804776c|00|00|00|00|b0|f8|03|08|........|
|---|---|---|---|---|---|---|---|---|---|
|0x08047774|df|df|df|da|00|00|00|80|........|
|0x0804777c|d1|4b|c0|3f|00|00|00|80|.K.?....|
|0x08047784|d1|4b|c0|3f|a8|78|02|08|.K.?.x..|
|0x0804778c|20|6b|02|08|01|00|00|00| k......|
|0x08047794|cc|77|04|08|e1|e9|85|6e|.w.....n|
|0x0804779c|95|b9|eb|bf|01|00|00|00|........|
|0x080477a4|74|6e|02|08|20|6b|02|08|tn.. k..|
|0x080477ac|01|00|00|00|cc|77|04|08|.....w..|
|0x080477b4|e1|e9|85|6e|01|f4|09|10|...n....|
|0x080477bc|18|6b|02|08|a8|78|02|08|.k...x..|
|0x080477c4|20|6b|02|08|59|8b|08|10| k..Y...|
|0x080477cc|d4|77|04|08|b9|2d|08|10|.w...-..|  <------- SP
|0x080477d4|04|00|00|00|4f|03|08|10|....O...|
|0x080477dc|ed|ff|ff|ff|00|00|00|00|........|
|0x080477e4|00|00|f0|00|34|ef|00|e0|....4...|
|0x080477ec|00|00|00|c0|f0|02|00|00|........|
|0x080477f4|4f|b2|08|10|e4|69|08|10|O....i..|
|0x080477fc|00|00|0f|61|00|00|00|00|...a....|
|0x08047804|00|00|00|a0|06|01|00|00|........|
|0x0804780c|00|0e|00|14|10|04|00|08|........|
|0x08047814|c6|6d|aa|dc|fd|10|64|8e|.m....d.|
|0x0804781c|98|60|07|ac|ac|ea|7d|05|.`....}.|
|0x08047824|ff|1e|81|77|1e|77|af|f7|...w.w..|
|0x0804782c|c4|b3|c7|0d|8e|ae|54|4c|......TL|

or

|0x0804778c|08026b20|00000001|080477cc|6e85e9e1| k.......w.....n|
|---|---|---|---|---|---|
|0x0804779c|bfebb995|00000001|08026e74|08026b20|........tn.. k..|
|0x080477ac|00000001|080477cc|6e85e9e1|1009f401|.....w.....n....|
|0x080477bc|08026b18|080278a8|08026b20|10088b59|.k...x.. k..Y...|
|0x080477cc|080477d4|10082db9|00000004|1008034f|.w...-......O...| <----SP
|0x080477dc|ffffffed|00000000|00f00000|e000ef34|............4...|
|0x080477ec|c0000000|000002f0|1008b24f|100869e4|........O....i..|
|0x080477fc|610f0000|00000000|a0000000|00000106|...a............|
|0x0804780c|14000e00|08000410|dcaa6dc6|8e6410fd|.........m....d.|
|0x0804781c|ac076098|057deaac|77811eff|f7af771e|.`....}....w.w..|
|0x0804782c|0dc7b3c4|4c54ae8e|50dd3780|67a06567|......TL.7.Pge.g|

If I subtract 32 from SP, I see 10088b59, which is in here?

 401:     while(1);
0x10088B5A E7FE     b.n	10088b5a <Cy_SysLib_ProcessingFault+0x7e>

lr=,0x1009F433, points to around here:

0x1009F42E F000FB37 bl	1009faa0 <__retarget_lock_release_recursive>
0x1009F432 4628     mov	r0, r5
0x1009F434 BD38     pop	{r3, r4, r5, pc}
0x1009F436 BF00     nop

If I look at psp=,0x0803F618,, that should be in SRAM:

|0x0803f558|100aad91|100aad89|0803f5b4|0803f5b0|................|
|---|---|---|---|---|---|
|0x0803f568|0803b3f8|08045ac0|a0000000|3fc05b06|.....Z.......[.?|
|0x0803f578|a0000000|3fc05b06|0000001a|00000000|.....[.?........|
|0x0803f588|00000006|00000001|00000000|00000001|................|
|0x0803f598|00000000|00000001|00000005|08045aa0|.............Z..|
|0x0803f5a8|000003fc|20000000|3fc02dd7|000003fe|....... .-.?....|
|0x0803f5b8|80000000|fffffffd|00000016|80000000|................|
|0x0803f5c8|3fc04bd1|3fc04bd1|3fc04bd1|ffffffed|.K.?.K.?.K.?....|
|0x0803f5d8|0803f8b0|dadfdfdf|80000000|3fc04bd1|.............K.?|
|0x0803f5e8|80000000|3fc04bd1|fecf5fff|facfffff|.....K.?._......|
|0x0803f5f8|fe5ffff7|ffdfdf57|fefbd7fe|ffde5ff3|.._.W........_..|
|0x0803f608|bedfdff7|feddd5f7|fed7f5ff|bedfdff7|................|
|0x0803f618|30303032|00000000|ffffffff|08045a18|2000.........Z..| <--PSP
|0x0803f628|7fd7fbeb|100aa23f|100aaf30|81010000|....?...0.......|
|0x0803f638|3e025e8c|dffff7a7|fd57f7ee|fadf55f7|.^.>......W..U..|
|0x0803f648|bedbd5b9|367fffff|f7dff7b7|fa5fd9b7|.......6......_.|
|0x0803f658|fccfd7c7|fedbd7f6|7edfd7f2|ffdfc7bf|...........~....|
|0x0803f668|368753a9|3d2531a0|80000000|3fc04bd1|.S.6.1%=.....K.?|
|0x0803f678|00000010|00000001|08026b20|08045aa0|........ k...Z..|
|0x0803f688|00000001|100aa23f|0803f6e4|0803f6e0|....?...........|
|0x0803f698|00000005|08045be0|80000000|3fc04bd1|.....[.......K.?|
|0x0803f6a8|80000000|3fc04bd1|00000018|ffffffff|.....K.?........|
|0x0803f6b8|00000004|00000001|00000000|00000001|................|
|0x0803f6c8|00000000|ffffffed|00000003|dadfdfdf|................|

I’ve been reading this: https://interrupt.memfault.com/blog/cortex-m-rtos-context-switching to try to educate myself about how this works. I found this tidbit there:

WARNING: Over the years I’ve seen a lot of nasty stack overflows arise here which can be tricky to track down. As soon as an FPU instruction is used an additional 132 bytes will be pushed on the stack, which can lead to unexpected overflows of small embedded stacks

So, I guess a lot of what I’m looking at there is floating point registers?

I need to back up 132+32 bytes (0xA4) to find the preious PSP? 0x0803f618 - 0xA4 is 0x‭0803F574‬:
0x0803f4c8 08039e50 0803fa00 0000000b 0803f4e0 P...............

|0x08039dd0|08039598|20454c42|6b736154|00000000|....BLE Task....|
|---|---|---|---|---|---|
|0x08039de0|00000000|00000008|00000000|00000013|................|
|0x08039df0|00000000|00000000|00000000|00000000|................|
|0x08039e00|00000000|00000000|00016e57|00000000|........Wn......|
|0x08039e10|00000000|00000000|00000000|80000010|................|
|0x08039e20|08039e30|00000000|00000000|80000020|0........... ...|
|0x08039e30|00000003|08026728|08039ea8|01dacc00|....(g..........|
|0x08039e40|00000000|00000000|00000000|80000058|............X...|
|0x08039e50|00000000|08039e50|00000000|00000000|....P...........|
|0x08039e60|00000000|08039e68|ffffffff|08039e68|....h.......h...|
|0x08039e70|08039e68|00000000|08039e7c|ffffffff|h.......|.......|
|0x08039e80|08039e7c|08039e7c|00000001|00000001||...|...........|
|0x08039e90|00000000|0000ffff|00000000|00000004|................|
|0x08039ea0|00000000|800004a8|1009b951|1009ba6d|........Q...m...|

I think I need to do some more reading.

I got some new hardware, and I have been running three instances of this project. However, I only have one LCD module, so only one instance is running the UITask operations that create and update the display. I haven’t seen any Bus Faults from the other instances, lately, so I think the problem is probably in the display stuff.

So you have seen UITask was running couple of time when fault happened. We can try to determine if its stack is getting overflown or if it is corrupting SnTTask’s stack. Use xTaskCreateStatic API to create these tasks so that you can supply your own stack buffer.

StackType_t xStack[ STACK_SIZE + 1 ];

/* Pass the address to the 1st item as opposed to the 0th item 
 * to be used as the stack buffer. */
xHandle = xTaskCreateStatic(
                      vTaskCode,       /* Function that implements the task. */
                      "NAME",          /* Text name for the task. */
                      STACK_SIZE,      /* Number of indexes in the xStack array. */
                      NULL,            /* Parameter passed into the task. */
                      tskIDLE_PRIORITY,/* Priority at which the task is created. */
                      &xStack[1],      /* Array to use as the task's stack. */
                      &xTaskBuffer );  /* Variable to hold the task's data structure. */

Put a data break point on &( xStack[0] ) to stop the program when it changes - this will break the program in the debugger whenever stack gets overflown. This will help to determine the exact place memory corruption is happening (assuming it is happening).

Thanks.

Decoding bits of the dump:

|0x0803f628|7fd7fbeb|100aa23f|100aaf30|81010000|…?..0…|

Note, the 81010000 which matches the fault address, this is the PC address saved in the dump, the 100aaf30 is the LR and 100aa23f is R12.

Now note that the stack grows DOWN, so to back up over the Floating-point registers you ADD that offset, but only really if the FP registers have been saved here (which I am not sure if they have been yet, I think that happens only at task switch time)

Scanning upwards in the stack I see the line:

|0x0803f688|00000001|100aa23f|0803f6e4|0803f6e0|…?..|

Again, we have the 0x100aa23f that was in the R12 at the crash. That might be an address to see if it make sense for the program to have been running there. I might expect to find an instruction that is making a call of some sort just before that.

That is a very clever idea, and I will try it here shortly.

Looking around there in the Disassembly:

0x100AA21E D997     bls.n	100aa150 <_dtoa_r+0x180>
0x100AA220 2300     movs	r3, #0
0x100AA222 2601     movs	r6, #1
0x100AA224 9324     str	r3, [sp, #90]	; 0x90
0x100AA226 9325     str	r3, [sp, #94]	; 0x94
0x100AA228 F04F33FF mov.w	r3, #ffffffff
0x100AA22C 960C     str	r6, [sp, #30]	; 0x30
0x100AA22E 930A     str	r3, [sp, #28]	; 0x28
0x100AA230 9B0A     ldr	r3, [sp, #28]	; 0x28
0x100AA232 9310     str	r3, [sp, #40]	; 0x40
0x100AA234 2100     movs	r1, #0
0x100AA236 6461     str	r1, [r4, #44]	; 0x44
0x100AA238 4620     mov	r0, r4
0x100AA23A F000FE71 bl	100aaf20 <_Balloc>
0x100AA23E 9003     str	r0, [sp, #c]
0x100AA240 2800     cmp	r0, #0
0x100AA242 F0008644 beq.w	100aaece <_dtoa_r+0xefe>
0x100AA246 9B03     ldr	r3, [sp, #c]
0x100AA248 6423     str	r3, [r4, #40]	; 0x40
0x100AA24A 9B0A     ldr	r3, [sp, #28]	; 0x28
0x100AA24C 2B0E     cmp	r3, #e
0x100AA24E F2008101 bhi.w	100aa454 <_dtoa_r+0x484>

What does str do? “STR instructions store a register value into memory.” Looks like it’s trying to put something on the stack? (Just a guess, seeing the “sp” there). I guess dtoa could well be part of printf; double to ascii.

STR is ‘Store’, the instruction at 100AA23E stores the contents of r0, into the location address by the sp+0xC (i.e. an automatic variable).

The 0x100aa23F is the return address for the bl _Balloc statement (the lsb being set indicates thumb instruction), so my first guess is that _Balloc needs to be carefully examined to see if it has a problem, or it could be code shortly after that call, before the next one.

I would decode the program to something like:
local_90 = 0;
local_94 = 0;
local_30 = 1;
local_28 = -1;
local_40 = local_28;
member_44 = 0;
local_0c = _Balloc(this, 0);
if(local_0c) {
member_40 = local_0c

all this code is part of a function _dtoa_r

Thinking about it, this is a helper function for *printf, so check the calls, especially to sprintf to see if you might generate a too long string (sprintf really should be snprintf to avoid this)

Well, now I get an instant Bus Fault. I can’t get xTaskCreateStatic to work at all. The problem seems to come up around these lines in prvInitialiseNewTask in tasks.c:

#if( tskSET_NEW_STACKS_TO_KNOWN_VALUE == 1 )
{
	/* Fill the stack with a known value to assist debugging. */
	( void ) memset( pxNewTCB->pxStack, ( int ) tskSTACK_FILL_BYTE, ( size_t ) ulStackDepth * sizeof( StackType_t ) );
}
#endif /* tskSET_NEW_STACKS_TO_KNOWN_VALUE */

That changes uxCurrentNumberOfTasks from 0 to 0xA5A5A5A5 and following code just rolls with that.

I don’t know what the heck is going on.

I couldn’t get the UITask one to run, so I though maybe you’re not supposed to mix Static and non-Static, and I changed them all, and now the very first one fails. Whats wrong with this?

static StackType_t xStack[1024]; 
static StaticTask_t xTaskBuffer;
xTaskCreateStatic(UARTTask, "UART Task", sizeof xStack, 0, 2, xStack, &xTaskBuffer);

I’m probably missing some mundane detail, as usual.

sizeof xStack is the number of byte, you want the number of elements.

Either define a #define symbol with the number and use it the two places of use sizeof xStack / sizeof (xStack[0]) as the stack size parameter

D’oh! Thanks.

I’ve got it running now, with a Variable Watchpoint set for write on xStack[0] for UITask.

The only sprintf()s I can find are in

  • +FAT tests that aren’t running now
  • File-related-CLI-commands that aren’t being used at the moment
  • in tasks.c

So, the only ones that I think could matter would be in tasks.c, and I hope those are well-tested.

Find results for 'sprintf':
---------------------------
C:\Program Files (x86)\Cypress\PSoC Creator\4.3\PSoC Creator\import\gnu\arm\9.3.1\arm-none-eabi\include\stdio.h - (line 244, col 5): int	sprintf (char *__restrict, const char *__restrict, ...)
C:\Users\carlk\Documents\PSoC Creator\Turbmon.cydsn\Turb_Mon.cydsn\CreateAndVerifyExampleFiles.c - (line 384, col 2): 	sprintf( pcFileName, "%s.txt", pcDirectory2 );
C:\Users\carlk\Documents\PSoC Creator\Turbmon.cydsn\Turb_Mon.cydsn\ff_stdio_tests_with_cwd.c - (line 322, col 4): 			sprintf( pcExpectedString, "%s %d\n", pcStringStart, iString );
C:\Users\carlk\Documents\PSoC Creator\Turbmon.cydsn\Turb_Mon.cydsn\ff_stdio_tests_with_cwd.c - (line 333, col 4): 			sprintf( pcExpectedString, "%s %d\n", pcStringStart, iString );
C:\Users\carlk\Documents\PSoC Creator\Turbmon.cydsn\Turb_Mon.cydsn\File-related-CLI-commands.c - (line 300, col 3): 		sprintf(pcWriteBuffer, "In: ");
C:\Users\carlk\Documents\PSoC Creator\Turbmon.cydsn\Turb_Mon.cydsn\File-related-CLI-commands.c - (line 304, col 3): 		sprintf(pcWriteBuffer, "Error");
C:\Users\carlk\Documents\PSoC Creator\Turbmon.cydsn\Turb_Mon.cydsn\File-related-CLI-commands.c - (line 388, col 3): 		sprintf(pcWriteBuffer, "%s was deleted", pcParameter);
C:\Users\carlk\Documents\PSoC Creator\Turbmon.cydsn\Turb_Mon.cydsn\File-related-CLI-commands.c - (line 390, col 3): 		sprintf(pcWriteBuffer, "Error.  %s was not deleted", pcParameter);
C:\Users\carlk\Documents\PSoC Creator\Turbmon.cydsn\Turb_Mon.cydsn\File-related-CLI-commands.c - (line 420, col 3): 		sprintf(pcWriteBuffer, "%s was deleted", pcParameter);
C:\Users\carlk\Documents\PSoC Creator\Turbmon.cydsn\Turb_Mon.cydsn\File-related-CLI-commands.c - (line 422, col 3): 		sprintf(pcWriteBuffer, "Error.  %s was not deleted", pcParameter);
C:\Users\carlk\Documents\PSoC Creator\Turbmon.cydsn\Turb_Mon.cydsn\File-related-CLI-commands.c - (line 467, col 3): 		sprintf(pcWriteBuffer, "Source file does not exist");
C:\Users\carlk\Documents\PSoC Creator\Turbmon.cydsn\Turb_Mon.cydsn\File-related-CLI-commands.c - (line 477, col 4): 			sprintf(pcWriteBuffer, "Error: Destination is a directory not a file");
C:\Users\carlk\Documents\PSoC Creator\Turbmon.cydsn\Turb_Mon.cydsn\File-related-CLI-commands.c - (line 483, col 4): 			sprintf(pcWriteBuffer, "Error: Destination file already exists");
C:\Users\carlk\Documents\PSoC Creator\Turbmon.cydsn\Turb_Mon.cydsn\File-related-CLI-commands.c - (line 491, col 4): 			sprintf(pcWriteBuffer, "Copy made");
C:\Users\carlk\Documents\PSoC Creator\Turbmon.cydsn\Turb_Mon.cydsn\File-related-CLI-commands.c - (line 493, col 4): 			sprintf(pcWriteBuffer, "Error during copy");
C:\Users\carlk\Documents\PSoC Creator\Turbmon.cydsn\Turb_Mon.cydsn\File-related-CLI-commands.c - (line 619, col 2): 	sprintf(pcBuffer, "%s [%s] [size=%lu]", pxFindStruct->pcFileName, pcAttrib, (unsigned long) pxFindStruct->ulFileSize);
C:\Users\carlk\Documents\PSoC Creator\Turbmon.cydsn\FreeRTOS\FreeRTOS\Source\tasks.c - (line 4467, col 5): 				sprintf( pcWriteBuffer, "\t%c\t%u\t%u\t%u\r\n", cStatus, ( unsigned int ) pxTaskStatusArray[ x ].uxCurrentPriority, ( unsigned int ) pxTaskStatusArray[ x ].usStackHighWaterMark, ( unsigned int ) pxTaskStatusArray[ x ].xTaskNumber ); /*lint !e586 sprintf() allowed as this is compiled with many compilers and this is a utility function only - not part of the core kernel implementation. */
C:\Users\carlk\Documents\PSoC Creator\Turbmon.cydsn\FreeRTOS\FreeRTOS\Source\tasks.c - (line 4569, col 8): 							sprintf( pcWriteBuffer, "\t%u\t\t%u%%\r\n", ( unsigned int ) pxTaskStatusArray[ x ].ulRunTimeCounter, ( unsigned int ) ulStatsAsPercentage ); /*lint !e586 sprintf() allowed as this is compiled with many compilers and this is a utility function only - not part of the core kernel implementation. */
C:\Users\carlk\Documents\PSoC Creator\Turbmon.cydsn\FreeRTOS\FreeRTOS\Source\tasks.c - (line 4585, col 8): 							sprintf( pcWriteBuffer, "\t%u\t\t<1%%\r\n", ( unsigned int ) pxTaskStatusArray[ x ].ulRunTimeCounter ); /*lint !e586 sprintf() allowed as this is compiled with many compilers and this is a utility function only - not part of the core kernel implementation. */

Matches found: 19

You previously mentioned that this task uses snprintf a lot, you might want to double check the size of all the string buffers and the size passed to snprintf, to see if you have a too small buffer anywhere.

The other option is to scan through the stack like I did farther to find the call into the *printf function, and look at it more carefully.

My guess, from what I see that the issue may not be a normal ‘stack overflow’ where you grow the stack past its bounds (as the stack seems well behaved at the crash) but an overflow of a buffer on the stack clobbering other items on the stack, like a return address.

In completion to Richards hints it’s possible that the stack might not be overflowed say linearly.
Once I encountered a (s)printf caused stack overflow with a gap where the stack data was not changed but with a corruption at a certain offset.
This was caused by a serious underestimation of the stack needed.

It ran happily all night. [Of course: change one thing and the problem goes away… for a while.] But then, this morning I decided to interact with it using its keypad, and triggered an assert out of the blue

EDIT: Turns out it was an extraneous configASSERT. I’ve been sprinkling them around a bit too freely in my efforts to find this darn Bus Fault.

I hope I don’t cause you a fruitless detour here, but I think PSoC Creator uses newlib for the C runtime. Newlib does need some care and feeding.

I noticed in your FreeRTOSConfig.h:

#define configUSE_NEWLIB_REENTRANT 0

Most likely you’ll need to change that to 1. This tells FreeRTOS to maintain a global pointer used by newlib to find its current “context” data. If you call a newlib function reentrantly (eg, from more than one task) without this, it can cause all kinds of problems (corruptions, faults, etc).

Note also that newlib uses heap memory – and not the FreeRTOS heap because newlib doesn’t know anything about FreeRTOS. That’s fine as long as you provide the heap, along with the protections required when that heap runs out of space, along with the hook functions to make the malloc() family threadsafe. Cypress may be already doing much of this for you, but you should check. Maybe your “other” heap is running out of space or getting corrupted.

There is good detailed information about all of the above here: http://www.nadler.com/embedded/newlibAndFreeRTOS.html though it is not geared directly to PSoC Creator.

1 Like

Looking at your original message, it looks like p got it value changed to point into your programs string constant area, in particular the value of the string message for the assert. I don’t know if you just got the wrong level of the call stack or if this is a sign of some of the memory mis-writes that seem to be plaguing your code.

Gak! I’ve fallen into a nasty pitfall. That sure seems like an exact match for my problem. I’ve now set #define configUSE_NEWLIB_REENTRANT 1. Then I had to do some fiddling around to get stack and heap to fit in the SRAM, because, exactly as Dave warns in his excellent article, “configUSE_NEWLIB_REENTRANT can chew up considerable memory!”

I’ve got it running now, and we’ll see if this solves the Bus Fault. I am interested in trying his heap_useNewlib, but heap_useNewlib_NXP.c doesn’t find HEAP_SIZE in cy8c6xx7_cm4_dual.ld, and I know nothing about linker scripts.cy8c6xx7_cm4_dual.zip (3.9 KB)

  • Having FreeRTOS provide and manage the REENT structures is a great diagnostic at the very least. There are ways to avoid them later if needed (and recover that RAM).
  • Your code does appear to call newlib function(s) that use heap memory – a post above indicates Balloc() is executing. As a diagnostic, you might use a breakpoint there (or better yet in _sbrk() or _sbrk_r()) to determine what newlib function(s) are using the heap, and then temporarily stop calling those functions. For the moment I suspect the malloc family of functions are not threadsafe and probably not out-of-memory safe in your config.

With those two things in place (REENT structures via FreeRTOS and avoiding newlib heap usage) you should be able to tell if this whole idea was a wild goose chase.

I’m pretty sure newlib concurrency problems was the issue. It ran fine all night with

#define configUSE_NEWLIB_REENTRANT 1

It all makes sense, now. As Dave says: “The most common functions that bite unsuspecting embedded developers are sprintf using %f…”. I started having problems when I started fleshing out the vibration analysis part of my project and began using a lot of, you guessed it, (various)printf using %f. Interestingly, I had three hardware instances running, but two were only logging to SD card, while a third was also updating an LCD display. Only the one with the display had the Bus Faults. Maybe it has something to do with the display being updated concurrently with logging, both using printf %f.

Now, I am experimenting with heap_useNewlib_NXP.c. I have it working, but some things are mysterious. I changed cy8c6xx7_cm4_dual.ld from:

.heap (NOLOAD):  
{  
    __HeapBase = .;  
    __end__ = .;  
    end = __end__;  
    KEEP(*(.heap*))  
    __HeapLimit = .;  
} > ram  

to:

HEAP_SIZE = 0x7000; /* 28 kB */

.heap (NOLOAD):
{
    __HeapBase = .;
    __end__ = .;
    end = __end__;
    KEEP(*(.heap*))
    . = ALIGN(8);
    . = . + HEAP_SIZE;
    . = ALIGN(8);
    __HeapLimit = .;
} > ram

FreeRTOSConfig.h from:

#define configTOTAL_HEAP_SIZE                   (28*1024)
#define configAPPLICATION_ALLOCATED_HEAP        0

to:

#define configTOTAL_HEAP_SIZE                   0 //(28*1024)
#define configAPPLICATION_ALLOCATED_HEAP        1

That ran… until it tried to do a printf %f, which immediately gets a Usage Fault. On a whim, I changed:

#define configUSE_NEWLIB_REENTRANT 1

back to

#define configUSE_NEWLIB_REENTRANT 0

and that seems to run fine! But I have no idea why, and it feels like I’m living dangerously.

You don’t need to define either of these symbols because you should remove heap_n.c from the project now. I think those symbols are used only in heap_n.c.

Can you use a breakpoint in the debugger (at printf %f) to find out more precisely where the fault occurs using heap_useNewlib when configUSE_NEWLIB_REENTRANT is 1? You said it happens immediately, and that should make it easier to find the root cause. The new fault is problematic because the code should always work with configUSE_NEWLIB_REENTRANT set to 1, and with care it can be made to work with configUSE_NEWLIB_REENTRANT set to 0.