Hardfault in xQueueGenericReceive() at certain optimisation levels on STM32 ARM Cortex M4, FreeRTOS v9.0.0

stuartbrown wrote on Tuesday, March 20, 2018:

Hi

I’m in need of some help. I have a FreeRTOS project generated from CubeMX (v4.25.0) targetting a STM32L432. The project is generated for, compiler in and debugged in Atollic TrueStudio, so the tool chain is gcc. When I compile with optimisation level -O2 or -O3 I get hard faults in xQueueGenericReceive() . If I use other levels (-O0, -O1, -Og, -Os, -Of) then it runs fine.

According to the fault analyser in Atollic True Studio, the PC when the hard fault occurs is on this line in queue.c, function xQueueGenericReceive(),

for( ;; )
	{
		taskENTER_CRITICAL();
		{
			const UBaseType_t uxMessagesWaiting = pxQueue->uxMessagesWaiting; <--- HARD FAULT HERE
`

I have a screenshot of the disassembly from the Atollic Fault Analyser here , showing the error location to be the following instruction
ldr r6, [r4, #56]

This causes a precise bus fault because r4 contains address 0x32000, so the instruction is trying to access address 0x320038.

I have spent the last few days reading about hardfaults and debugging. But I’m still a bit lost, being new to STM32. I have followed the advice on the FreeRTOS site about Cortex M3/4 processors and interrupt priority and debugging hard faults.

I have config assert defined:
#define configASSERT( x ) if ((x) == 0) { taskDISABLE_INTERRUPTS(); __asm volatile("BKPT #01"); for( ;; ); }

My interrupt config is generated by CubeMX as follows:
#define configPRIO_BITS 4
#define configLIBRARY_LOWEST_INTERRUPT_PRIORITY 15
#define configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY 5
#define configKERNEL_INTERRUPT_PRIORITY (configLIBRARY_LOWEST_INTERRUPT_PRIORITY << (8 - configPRIO_BITS) )
#define configMAX_SYSCALL_INTERRUPT_PRIORITY (configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY << (8 - configPRIO_BITS) )

My application is not using any interupts. It is configured without premption, and has 3 tasks that wait on a FreeRTOS queue, blocking indefinitely till there is something to process. The 3 tasks are

  1. An application task running a state machine. It waits on a queue that contains state machine events.
  2. A message processing task for handling communications from a PC application. The PC app is not being used when the fault occurs, so this task is simply blocked.
  3. And finally a task processing bytes from the GPS UART. This is running when the fault occurs. The GPS UART is configured for DMA to a circular buffer, and the buffer is emptied in the idle task hook, where all bytes are read out and pushed into the FreeRTOS queue.

There are 2 timers running, their callbacks push timer expired events into the state machine queue. The state machine logic will start the timers as approriate.

At the time of the hard fault, the application task (HSM) is spinning in a loop waiting for GPS fix, it checks the fix flag and if there is no fix it calls vTaskDelay(). It seems that the error is somehow related to this delay causing a context swtich. The same happens if I replace the delay with a taskYIELD().

Please can someone help me debug this further? I’m really stuck

thanks

Stuart

rtel wrote on Tuesday, March 20, 2018:

Curious. Can you tell me the FreeRTOS and GCC versions you are using.

stuartbrown wrote on Wednesday, March 21, 2018:

Richard

FreeRTOS is v9.0.0 - it is what CubeMX spits out. I have not compared to the “official” download.

gcc is the Atollic version:

 .\arm-atollic-eabi-gcc.exe --version
arm-atollic-eabi-gcc.exe (GNU Tools for ARM Embedded Processors (Build 17.03)) 6.3.1 20170215 (release) [ARM/embedded-6-branch revision 245512]
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

rtel wrote on Wednesday, March 21, 2018:

I will see if I can get the same GCC version and try it myself.

Can you look at the assembly and see how r4 is getting set? It might
give a clue as to why it seems to hold the wrong value when it is used.

stuartbrown wrote on Wednesday, March 21, 2018:

Richard

I did a compare on the CubeMX generated FreeRTOS source code and did not find any significant differences.

You can download the tool chain (complete IDE) for free from Atollic website, STM have bought them and now make the tool available for STM32 for free - https://atollic.com/resources/download/. BTW, I’m using the Windows version.

Here is the dissassembly for the -O2 optimised version. Things seem to have been moved around a bit…

1238        {
            xQueueGenericReceive:
0x08010d28:   stmdb   sp!, {r4, r5, r6, r7, r8, r9, r10, lr}
0x08010d2c:   sub     sp, #16
0x08010d2e:   str     r2, [sp, #4]
1244        	configASSERT( pxQueue );
0x08010d30:   cmp     r0, #0
0x08010d32:   beq.w   0x8010e5a <xQueueGenericReceive+306>
1245        	configASSERT( !( ( pvBuffer == NULL ) && ( pxQueue->uxItemSize != ( UBaseType_t ) 0U ) ) );
0x08010d36:   cmp     r1, #0
0x08010d38:   beq.w   0x8010e9c <xQueueGenericReceive+372>
0x08010d3c:   mov     r4, r0
0x08010d3e:   mov     r9, r3
0x08010d40:   mov     r8, r1
1248        		configASSERT( !( ( xTaskGetSchedulerState() == taskSCHEDULER_SUSPENDED ) && ( xTicksToWait != 0 ) ) );
0x08010d42:   bl      0x8011868 <xTaskGetSchedulerState>
0x08010d46:   cbnz    r0, 0x8010d60 <xQueueGenericReceive+56>
0x08010d48:   ldr     r5, [sp, #4]
0x08010d4a:   cbz     r5, 0x8010d62 <xQueueGenericReceive+58>
0x08010d4c:   mov.w   r3, #80 ; 0x50
0x08010d50:   msr     BASEPRI, r3
0x08010d54:   isb     sy
0x08010d58:   dsb     sy
1248        		configASSERT( !( ( xTaskGetSchedulerState() == taskSCHEDULER_SUSPENDED ) && ( xTicksToWait != 0 ) ) );
0x08010d5c:   bkpt    0x0001
0x08010d5e:   b.n     0x8010d5e <xQueueGenericReceive+54>
0x08010d60:   movs    r5, #0
1258        		taskENTER_CRITICAL();
0x08010d62:   bl      0x8010434 <vPortEnterCritical>
1260        			const UBaseType_t uxMessagesWaiting = pxQueue->uxMessagesWaiting;
0x08010d66:   ldr     r6, [r4, #56]   ; 0x38
1401        					portYIELD_WITHIN_API();
0x08010d68:   ldr.w   r10, [pc, #332] ; 0x8010eb8 <xQueueGenericReceive+400>
1371        		prvLockQueue( pxQueue );
0x08010d6c:   movs    r7, #0
1264        			if( uxMessagesWaiting > ( UBaseType_t ) 0 )
0x08010d6e:   cmp     r6, #0
0x08010d70:   bne.n   0x8010dfc <xQueueGenericReceive+212>
1343        				if( xTicksToWait == ( TickType_t ) 0 )
0x08010d72:   ldr     r3, [sp, #4]
0x08010d74:   cmp     r3, #0
0x08010d76:   beq.w   0x8010e7e <xQueueGenericReceive+342>
1351        				else if( xEntryTimeSet == pdFALSE )
0x08010d7a:   cbnz    r5, 0x8010d82 <xQueueGenericReceive+90>
1355        					vTaskSetTimeOutState( &xTimeOut );
0x08010d7c:   add     r0, sp, #8
0x08010d7e:   bl      0x80117b0 <vTaskSetTimeOutState>
1365        		taskEXIT_CRITICAL();
0x08010d82:   bl      0x8010478 <vPortExitCritical>
1370        		vTaskSuspendAll();
0x08010d86:   bl      0x801138c <vTaskSuspendAll>
1371        		prvLockQueue( pxQueue );
0x08010d8a:   bl      0x8010434 <vPortEnterCritical>
0x08010d8e:   ldrb.w  r3, [r4, #68]   ; 0x44
0x08010d92:   cmp     r3, #255        ; 0xff
0x08010d94:   it      eq
0x08010d96:   strbeq.w        r7, [r4, #68]   ; 0x44
0x08010d9a:   ldrb.w  r3, [r4, #69]   ; 0x45
0x08010d9e:   cmp     r3, #255        ; 0xff
0x08010da0:   it      eq
0x08010da2:   strbeq.w        r7, [r4, #69]   ; 0x45
0x08010da6:   bl      0x8010478 <vPortExitCritical>
1374        		if( xTaskCheckForTimeOut( &xTimeOut, &xTicksToWait ) == pdFALSE )
0x08010daa:   add     r1, sp, #4
0x08010dac:   add     r0, sp, #8
0x08010dae:   bl      0x80117d0 <xTaskCheckForTimeOut>
0x08010db2:   cmp     r0, #0
0x08010db4:   bne.n   0x8010e40 <xQueueGenericReceive+280>
1918        	taskENTER_CRITICAL();
0x08010db6:   bl      0x8010434 <vPortEnterCritical>
1920        		if( pxQueue->uxMessagesWaiting == ( UBaseType_t )  0 )
0x08010dba:   ldr     r3, [r4, #56]   ; 0x38
0x08010dbc:   cmp     r3, #0
0x08010dbe:   bne.n   0x8010e2e <xQueueGenericReceive+262>
1929        	taskEXIT_CRITICAL();
0x08010dc0:   bl      0x8010478 <vPortExitCritical>
1382        					if( pxQueue->uxQueueType == queueQUEUE_IS_MUTEX )
0x08010dc4:   ldr     r3, [r4, #0]
0x08010dc6:   cmp     r3, #0
0x08010dc8:   beq.n   0x8010e6e <xQueueGenericReceive+326>
1397        				vTaskPlaceOnEventList( &( pxQueue->xTasksWaitingToReceive ), xTicksToWait );
0x08010dca:   ldr     r1, [sp, #4]
0x08010dcc:   add.w   r0, r4, #36     ; 0x24
0x08010dd0:   bl      0x801168c <vTaskPlaceOnEventList>
1398        				prvUnlockQueue( pxQueue );
0x08010dd4:   mov     r0, r4
0x08010dd6:   bl      0x8010974 <prvUnlockQueue>
1399        				if( xTaskResumeAll() == pdFALSE )
0x08010dda:   bl      0x801139c <xTaskResumeAll>
0x08010dde:   cbnz    r0, 0x8010df0 <xQueueGenericReceive+200>
1401        					portYIELD_WITHIN_API();
0x08010de0:   mov.w   r3, #268435456  ; 0x10000000
0x08010de4:   str.w   r3, [r10]
0x08010de8:   dsb     sy
0x08010dec:   isb     sy
0x08010df0:   movs    r5, #1
1258        		taskENTER_CRITICAL();
0x08010df2:   bl      0x8010434 <vPortEnterCritical>
1260        			const UBaseType_t uxMessagesWaiting = pxQueue->uxMessagesWaiting;
0x08010df6:   ldr     r6, [r4, #56]   ; 0x38
1264        			if( uxMessagesWaiting > ( UBaseType_t ) 0 )
0x08010df8:   cmp     r6, #0
0x08010dfa:   beq.n   0x8010d72 <xQueueGenericReceive+74>
1270        				prvCopyDataFromQueue( pxQueue, pvBuffer );
0x08010dfc:   mov     r1, r8
0x08010dfe:   mov     r0, r4
1268        				pcOriginalReadPosition = pxQueue->u.pcReadFrom;
0x08010e00:   ldr     r5, [r4, #12]
0x08010e02:   bl      0x801094c <prvCopyDataFromQueue>
1272        				if( xJustPeeking == pdFALSE )

stuartbrown wrote on Wednesday, March 21, 2018:

OK, so I was able to trap it in the debugger. Basically, R4 gets loaded from R0, which is the first argument to xQueueGenericReceive, that is QueueHandle_t xQueue. If I add the following line to the start of xQueueGenericReceive if (xQueue == 0x320000) __asm volatile("BKPT #01"); then the debugger stops just before the hard fault is generated. But if I go back in the stack trace, the call has a valid address for xQueue:

Stack:
Thread #1 (Suspended : Signal : SIGTRAP:Trace/breakpoint trap)
xQueueGenericReceive() at queue.c:1,239 0x8010e62
gps_task() at gps.c:196 0x80136f6
uxListRemove() at list.c:238 0x8010374

Looking at gps_task() it contains the following block of code

for (;;)
    {
        uint8_t new_byte;
        BaseType_t status = xQueueReceive(g_gps_uart_queue_handle, &new_byte, portMAX_DELAY);
        if (pdTRUE == status)
        {

Disassembled this gives

196                BaseType_t status = xQueueReceive(g_gps_uart_queue_handle, &new_byte, portMAX_DELAY);
0x080136e6:   movs    r3, #0
0x080136e8:   mov.w   r2, #4294967295
0x080136ec:   add.w   r1, r7, #11
0x080136f0:   ldr     r0, [r4, #0]
0x080136f2:   bl      0x8010d28 <xQueueGenericReceive>
 197                if (pdTRUE == status)
0x080136f6:   cmp     r0, #1
0x080136f8:   bne.n   0x80136e6 <gps_task+22>

In the load register instruction @ 0x080136f0 ldr r0, [r4, #0], r0 = 0x320000, but r4 = 0x2000000a. So why is r0 wrong?

rtel wrote on Wednesday, March 21, 2018:

Reading this on my phone at the moment so can’t read the code properly. Will look in more detail tomorrow. If you get a chance post the disassemble of the code at that point too.

stuartbrown wrote on Wednesday, March 21, 2018:

Thanks Richard.

I think the disassembly you asked for is there. It seems something is corrupting the queue handle passed into xQueueReceive(). I have added a watch point on it, and the only times it gets touched is at init time, and when the queue is created. Yet stepping through in assembly clearly shows it is wrong. It works for a while and then suddenly goes wrong.

I will try another compiler tomorrow, the AC6 System Workbench one.

I’m baffled.

stuartbrown wrote on Wednesday, March 21, 2018:

I have tried the AC6 tool chain v2.4(http://www.openstm32.org/HomePage), it uses the same compiler verison (6.3.1, but a slightly later build) and also generates the hard fault.

./arm-
none-eabi-gcc --version
arm-none-eabi-gcc.exe (GNU Tools for ARM Embedded Processors 6-2017-q2-update) 6.3.1 20170620 (release) [ARM/embedded-6-branch revision 249437]
Copyright (C) 2016 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

stuartbrown wrote on Thursday, March 22, 2018:

Thanks for your help Richard. I found the problem, and as I expected it was in my code. I found an array overrun that was clobbering the queue handle. It seems that the optimisation settings changed what got clobbered when the overrun happened. It was just coincidence that it always clobbered the queue handle for levels -O2 and -O1.