Cortex M3 Hard Faults - xTimerCreate

smith84 wrote on Friday, May 03, 2013:

Hi everyone,

I have a system which is running fine with 2 tasks and 2 timers with FreeRTOS 7.4.0 on a Cortex-M3 (Energy Micro EFM32LG232). Compiler is GNU Tools ARM Embedded 4.7 2012q4. Tickless idle has been enabled and is implemented using the on-chip RTC.

I have added new module which uses another timer, and now the system is beginning to Hard Fault straight away from OS start.

I have commented out as much code as I can, but it appears to be the call to xTimerCreate (which happens before OS Startup) which makes all the difference. Even if the timer is never started, the micro will hit a hard fault after the OS is started. Commenting out the new call to xTimerCreate stops the hard fault from occurring.

The HardFault handler from has been copied in.

When the hard fault occurs,
- The “Tmr Svc” task is running (according to the Task Table in Eclipse)
- HFSR.FORCED = 1 (all other bits 0)
- CFSR.INVSTATE = 1 (all other bits 0)
- The stacks do not appear to have overflowed as there is plenty of space filled with the poison value.
- pulFaultStackAddress = 0x20007fc0
- stacked_r0 = 0
- stacked_r1 = 0
- stacked_r2 = 0x10000000
- stacked_r3 = 0xE000ED04
- stacked_r12 = 0xA5A5A5A5
- stacked_lr = 0xFFFFFFFD
- stacked_pc = 0xbe00 (no instructions here, it is all 0xFF)
- stacked_psr = 0x6000000E (PendSV ISR active?)

I have analysed the memory contents before and after the call to xTimerCreate and nothing there appears to be writing anything where it shouldn’t.

Not sure what is going on here. Funny thing is that the error that causes the hard fault looks like it is coming from the SVC Handler ISR.

I hope that someone else here who has used the CM3 with Timers on FreeRTOS can make some suggestions. Many thanks!

rtel wrote on Friday, May 03, 2013:

You say the code was running fine, and that this problem started when you added more modules.  Could it be that the added modules have exhausted your FreeRTOS heap, and that something you have attempted to create has not been created resulting in a null pointer reference.

If that is a possibility then you can check the return value of each API call used to create a task, queue, timer, semaphore or mutex to trap any allocation failures - but as a more blunt but simple method you can define a Malloc failed hook to see if the hook function ever gets called (set configUSE_MALLOC_FAILED_HOOK to 1 in FreeRTOSConfig.h, then define a function with prototype

void vApplicationMallocFailedHook( void )

- put a break point in the function and see if it ever gets called.

If the error come from within the tickless implementation then I’m afraid I can’t help as you provided that yourself, but there are two example implementations in the V7.4.1 download that you can use as a template (and presumably have already used as a template).  Note there was an error in the SAM4L and RX100 tickless code in the line:

	ulAlarmValue = ( ulAlarmValueForOneTick * ( xExpectedIdleTime - 1UL ) );

which should have been simply:

	ulAlarmValue = ulAlarmValueForOneTick * xExpectedIdleTime;

Otherwise the partially complete tick period is compensated for twice.


smith84 wrote on Friday, May 03, 2013:

The malloc failure hook is currently implemented with an endless loop. I am therefore quite sure that lack of heap is not the issue. Oddly I have been able to call xTimerCreate in another place before OS start and it seems to be fine (probably should have mentioned that!)

I probably should try disabling tickless idle to see if that makes a difference. I will do that after the Bank Holiday and post results. You are correct that the tickless idle functionality is custom, but it was done based on the documentation (before 7.4.1 was released).

smith84 wrote on Tuesday, May 07, 2013:

Unfortunately disabling the tickless idle has not made any change. I will have to debug further into this. Quite frustrating.

rtel wrote on Tuesday, May 07, 2013:

Ok - it is good that the tickless idle implementation has been eliminated as a cause.  From your original post it appears that the new module you added is key, either because there is something in the implementation of the module itself that is causing some kind of corruption, or because the act of adding the code into your build is shifting code in flash or ram around, and that is highlighting a problem.

Are you 100% sure your linker script is correct for the device being programmed.

Before delving into the new module, have you checked that removing the new module again gets you back to a working state.

How big is the new module?  Are you able to post its code here?


smith84 wrote on Tuesday, May 07, 2013:

I have just double-checked the linker script and all appears well, certainly the memory sizes are correct. (I have used the unmodified EFM32LGxxx series linker file from Energy Micro.)

I am now not certain that all is well when the new timer call is not included after doing some more detailed analysis. At this stage I have commented out the offending module’s Init function so there may be a memory size influence, but there should certainly be no new code running.

I have done some step-by-step debugging to try and understand what is going on during the OS startup through the prvPortStartFirstTask and vPortSVCHandler calls, and would like to ask the following (possibly silly) question:
Should the port.c code initialise the priority of the SVCall Exception? I see that it sets the priority of the PendSV and SysTick (though it probably shouldn’t do SysTick if tickless idle is active), but could not see anything for SVCall and so its priority remains at 0 (highest). Although I guess if my reading is correct it is only used to set up the first task and so would probably not make a difference.

rtel wrote on Tuesday, May 07, 2013:

In that port SVC is only used to start the scheduler, so its priority does not matter.


smith84 wrote on Wednesday, May 08, 2013:

I have inadvertently managed to reproduce the fault in another way without the new module while trying to work back to a fully stable system.
It seems that it might have a contribution from my Tickless Idle implementation - the fault occurs when the tickless idle is enabled and disappears when it is disabled.
Oddly, the stacked LR, PC and IPSR are all the same as the previous incarnation. Particularly the PC as there are no valid instructions at its location.
Will post more once I have figured something out.

smith84 wrote on Thursday, May 09, 2013:

The problem went away after commenting out a register sync spin loop post-RTC configuration.
Once making the system stable I have built back in the trouble module and all seems to work again.
Now an experiment of putting the loop back in did not cause the fault to recur, so I am just going to have to watch and see if the problem comes back.

Thanks again for your help, Richard. I have certainly learned a lot about how FreeRTOS and the Cortex-M3 works during this process. If I eventually get to the bottom of the problem I will try and post as much info as I can here.