STM32WB55 FLASH_SR_CFGBSY never clears when using FreeRTOS and TIM1

EvanZ · April 13, 2021, 4:57pm

Hello,

I am using the STM32WB55 Nucleo board in combination with FreeRTOS to create a Bluetooth application. Once the project got quite large and all peripherals were configured, I noticed that the FLASH_SR_CFGBSY bit was getting set and was never clearing. This prevents CPU2 from triggering the interrupt that it is ready and to start the Bluetooth-specific application code on CPU1.

I have been talking with STMicroelectronics on this issue and they believe they have narrowed it down to an issue in FreeRTOS when using TIM1:

I tested adding TIM1 from both BLE_HeartRateFreeRTOS and BLE_HeartRate examples.
The BLE_HeartRateFreeRTOS example that uses the FreeRTOS v10 had an issue when adding TIM1.
Whereas the BLE_HeartRate example, which uses a sequencer provided from the STM32CubeWB package, doesn’t generate the FLASH_SR_CFGBSY bit toggling issue.
It doesn’t appear that this TIM1 issue occurs if FreeRTOS is removed, so it seems that the issue has to do with FreeRTOS.
The ticket within ST has been escalated and is being looked at by the firmware developers supporting FreeRTOS as it does appear to be the underlying issue.

I am hoping that since they believe the issue is with FreeRTOS, someone here might have an idea of what is causing this problem.

Thanks!

hs2 · April 13, 2021, 7:11pm

I don’t think that this rather strange effect is caused by FreeRTOS in itself, but I found this interesting post in the STM32 MCU forum.
Seems that certain programming bugs (e.g. writing to address 0 and upwards) cause FLASH_SR_CFGBSY being set on a STM32WB55…
It would be better if accessing/writing to a NULL pointer would cause a far less misleading hard fault
Maybe you stumbled into a similar issue accessing a not yet initialized HAL (peripheral) handle or it’s something different…
I guess you’re using TIM1 as FreeRTOS ticker and not the SysTick (probably used by HAL), right ?
Also see this interesting thread here in the forum. It seems to me that there a still a few issues running Cube-generated FreeRTOS apps on the STM32WB55.

EvanZ · April 13, 2021, 8:05pm

Thanks for the response @hs2 !
I am using TIM16 as the SYS Timebase Source since it was recommended not to use SysTick as the timebase when using FreeRTOS. I think the default FreeRTOS Heart Rate demo uses TIM17. FreeRTOS #undef’s SysTick_Handler and redefines it itself in cmsis_os2.c
I am not using the WDT and I am not seeing any issues with the systick not counting like those other posts you mentioned.
It is a very strange problem because simply moving some code around (like replacing a function call with the contents of that function, even if the code isn’t called when the issue happens) or changing optimization levels while debugging can cause the issue to temporarily “go away.” However, that’s not really a solution to the problem since I have no way of knowing when it will pop up again…
Thanks again!

hs2 · April 13, 2021, 8:56pm

Ok - the weird behavior you describe would be a strong indication for me of the most common cause of such behavior - a stack overflow. Or an inappropriate interrupt priority of an ISR using FreeRTOS API calls.
Please see e.g. this hints hopefully helping you to nail it

EvanZ · April 13, 2021, 9:27pm

I have defined configCHECK_FOR_STACK_OVERFLOW as 2 in FreeRTOSConfig.h and provided an implementation for vApplicationStackOverflowHook. I have caught a few stack overflows when developing my application this way, however it does not break in this function when the FLASH_SR_CFGBSY issue occurs and all of my other application code and threads run fine, just the CPU2 interrupt doesn’t occur. One thing to note is if a build doesn’t work, then it never works. If it works, then it always works. From the documentation that I have read, it appears that any interrupt that calls FreeRTOS functions on the STM32WB needs to be priority 5 or lower (“lower” being a larger number). The interrupt that would normally occur when the CPU2 is ready is set at priority 6, however if I put a breakpoint in that interrupt function, it never hits when the FLASH_SR_CFGBSY issue occurs, so it doesn’t even get a chance to call the FreeRTOS function in there. I tried increasing the priority (eg. prio 4) and the FreeRTOS function configASSERT’s as expected.

Certainly a stumper of an issue…

Thanks again for the reply!

EvanZ · April 26, 2021, 2:44pm

Adding __disable_irq(); before all of the MX_xxxx_Init(); functions in main.c and __enable_irq(); after the init functions appear to make the FLASH_SR_CFGBSY issue go away, but without a root cause I can not have a high degree of confidence that this doesn’t just hide the issue again.

I will continue to monitor with this change in my project to see if it comes back.

EvanZ · April 29, 2021, 7:36pm

After 3 days, the issue returned. Still looking for solutions.

jefftenney · April 30, 2021, 6:18pm

Hopefully ST will get some answers for you. I would guess the issue is theirs, even though it happens only in the FreeRTOS version of their application.

One “long shot” worth trying is to set a data-write breakpoint at location zero. It sounds like maybe some code is writing with a null pointer, and maybe you could catch wherever that is happening.

EvanZ · May 5, 2021, 7:32pm

Thanks for the reply @jefftenney, I added a watchpoint for “read” and “write” on address 0. The execution breaks in prvPortStartFirstTask() both when the issue occurs and when it doesn’t. It does not break at all if I just have the watchpoint set to “write” on address 0. I am not super familiar with watchpoints, so hopefully I set it up correctly.

Side note, the issue may not have come back after disabling/enabling interrupts. I just noticed that I mistakenly added some code which was writing to address 0 (assigning a value to null address) which might be why I saw the issue come back after a few days (writing address 0 sets the busy bit). After fixing that problem in my code, I have not seen the issue occur. So maybe disabling interrupts does solve the problem for now, but it is difficult to tell. All I know right now is that if I remove the __disable_irq() and __enable_irq() then the issue returns, but I do not see a write data break on address 0.

RAc · May 5, 2021, 7:46pm

Is there an assembly window aside from the source window you would want to share with us?

EvanZ · May 5, 2021, 7:52pm

Here is the window showing the disassembly view

jefftenney · May 5, 2021, 7:52pm

I can’t tell if you set the “write” breakpoint correctly. You successfully captured a “read” at location zero (expected), but apparently you didn’t capture a “write” at location zero, even when your code explicitly (accidentally) wrote to location zero. If you can get the write-to-address-zero breakpoint working (verify with an intentional write to address zero), then it might give you some confidence that you’ll catch the error you are searching for.

EvanZ · May 5, 2021, 8:00pm

Thanks, the write of address 0 watchpoint is what lead me to catch the bug in my code which made it appear as though the issue returned (I am actually quite surprised this did not trigger a hard fault in the debugger). It did break in that case. However, after fixing that problem I do not see any more breaks on the write of address 0, despite the issue returning if I comment out the enable/disable irq. With that code enabled, I am not seeing the issue return so it is hard to say what is really causing it to happen… So maybe a false alarm. that it doesn’t return again since ST hasn’t responded to me in weeks. I also plan on changing my board to not use TIM1 in the chance that they were correct in saying that TIM1 was the issue. I will update if I ever hear back from them or if the issue returns again.

RAc · May 5, 2021, 9:54pm

One thing you could do is reduce the disable/enable pair to subsequently smaller pieces of your init sequence until you have pinpointed the exact code that must be protected for the issue to go away. That’ll probably give you a strong hint where to look closer.

EvanZ · May 10, 2021, 8:51pm

So ST has found that the TIM1_UP_TIM16_IRQHandler if firing before the TIM1 peripheral is initialized, which is triggering an access to htim1->Instance which is null. This could explain why disabling all interrupts around the Init() functions makes the issue go away. They also say this has nothing to do with freeRTOS as they can replicate the issue without enabling it in their project generation. I set a breakpoint on HAL_TIM_IRQHandler(&htim1); in TIM1_UP_TIM16_IRQHandler and confirmed that htim1->Instance is null when the issue occurs. They recommended adding a null check while they continue to investigate why the interrupt is firing before the peripheral is initialized. I am surprised this was not caught by the watchpoint.

EG:

void TIM1_UP_TIM16_IRQHandler(void)
{
  /* USER CODE BEGIN TIM1_UP_TIM16_IRQn 0 */

  /* USER CODE END TIM1_UP_TIM16_IRQn 0 */
  if (htim1.Instance != NULL)
  {
      HAL_TIM_IRQHandler(&htim1);
  }
  /* USER CODE BEGIN TIM1_UP_TIM16_IRQn 1 */

  /* USER CODE END TIM1_UP_TIM16_IRQn 1 */
}

Thanks everyone for all the help! I suspect there will be some change to ST code generation in the future to resolve this, but if anyone else sees this happening I hope the above snippet helps you out!