FreeRTOS DualCore stm32h745

Hello fellow coders,

I havve run into a problem which i cant seem to solve.

The problem is as followed:
I got a stm32h745xi discovery and i am trying to run freertos on both cores at the same time.
This works while i am debugging with the stlink.

The problem arises when i run the application as is ( no debugger connected), the cm7 core seems to start correct and freertos runs on this core (i let a task blink a led every 1000ms), but the cm4 core never seems to start its tasks blinking task (diffrent led same amount of delay) and seems to be stuck in the osKernelStart. i turn the led used in the blinking led on before the kernel is started. However the led is never turned off and does not start blinking.

When i dont start the cm7 kernel and just let it blink in a while loop in the main the cm4 RTOS does work.

I have no idea what the root cause of this problem is and the deadline for the project is in 2 weeks.

Any advise and tips are welcome.

Hmm, this doesn’t necessarily seem to be a FreeRTOS problem as such as the kernel is running on both cores if you run those cores individually. If it is related to FreeRTOS then I would assume it was something to do with sharing hardware resources. Are the programs on the two cores accessing the same resources? For example you say they are both toggling IO pins - are they accessing the same memory addresses to toggle the IO pins, and if so, is there any locking mechanism that might not be used correctly?

Does this project help at all? https://www.freertos.org/2020/02/simple-multicore-core-to-core-communication-using-freertos-message-buffers.html

They dont use the same resources as far as i can tell, except for the sram3 which is used for the message_buffers like the exaample u send. It could be that i didnt set the MPU for sram3 correct, but could that cause the program to work while debugging but not to work while its running normally?

The leds are connected to seperated PORT, one led is connected to port B and the other is connected to port C. So i dont think that is the problem.

Which Core is the master (sets up the clocks etc.)?

Are you using the gated startup code (so that the slave waits until the master has finished the config and notified it with a hardware semaphore)?

I do have a vague recollection of something not being quite right with the startup code in the STM32H7 cube package. If I get time later I will dig my STM32H745 board out (its a Nucleo not a Disco though).

The cortex m7 is the master and the m4 is the slave and is notified of starting with a hardware semiphore after the clock_init and the hal_init of the m7.

This is the m7 startup code in the main.

MPU_Config();
/* USER CODE BEGIN 1 */
/* USER CODE END 1 */
/* USER CODE BEGIN Boot_Mode_Sequence_0 */
int32_t timeout;
/* USER CODE END Boot_Mode_Sequence_0 */

/* USER CODE BEGIN Boot_Mode_Sequence_1 */
/* Wait until CPU2 boots and enters in stop mode or timeout*/
timeout = 0xFFFF;
while ((__HAL_RCC_GET_FLAG(RCC_FLAG_D2CKRDY) != RESET) && (timeout-- > 0));
if (timeout < 0)
{
	Error_Handler();
}
/* USER CODE END Boot_Mode_Sequence_1 */
/* MCU Configuration--------------------------------------------------------*/

/* Reset of all peripherals, Initializes the Flash interface and the Systick. */
HAL_Init();
/* USER CODE BEGIN Init */
MX_GPIO_Init();
/* USER CODE END Init */

/* Configure the system clock */
SystemClock_Config();
/* USER CODE BEGIN Boot_Mode_Sequence_2 */
/* When system initialization is finished, Cortex-M7 will release Cortex-M4 by means of
 HSEM notification */
/*HW semaphore Clock enable*/
__HAL_RCC_HSEM_CLK_ENABLE();
/*Take HSEM */
HAL_HSEM_FastTake(HSEM_ID_0);
/*Release HSEM in order to notify the CPU2(CM4)*/
HAL_HSEM_Release(HSEM_ID_0, 0);
/* wait until CPU2 wakes up from stop mode */
timeout = 0xFFFF;
while ((__HAL_RCC_GET_FLAG(RCC_FLAG_D2CKRDY) == RESET) && (timeout-- > 0));
if (timeout < 0)
{
	Error_Handler();
}
/* USER CODE END Boot_Mode_Sequence_2 */

/* USER CODE BEGIN SysInit */
MX_GPIO_Init();
/* USER CODE END SysInit */

/* Initialize all configured peripherals */
MX_DMA_Init();
MX_FMC_Init();
MX_DCMI_Init();
MX_JPEG_Init();

/* USER CODE END 2 */

/* Call init function for freertos objects (in freertos.c) */
MX_FREERTOS_Init();
/* Start scheduler */
HAL_GPIO_WritePin(LED_1_PORT, LED_1_PIN, GPIO_PIN_SET);
HAL_Delay(1000);
osKernelStart();

And this is the startup code in the m4:

 /*HW semaphore Clock enable*/
  __HAL_RCC_HSEM_CLK_ENABLE();
  /* Activate HSEM notification for Cortex-M4*/
  HAL_HSEM_ActivateNotification(__HAL_HSEM_SEMID_TO_MASK(HSEM_ID_0));
  /*
  Domain D2 goes to STOP mode (Cortex-M4 in deep-sleep) waiting for Cortex-M7 to
  perform system initialization (system clock config, external memory configuration.. )
  */
  HAL_PWREx_ClearPendingEvent();
  //HAL_PWREx_EnterSTOPMode(PWR_MAINREGULATOR_ON, PWR_STOPENTRY_WFE, PWR_D2_DOMAIN);
  /* Clear HSEM flag */
  __HAL_HSEM_CLEAR_FLAG(__HAL_HSEM_SEMID_TO_MASK(HSEM_ID_0));

/* USER CODE END Boot_Mode_Sequence_1 */
  /* MCU Configuration--------------------------------------------------------*/

  /* Reset of all peripherals, Initializes the Flash interface and the Systick. */
  HAL_Init();

  /* USER CODE BEGIN Init */

  /* USER CODE END Init */

  /* USER CODE BEGIN SysInit */

  /* USER CODE END SysInit */

  /* Initialize all configured peripherals */
  MX_GPIO_Init();
  MX_DMA_Init();
  //MX_USB_OTG_FS_PCD_Init();
  MX_SPI1_Init();
  MX_USART1_UART_Init();
  MX_I2C1_Init();
  MX_SDMMC1_SD_Init();
  MX_FATFS_Init();

  /* Call init function for freertos objects (in freertos.c) */
  MX_FREERTOS_Init();
  /* Start scheduler */
  HAL_GPIO_WritePin(LED_2_PORT, LED_2_PIN, GPIO_PIN_SET);
  HAL_Delay(1000);
  osKernelStart();

I notice this is commented out. It is needed otherwise the M4 will run too soon.
Also are you sure both applications are programmed into Flash?

I still have a feeling I have had this problem in the past. I know I fixed it but I can’t remember how. Unfortunately it was propriety code so I don’t have it available to me now.

Yeah, that line is commented out when we are debugging using the debugger, we pause the m4 before that line and let the m7 run initialisation before we continue the m4.

if something else i can try comes to mind please let me know,
Thanks anyway.

I have noticed on my board with that line commented out (in a non-debug build) the M4 runs very slowly (maybe 10 times slower). This can lead to it looking like it is not working as the LED toggles very slowly. This is because the clock has not been initialised before the m4 has started up properly. Double check that the line is not commented out in non-debug builds.

Failing that, if you could post a complete project that shows the problem (preferably in STM32Cube IDE) then I will take a look tomorrow.

1 Like

Sadly i am not allowed to share the code/project cause it contains propriety code.
The blinking tasks where just for debugging purpuses and to demo the problem to my supervisor.

But i will double check that the line isnt commented out in a non-debug enviroment. It also gave me the idea that maby the timeout time is to short and that the m7 releases the m4 to soon cause of this. I will see if this changing this value works.

Just knock together a little standalone program that demonstrates the problem. No need to post any propriety code.

The timeouts look fine. If it times out it should end up in an error handler anyway.

I am running virtually identical code to you without any issues. I suspect the problem lies elsewhere, either with the low level startup code or the way FreeRTOS is setup.

1 Like

RTOS_TEST.zip (1.5 MB)

This is a standalone program without the propriety code.

I have the code running here but it seems to run OK. I did have to make some minor mods as I am running on the Nucleo board.

I have noticed a few funnies on the M4 in that this line was passing as the values were not NULL but some invalid memory location (which caused a hard fault, due to dereferencing a invalid memory:

while ((M7_ctrlHandle == NULL) | (M7_Message_Handle == NULL))

Not sure if this is somehow related to timing.

I have now setup the debugger to debug both cores at the same time using ST-Link GDB server debugging - See ST AN5361 for details of how to setup the debugger. All seems to work fine.

If I reset the board after debugging both cores seem to run OK. That said after debugging i seem to need to reset the board several time to get it to startup properly.

I do suspect that your problem is something to do with the line above. I would attempt to simplify things further as the code is still somewhat complex.

Thats wierd those 2 handles are located in the D3_RAM and assigned in the cortex m7 core like the example u shared before. Could it be that i made a mistake in the linker script that makes it so that the m4 cant read that memory?

Yes it is a bit weird. With the dual core debugging the M4 can see them fine. Also when I have it free running it must be able to see them fine as it does not crash. It’s only when I debug the M4 (with the M7 running) so I suspect that that issue may be debugger / timing.

I think the way forward would be strip it back further so that each processor has 1 task that flashes the LED. Then start adding stuff back in to locate the issue.

Also setup the dual core debugging as then it can debug both cores at the same time and you don’t need to comment out the power down line in the M4.

I debug both cores with openocd, problem with that line of code is taht the debugger cant reach the m4 while its in sleep mode and then crashes on it.

I might have found the problem.
The example initialises the m7 handles before releasing the m4 and i dont do that, this migth be why the memory is invalid. I cant test this right now cause i dont have the hardware with me atm.

I think taht if i move the init_messager to after the hal_init of the m7 that i could solve the problem.

It seems that if i remove the cm4 handles and only send data from the cm7 tot the cm4 taht it works normal, as soon as the cm4 starts sending data back it stops working.

I have a feeling that somewhere in my functions the memory gets corrupted.
We solved this by adding hardware semaphores around the sending and recieving functions.