How to solve race condition problem and find them

Manfred-O · May 26, 2020, 7:43am

Hi,
my system has 4 tasks which read out some sensor data and fills a large buffer. After a certain amount i send via Queue to my transmitting task which send data to remote in dma mode with interrupt to notify. I timed all tasks so that there are filled up in 1s. So all of them are finished closed by. I also receive some configuration data and other stuff and send back an acknowledge using the same process. My Queue has just the pointer to those buffers and the length. I not using freertos directly but the wrappers implemented by st in there stm32cubeide.

Now the problem. At starting the system everything works fine and i can see all the datas coming from the device on my remote host. But after some time like hours one of the senor seems not to send data. Digging more into i could isolate the issue that those 4 sensortask are running constantly, hope i could. So my guess is that there must be something between pushing values to the Queue and receiving those in my transmit task. But if who could? All of my sensortask and the receiving task are of higher priority then my transmitting task.

I post my transmitting task

void StartTask04(void *argument)
{
  /* USER CODE BEGIN StartTask04 */
	  uint32_t ulNotificationValue;
	  const TickType_t xMaxBlockTime = pdMS_TO_TICKS(500);
	  struct DMAMessage xRxedMessage;

	  xTaskToNotify = xTaskGetCurrentTaskHandle();
	  /* Infinite loop */
	  for(;;)
	  {
			if(osOK == osMessageQueueGet(myQueue01Handle, &(xRxedMessage), 0, osWaitForever))
			{

#ifdef SER2COP
				COP_ACT_LED_On;
				HAL_UART_Transmit_DMA(&huart3, (uint8_t *)xRxedMessage.tx_dma,xRxedMessage.tx_dma_length);
#else
				if(HAL_BUSY == HAL_UART_Transmit_DMA(&huart2, (uint8_t *)xRxedMessage.tx_dma,xRxedMessage.tx_dma_length))
				{

				}

#endif

				ulNotificationValue = ulTaskNotifyTake(pdTRUE,xMaxBlockTime);
				if(ulNotificationValue == 1)
				{
#ifndef SER2COP
					if(xRxedMessage.tx_ID == 0x10)
						COP_ACT_LED_Toggle;
					if(xRxedMessage.tx_ID == 0x30)
						COP_MOBILE_LED_Toggle;
					if(xRxedMessage.tx_ID == 0x20)
						COP_CPU_HEARTBEAT_LED_Toggle;
					if(xRxedMessage.tx_ID == 0x40)
						COP_STATUS_LED_Toggle;
#else
						COP_ACT_LED_Off;
#endif
				}
			}
		osDelay(20);
	  }
  /* USER CODE END StartTask04 */
}

Help on this topic would be nice

rtel · May 26, 2020, 4:18pm

Before I make any suggestions I want to ensure I understand your post correctly. Is this right?

You say you have four separate tasks that send data on a queue to a single transmitting task. After some time all four tasks that send data to the queue are still running, but you only see data from three tasks being transmitted?

Manfred-O · May 27, 2020, 7:15am

Yes that is correct.

But randomly it could be one or the other in a system. But i will check it again to be sure. My Queue just holds pointer and length to databuffers. For each task i have 2 databuffers so while one is filled the second is transmitted. For that i don’t think the data will be over written somehow and my host will capture them correctly.

For me it is hard to look after a problem which does not popup frequently. Those systems are in the field for long time testing purposes. There is a workaround, just resetting, but it is not my favorite to get through.

Maybe is it a wrong concept. I have 4 sensortask and one receivertask which uses the Queue too. Those are periodic task one has 2ms the rest 100ms in its fastest mode. So transmitting task will always have time to execute. Queue has an item length of 8. I thought of using the Queue as an FIFO, so it stores the datas while my transmission is ongoing. If something is in the Queue, like two or three items, then my transmission task picks one item starts sending via DMA and waits until notified coming from dma tx finish interrupt then waits a little while and repeats. All the transmissions of all tasks does not last longer then 400ms combined.

hs2 · May 27, 2020, 7:35am

Do you rely on meeting the deadlines and delays for managing the double buffers ?
Although this might be doable with an RTOS I think a synchronized system is more robust.
In other words I‘d add a confirm or a reliable mark of the buffers transmitted so that the sender tasks can be sure to grab the next free/unused buffer or at least can raise an assert if the expected buffer is still in use/not yet transmitted.
Just an idea…

Manfred-O · May 27, 2020, 8:25am

Yes i have some constraints like continuity. And yes i could mark those buffers when filled up and ready to send and acknowledge them after successful sending, also for debugging it is a fine idea. There are 8 buffers for 4 tasks i hope there are no collisions in that part.

richard-damon · May 27, 2020, 11:04am

I would make sure that the sending tasks don’t use a ‘wait forever’ but a finite wait time and report an error if the queue gets full (since by your analysis that shouldn’t happen). It sounds like your issue is the queue does get full, at which point if one task has a lower priority, it might get starved at its use of the queue if there is always a higher priority task ready to send data or to be running.

Manfred-O · May 27, 2020, 11:55am

Yes i will check if queue is not overflowing and i will also check if those tasks are really running by notifying them.

But i timed all task to be filled up in 1s, i just count entries in the buffer to become a certain number. So i thought every second there will be 4 items in the queue plus maybe one item of the receiving task so max 5 and in thought there is enough time to send them.

Another thing, i am starting those tasks right away and independently without synchronizing them. Could it be that over time they differ, so that one task will enter two items. Could this lead the queue to get filled up?

richard-damon · May 27, 2020, 12:21pm

There are lots of possibilities, and not being able to run your code, says it is hard for us to say what could happen. The symptoms sound like queue is getting filled, and causing starvation. Testing is probably the best answer here.

Manfred-O · May 29, 2020, 1:06pm

Hi,

could you give me an opinion on that.

…

		/*check for !DRDY goes Low then new data available*/
		do{
			SPI2_CS_ADC_LOW;
			if(GPIO_PIN_RESET ==  HAL_GPIO_ReadPin(SPI_DOUT_GPIO_Port,SPI_DOUT_Pin))                                     
			{
				memset(RX,0,sizeof(RX));
				HAL_SPI_TransmitReceive_DMA(&hspi3,TX,RX,4);
				break;
			}
		}while(del--);


		ulNotificationValue = ulTaskNotifyTake(pdTRUE,xMaxBlockTime);
		if(ulNotificationValue == 1)
		{

			SPI2_CS_ADC_HIGH;

…

the xMaxBlockTime is 2ms. I am asking whether this task could block infinitely although it has a 2ms Timeout. My notification comes from an spi interrupt and should be pretty quick after sending with API “HAL_SPI_Trans…”. It could be like, it reaches the …NotifyTake… before it interrupt occurs or right after it. But in normal process i get my data anyway.

Thanks on your opinions

Kumar · June 14, 2020, 7:00am

Are you using cubeMx for configuring the project. If possible please share the zip file of the project. I am also using the cubeMx for project configuration.