Stm32F103 DMA problem

hello,

I using DMA (Stm32F103) with RTOS. I just want to print 2 strings according to running tasks. Following is the code snippet. Problem is on terminal window it only shows Task 2 running if i am not using breakpoint in both tasks. But if I use breakpoint in both tasks then output is fine. Snapshots of outputs are below.

Regards, pavel

int main(void)
{
    /* Configure the system clock to 72 MHz */
    SystemClock_Config();
    
    /* Initialize LED2 */
    LED_Init();
    
    
    
    /* Wait for the end of the transfer and check received data */
    //WaitAndCheckEndOfTransfer();
    
    /* Before a semaphore is used it must be explicitly created. In this example a
    mutex type semaphore is created. */
    xMutex = xSemaphoreCreateMutex();
    
    
    /* Check the semaphore was created successfully before creating the tasks. */
    if( xMutex != NULL )
    {
        /* Create one of the two tasks. Note that a real application should check
        the return value of the xTaskCreate() call to ensure the task was created
        successfully. */
        xTaskCreate( vTaskFunction1, /* Pointer to the function that implements the task. */
                    "Task 1",/* Text name for the task. This is to facilitate
                    debugging only. */
                    configMINIMAL_STACK_SIZE, /* Stack depth - small microcontrollers will use much
                    less stack than this. */
                    (void*)str1, /* This example does not use the task parameter. */
                    1, /* This task will run at priority 1. */
                    NULL ); /* This example does not use the task handle. */
        /* Create the other task in exactly the same way and at the same priority. */
        xTaskCreate( vTaskFunction2, "Task 2", configMINIMAL_STACK_SIZE, (void*)pcTextForTask2, 2, NULL );
        
        xTaskCreate( initTask, "Task 0", configMINIMAL_STACK_SIZE, NULL, 3, initTaskHandle );
        /* Start the scheduler so the tasks start executing. */
        vTaskStartScheduler();
        /* If all is well then main() will never reach here as the scheduler will
        now be running the tasks. If main() does reach here then it is likely that
        there was insufficient heap memory available for the idle task to be created.
        Chapter 2 provides more information on heap memory management. */
        for( ;; );
    }
    
}

void initTask( void *pvParameters )
{
    /* As per most tasks, this task is implemented in an infinite loop. */
    for( ;; )
    {
        /* Initialize button in EXTI mode */
        UserButton_Init();
        
        /* Configure USARTx (USART IP configuration and related GPIO initialization) */
        Configure_USART();
        
        /* Configure DMA channels for USART instance */
        Configure_DMA();
        
        /* Wait for User push-button press to start transfer */
        //WaitForUserButtonPress();
        
        /* Initiate DMA transfers */
        StartTransfers();
        vTaskDelete( initTaskHandle );
    }
}

void vTaskFunction1( void *pvParameters )
{
    char *pcTaskName;
    const TickType_t xDelay250ms = pdMS_TO_TICKS( 250 );
    /* The string to print out is passed in via the parameter. Cast this to a
    character pointer. */
    pcTaskName = ( char * ) pvParameters;
    /* As per most tasks, this task is implemented in an infinite loop. */
    for( ;; )
    {
        ubNbDataToTransmit = 16;
        if (xSemaphoreTake(xMutex, 10)) 
        {
            user_uart_tx(pcTaskName, ubNbDataToTransmit);
            if (xSemaphoreGive(xMutex) != pdTRUE) 
            {
                /* Processing Error */
                LED_Blinking(LED_BLINK_ERROR);
            }
        }
        //vPrintString(pcTaskName);
        /* Delay for a period. This time a call to vTaskDelay() is used which places
           the task into the Blocked state until the delay period has expired. The
           parameter takes a time specified in ‘ticks’, and the pdMS_TO_TICKS() macro
           is used (where the xDelay250ms constant is declared) to convert 250
           milliseconds into an equivalent time in ticks. */
        
        vTaskDelay( xDelay250ms );
    }
}

void vTaskFunction2( void *pvParameters )
{
    char *pcTaskName;
    const TickType_t xDelay250ms = pdMS_TO_TICKS( 250 );
    /* The string to print out is passed in via the parameter. Cast this to a
    character pointer. */
    pcTaskName = ( char * ) pvParameters;
    /* As per most tasks, this task is implemented in an infinite loop. */
    for( ;; )
    {
        ubNbDataToTransmit = 20;
        if (xSemaphoreTake(xMutex, 10)) 
        {
            user_uart_tx(pcTaskName, ubNbDataToTransmit);
            if (xSemaphoreGive(xMutex) != pdTRUE) 
            {
                /* Processing Error */
                LED_Blinking(LED_BLINK_ERROR);
            }
        }
        //vPrintString(pcTaskName);
        /* Delay for a period. This time a call to vTaskDelay() is used which places
        the task into the Blocked state until the delay period has expired. The
        parameter takes a time specified in ‘ticks’, and the pdMS_TO_TICKS() macro
        is used (where the xDelay250ms constant is declared) to convert 250
        milliseconds into an equivalent time in ticks. */
        vTaskDelay( xDelay250ms );
    }
}

void Configure_DMA(void)
{
  /* DMA1 used for USART2 Transmission and Reception
   */
  /* (1) Enable the clock of DMA1 */
  LL_AHB1_GRP1_EnableClock(LL_AHB1_GRP1_PERIPH_DMA1);

  /* (2) Configure NVIC for DMA transfer complete/error interrupts */
  NVIC_SetPriority(DMA1_Channel7_IRQn, 6);
  NVIC_EnableIRQ(DMA1_Channel7_IRQn);
  NVIC_SetPriority(DMA1_Channel6_IRQn, 7);
  NVIC_EnableIRQ(DMA1_Channel6_IRQn);

  /* (3) Configure the DMA functional parameters for transmission */
  LL_DMA_ConfigTransfer(DMA1, LL_DMA_CHANNEL_7, 
                        LL_DMA_DIRECTION_MEMORY_TO_PERIPH | 
                        LL_DMA_PRIORITY_HIGH              | 
                        LL_DMA_MODE_NORMAL                | 
                        LL_DMA_PERIPH_NOINCREMENT         | 
                        LL_DMA_MEMORY_INCREMENT           | 
                        LL_DMA_PDATAALIGN_BYTE            | 
                        LL_DMA_MDATAALIGN_BYTE);
  LL_DMA_ConfigAddresses(DMA1, LL_DMA_CHANNEL_7,
                         (uint32_t)aTxBuffer,
                         LL_USART_DMA_GetRegAddr(USART2),
                         LL_DMA_GetDataTransferDirection(DMA1, LL_DMA_CHANNEL_7));
  LL_DMA_SetDataLength(DMA1, LL_DMA_CHANNEL_7, ubNbDataToTransmit);

  /* (4) Configure the DMA functional parameters for reception */
  LL_DMA_ConfigTransfer(DMA1, LL_DMA_CHANNEL_6, 
                        LL_DMA_DIRECTION_PERIPH_TO_MEMORY | 
                        LL_DMA_PRIORITY_HIGH              | 
                        LL_DMA_MODE_NORMAL                | 
                        LL_DMA_PERIPH_NOINCREMENT         | 
                        LL_DMA_MEMORY_INCREMENT           | 
                        LL_DMA_PDATAALIGN_BYTE            | 
                        LL_DMA_MDATAALIGN_BYTE);
  LL_DMA_ConfigAddresses(DMA1, LL_DMA_CHANNEL_6,
                         LL_USART_DMA_GetRegAddr(USART2),
                         (uint32_t)aRxBuffer,
                         LL_DMA_GetDataTransferDirection(DMA1, LL_DMA_CHANNEL_6));
  LL_DMA_SetDataLength(DMA1, LL_DMA_CHANNEL_6, ubNbDataToReceive);

  /* (5) Enable DMA transfer complete/error interrupts  */
  LL_DMA_EnableIT_TC(DMA1, LL_DMA_CHANNEL_7);
  LL_DMA_EnableIT_TE(DMA1, LL_DMA_CHANNEL_7);
  LL_DMA_EnableIT_TC(DMA1, LL_DMA_CHANNEL_6);
  LL_DMA_EnableIT_TE(DMA1, LL_DMA_CHANNEL_6);
}

Unfortunately some information is missing (e.g. implementation of user_uart_tx, which function prints the task name and/or I AM IN TASK1 ?) to tell something more specific.
Just a few observations:
configMINIMAL_STACK_SIZE is the bare minimum of stack a FreeRTOS task needs.
This is often too less for real applications. Especially with (s)printf family functions being used, which are stack hungry and might even use heap allocation internally in addition.
I strongly recommend to enable stack runtime checks during development.

Seems that ubNbDataToTransmit a global variable used in Configure_DMA but is also used and modified in Task1+2. Modifying a concurrently used global variable like ubNbDataToTransmit should be mutex-protected as well.

However, I guess that the root cause problem could be that user_uart_tx is not synchronous i.e. waits until the DMA transfer has finished and the last char is sent out to the wire. So e.g. in Task 1 just setting up the DMA transfer and return causes that the mutex is given back to Task 2, which immediately sets up it’s own DMA transfer probably overwriting/corrupting the previously configured DMA transfer…
In this case you have to rework the implementation to avoid this race condition.

I don’t get the point of using a temporary, self-deleting init task. What is it good for ?
Is it derived from ST example code ?
BTW vTaskDelete(NULL); is also possible for that purpose. No need to use the handle.

Beware that the main stack must not be used for variables given as arguments for tasks because the main stack is reset and reused as ISR stack for Cortex-M3/4/7 MCUs.
static and global variables can be used, of course.

I hope it helps and I’m not completely wrong with my guessing :slight_smile:

Thanks for all your suggestions, I will keep in mind , following is missing information.
I am using IAR, when I am debugging the code using break points, I am getting output string from both tasks but when i remove breakpoints from inside of both tasks then only Task2 output is coming as shown in above terminal JPG.

char str1[] = "I AM IN TASK1\r\n";   //Global
static const char *pcTextForTask2 = "Task 2 is running\r\n"; // Global

void user_uart_tx(char *ptr, uint8_t size)
{
    if(READ_BIT(flag_DMA, DMA_FULL_TRANSFER_UART) == DMA_FULL_TRANSFER_UART)
    {
        CLEAR_BIT(flag_DMA, DMA_FULL_TRANSFER_UART);
        /* DMA Tx transfer completed */          
        LL_DMA_ConfigAddresses(DMA1, LL_DMA_CHANNEL_7,
                               (uint32_t)ptr,
                               LL_USART_DMA_GetRegAddr(USART2),
                               LL_DMA_GetDataTransferDirection(DMA1, LL_DMA_CHANNEL_7));
        LL_DMA_SetDataLength(DMA1, LL_DMA_CHANNEL_7, size);
        /* Enable DMA Channel Tx */
        LL_DMA_EnableChannel(DMA1, LL_DMA_CHANNEL_7);
    }    
}

void DMA1_Channel7_IRQHandler(void)
{

  /* Check whether DMA half transfer caused the DMA interruption */
  if(LL_DMA_IsActiveFlag_HT7(DMA1) == 1)
  {
    /* Clear flag DMA half transfer */
    LL_DMA_ClearFlag_HT7(DMA1);
    
    /* Call interruption treatment function */
    DMA1_TransmitHalf_Callback();
  }  
  else if(LL_DMA_IsActiveFlag_TC7(DMA1))
  {
    LL_DMA_ClearFlag_GI7(DMA1);
    /* Call function Transmission complete Callback */
    DMA1_TransmitComplete_Callback();
  }
  else if(LL_DMA_IsActiveFlag_TE7(DMA1))
  {
    /* Call Error function */
    USART_TransferError_Callback();
  }
}

Seems it’s like I guessed. user_uart_tx just fires up the DMA transfer and returns immediately. So again e.g. if Task 1 invokes user_uart_tx to send the text this function returns and Task 1 gives back the mutex to Task 2 immediately. Task 2 waiting for it wakes up and re-configures the previously enabled and still running DMA transfer of Task 1 to transfer its text before the text of Task 1 was transferred successfully.
You might prove this assumption by moving the vTaskDelay( xDelay250ms ); right after the user_uart_tx call with the mutex still being held. Although this is bad design it should give the running DMA transfer enough time to complete.
That will probably show the same behavior when using appropriate breakpoints. Stopping at breakpoints e.g. at the line giving back the mutex in both tasks just stops the code and the DMA HW gets enough time to complete the transfer.
While we are at it. I don’t understand why you specify a timeout of 10 ticks waiting for the mutex. Why polling for the mutex when you need to wait/block anyway ?

Better quote C-code with a starting ~~~c <…code…> and ending ~~~ (3 tildes).
It’s much better readable.

Edit:
In other words you should signal DMA transfer completion in the DMA ISR to the task which armed the transfer. The task which started the transfer should wait for the completion of it.
I’d recommend to use a task notification for the signaling, but also a (binary) semaphore or a queue are suitable. You could use a queue if you want to signal the transfer status (ok or error) but this might not be needed. The DMA transfer usually just works if configured correctly.
Alternatively you could add a DMA mutex taken when entering user_uart_tx and given back on completion in the DMA ISR to protect the access to the DMA / the transfer.

I have made the following changes but still the output is same.

void vTaskFunction1( void *pvParameters )
{
    char *pcTaskName;
    const TickType_t xDelay250ms = pdMS_TO_TICKS( 250 );
    /* The string to print out is passed in via the parameter. Cast this to a
    character pointer. */
    pcTaskName = ( char * ) pvParameters;
    /* As per most tasks, this task is implemented in an infinite loop. */
    for( ;; )
    {
        
        if (xSemaphoreTake(xMutex, 0)) 
        {
            ubNbDataToTransmit = 16;
            user_uart_tx(pcTaskName, ubNbDataToTransmit);
            vTaskDelay( xDelay250ms );
            if (xSemaphoreGive(xMutex) != pdTRUE) 
            {
                /* Processing Error */
                LED_Blinking(LED_BLINK_ERROR);
            }
        }
        //vPrintString(pcTaskName);
        /* Delay for a period. This time a call to vTaskDelay() is used which places
           the task into the Blocked state until the delay period has expired. The
           parameter takes a time specified in ‘ticks’, and the pdMS_TO_TICKS() macro
           is used (where the xDelay250ms constant is declared) to convert 250
           milliseconds into an equivalent time in ticks. */
        
        
    }
}

void vTaskFunction2( void *pvParameters )
{
    char *pcTaskName;
    const TickType_t xDelay250ms = pdMS_TO_TICKS( 250 );
    /* The string to print out is passed in via the parameter. Cast this to a
    character pointer. */
    pcTaskName = ( char * ) pvParameters;
    /* As per most tasks, this task is implemented in an infinite loop. */
    for( ;; )
    {
        
        if (xSemaphoreTake(xMutex, 0)) 
        {
            ubNbDataToTransmit = 20;
            user_uart_tx(pcTaskName, ubNbDataToTransmit);
            vTaskDelay( xDelay250ms );
            if (xSemaphoreGive(xMutex) != pdTRUE) 
            {
                /* Processing Error */
                LED_Blinking(LED_BLINK_ERROR);
            }
        }
        //vPrintString(pcTaskName);
        /* Delay for a period. This time a call to vTaskDelay() is used which places
        the task into the Blocked state until the delay period has expired. The
        parameter takes a time specified in ‘ticks’, and the pdMS_TO_TICKS() macro
        is used (where the xDelay250ms constant is declared) to convert 250
        milliseconds into an equivalent time in ticks. */
        
    }
}

>~~~

My bad - leave/add vTaskDelay( xDelay250ms ); after giving back the mutex, too.
Otherwise Task 2 grabs the mutex back immediately after giving it because it has a higher priority.
Again, task synchronization by sleep (vTaskDelay) is very bad design. This should/can be used only as a quick hack/verification.
Please see the hints in Edit part of my previous post.

Ok now i have used binary semaphore.
I got the result as per expectation but i didn’t understand whats happening.

I think when task 2 blocks itself for 3ms then scheduler goes to task1 and print its string.

but why it was not happening when earlier i was giving 250ms delay and using mutex.


void vTaskFunction1( void *pvParameters )
{
    char *pcTaskName;
    const TickType_t xDelay250ms = pdMS_TO_TICKS( 250 );
    /* The string to print out is passed in via the parameter. Cast this to a
    character pointer. */
    pcTaskName = ( char * ) pvParameters;
    /* As per most tasks, this task is implemented in an infinite loop. */
    for( ;; )
    {
        
        if (xSemaphoreTake(xBinarySemaphore, 0)) 
        {
            ubNbDataToTransmit = 16;
            user_uart_tx(pcTaskName, ubNbDataToTransmit);
            //vTaskDelay( xDelay250ms );


void vTaskFunction2( void *pvParameters )
{
    char *pcTaskName;
    const TickType_t xDelay250ms = pdMS_TO_TICKS( 3 );  // 3ms
    /* The string to print out is passed in via the parameter. Cast this to a
    character pointer. */
    pcTaskName = ( char * ) pvParameters;
    /* As per most tasks, this task is implemented in an infinite loop. */
    for( ;; )
    {
        
        if (xSemaphoreTake(xBinarySemaphore, 0)) 
        {
            ubNbDataToTransmit = 20;
            user_uart_tx(pcTaskName, ubNbDataToTransmit);
            vTaskDelay( xDelay250ms );  //3 ms


void DMA1_TransmitComplete_Callback(void)
{
    static BaseType_t xHigherPriorityTaskWoken;
    xHigherPriorityTaskWoken = pdFALSE;
    /* DMA Tx transfer completed */
    ubTransmissionComplete = 1;    
    SET_BIT(flag_DMA, DMA_FULL_TRANSFER_UART);
    /* Disable DMA1 Tx Channel */
    LL_DMA_DisableChannel(DMA1, LL_DMA_CHANNEL_7);
    if (xSemaphoreGiveFromISR(xBinarySemaphore, &xHigherPriorityTaskWoken) != pdTRUE) 
    {
        /* Processing Error */
        LED_Blinking(LED_BLINK_ERROR);
    }
    
    if( xHigherPriorityTaskWoken != pdFALSE )
    {
        // We can force a context switch here.  Context switching from an
        // ISR uses port specific syntax.  Check the demo task for your port
        // to find the syntax required.
        portYIELD_FROM_ISR( xHigherPriorityTaskWoken );
    }
    
}

Don’t get confused and maybe reread the posts and verify the steps of the broken previous version. Key is as explained that now the complete DMA transfer is protected against a concurrently called user_uart_tx to avoid overwriting the currently running DMA transfer. Instead the 2nd or next call now waits until the transfer is complete (signaled by the ISR). Then the DMA is again free to be used for the next transfer.
This implementation should now work without patching any vTaskDelay into the code.

Please note that a timeout of 0 for xSemaphoreTake is used to poll a semaphore. I think it’s better to block/wait for it using portMAX_DELAY (as documented) to allow the scheduler to run other tasks.

My comment was probably a bit misleading. The end marker of quoted code is just 3 tildes ‘~~~’ (without the ‘>’). It’s normal markdown syntax.

Good luck :slight_smile:

If i remove

vTaskDelay

then in terminal only task 2 string gets printed.

I’m sure you can find out the reason with the hints already given and your better understanding of the issue :+1: The debugger is your friend :slight_smile:

Yes, the higher priority task needs to do something to block so the lower priority task can run.