Race condition between thread/interrupt signaling

wiwix wrote on Sunday, February 26, 2017:

Hi all,

I’m trying to debug a serial driver. A binary semaphore has been used to signal data reception from the serial line interruption and the reading task. In rare conditions, my reading task is not awoken, whereas the received byte has been stocked properly in RAM.

In order to debug I have a debugger, and a “debug queue” where I put some debug events to have a RAM history that I read from the debugger.

Thanks to that, I can see that the normal condition is :

  1. reading task waits on signal
  2. char receive
  3. interrupt : char is saved in a ring buffer
  4. signal
  5. reading task awoken

When the race condition arise, the last step (reading task awoken) does’nt happen. Note that the interrupts continue to save char as long as the ring buffer is not exhausted.

I tried 2 implementation of my driver, one with a counting semaphore, and one with a binary semaphore (as a signal), the same problem happen in both cases.

Could someone help me with debug methods to identify what’s happening ? For instance, it’d like to know where my thread is in the source code when it is blocked, do I have a mean to get the associated “program counter” ?

If it is of any interest I’m working with Atmel Studio and the board is an Arduino due.

Here is an extract of the code, please note that in complete code there is a write() function and a second part of the interrupt which is dedicated to send the tx buffer, and to manage/recover from HW errors, let me know if someone needs the complete code.

Note that my circular buffer library is OS-less so no critical section is hidden inside buffers calls, this is why I protect them.

:::C++

ArdUART::ArdUART(...):
...
{
    ...
    rxSignal = xSemaphoreCreateBinary();
    txSignal = xSemaphoreCreateBinary();
}

void ArdUART::read(uint8_t * const byte)
{
    bool byteReceived = false;

    while(!byteReceived)
    {
        portENTER_CRITICAL();
        byteReceived = circular_popByte(&rxBuf, byte);
        portEXIT_CRITICAL();

        if(!byteReceived)
            xSemaphoreTake(rxSignal, portMAX_DELAY);
    }

    //TODO a enlever pour debug
    dh_publish_event("read", *byte, 0);
}

void ArdUART::IrqHandler(void)
{
    UBaseType_t uxSavedInterruptStatus;
    uxSavedInterruptStatus = portSET_INTERRUPT_MASK_FROM_ISR();

    uint32_t status = baseAddr->UART_SR;
    uint8_t rxByte = 0;
    
    // Did we receive data?
    if ((status & UART_SR_RXRDY) == UART_SR_RXRDY)
    {
       //read received char and append it to the buffer
        txByte = baseAddr->UART_RHR;
        if ( circular_appendByte(&rxBuf, txByte) )
        {
            BaseType_t xHigherPriorityTaskWoken = pdFALSE;
            //signal the reading task that some data are available
            xSemaphoreGiveFromISR(rxSignal, &xHigherPriorityTaskWoken);
            //force context switch if a task with higher priority is awoken
            portYIELD_FROM_ISR(xHigherPriorityTaskWoken);

            //TODO debug : save event in history
            dh_publish_event_fromISR("it::RX", txByte, 0);
        }
         //if the append failed, the buffer is overflown, save the error
        else
        {
            nbRxBytesLost++;
            lastError = ERR_OVERSHOOT_RX;
            //TODO debug : save event in history
            dh_publish_event_fromISR("ErrOvRx", nbRxBytesLost, 0);
        }
    }

    portCLEAR_INTERRUPT_MASK_FROM_ISR( uxSavedInterruptStatus );
}

richarddamon wrote on Sunday, February 26, 2017:

My first guess to the problem is that somewhere you have an interrupt priority issues. Do you have configASSERT defined to trap this sort of error? Having an interrupt using FreeRTOS at too high of a priority can corrupt some of the system lists needed to make it work. You can also get some of these issues if you let a task overrun its stack.

spachner wrote on Sunday, February 26, 2017:

Hi,

I assume your interrupt disable/enable you are using

portSET_INTERRUPT_MASK_FROM_ISR();

portCLEAR_INTERRUPT_MASK_FROM_ISR( 0 );

works only as expected when ArdUART::IrqHandler() runs on lowest priority (i.e. less urgent priority).

Inside xPortSysTickHandler() is doing this, but it notes that lowest priority is expected.

void xPortSysTickHandler( void )
{
	/* The SysTick runs at the lowest interrupt priority, so when this interrupt
	executes all interrupts must be unmasked.  There is therefore no need to
	save and then restore the interrupt mask value as its value is already
	known. */
	( void ) portSET_INTERRUPT_MASK_FROM_ISR();
	{
		/* Increment the RTOS tick. */
		if( xTaskIncrementTick() != pdFALSE )
		{
			/* A context switch is required.  Context switching is performed in
			the PendSV interrupt.  Pend the PendSV interrupt. */
			portNVIC_INT_CTRL_REG = portNVIC_PENDSVSET_BIT;
		}
	}
	portCLEAR_INTERRUPT_MASK_FROM_ISR( 0 );
}

regards

spachner

rtel wrote on Sunday, February 26, 2017:

Here is a link that gives a little more information about the problem
Richard D is describing: http://www.freertos.org/RTOS-Cortex-M3-M4.html

wiwix wrote on Thursday, March 02, 2017:

Hi guys, thanks a lot for your answer. Good catch, for sure there were an issue with interrupt priorities as I didn’t configured it and (as explained) the default highest priority was configured which is not masked to the OS.

I made some unit test with leds and timer interrupts to check the config.

I have a configAssert defined with a breakpoint set on it, but for some reasons my debugger is not alsways trigger, so I may have missed some issues. I’m currently working on this.

Is there any way to force FreeRtos to check stacks ? As far as i know FreeRtos only check stacks at context switch but as I’m in a unit test I only have 1 task.

Thanks for pointing xPortSysTickHandler, I think I have something to learn, but I didn’t understand what you want to tell me, could you please rephrase or point me to any relevant documentation please ?

rtel wrote on Friday, March 03, 2017:

The OS can only check the stack when it is executing. Most of the time
your compiled application code is executing, rather than the OS itself,
hence the OS checks the stack when it switches from one task to another.
If you have configCHECK_FOR_STACK_OVERFLOW set to 2 you could insert
the same stack checking macros into every API call, but that would be
laborious to do and slow down your system significantly - presumably if
you are doing testing you want your system to behave as normally as
possible.

An alternative would be to use an MPU port (a port that uses a memory
protection unit) as that will generate a hardware exception if you write
outside of your stack region. The programming model is very different
to using a non-MPU port though so suspect that will not be a good
solution for you.

heinbali01 wrote on Friday, March 03, 2017:

And maybe a third approach: give your task(s) a lot of stack, twice as much as you think it needs.

Now make a command that calls uxTaskGetSystemState(), or call it regularly. The field usStackHighWaterMark indicates the number of unused words in the task stack.

For instance, I start :

    #define mainPRIORITY_MAIN_TASK     2
    #define mainSTACK_SIZE_MAIN_TASK    400
    xTaskCreate( vMainTask, "vMainTask", mainSTACK_SIZE_MAIN_TASK, NULL, mainPRIORITY_MAIN_TASK, NULL );

The usStackHighWaterMark for that task drops to 139, 130 and stays there. It means the 270 words (1080 bytes on a 32-bits CPU) have been used.

No guarantees can be given, unless you have checked all possible paths of execution.