xTimerStop race condition

Hi,

in my application, I make extensive use of the FreeRTOS software timers. One thing that I’ve wondered how to handle is a race condition concerning xTimerStop(). Specifically, with the given API it is not possible to disambiguate whether the timer has been stopped by the call or whether the timer has already been fired (i.e. the timer callback was called) before calling xTimerStop().
The only thing I came up with is to call first xTimerIsTimerActive() and thenxTimerStop() all in a critical section. I wonder whether there are better alternatives.

First thing you could do in your callback is set a flag on which concurrent tasks could sync. Of course the concurrent task could still try to stop the timer within the few cycles before the callback is executing. What are the relative priorities of the involved tasks?

Thanks for the reply.

First thing you could do in your callback is set a flag on which concurrent tasks could sync

True. I forgot to mention that the whole architecture of my application is based on message queues. So, tasks communicate via message passing. And all my timer callbacks are actually just “enqueue a timeout message in the message queue of this task”. Furthermore, as for now, the tasks can only pop messages from the message queue. They can’t lock the message and inspect the whole content of the message queue.
That’s why it is important for me to make sure that no “timeout message” ends up in the message queue if the task expects that the timer has been stopped before the timeout is fired.

What are the relative priorities of the involved tasks?

The logic of stopping a timer when not applicable anymore (i.e. a timer is started to signal the timeout of a potentially long-running operation, but that operation finished in time, thus the timer is stopped) is used throughout the codebase and there are some tasks which do this which have a higher priority than the FreeRTOS timer task as well as some other tasks which have lower priority.

I’d say either you follow your reasonable approach ensuring valid timeout messages/events or alternatively handle orphaned timeout events in the target state machines means ignore them (I guess this happens rarely) if the timeout event is not longer applicable/timer was cancelled. In case you have the information there, of course.

Thanks for the confirmation. Yeah, just like you suggested, what I do in the target state machine is:

void action_stop_timeout(void) {
    enter_critical_section();
    state.stopped_after_fired = !xTimerIsTimerActive();
    xTimerStop();
    exit_critical_section();
}

then when the timeout message is received from the message queue, I just check state.stopped_after_fired.

I asked here to make sure that I wasn’t missing something obvious that FreeRTOS already provided, but I guess the way I’m doing it is okay then :smile: .

There might be a 2nd level ‘race’ condition when a timeout message was already enqueued after the timeout was correctly handled when stopping a timer, right ?
Means you also might need to handle (ignore ?) this late timeout event.

Mmm. I’m not exactly sure what you mean.

when a timeout message was already enqueued after the timeout was correctly handled when stopping a timer

if that’s the case, then xTimerIsTimerActive would return false, thus state.stopped_after_fired would be true. Then, when the task receives the timeout message from the message queue, since state.stopped_after_fired is true, the timeout message will be ignored.

That’s exactly what I mean :slight_smile: Didn’t know that you check the state in the target state machines. Sounds good. As a last thought because the state struct is shared data between multiple tasks you should make it/the shared members volatile.
You know, ‘… data can get changed without knowledge of the compiler…’

Oh thanks for the additional note. That’s also taken care of :slight_smile: . Because state is only accessed by the target state machine. In fact action_stop_timeout is called by the target state machine.
So, e.g. the state machine starts a long-running operation and calls action_start_timeout and when the long-running operation finishes in time action_stop_timeout (which accesses state) is called. Then the state machine pops another message from the message queue and so on and when it gets a timeout message, the state machine checks whether state.stopped_after_fired is true and if true ignores the timeout message.