Rare case of timer callback with scheduler disabled

@richard-damon recently noted that the timer task might call a callback with the scheduler disabled. Looking at this code from the perspective of a tick-count rollover, it seems he is right. That behavior is probably not desirable, even though it’s a rare case where timers are being processed late and there’s a tick-count rollover and timer callbacks should be short by design and should never attempt to block anyway. If there is interest I would be happy to submit a PR to address this little issue. Here is the idea I had.

1 Like

Looks good @jefftenney :+1:t2: Thanks for your effort !
In my opinion it’s worth the PR to make the timers even more robust.

I would need to check things more closely to be sure, but this change will make the calls to prvReloadTimer called in prfProcessExpiredTimer have the wrong value for xTimeNow. This might not matter, as we will catch up to the current time after we let the lists roll over.

Thanks for looking at the idea. It’s essentially holding off the timer task’s recognition of the rollover. Turns out the idea wasn’t fully baked. This other usage of prvSampleTimeNow() isn’t compatible with holding off the rollover. It would allow the very thing forbidden by the comment above that line. A command would have a newer timestamp (post rollover) than the current acknowledged time in the timer task. No good.

I’m still happy to help, but I would need to put a little more time into it. I do still think it’s worth cleaning up so we never call a timer callback with the scheduler disabled.

My thought, looking at the code is the question, why do we need to stop the scheduler here? My guess is that it may be to try to minimize (but can’ t eliminate) the possibility of time slipping between the getting of xTimeNow and the call to vQueueWaitForMessageResticted later in the code (that is a special Queue operation that is designed to be called with the scheduler suspended, that will resume it before blocking).

Instead, if we can fix the code to not need the scheduler disabled here, and I am not sure it does, but instead check just before calling the wait for message, what the time is NOW, and perhaps loop again if needed.

Interesting idea. Suspending the scheduler is important to the design, but here is a way to delay suspending the scheduler until after the call to prvSampleTimeNow(). Just another idea for now.

I’m not so sure that suspending the scheduler is that important. It seems it is just an attempt to keep the block time correct, but it actually doesn’t do it, as the tick interrupt can still happen and increase the tick, causing the block to go too late.

What would be better would be for the “restricted” function to instead of taking a blocking time, a time to block too. That or make it a real critical section, and not just a scheduler suspension. And actually, having versions of the blocking operations that take a timestamp of the time to block to, rather than a time period could be useful for similar cases in user code.

When the scheduler is suspended, OS ticks accumulate into an xPendedTicks field that is private to tasks.c instead of into the system tick count. That way the system tick count advances only when the scheduler can properly process each tick for task readiness.

As far as FreeRTOS APIs taking a tick count to delay until, I’m not sure we should do that. It would be difficult (impossible?) for the API implementation to decide if a tick time provided by the application is in the future or in the past.

Very interesting discussion.

As @jefftenney explained, this cannot happen because the ticks are accumulated in xPendedTicks when the scheduler is suspended.

I wanted to point out this section in the documentation which states that:

Timer callback functions execute in the context of the timer service task. It is therefore essential that timer callback functions never attempt to block. For example, a timer callback function must not call vTaskDelay(), vTaskDelayUntil(), or specify a non zero block time when accessing a queue or a semaphore.

Given that we prohibit blocking in the timer callbacks, do we need this?

I would say something needs to be fixed. The problem is that statement doesn’t explain why, and if you look at the various times blocking in Timer Callbacks gets discussed, the reason given isn’t the real reason, but is assumed to be because it interfears with other Timer Callbacks getting invoked in a timely manner. And in fact, the prohibition isn’t correct, as Callbacks CAN do those actions, if they first confirm that the scheduler is running, and they accept the other side effects of delaying the timer task.

As to:

This appears to be undocumented behavior, and the Timer task using the behavior, and a restricted function to implement the desired wait on a queue until a specified tick count (instead of blocking for a specified time) which is an operation that user code might want to do also, shows a slight weakness in the current definitions.

As to:

That is true only if you just pass a single tick count. What would need to be done is something like is done in xTaskCheckForTimeout, where we store a “full” tick count (that includes the roll over counter) and the time period, or perhaps have a function that can manipulate the TimeOut_t to point to the “future” time to wait for. Another option would to pass two tick counts, one being a base, that is known not to be in the future, and the second being a tick count known not to be in the past of that base. Given those, we can know if the second tick count is in the current future or not (sort of what xTaskDelayUntil does after adding the delay time to the previous time.)

Thinking a bit of minimal changes, the source of the problem is in here:

 static TickType_t prvSampleTimeNow( BaseType_t * const pxTimerListsWereSwitched )
    {
        TickType_t xTimeNow;
        PRIVILEGED_DATA static TickType_t xLastTime = ( TickType_t ) 0U;

        xTimeNow = xTaskGetTickCount();

        if( xTimeNow < xLastTime )
        {
            prvSwitchTimerLists();
            *pxTimerListsWereSwitched = pdTRUE;
        }
        else
        {
            *pxTimerListsWereSwitched = pdFALSE;
        }

        xLastTime = xTimeNow;

        return xTimeNow;
    }

It is where the code calls prvSwitchTimerLists(), which will run the expired timer callbacks. when this is done inside the Scheduler suspension we get the problem.

This could be remove by testing the scheduler state before calling prvSwitchTimerList(), in if it is suspended, restore it. Then make the call, and if it was suspended, suspended it again, and then refetch the tick count.

No, the proposed change doesn’t seem to be a need. The alternative is for the Timer API docs to indicate that callback functions might be called with the scheduler disabled. If we give the timer implementation that freedom, and publish it in the API docs, then we don’t need to change the code.

The most compelling reason to make the code change is probably to help applications that are currently “getting away with” blocking in a timer callback. Those applications are probably passing all of their tests and seem to function properly, but really there’s a subtle bug that would only show up at tick-count rollover, because the Timer task unexpectedly changes its behavior at tick-count rollover.

Since these applications are already doing something “wrong”, maybe we don’t have any obligation to try to help them out. And since that’s the most compelling reason I could think of, maybe a small documentation update would be better.