What if a tick gets pended after xExpectedIdleTime is calculated?

caseymillsdavis wrote on Monday, September 21, 2015:

In the GCC/ARM_CM3 implementation of vPortSuppressTicksAndSleep, a SysTick interrupt could fire and pend a tick after the scheduler is suspended and before the SysTick has been stopped. ulReloadValue would then get calculated using the SysTick current-value-register that has just been refreshed and a value of xExpectedIdleTime that doesn’t account for this recent tick. I think the result would be a SysTick wake-up scheduled a tick too late.

Is this reasoning correct?

rtel wrote on Monday, September 21, 2015:

This could conceivable happen if the Idle task started to run right at the end of a time slice (immediately before a tick interrupt).

I’m not sure the scenario you highlight adds any additional uncertainty than say, calling vTaskDelay( 1 ). A time measured in ticks is only accurate to the resolution of one tick period, so vTaskDelay( 1 ) could block for 0.001 of a tick period, or 0.999 of a tick period, depending on how far through the time clice it was called. However, unlike a pure vTaskDelay( 1 ), it could result in some time slippage - but that is noted in the code comments.

Anything that stops the timer will inevitably result in some time slippage forward or backwards (depending on the compensation value). The scenario you highlight could be mitigated by updating eTaskConfirmSleepModeStatus() so it returns eAbortSleep if a tick is pending (it already aborts if the tick attempted a context switch), but then you are more likely to abort a sleep, which in turn means you will be turning the clock off more often (when re-trying an aborted sleep) so I’m not sure you gain anythng.

Other than some clock slippage (relative to calendar time) - do you see anything actually dangerious in the implementation. By that I mean, somethign that could cause a crash, or lock up, etc.?

Regards.

caseymillsdavis wrote on Monday, September 21, 2015:

caseymillsdavis wrote on Monday, September 21, 2015:

Thanks for the fast reply.

Your example from “vTaskDelay( 1)” clarifies things and I think you are right that this scenario does not add any more uncertainty than that. I didn’t see any other concern apart from clock slippage.

Hi Richard, I think the “fix” you contemplated here might be worth doing after all. Imagine a system with just this one task:

void jitterTask(void* unused)
{
    TickType_t xPrevWakeTime = xTaskGetTickCount();

    while (1)
    {
        vTaskDelayUntil( &xPrevWakeTime, 5 );
        doSomeWork();
    }
}

If doSomeWork() finishes just before a tick interrupt, it might cause the next execution of doSomeWork() to start one tick late. So if we were to list the values of xTickCount upon each return of vTaskDelayUntil(), they might look like this:

5, 10, 15, 21, 25, 31, 35, 40, 45, 50, 56, …

And the same would happen with auto-reload timers.

I must confess this really is not a big deal. There’s no real drift or slippage here, just jitter. But I for one would still vote for a fix. And I think the fix you contemplated above is smart.

Yes, changing the return value of eTaskConfirmSleepModeStatus() may negatively impact existing systems that have tuned (out) the clock drift associated with tickless idle, but I think the impact would be very small. If nothing else you could make it a configuration option.

Jeff

I think that is what this does: https://github.com/FreeRTOS/FreeRTOS-Kernel/blob/V10.3.1-kernel-only/tasks.c#L3524

I’m thinking of the case where xYieldPending is false at the time eTaskConfirmSleepModeStatus() executes. In the example above, consider xTickCount = 16 and doSomeWork() has just finished. While the system is preparing to suppress ticks until tick 20, tick 17 comes but gets pended (xPendedTicks = 1). So xTickCount remains 16, and eTaskConfirmSleepModeStatus() sees xYieldPending is false. So we go to sleep for 4 ticks (xExpectedIdleTime = 4). When vTaskDelayUntil() returns, we’re at xTickCount = 21 instead of 20.

Don’t have code in front of me right now to check, but even so, I don’t think the hardware will enter sleep mode if there is already an interrupt pending. The wfi instruction would be waiting for something that had already happened. Don’t have the hardware manual in front of me right now to check that either…

Oh, I didn’t mean the interrupt is pending, I meant the tick itself is waiting in xPendedTicks. That happens when the tick ISR executes with the scheduler suspended. Sorry for the miscommunication.

The OP stated the problem well. I just described additional fallout from the same problem. I mean, the post is only 5 years old, it should be fresh in your mind right? :slightly_smiling_face:

To summarize:

  • Tickless sleep can oversleep by 1 tick.
  • The root cause is a race condition that cannot currently be managed in the portable layer.
  • The proposed fix is to update eTaskConfirmSleepModeStatus() to return eAbortSleep if ticks are pended (xPendedTicks != 0).

From what I can see, the proposed fix resolves the issue for all official FreeRTOS ports.

And the case for applying this fix is arguably stronger when you consider there are processor-specific tickless implementations out there that have no drift – they never stop the timer.

I got the checkin comment a bit messed up but see here https://github.com/FreeRTOS/FreeRTOS-Kernel/commit/e1b98f0b4be153cf6590c5b5b2f76b001af71c51

1 Like

Awesome! Big thanks, Richard! And thanks for being so connected to the user community.

2 Likes

There will be a few more aborted sleeps (eAbortSleep) after this change, so now might be a good time for an optimization something like this:

Don’t stop systick for eAbortSleep

It would probably be a lot of work to duplicate this change among all the official ports with tickless implementations, and I’m not sure the benefit justifies the work.

Just had a brief look but there may be the potential for a race condition in this:

Don’t stop systick for eAbortSleep

As it is checking to see if the sleep should be aborted while the clock is still running - if that were the case it would have to check again after the clock
had been stopped. Or did I miss something in my haste?

I think it manages the race(s) correctly. Yes, the timer keeps running during the check for abort. Then assuming there’s no abort, the new code checks for, and handles, a timer expiration that occurs after we entered the critical section.

It’s interesting (but good I think) that a timer expiration that occurs before the critical section, causes an abort (with your recent code change), but a timer expiration that occurs in the critical section is handled by the sleep operation (my suggested code). For perspective, the old code prevented timer expiration in the critical section by stopping the timer before the critical section.

I added a comment on the commit to explain. I think the code works, but it may not be worth the effort to integrate it into all the official tickless implementations. I wasn’t sure if you forgot about this or decided it wasn’t worth doing or too risky etc.