possible bug in software timers

gregrenda wrote on Friday, November 16, 2012:

I’m using version FreeRTOS 7.0.0 but the code in question seems to be the
same in 7.3.0.  The problem is in prvProcessReceivedCommands().
Specifically this at the beginning of the function:

    xTimeNow = prvSampleTimeNow( &xTimerListsWereSwitched );
    while( xQueueReceive( xTimerQueue, &xMessage, tmrNO_DELAY ) != pdFAIL )
    {

The issue I’m seeing is that messages are being retrieved from xTimerQueue
with commandTimes > xTimeNow.  This case is interpreted by code in
prvInsertTimerInActiveList() as indictating that the tick count has
overflowed.  In which case the timer is considered expired, the callback is
called and the timer is restarted if it’s an autoReload timer.  That’s all
well and good but, in my case, the tick count has not overflowed.  What has
happened is that xTimerQueue was being filled by another task at the same
priority as prvTimerTask().  The other task was able to add a timer start
command with commandTime > xTimeNow.  The timer was immediately considered
expired and, as this was an autoReload timer, re-queued, causing the while
loop to never exit since xTimeNow is never recalculated.  This situation
does not occur if prvTimerTask is a higher priority than the task starting
the timers.  Another solution would be to move the call to
prvSampleTimeNow() inside the while loop.  The solution I went with was to
change the while into an if.

So, have I interpreted all of this correctly?  If so, aside from changing
priority levels, which is the preferred fix?  Thanks.

edwards3 wrote on Friday, November 16, 2012:

I’m sure Richard will commont on this, but I don’t understand the scenario you are describing.

The other task was able to add a timer start
command with commandTime > xTimeNow.  The timer was immediately considered
expired and, as this was an autoReload timer, re-queued, causing the while
loop to never exit since xTimeNow is never recalculated.

If that is the case, why is the timer being immediately considered expired?

If commandTime == xTimeNow or if commandTime == (xTimeNow +1) then you can imaging the timer task would rightly attempt to continuously process the timer because you have told it it is expiring continuously, but I don’t think that is what you are reporting.

gregrenda wrote on Friday, November 16, 2012:

In the tmrCOMMAND_START case prvInsertTimerInActiveList() is called and returns true when ( xTimeNow < xCommandTime ) && ( xNextExpiryTime >= xCommandTime ).  This causes the callback to be called and the timer to be re-queued with the current xTaskGetTickCount() value which is now pretty much guaranteed to be > xTimeNow, and around you go.

The problem is that within the while loop xTimeNow is frozen at the value it got before the loop was entered.  Other tasks at the same or higher priority as the timer task can add timer starts to the queue with commandTimes == xTaskGetTickCount which is moving along while xTimeNow is frozen as long as there are messages in the queue.

rtel wrote on Friday, November 16, 2012:

I know there is a note in the description of configTIMER_TASK_PRIORITY saying you have to be careful of the priority assigned to the task relative to other tasks, but need to understand exactly what you are saying before I can comment properly.

An example with sample values might help, for example, this is how I am interpreting description at the moment:

1)
You say

“messages are being retrieved from xTimerQueue with commandTimes > xTimeNow”

and relate this to the start timer commend.  So a task calls xTimerStart() on a timer that is defined to expire at a time greater than the time at which the function is called.  So with sample numbers lets say the time when the message is posted (the CommandTime) is 10, and timer period is 5, so the expiry time is 15.

2) The command is tmrCOMMAND_START, so prvProcessReceivedCommands() calls prvInsertTimerInActiveList()

3) You say

“This case is interpreted by code in prvInsertTimerInActiveList() as indictating that the tick count has overflowed”

Which is the case in prvInsertTimerInActiveList() that uses the test…

“( xTimeNow < xCommandTime ) && ( xNextExpiryTime >= xCommandTime )”

…as per your later post.

Using sample numbers again, to show the correct case - to make this test past, lets assume that xTimeNow (the time when prvProcessReceivedCommands() was entered) is 4 so an overflow has genuinely occurred .

4) You say

“in my case, the tick count has not overflowed”

Can you provide sample numbers, in the way I have above, that shows the case where the test…

“( xTimeNow < xCommandTime ) && ( xNextExpiryTime >= xCommandTime )”

…evaluates to TRUE when the tick count has not overflowed, which I think you are saying is happening to cause the anomalous behaviour you report.

These cases were statically and dynamically tested rigorously, but it is obviously possible that with certain temporal dynamics at differing task priorities, a case slipped through untested.

I would be extremely careful about turning the while() into an if() – it might be ok but I can’t be sure.  I would at least have to run the code through the test cases to see if they picked up anything.

Regards.

gregrenda wrote on Friday, November 16, 2012:

Okay, let me try it this way.  Consider two tasks, a main task creating and
starting timers and the timer task, servicing the timer requests, both
running at the same priority level.

Main task looks like:

     t1=xTimerCreate()    period=200
     xTimerStart(t1)      commandTime=5  expiry=205

     – main task blocks, timer task runs –

     t2=xTimerCreate()    period=200
     xTimerStart(t2)      commandTime=15  expiry=215

timer task looks like:

      xTimeNow = prvSampleTimeNow()   returns xTaskGetTickCount = 10

      while (xQueueReceive(…))
      {
           First time through the loop
               t1 is retrieved
                  commandTime = 5
                  xTimeNow = 10
                  expiry = 205
               everything is fine  xTimeNow > commandTime

           – timer task blocks, main task runs –

           Second time though the loop
               t2 is retrieved
                  commandTime = 15
                  xTimeNow = 10         frozen since we’re still in the loop
                  expiry = 215

               whoops, xTimeNow < commandTime and expiry > commandTime
               timer tick considered to have overflowed.  But it hasn’t.
               Then t2 is re-queued from within this loop with a commandTime of
               commandTime + timerPeriod, guaranteeing that we’ll stay in the
               loop since we’ve added to the queue and guaranteeing that
               we’ll see xTimeNow < commandTime again since xTimeNow will still be
               frozen.
      }

Thanks for looking into this with me. 

rtel wrote on Friday, November 16, 2012:

Yes, thank you, now I understand clearly what you are reporting.

I will run through the very example you have in a simulator, where I can punch in the times manually to fit the case to see if there is an easy solution.

Assuming this is a genuine issue, which at the moment it looks like it is (but I need to exporte more with a clear head, I’ve been travelling for a few days) then you are right that raising the timer task priority is a work around – but a proper fix is needed.

Looking at the code I realise why:

xTimeNow = prvSampleTimeNow( &xTimerListsWereSwitched );

is called outside of the loop, not inside which might be more obvious.  prvSampleTimeNow() will potentially switch the timer lists from the list currently being processed to the list that contains tasks whose timeout is after the tick count has overflowed.  Calling xTimeNow = prvSampleTimeNow() on each loop, I think at least, runs the risk of the lists being switched when the list being switched away from is not empty.  If that happens a timer that has expired, and should be processed before the lists are switched, will remain unprocessed until the tick count value has looped all the way back around – which could be weeks.

There may therefore be one possible solution whereby any timers that remain in the list have assumed to expired, and are processed – but that needs more thought.

Thank you for taking the time to report, and for your patients in explaining three times before I understood.

Regards.

gregrenda wrote on Friday, November 16, 2012:

Glad I could finally make the problem clear for you.  As to possible solutions, it seems to me that turning the while into an if is functionally okay but slightly inefficient if there are multiple items in the queue.    I didn’t look too closely at moving prvSampleTimeNow() into the loop but I think that’s okay, too, since if the lists are switched, the timers on the current list are all processed by prvSwitchTimerLists() so there should be no orphaned timers.  But it’s certainly possible there are other side effects I haven’t considered.   I leave it in your capable hands.

rtel wrote on Sunday, November 18, 2012:

Moving the call to prvSampleTimeNow() inside the while() look does indeed make a test case fail, as I recalled it did.  Investigating this however shows that this is only because of an assumption the test case pre-condition set up makes.

prvSampleTimeNow() contains a static variable, and the test case calls prvProcessReceivedCommands() to set the static variable to the correct value for the test case pre-condition.  Moving prvSampleTimeNow() inside the while loop prevents it being called when there are no items in the timer queue, and so prevents the static variable being set to the pre-condition value.  Changing the test code to call prvSampleTimeNow() directly in the pre-condition set up, rather than indirectly through prvProcessReceivedCommands() allows the test case to pass.  The code I have now is therefore as below:

static void	prvProcessReceivedCommands( void )
{
xTIMER_MESSAGE xMessage;
xTIMER *pxTimer;
portBASE_TYPE xTimerListsWereSwitched, xResult;
portTickType xTimeNow;
	while( xQueueReceive( xTimerQueue, &xMessage, tmrNO_DELAY ) != pdFAIL )
	{
		pxTimer = xMessage.pxTimer;
		if( listIS_CONTAINED_WITHIN( NULL, &( pxTimer->xTimerListItem ) ) == pdFALSE )
		{
			/* The timer is in a list, remove it. */
			uxListRemove( &( pxTimer->xTimerListItem ) );
		}
		traceTIMER_COMMAND_RECEIVED( pxTimer, xMessage.xMessageID, xMessage.xMessageValue );
		/* In this case the xTimerListsWereSwitched parameter is not used, but 
		it must be present in the function call.  prvSampleTimeNow() must be 
		called after the message is received from xTimerQueue so there is no 
		possibility of a higher priority task adding a message to the message
		queue with a time that is ahead of the timer daemon task (because it
		pre-empted the timer daemon task after the xTimeNow value was set). */
		xTimeNow = prvSampleTimeNow( &xTimerListsWereSwitched );
		switch( xMessage.xMessageID )
		{

I think this is a result.

Regards.