xTaskCheckForTimeOut Faulty Behavior

The xTaskCheckForTimeOut() is conceptually wrong as is in V10.4.3.

If the elapsed time is equal to the time allowance, it signals that a timeout has occurred.
This is wrong!
In this case, a timeout has not yet occurred and if all subsequent functions complete immediately a timeout will never occur.
As a clear example: This implementation, as is, would result in functions timing out if a 0 wait time is specified, even if all necessary recourses are immediately available.

On line 3348 “else if( xElapsedTime < *pxTicksToWait )” should be modified to “else if( xElapsedTime <= *pxTicksToWait )”

Thanks for your report. I’m not quite following the logic, especially the part about a timeout never occurring, so would be grateful if you can provide a worked example that would exhibit the incorrect behaviour. For example, value for the pxTimeOut and pxTicksToWait parameters as well as the value for xTickCount (the current tick count value).

Consider this:

int main()
{
    Init();

    vTaskStartScheduler();
}

SemaphoreHandle_t bar;

bool Init()
{
    xTaskCreate(&fooBarTask, 
        "fooBar",          
        configMINIMAL_STACK_SIZE,    
        NULL, 
        1,          
        NULL); 

    bar = xSemaphoreCreateCounting(10, 3); //3 events immediately available
}

bool foo(TickType_t xTicksToWait)
{
  //record time of entry
  TimeOut_t xTimeOut;
  vTaskSetTimeOutState( &xTimeOut );
  
  //wait for 3 events and process them
  for(size_t i = 0; i < 3; i++){
    
    //wait for event
    if(pdFALSE == xSemaphoreTake(bar, xTicksToWait)){ //if the event didn't come in time
      return false; //signal failure
    }
    
    //process event
    //...
    
    //get remaining time allowance
    if( xTaskCheckForTimeOut( &xTimeOut, &xTicksToWait ) == pdTRUE ) //if we timed out
    {
      return false; //signal failure
    }
  }
  
  return true; //signal success
}

void fooBarTask(void* pvParam)
{
    bool OK = foo(0); //try to process 3 events or as many as immediately available

   //OK will be false and only one event will be processed, even though 3 was available, no wait was needed, xTaskCheckForTimeOut incorrectly indicated a timeout

    //...
}

All three necessary events for completing foo without delay is immediately available, nonetheless xTaskCheckForTimeOut will return pdTRUE signaling a non-existent timeout.

Similar faults will occur for all cases where (xTickCount - pxTimeOut->xTimeOnEntering) == *pxTicksToWait, such as the quiet typical xTickCount == pxTimeOut->xTimeOnEntering && *pxTicksToWait == 0.

Just because the allowed time elapsed, that does not mean that there was a time out. It can very well be, that we are just in time to finish everything, which we can do, if everything is immediately available.

The documentation specifically says:

“If pdTRUE is returned then no block time remains, and a timeout has occurred.”

This is not the case though at present time. Right now it should read: If pdTRUE is returned then no block time remains
However, it should behave as the documentation suggest, otherwise it is of limited use…
So it is the function that requires fixing, not the documentation!

As far as I can see you are highlighting a case that would hit an assert because you are passing in 0 as the requested timeout. It is highly recommended to always do development work with configASSERT() defined.

I’m sorry but you are wrong.

For one, those are asserts on pointers (at least in the version I have, V10.4.3) so they just check that the pointers are not NULL.
Second, that would be really really really bad if it not just handled 0 timeout wrong, but it caused a straight out assert failure.
Third, I do think I clearly described that the specified timeout doesn’t need to be 0 for this to return stupid.

This does need to work correctly with 0 time out, and it does need to work correctly when there is no block time remaining but a timeout did not occur.
Otherwise this function is useless, confusing, and dangerous. Plus the documentation is wrong.

You are right on the assert - it is a NULL check - but I still need to understand where you are asserting this is:

Below I describe the intended operation which is also demonstrated in the example code on this page - at which point are you reporting a deviation from this behaviour?

The vTaskSetTimeOutState() and xTaskCheckForTimeOut() functions are necessarily paired. vTaskSetTimeOutState() captures the time at which it is called. This is then used as a reference in future calls to xTaskCheckForTimeOut(). So lets assume the initial timeout (*pxTicksToWait) is set to 5 and you call xTaskCheckForTimeOut() 1 tick after calling vTaskSetTimeOutState() - so the timeout hasn’t expired yet. In that case xTaskCheckForTimeOut() decrements the timeout to 4, as there are 4 ticks remaining in the timeout, and returns a value to say the timeout has not occurred yet. So far so good. Now lets assume that happens another 3 times, so the tick count has been decremented to 1, still with no timeout. Now assume at least one tick passes before xTaskCheckForTimeOut() is called again - this time, as the timeout was 1 on calling the function and at least 1 tick has passed since the last call there has actually been a timeout - so the timeout (*pxTicksToWait) is set to zero and the function returns that a timeout has occurred. Now if the application writer for some reason ignores the return value and calls xTaskCheckForTimeOut() again the timeout being requested has already reached zero - so the timeout has already occurred and the function returns a value indicating as such.

When can it be possible for there to be no timeout remaining without a timeout occurring?

This reminds me of that piece from that classic “The Mythical man-month”.

I think this behavior has been like this for so long that changing it now will likely break a lot of people’s code.

Here is the bit it reminded me of for those who do not have the book.

“Using an implementation as a definition has some advantages. All questions can be settled unambiguously by experiment. Debate is never needed, so answers are quick. Answers are always as precise as one wants, and they are always correct, by definition. Opposed to these one has a formidable set of disadvantages. The implementation may over-prescribe even the externals. Invalid syntax always produces some result; in a policed system that result is an invalidity indication and nothing more. In an unpoliced system, all kinds of side effects may appear, and these may have been used by programmers. When we undertook to emulate the IBM 1401 on System/360, for example, it developed that there were 30 different “curios” side effects of supposedly invalid operations that had come into widespread use and had to be considered as part of the definition.” -
Brooks Jr., Frederick P… Mythical Man-Month, Anniversary Edition, The (pp. 74-75). Pearson Education. Kindle Edition.

1 Like

I think the issue here is that a timeout of zero has a slightly different meaning than a timeout greater than 0. With a Timeout greater than 0, if Timeout tick occur without having meet the need, a timeout HAS occurred. It doesn’t matter if after that timeout when we wake up we can now get what we were looking for, the timeout did occur.

If a timeout of 0 had that exact same meaning, then ALL calls with a zero timeout would timeout, and not be that useful, so we use a slightly different meaning to ask can the need be meet without having to block.

For a single request, just doing the request and seeing if it is meet is good enough so we don’t notice the difference, the request operation can take care of it. When needing multiple requests, I don’t think xTaskCheckForTimeOut saves enough state to know the difference, so it. may require the user code to do something to support 0 tick waits, which are a bit unusual in this sort of context.

It might be possible to special case inside xTaskCheckForTimeOut for the 0 wait case, if it can never compute an updated value of 0, which would take a bit a analysis,

A natural language description of the intended use of this function pair is “have x ticks passed since a fixed point in time, and if not, how many ticks remain until x ticks have occurred since that fixed point in time.” The functions were originally private to the kernel where they were used to answer the question “how long have I been blocked for?”. For example, if a task enters the blocked state to wait for an event, is unblocked because the event occurs, but finds the event clear again when it actually executes (maybe it was consumed by a higher priority task), then the task needs to adjust its block time for whatever time remains then re-enter the blocked state using the adjusted time so it times out at the same time it would have done had it not been unblocked once already. The two functions in question were exposed to the application (made public function) at the request of users.

Perhaps the name of the second parameter should have not been “time out” but “time remaining” - “time out” reflects its original private usage scenario. Effectively this is just doing a calculation that were it not possible for the tick count to overflow multiple times in the interim could just be “time-now minus time-at-start”. In the intended use cases, if the time remaining reaches zero and the function returns that a timeout has not occurred, then there would be no way out and the system would deadlock as repeated calls could only ever say there was no timeout.