Delaying by 0 ticks violates assert

richard-damon · December 10, 2020, 8:16pm

Actually, it sounds like he want to delay to a give tick count so maybe vTaskDelayTo(TickType_t timeStamp);

cobusve · December 10, 2020, 8:39pm

It could be more efficient for the current one to use that one …

joelfrederico · December 10, 2020, 8:39pm

I’m on the fence of doing one or not. I don’t like doing things part-way, and I’d want to introduce an API that mirrors the C++ API. I think there’s some major issues with the way your implementation of ticks has leaked into your API. For example, in the same function, I had asserts triggered because your documentation said that:

The task will be unblocked at time (*pxPreviousWakeTime + xTimeIncrement).

I am used to the C++ idiom of keeping track of my own wake times if I want a wake to happen at regular intervals over long time periods. So I did that by changing the pxPreviousWakeTime value and setting xTimeIncrement = 0. I had two asserts failing, not one. My MCU doesn’t really have a debugger, and my configASSERT had been defined by whoever adapted FreeRTOS to SAM D21. I couldn’t find the error until I looked at the source code.

richard-damon · December 10, 2020, 11:17pm

One thing to watch out for, I believe FreeRTOS will assume that pxPreviousWakeTime is a time in the past, as it was the last time that the routine was woken. If you change it to be a time in the ‘future’, because time wraps in FreeRTOS, it might take that as a time in the distant past.

Note, your C++ API used a time that was kept to a big enough range that it is assumed to NEVER recycle. This puts minimum sizes onto the types used for this item. FreeRTOS is run on processors where the Tick is naturally a 16 bit value, on such machines, it is expected that you will see the tick roll over, and the API is designed to handle those case (things like I believe that vTaskDelayUntil will assume the pxPreviousWakeTIme is in the past even if larger than the current tick, assuming that things wrapped).

rtel · December 11, 2020, 3:24pm

How do you obtain the wake time, rather than the time the task actually started running? A task may be moved out of the Blocked state some time before it is selected to run.

simonjwright · December 11, 2020, 6:25pm

Being used to Ada’s delay until construct (delay until an absolute time; in the language since 1983), I took some time to appreciate how vTaskDelayUntil() was supposed to work.

To get an undrifting delay, we would say e.g.

loop
   --  do stuff here
   delay until Next_Time;
   Next_Time := Next_Time + Period;
end loop;

which is I think what Joel is suggesting.

simonjwright · December 11, 2020, 6:29pm

I think you’d be upset if the developers of Boost or STL took that approach

richard-damon · December 11, 2020, 6:43pm

Note, that vTaskDelayUntil works that way, only it implements the Net_Time := Next_Time as part of the delay operation, (through the pointer to the last woken time parameter) and you pass the function the Period parameter.

Note again, that sort of system assumes that Next_Time is in some absolute time frame that doesn’t roll over. FreeRTOSs timer tick value is NOT promised to be that sort of value, and in fact has a lot of code to handle the timer tick rolling over.

With 32 bit processors with 32 bit ticks, it can be easy to forget that fact, since the roll over time can be months, but for some systems it can still happen. There have even been suggestions (don’t remember if it has been implemented) to allow for testing to start the tick counter near the roll over to allow testing that the code can handle that.

All the proposals for delaying until a specified tick count will have an issue of deciding if the count represents a point of time in the recent past of the far future. Either they need to define a window of time that this point will represent or use something possibly bigger than the current tick counter (especially if it is just 16 bits).

joelfrederico · December 11, 2020, 8:08pm

I got rate-limited by the forum.

I’m going to avoid the nested replies.

To all:

Why? The xTickCount isn’t defined and updated in port-specific code. Since it’s not port-defined but FreeRTOS-defined, why not use more bits? Its value is stored in memory and it’s updated here:

github.com

FreeRTOS/FreeRTOS-Kernel/blob/47338393f1f79558f6144213409f09f81d7c4837/tasks.c#L2740


      
          

          /* Called by the portable layer each time a tick interrupt occurs.

           * Increments the tick then checks to see if the new tick value will cause any

           * tasks to be unblocked. */

          traceTASK_INCREMENT_TICK( xTickCount );

          

          if( uxSchedulerSuspended == ( UBaseType_t ) pdFALSE )

          {

              /* Minor optimisation.  The tick count cannot change in this

               * block. */

              const TickType_t xConstTickCount = xTickCount + ( TickType_t ) 1;

          

              /* Increment the RTOS tick, switching the delayed and overflowed

               * delayed lists if it wraps to 0. */

              xTickCount = xConstTickCount;

          

              if( xConstTickCount == ( TickType_t ) 0U ) /*lint !e774 'if' does not always evaluate to false as it is looking for an overflow. */

              {

                  taskSWITCH_DELAYED_LISTS();

              }

              else

Why not make TickType_t 64-bit number? Are you worried about MCU’s with limited registers? Are you worried that the update would be expensive? I can suggest solutions to deal with every issue I can think of, with minimal performance hits. Why introduce edge cases with rollovers rather than just say, “rollovers happen in half a billion years, if your device is still around, undefined behavior”? (Edit: even with ticks representing nanoseconds, that’s 584 years)

And why expose the tick count implementation to developers? What makes it worth it to deal with overflows rather than just getting rid of the problem entirely?

Delay algorithm (Simon + Richard):

simonjwright:

To get an undrifting delay, we would say e.g.
loop
   --  do stuff here
   delay until Next_Time;
   Next_Time := Next_Time + Period;
end loop;
which is I think what Joel is suggesting.

That delay is exactly how I’d implement things. The systems I work with have more bits for tracking the internal system time so they don’t have to worry about overflows, which obviate Richard’s concerns. A la above. If you read the standard, C++'s STL must require that.

User philosophy:

If you make bad mistakes in the STL, undefined things do happen… Ever access a std::vector without bounds checking? I have. I would be annoyed if the STL didn’t offer options with undefined behavior. If every std::vector access was bounds-checked, it would be entirely unacceptable to me.

richard-damon · December 11, 2020, 9:14pm

From what I remember the type used for the tick gets its default out of the port layer and can be overridden in FreeRTOSConfig.h

The key is that for a 16 bit processor, it may be MUCH more efficient to use a 16 bit type for the tick counter than to use a 32 bit value. Since much of FreeRTOS works with tick values, it is important to let it be efficient. FreeRTOS even can be used on some machines that are more of an 8 bit processor, where 16 bits is usable, but 32 bits would be very expensive.

FreeRTOS also goes back to a point in time where 16 bit processors were very common, so keeping the code efficient for the 16 bit processor was important. Another big key is that a 16 bit tick counter on a 16 bit processors could be accessed atomically, so you didn’t need to use a critical section on accessing a tick value that was shared between tasks, or tasks and interrupts.

The key is that FreeRTOS is targeted to be useful on SMALL machines as well as bigger machines. In fact, I would say that in many cases as you get to the bigger machines and want the fancier features they provide, it might make sense to move to an OS designed for that bigger environment.

Maybe there is room for a ‘FreeRTOSPlus’ design that removes the limitations that are in place partially to handle the smaller processors. One ceiling to that design is it needs to remember to be significantly simpler than embedded Linux. There is already that market being served, and it is served fairly well for those needing that level.

joelfrederico · December 11, 2020, 10:56pm

So here’s what I’d do. Let the implementation stay as-is. But I’d hide the implementation from the user for sleep API’s.

You need to have more bits saved someplace to track time. So I’d set aside (64 - 16 = 48) bits of memory for that purpose:

unsigned char time_bits[6];

Those bits need to be updated when the counter overflows. So in the increment section, I’d add additional logic for overflows:

void increment_time_bits(uint16_t* address) {
  *address += uint16_t(1);
  if (*address == 0) {
    // Recursion means overflows are handled
    // in chunks of 16 bits, for 16-bit machines
    increment_time_bits(address + 1);
  }
}

void xTaskIncrementTick() {
  ...
  if (xTickCount == 0) {
    increment_time_bits( (uint16_t*) time_bits);
  }
  ...
}

Then, I’d provide an alternative to get the current time:

uint64_t now() {
  uint64_t result;
  unsigned char* result_ptr = &result;
  { 
    // Disable interrupts and/or critical section,
    // to prevent overflows while getting the time
    memcpy(result_ptr, &xTickCount, 2);
    memcpy(result_ptr+2, &time_bits, 6);
  }
  return result;
}

You’d need to fix the sleep_until function:

void sleep_until(uint64_t time_point) {
  uint64_t entry_time = now();
  if (time_point <= entry_time) {
    return;
  }
  sleep_for(time_point - entry_time);
}

void sleep_for(uint64_t delay) {
  // Pseudo-code!
  // - Calculate the number of overflows needed.
  // - Calculate the exact amount of ticks that will remain.
  // - Use xTaskDelayUntil() to repeatedly sleep through
  //   overflows in a loop as necessary, until within 2^16
  //   ticks of end time.
  // - Then sleep the exact amount remaining.
}

Obviously, it’s an extension of the concept to support alternative tick bit widths.

No guarantees my code compiles! For the concept only. I probably (almost certainly) made mistakes.

joelfrederico · December 11, 2020, 11:29pm

It occurs to me that the whole section incrementing extra bits and the tick would still be a critical section on an overflow. Recursion just makes it so that each 16bit chunk is accessed and updated only if necessary. That minimizes the churn needed to increment a 64-bit time stamp.

If the problem is the overhead of the critical section, only do a critical section if the update would cause an overflow. If not, you can access the tick atomically. That way you only encounter a critical section once every 2^16 ticks or about once a minute.

I mean at this point I’ve basically solved my problem. I probably should just turn it into a PR so comments and criticisms can flow easier and be tied to concrete code instead of concepts…

richard-damon · December 12, 2020, 12:45am

Well, there already is an overflow counter, with the same number of bits as the tick counter, so that gives you 64 bits for a system with 32 bit ticks, and 32 bits for a system with 16 bit ticks. It probably wouldn’t be that hard to add an extra 32 bit counter for the 16 bit tick systems that incremented when the overflow counter overflowed. That increment is done in the tick interrupt, so doesn’t need a critical section, unless some other ISR needs to know the current absolute time to 64 bits. Reads of these parts would need critical sections so it would all come together. (I would need to check if the current API has an ISR version to access the overflow bits.)

I don’t know how much influence it would create to allow a sleep to go past the first overflow period, as currently nothing can delay for more than that, so the system isn’t setup to handle longer delays than that. Without a major change, it might require that the task be woken on every tick overflow and the delay function sees if we are in the right epoch, or if we sleep for another cycle.

joelfrederico · December 12, 2020, 2:26am

I missed this. But yep it’s there. So only allocate any extra space needed to get to 64 bits.

That’s what I suggested in the pseudocode for sleep_for. If I only create 3 new API’s (now, sleep_for, and sleep_until) I could do that without changing the kernel. If the API proved useful and/or popular, somebody could update the kernel to not wake up and then sleep the task on every tick overflow interval.

I think it would at least require the ISR to do all of its calculations for task priority on 64 bits. It could be clever about tracking which bits it needed to compare to reduce overhead, but that would be the trade off: more calculations per tick but less context switches? Hard to say how much that would add to overhead.

At any rate, I wouldn’t take up modifying the kernel or ISR without motivation. Optimize when necessary and all that. I imagine sleeps across an overflow would be rare. Perhaps if you were scheduling some sort of hardware task at long timescales of a few minutes?

cobusve · December 12, 2020, 7:01am

This is a very interesting discussion. I always try to remind myself that FreeRTOS has slowly evolved to where it is today over a very long period of time and I think during that a lot of the reasons why things are done are sometimes not so obvious.

The portability of the Kernel to a large number of platforms is a major limiting factor.

Some things to consider.

FreeRTOS runs on compilers that do have/not support 64-bit data types
FreeRTOS runs on compilers that do not support recursion
Many safety critical standards (MISRA C:2012, 17.2 ) also probibit recursion and using it makes it very hard for many customers to use the OS
Since the tick-type can be user defined on FreeRTOS we did have problems before with things breaking when the type is 16-bits (see for example this post)

We can and should have long debates about each of the above points to keep on pushing for improvement and I can assure you that none of these decisions are cast in stone, but we need to also consider that what may be obviously the best way to do something on the CPU you are working on today may not be the best way to do it for all users on all CPU’s.

For example the argument above about 64-bit ticks makes perfect sense when your compiler and device supports 64-bits, but when the tick type is 32 bits or like I mentioned above even 16 bits (which makes perfect sense if you have a 16-bit hardware register which you use for your tick counter - as some ports do), then your 584 years reduce to 65ms before you wipe your eyes out and the entire game changes.

I am not saying that I disagree with anything you say above, I am just suggesting that it is not always as simple or obvious as it seems.

joelfrederico · December 12, 2020, 8:33am

Okay, make it a struct. I’m bashing 16 bits at a time anyways, this really isn’t an issue. Throw in subtraction and addition functions. The C++ style I’m mimicking takes that approach anyways, they use a class.

Also fine, since there’s a maximum of 3 levels of recursion with 16-bit chunks, I can even unroll the whole thing by hand.

I think you may have missed the point of the whole idea? It’s not to extend the bits of the tick in the entire kernel. Maybe the tick should be 64 bits, maybe it shouldn’t, but that should only change if there’s a compelling need.

The point is to fix the API of sleeping a task to hide the kernel implementation, and to match idioms in other mainstream libraries.

That means keeping around extra bits of time data without the kernel using it anywhere else except to support the sleep API.
It doesn’t mean changing the tick bit size. Let ports keep doing that. Since the implementation bled into both the user API and the port API, it’s likely some people and ports depend on that.
It does mean using the preprocessor to get the number of extra bits right so that it’s a uniform 64 bits total no matter the tick size.

If I’m going to do this, might as well use more than 32 bits total to have more than 60 days. 64 is the next logical choice, as 48 is oddly aligned for some MCU’s, and 64 vs. 48 only adds one more 16-bit level to the whole exercise.