True tickless mode

mciholas · January 17, 2024, 7:00pm

My company is investigating what it would take to modify FreeRTOS to be truly tickless. What that means is we can make threads operate with much finer timing resolution than 1 ms. This is necessary for our applications. In the modern age of 100+ MHz processors, only being able to manage time on 1 ms boundaries is lame. That’s 100,000 instructions.

Our present code uses a complex interrupt driven timer layer under FreeRTOS that has task queues, timed events, and so forth that is complex to maintain and use and it only exists because FreeRTOS timing granularity is too coarse. We desire sub 10 us granularity, or even down to 1 us if we can get it. The idea is that a “real time” operating system should allow us to have things happen at “precise time”, and that means better than 1 ms granularity.

An obvious “solution” is to redefine the tick to be faster, but it is impractical to have a tick interrupt every 10 us.

Instead, we propose to change the system to truly tickless. What this means is the scheduler only runs when it needs to, not gratuitously every 1 ms. This is accomplished by using a timer which has a compare interrupt set to the next time the scheduler needs to run. For example, some future higher priority task wants to run in 4.37 ms, so set the timer compare to that time. The currently running task then runs without interruption until that time expires, the scheduler runs, and the high priority task is activated.

For low power sleep, the implementation is to notice when the idle task will run longer than some minimum time (look ahead to the next timed event), put the processor to sleep during that time, and then wakeup with sufficient time to fix up timers and state to keep running.

For time slicing, any thread can be given a fixed amount of time when it starts to run, say 1 ms. If the thread doesn’t complete in that time, the scheduler is triggered by the timer and context switch can occur. So time slicing can happen. You can weight threads, giving some threads more “run time” per slice than others. Say one thread is 1 ms and another is 5 ms, you now load balance 5 to 1.

I am aware of the opposition to true tickless mode, for example this from 7 years ago from rtel:

As you already noted, time slicing won’t work if the OS is always tickless.

Not true. Give each thread a limited time to run before scheduler runs again and you can time slice just fine. The next timer interrupt is the lesser of the time slice chunk for the current thread or the soonest next thread timer event. Like I indicated above, this also gives you ability to time slice “unfairly” by controlling the slice size for each thread if you want.

It is difficult to keep re-programming the clock without slippage between the RTOS system time and calendar time.

Not if you do it properly. The timer would be free running with a compare register, so time never slips. You simply move the compare point. Using one shot timing is lame and does suffer the issues you indicate, but that’s not how it should be implemented.

Under heavy load it is simply more efficient to use a fixed per-calculated time period than to re-calculate a time period each time the scheduler runs and then re-program the clock.

That seems backwards. Running the scheduler gratuitously when you don’t have to is wasting resources. Moving the timer compare interrupt is just adding something to a compare value which is very fast.

For example presently, the scheduler starts a thread that finishes in 0.95 ms, scheduler runs, new thread starts. 0.05 ms into the new thread, tick, scheduler runs again and maybe even slices to another thread. That hardly seems efficient or fair. Instead, each new thread slice starts with a 1 ms time it can run before being “ticked off” (ha!).

Using the tickless idle mode CPU cycles are dedicated to saving power only when it is possible to save power, that is, when it is known nothing wants to run.

True tickless mode enables sleeping even for very brief periods like 250 us. All you need is the idle thread to see the next event isn’t too close (which depends on the processor sleep/wake timings) and then you can sleep. For some applications which do a lot of little things very often, you may never sleep in the current system due to 1 ms time granularity.

It is very common for people to add their own code to the tick interrupt using a tick hook.

That’s horrible, IMO, and should not be encouraged.

But in any case, you can define a thread that runs every 1 ms if you want and get back this capability. The true tickless mode can even allow sleep to occur with such a system since it can sleep sub 1 ms times.

The implementation covers more use case scenarios (FreeRTOS is probably used in a more wide number of use cases than Contiki).

True tickless mode can do everything the tick mode can do, so I don’t understand this point. In the limit, if you force all times to be 1 ms boundaries, you have the present tick system, so what can’t tickless mode do?

Backward compatibility with old versions of FreeRTOS.

That can be maintained. The old threads simply have 1 ms boundary timings but we can still sleep and can still do finer grained timing for threads using the new capability.

Probably other reasons too, the decision was taken quite a long time ago.

Right, times have changed. We are no longer in the 8 bit 2 MHz microcontroller era any more. As processors have grown, the 1 ms tick has become some sort of codified commandment that hasn’t scaled with time. I think it might be time to reassess that decision in light of current and future embedded systems, and the heavy emphasis on low power operation. It gets to the very heart of what it means to be “real time”.

Obviously this post is meant to be a conversation starter on the merits and issues of this idea. What do you think?

Mike C.

richard-damon · January 17, 2024, 7:54pm

FreeRTOS doesn’t schedule at a tick granularity, but tasks can block, or be released at any point in time. The scheduler tick is only really used for time based delays (and if you need finer timing, you can use a hardware timer to activate at the shorted delay), or to round robbin tasks of equal priority, and for that, a lower speed is normally ok.

I find very few tasks are activated “by time”, and those that do have only coarse timing specification (read a button about every 10 milli-seconds to get human scale “immediate” responce). The other main use of time is to set timeouts to handle things going “wrong” and those do not need to be precise.

When I do have a precise timing expectation, it is invariably based on an event that is already generating an interrupt when things need to happen.

I will add, that I don’t run FreeRTOS at a 1ms tick, but tend to run it anywhere between 5ms to 25ms per tick depending on the application, and that is plenty accurate timing for things that need to use the tick. And I might have things operating in the system running activities at 60 KHz.

There also is the fact that if your time base is speed up that much, you are going to likely need to widen the tick. At a microsecond resolution, your tick will wrap about every hour, which is often too short (and at 100s of MHz, 1 us might be coarse than you want, so the wrapping is faster).

Your concept of Free Running with a compare register may work for MANY, but not all processors. For instance, the standard systick counter in ARM processors, doesn’t have that ability, so would need the reprogramming to handle the continuous tick concept, and that truly generic concept for basic support has been a boon to FreeRTOS being widely available. Needing a new port layer for every different processor gets old fast, and the assumption of interrupting at arbitrary future counter values will likely cause that.

What might make more sense is a modification that allows the tick interrupt to not reschedule every time, but only when something needs to be done (so round robin works not per tick, but some longer period) and then the tick rate could be increase to 10 to 100 KHz for applications that wanted finer timing for things.

mciholas · January 18, 2024, 5:18am

I find very few tasks are activated “by time”, and those that do have only coarse timing specification

For applications which don’t need precise time, then what FreeRTOS does is adequate.

For precise time applications, FreeRTOS is not sufficient by itself. If you build an underlying interrupt thread mechanism under it, like we have, you can compensate for that to some degree, but it makes your life miserable for debugging things and it makes implementing low power sleep rather complex as both FreeRTOS and the interrupt layer have to coordinate when to go to sleep and when to wake up.

Our insight is that we could operate almost entirely at the FreeRTOS thread level IF we could get FreeRTOS to manage time precisely, not with the 1 ms granularity. Some RTOS have this feature already, namely true tickless mode, which yields efficient and precise handling of tasks.

This all goes to the basic question of what an RTOS should be. We believe it should be able to provide precise timing controls so things can happen exactly when you want them to. Currently, FreeRTOS lacks that ability.

Maybe I should call the objective a “precise time” OS, PTOS, since it seems an RTOS is rather sloppy with time.

There also is the fact that if your time base is speed up that much, you are going to likely need to widen the tick.

A 64 bit tick timer is no big deal. FreeRTOS apparently already sort of does that with xNumOfOverflows.

Your concept of Free Running with a compare register may work for MANY, but not all processors.

Exceptionally few 32 bit processors lack a timer with a compare function, could you even name one like that? If the hardware doesn’t support it, then you don’t get that feature. We are no longer in the dark ages of microcontrollers these days.

Our basic goal is to try and unify our need for precise timed threads with FreeRTOS so we can reduce the complexity of our firmware design leading to more correct and efficient code development.

Mike C.

richard-damon · January 18, 2024, 12:36pm

Yes, most processors have a timer with a compare function, but the access to that timer is DIFFERENT for every family.

Look at the portable directory of FreeRTOS, there are a lot of ports there. What don’t you see, a lot of ports for a lot of implementations of the Arm Cortex series, because they can (especially the M series) just use a few common ports. You are increasing the porting work 10 to 100 fold.

Note too, if you are looking for precise timing of task starting, that this is easy enough to do with a little code in an ISR for one of the timers that you have. FreeRTOS is NOT limited to rescheduling tasks to the tick.

I will again state, that I find very few tasks that want a “precise” timing activation. I have operations that need tight timing, that tend to be part of an ISR, or implemented in hardware off a timer output, and I have somewhat more relaxed operations that get activated by an interrupt, and then I have low-resolution timing that is implemented off tick periods.

I suppose the issue is that most things that I see that need a “precise” time, need more precision that can be done in software, or at least software that is sharing a processor with other things in the way that happens with an RTOS. (and perhaps you compensate by putting over powered processor in to try to reduce the problems).

RAc · January 18, 2024, 12:54pm

I believe that scenario is way oversimplified. What if some other interrupt kicks in during those 4.37 and during its processing wakes up a higher priority task? Also, how does a task know for how long it is intending to claim the CPU in advance?

What you are proposing here, I believe, is a fully cooperative RTOS with explicit yields. FreeRTOS can be configured to operate that way.

mciholas · January 18, 2024, 3:21pm

Yes, most processors have a timer with a compare function, but the access to that timer is DIFFERENT for every family.

This is no different than the tickless idle sleep mode which requires resources unique to each family. For example, LPTIM timer on ST series. If you want sleep and precise time, you have to work with the hardware to get it done. Your argument would suggest we should take out the ability to have tickless idle sleep because of the portability concerns and thus FreeRTOS would become terrible for low power applications. Or to put it another way, when the capability is worthwhile, the acceptance of OS portability difficulty is made.

If your application requires you do tricky things in interrupts to get your timing, you have made your application non portable between architectures. It would be better to invest the time in making the OS more portable in that case, then applications can be more portable rather than less. That is, put the work into the OS, not the application, and you make it more generally useful.

Note too, if you are looking for precise timing of task starting, that this is easy enough to do with a little code in an ISR for one of the timers that you have. FreeRTOS is NOT limited to rescheduling tasks to the tick.

The problem is that we have a complex state machine of timed tasks that makes that “little” ISR code complex and it has queues and threads in it just like the OS. That “little” ISR code ends up being a second RTOS underneath FreeRTOS. It only exists due to the 1 ms thread timing granularity of FreeRTOS.

Why not make FreeRTOS be more precise in time? Then we eliminate that complex second OS inside the system (which is hugely non portable and hard to maintain and debug). Now we can control the relative priorities and interactions of threads in just one level, not two. We can control sleep enablement in one level, not two. And so forth.

An additional issue is FreeRTOS granularity of processor sleep is too coarse and we waste power because of it. There are plenty of times in our application where we could sleep 200 to 700 us but that’s not something FreeRTOS can do since it can only think in 1 ms time chunks.

I will again state, that I find very few tasks that want a “precise” timing activation.

Our application is different. We need precise thread timing.

Since FreeRTOS can’t do precise thread timing, then the applications FreeRTOS users know about will be those that don’t require it. Self fulfilling prophecy. That’s sort of like finding a culture that has only hammers and they argue no one should need a screw.

There are RTOS that have true tickless and precise timing of threads, so the idea does exist and is in use, so someone finds it worthwhile, as would we.

Mike C.

mciholas · January 18, 2024, 3:58pm

I believe that scenario is way oversimplified. What if some other interrupt kicks in during those 4.37 and during its processing wakes up a higher priority task?

Then the higher priority thread runs. The coder made that choice. This is exactly what you want, the RTOS honors the priority the coder specified. When that higher priority thread yields, the 4.37 ms thread will run, perhaps delayed now, but that was the coder’s choice.

If you want the 4.37 ms thread to be higher priority, then set it as such and it will run, and it will run at 4.37 ms assuming it is the highest priority thread ready to run.

Also, how does a task know for how long it is intending to claim the CPU in advance?

The scheduler looks ahead to the next (soonest to run) thread in queue blocked by time. FreeRTOS already has to do this since it has to check on every tick whether some higher priority thread’s timer has expired and it now ready to run. Instead of checking this every tick, we simply compute when that would be and set a timer compare to trigger. This then gives us the ability to make timing precise, not quantized, since we don’t have the penalty of gratuitous tick interrupts that would occur if the tick was, say, 1 us, an impossibly short period.

If a thread runs for less than the slice it has been given, no problem, the scheduler runs when it yields, the next thread is started with its own new time slice, and the timer compare is set to the next time event or expiration of the time slice, whichever is sooner. If a thread runs longer than the time slice or the next time a higher priority thread wants to run, then the timer compare interrupt stops it and switches context.

What you are proposing here, I believe, is a fully cooperative RTOS with explicit yields. FreeRTOS can be configured to operate that way.

No, not at all. What I am proposing is fully preemptive, but with precise thread timing based on a true tickless scheduler.

The scheduler knows when the next thread time event will occur. So why have it interrupt gratuitously every 1 ms before that? Just set the timer to interrupt exactly when it needs to. If that time is longer than a preemption slice, set the timer compare to that duration. Then that thread will get a full time slice regardless of the phasing of what would be the tick interrupt in the present system.

Mike C.

richard-damon · January 18, 2024, 9:34pm

The difference is that tickless idle is an optional mode, that disappers when not supported.

Changing the definition of how the tick works, is fundamental.

I regularly have my processor go to sleep for short periods of time, (That’s SLEEP, and not the lower power STOP modes, as those can actually cost you power if you enter them for too short of a period of time). Sleep mode doesn’t requiring FreeRTOS to think in terms of time chuncks at all, it just says “I don’t have anything to do, so I will sleep until something is needed”, When it is preparing to do that, it can check to see how long it is apt to be before something else is needed, and might decide to go to a STOP mode instead, but due to the power & time cost to enter/exit them, milli-second granularity is fine for that sort of decision.

I will ask, what sort of “precise” thread timing are you needing that needs sub-millisecond timing that also can tolerate the jitter caused by other tasks? When I need precise timing, it tends to be on the order of microseconds, and using task switching just creates too much jitter, so that operation gets put into an ISR.

You are, of course, free to use another OS that better meets your needs, if FreeRTOS doesn’t. In fact, it is a good thing that OSes exists with different design criteria that meet different needs. It is just a fact of nature, that a given system will tend to grow evolutionary, with a series of small changes, rather than evolutionary, with major changes all at once.

One thought I have had (but not put enough effort to make it a formal proposal) would be how hard would it be for FreeRTOS to export to the port layer an expectation of how many ticks will pass before a scheduling is needed (assuming the round robin interval could be increased to more than on tick). If that is reasonably possible, then all ports could easily be altered to be called at a higher tick rate, but since they don’t need to call the scheduler every tick, that doesn’t cause the increase system load that it would currently.

That sort of change could be an evolutionary path to what you seem to be wanting

RAc · January 19, 2024, 9:20am

After pondering your approach (there are arguments to be made in favor of house cleaning! :-)), I see the charm in it.

What we are looking at in terms of implementation is sort of an “interrupt level message pump” that (similar to an app level message pump such as a classic TCP/IP task implementation) is fed from several sources and ends up in what is now the timer tick interrupt.

In the backward compatibility case, the message pump would be exclusively fed by a hardware timer; in the fully tickless case, ir would be fed by everything that requests a (future or immediate) task switch such as a timed wait or a wakeup via semaphore, ensuring that a task switch occurrs if and only if that event handler is being executed. The only case that needs to be treated differently would be scheduling of tasks with identical priority.

This approach also leads to a natural implementation/migration strategy.

In the spirit of Open Source, I encourage you to provide an implementation in the scope of a PR to github. Since the restrictions that @richard-damon outlined still apply, a fork might be a reasonable compromise.

Thanks!

Edit: Thinking about it more, I can already spot a number of issues that make the implementation challenging. For a teaser, consider the dreadful function set vTaskSuspend()/vTaskResume() that is bad as it is already, but will require support. What this and related functions (and also things like priority inheritance) imply is that once your event handler gets around to processing a schedule event, that event may not be valid anymore, so at the very least, the ready lists have to be examined at least twice: Once at event generation time and once at event processing time. Tthe information at both times may not be consistent.

richard-damon · January 19, 2024, 8:52pm

My thoughts on this is that rather than trying to have a “message pump”, there is just a piece in the core that notifies the port level what "tick’ is the next one that will need the scheduler run again. To make this not always the next tick, it requires that round-robin be defined to run longer that just one tick, perhaps at first a fixed number from when the task gets started, and eventually it might be adjustable per task and/or priority level.

The port level could then either (for current ports) just ignore that value and call the tick process every tick, and it might ignore the early calls, or for the next generation of improvements, not call the tick process till we meet that mark. This second generation might allow raise the actual tick interrupt by 10 to 100 fold, giving higher resolution, but first generation ports would need to be kept at a lower tick rates.

The conversion macro for ms to ticks would need to be modified, as now 1 ms might be multiple ticks, and perhaps we will want a us to ticks (or maybe even a ns to ticks) function.

The next level past this, where we don’t actually get a tick interrupt every tick but just rely on a hardware counter that is programmed for the next needed schedule time, would require that xTaskGetTickCount (and related functions) include a call into the port layer, as the software tick count might not indicate the actual passage of time.

At this point, our system isn’t really “tickless” as we still have the concept of a system tick, its just that we are no longer presuming that an interrupt occurs every tick (and thus, much of the processor might not need to be awake for that time). We just have a definition of tick that is very flexible, for some systems we might define the tick timebase to be very slow (10 to 100 ms long). Other systems might let the tick be just 1 instruction long, and typical scheduling intervals being many thousands to millions of ticks apart.

hs2 · January 19, 2024, 8:59pm

Sounds similar to this embOS-Ultra — The next generation RTOS , right ?

richard-damon · January 19, 2024, 10:20pm

I haven’t used that (or even saw it before). and I don’t know which actual hardware method they are using. I do notice that it is only for “Arm” based processors, which says they may use at least as a default, some of the Arm core counters and work around the limitations on them.

Til · January 22, 2024, 8:18am

embOS-Ultra is not limited to Arm based processors only. It is already available for RISC-V as well and we can easily port it to any other processor.

richard-damon · January 22, 2024, 3:15pm

I missed the RISC-V at the bottom of the list.

JK232 · December 16, 2024, 6:15am

The RTOS does schedule as needed below the tick interval. Look at xSemaphoreGiveFromISR. The TaskWoken parameter is optional – but only if you want it to wait until the next tick. You need the parameter to determine if you need to yield before returning from the ISR to make it work right. What would really be nice, is if that mechanism was not required to make it work as an RTOS should.