Ensuring idle task runs after vTaskDelete

RAc · December 28, 2024, 12:28pm

well, maybe we can conclude this discussion with these two observations:

FreeRTOS is open source, so you can make any changes to it that you consider useful for your use case. This happens not infrequently in the “real-world.”
If you consider your modifications also useful for other users, you are both welcome to and encouraged to publish those in a PR. That would allow a much more down-to-the-code-line level discussion between you and the reviewers.

Thanks

aggarg · December 30, 2024, 6:02am

This is one possible solution and if it works for you, feel free to implement it in your application. You likely won’t need to suspend/resume and just queue send/receive would be enough. Let us know if you face any problem in implementing this.

RichardFalk · December 30, 2024, 9:52am

The suspend was just because it’s the last thing in the task since there’s nothing left to do after the send. Sure, I could just loop instead after putting in the queue since the task is going to get deleted, but suspend is easy. There isn’t any resume.

Anyway, I finished doing the techniques that richard-damon and others recommended as standard approaches for an RTOS, namely eliminate any polling loops and use interrupt/blocking and where that isn’t possible due to 3rd party libraries then put calling their function(s) into task(s) of priority 0. I then tested to see what differences this makes, not only for any functionality but also for debugging and used some past problems we debugged as examples. Here’s what I found.

The good news is that as expected any polling loops that had any kinds of delay were made more responsive due to nearly immediate interrupt handling. In my case, that meant the first character for serial had immediate echo instead of waiting up to a second (but after the first character I previously had no delay for some time as I prevented the 1 second suspend once typing was detected, for a while). Also good news was that the code for handling the 3rd party library RF tx/rx code was simpler as I could use my normal timer waiting code for handling timeout without any hacks.

The bad news is that certain debugging that used to be simpler by having serial typing force a no waiting event loop now triggers the ESP32 “waiti” or ESP32-C3 “wfi” instruction which unfortunately pauses cycle counting so the cycle count can’t be used for tight timing debugging. Sure, this can be worked around, but effectively requires putting back some of what I had before to (optionally, perhaps by CLI command) both loop without blocking while also having the idle task run (but not waiting for interrupt). And yes, while there are other timers one can look at, the cycle counter is simple, fast, and clean enough to readily examine in interrupts.

One bug I looked at that used the CPU cycle count for debugging was interrupt colliding/delay where timing intra and inter interrupt calls made the problem obvious. ESP-IDF increased their level 3 BLE interrupt time substantially between v4 and v5 causing our LEDC 7680 Hz PWM motor interrupt to intermittently fail due to delays pushing timing beyond the “on” part of the PWM cycle – disabling BLE while the motor runs works around this but for now we’re staying on v4 and will get Espressif to fix this in their v5 library. And yes, there are other ways of getting such timing such as using GPIO externally detectable signals but I wanted something I could reproduce easily with software alone to give to Espressif to see as a reproducible case.

Another downside is that I had to examine and verify all the slower library calls to ensure that they were using interrupt/blocking or at least vTaskDelay(1) as they do in their flash erase busy loop checking a status register. Fortunately, I didn’t find anything else obviously not following RTOS best practices except for RF cert tests. At least this checking only needs to be done once unless something gets broken/changed in their library or new functionality doesn’t do the right thing.

The last downside is that some testing that does back-to-back functions needs to be careful to have some blocking and if not then needs to add some explicitly in the test loop. Previously so long as messages were sent then I’d call the idle task even if there wasn’t any blocking (usually messages come because of calls with blocking, but there are a few calls that don’t block as they had messages for a different architecture such as separate MCUs vs. a single combined one). That is, a general architecture always using messages so that functionality can be distributed means that when done on a single processor one may not have blocking for some functions which sort of breaks RTOS when doing continuous loop testing. Again, this is worked around by explicitly putting in a blocked call (basically a vTaskDelay(1) or equivalent) in the testing loop. It’s just less convenient and one more thing to remember to do that was previously handled automatically as part of message handling (note that this message passing/handling is our own, not via RTOS, as it was inherited from our previous AVR code). Another workaround for this would be to have the main task be at priority 0 and do a yield in the message handler (which it already does so just the change to priority 0 is all that would be needed), at least for this back-to-back situation.

RAc · December 30, 2024, 12:58pm

Please be advised that you may be falling into another very common trap door here - vTaskDelay(1) means “delay AT MOST 1 tick, possibly less.” You may want to replace this with vTaskDelay(2), otherwise in the worst case, there may be no delay at all.

And again, do not EVER design any release code around debugging needs, that is a conceptual error which will sooner or later shoot you in the back.

The superior way to run-time monitor time stamped events is “silent monitoring,” ie record all the time stamps you need as they come in into an array and examine the array later when all important data is sampled and time stamped.

RichardFalk · December 30, 2024, 6:12pm

Thanks for the tip. I didn’t know that and I don’t think Espressif knows that either so I’ll let them know. I suspect that in practice for where/how it is used this limitation doesn’t show up because if the delay is missed the first time it gets done some subsequent time in the loop where it is used though theoretically if the gap in time between invocations is a multiple of ticks then one could get stuck on a round-down boundary. I had something similar in my own once-a-second RTC tick implementation where waiting for 1 second could be less so 2 was safer though I also had the option of using a subsecond wait that used an esp_timer so “timers_wait(999)” for 999ms would be accurate, if needed. Probably the riskiest place for this use is in any hardware timing delay so I’ll let our hardware guy know about that.

Understood though we didn’t intentionally make a design for debugging. It just happened to be that the design we fell into largely due to the default USB serial/JTAG driver being non-blocking had us optimize serial in a way that also happened to allow us to use a mode more suitable for debugging. It was accidental and until I looked at this in more detail given this RTOS thread/discussion I didn’t even realize the cycle count debugging would break under more usual RTOS operation though that clearly makes sense.

As I noted, I can add a CLI-driven command/flag to enable a no sleep or wait-for-interrupt debugging mode or can enable it via a compile time option though for modes that aren’t too complicated we tend to use runtime flags since these commands have been very useful even for remote debugging using production firmware and of course extensive logging at different levels is extremely helpful as well though is used mostly at the application level.

I couldn’t use the timestamped array of data approach you described in this situation (which I have used in other situations quite a bit so I was aware of it) because of how long it could take for the problem to occur (it was rather infrequent). With about 8kHz there was simply too much data before the problem happened as that could take many minutes. That’s why I used a statistics approach saving the minimum, maximum, and calculating an average (i.e. dividing sum of time by sum of count as needed). This is something compiled in only for local debugging (i.e. it’s not in production code).

Eventually I figured out ways to accelerate the problem by shortening BLE advertising time or by more rapidly sending BLE data in a connection though it still wasn’t that frequent (though much more than before). Our interrupt code (our CPU is at 80 MHz) was running usually around 0.6 µs (1.8 µs worst-case once every 30 times) every 130.2 µs so we were running rather efficiently but we were occasionally getting stomped on (delayed) by the BLE interrupt by 75µs or more ruining our PWM motor current sampling. Why in the heck they are using 6000+ CPU cycles for a BLE interrupt (at around once a second during BLE advertising) or possibly for a critical section is crazy but that code appears to be opaque (not published source) in their RF controller (the BLE host code with sources uses esp_timer functions called from high-priority tasks, not interrupts, though obviously the esp_timer itself is interrupt triggered but never been a problem). This apparently was happening in their v4 library that we didn’t realize but much lower at <2000 CPU cycles but in v5 it got much worse. I’ll get Espressif to look at this, but needed a clear way for them to reproduce this without needing our motor or specific hardware (i.e. I’ll just use LEDC on their standard vanilla development board without actually driving any real motor as that shouldn’t be necessary).

By the way, regarding timestamps, the RTC clock resolution is only 30.5 µs (32768 kHz) so not really good enough for this situation though the high-resolution timer would be OK at 1 µs though still wouldn’t really tell us how much time our interrupt code was actually taking, just that it was <= 1 µs, so when optimizing interrupts to be as fast as possible a cycle counter is helpful and, just as importantly, is accessible with a single instruction (single register access).

richard-damon · December 30, 2024, 6:34pm

Yes, if you are trying to use the cycle count for timing, WFI is a bad instruction, as it takes arbitrary long to execute a single cycle. it also isn’t a good method if you need really accurate timing, as I beleive some “cycles” can be longer than one clock cycle if the processor needs to “stall” to wait for something to be available.

My normal solution for this sort of precise timing is to use a general purpose free running counter running at a high speed (perhaps close to CPU clock rate) to provide this sort of timing that won’t be affected by CPU cycle variations.

RichardFalk · December 30, 2024, 6:48pm

The ESP-IDF “gettimeofday” which is what they recommend for high-resolution time goes through too much code, but I traced down to an esp_timer_get_time call that looks like it uses the high-resolution system or LAC timer (depending on CPU type) counter value so that looks pretty fast and efficient so I’ll plan to use that in the future, at least for 1 µs resolution. It looks like they are using the 40 MHz APB with a 1 count per tick for PLL and 2 counts per tick for XTAL and they fortunately have source and a private include file for this (esp_timer_impl_get_counter_reg) since dealing with all the HAL variants is a pain and is one advantage to using their library that abstracts and deals with that. This should give me better than 0.05 µs resolution which is plenty for interrupt timing with minimal delay.

I just have to use the declaration directly and recognize that this is subject to change as it’s not a public header file. They do some clever code getting lo hi then lo again registers comparing the two lo’s to make sure they didn’t change and trying again if they are different. This is so they can avoid critical section code and be safely called from anywhere including interrupts.

RichardFalk · December 30, 2024, 11:46pm

I need to reply to this again because after talking with our hardware guy we are gobsmacked about this though we clearly understand why (i.e. that the call can be in between ticks). The online documentation just says the following:

Delay a task for a given number of ticks. The actual time that the task remains blocked depends on the tick rate. The constant portTICK_PERIOD_MS can be used to calculate real time from the tick rate - with the resolution of one tick period.

You really need to document that the actual delay may be smaller than requested by up to 1 tick. If one wants behavior of “at least” for the delay then they should add 1 to the calculated value. Really, this function should probably have been written this way as an “at least” with the +1 built-in in the first place (IMHO). We’re going to write our own ourTaskDelay function (or macro) to do this (you could have a vTaskDelayAtLeast function or macro). And yes, this is probably something I could do a PR for but in the meantime please document this clearly in vTaskDelay documentation.

richard-damon · December 30, 2024, 11:56pm

When you think about how it must work, the reason for the “up to” is obvious, as to delay for “one tick” clearly means delay until the next tick interrupt. Automatically adding +1 would break a lot of code, though we do soemtimes add it into the ms2ticks function (as well as rounding up)

Note, it does NOT mean for “at most 1 tick”, as it is quite possible that the task will not resume on the next tick, as a higher priority task might occur and the delay be longer than specified. Yes, the delay will expire in at most one tick, but the task doesn’t necessarily run at that point.

RichardFalk · December 31, 2024, 12:01am

It’s only obvious for how it was implemented. Clearly one can have something that delays for at least the number of ticks requested (i.e. have the “+1”). If one does something like vTaskDelay(5/portTICK_PERIOD_MS) and the default 100Hz tick frequency is used then one gets no delay at all because 5/10 = 0 with integer truncation. Why have a function that works that way? The most useful/practical function is one that does vTaskDelayAtLeast. That also encompasses the “at least” aspect of a preemptive task switch or an interrupt.

The pdMS_TO_TICKS( value_in_ms ) in projdefs.h has the same issues but can be overridden in FreeRTOSConfig.h though this macro is often referenced in calls that have a timeout:

/* Converts a time in milliseconds to a time in ticks. This macro can be

overridden by a macro of the same name defined in FreeRTOSConfig.h in case the

definition here is not suitable for your application. */
#ifndef pdMS_TO_TICKS
#define pdMS_TO_TICKS( xTimeInMs ) ( ( TickType_t ) ( ( ( uint64_t ) ( xTimeInMs ) * ( uint64_t ) configTICK_RATE_HZ ) / ( uint64_t ) 1000U ) )
#endif

While the calls with timeouts theoretically have the same issue, in practice timeouts for such calls are not usually very short and are exceptions/failures rather than intentional delays that are required for proper operation.

RAc · January 2, 2025, 8:51am

The idiosyncracies of vTaskDelay(1) are part of RTOS folklore (not specific to FreeRTOS, btw) and are documented in many many places. An attempt to include this in the official docs has been made here: vTaskDelay… documentation update? - Kernel - FreeRTOS Community Forums but unfortunately, the page linked to has disapperead. @kstribrn : What is the status there?

Thanks!

kstribrn · January 7, 2025, 2:33pm

@RAc Thanks for the mention. I think this page must have gotten lost in the website revamp. I’m going to see if we can resurrect it.

RichardFalk · January 8, 2025, 6:41am

Well, I’ve run into another issue after doing the “right” thing according to the RTOS principle that any tasks not following the “block and wait for interrupt” approach should be at priority 0. I found that while I was able to get the RF task to run and the idle task to run, the RF output is noisy vs. frequency in scans and I determined that this is due to time slicing interrupting the RF task at important times even though the RF task is yielding a lot between each tick.

This implies that the time slicing algorithm is simply doing a round-robin regardless of whether or not a task has yielded during the previous time slice. It should not be designed that way because the purpose of time slicing is to ensure that uncooperative tasks (i.e. those that never yield or suspend between ticks) get interrupted to allow other tasks to run. But when tasks are cooperating by yielding they should not be interrupted at a time slice time if they have yielded.

The reason this is important is that code writers that are doing yields will intentionally do so at times when they know they can safely give up substantial time (compared to other points of time). If all tasks at same priority level are cooperating this way, then no important functions will get interrupted for very long. The way it looks like time slicing is currently working, it does not credit good behavior at all.

Shouldn’t it be trivial to have a bit per task that indicates it has yielded and that these bits get cleared at the time slice after figuring out which task to schedule next? The net effect of this approach is that cooperating tasks won’t get interrupted by time slicing and won’t need to because they’ll be bouncing back and forth (really, round-robin) from their yielding. You can still do round-robin for the targets of such yielding (i.e. keep round-robin for the target of a yield); you just don’t do time-slice switching away from a task if it recently (i.e. during the previous time slice) yielded.

My workaround for this is unfortunately to set the priority of the RF task to the same as my main task and for me to use the tasknotify_run_idle_task function I wrote like I did before. That way, I only do this when the RF task yielded and I furthermore block my task for one second at a time (so yield won’t switch to it during that time) so the RF task effectively gets almost all the time and most importantly doesn’t get interrupted except for time slicing quickly seeing there’s no task switch needed.

RAc · January 8, 2025, 8:25am

What makes you come to that conclusion? Totally wrong. Interrupt priority ordering and yielding are independent of each other. Also, priority 0 should normally be reserved for the idle task.

You can turn off time slicing exactly for that purpose.

Again, you are free to rewrite the scheduler if there is anything in it that does not serve your needs well.

I remember early RTOSs that blatantly called systems that had tasks running at equal priorities as “poorly designed.” I would not go that far, but of course it is a crucial part of the design of real time systems to understand which tasks employ the CPU heavily (“CPU bound tasks”) and order those according to the real time requirements. All an RTOS can do here is provide support.

There is no such thing as a free lunch, so any such change will incur side effects that other developers will complain about as unfit for THEIR requirements.

Again, submit a PR or branch off a custom version of the OS to suit your requirements if you believe that there is a shortcoming in the OS design or implementation.

Actually, in AMX (which was one of the OSs that scorned equal-priority task based systems as poorly designed), this was not a “workaround,” but the documented way to address the inherently missing round robin scheduling strategy. I remember the chief developer arguing that that way, developers actually had MORE control over multitasking behavior.

aggarg · January 8, 2025, 8:52am

As @RAc already mentioned, preemption and time slicing can be turned on/off independently. Here is a good description of different scheduling algorithms supported by FreeRTOS -

RichardFalk · January 8, 2025, 3:47pm

I am talking about tasks at the same priority so interrupt priority ordering isn’t relevant if by that you mean that higher priority takes precedence. As for priority 0, I was told by richard-daemon earlier in this thread that tasks that did not follow the “block until interrupt unblocks” paradigm including tasks that only do yielding should be set to priority 0 as all cooperative (i.e. yielding) tasks should be at the same priority including the idle task. Specifically, he wrote the following in post 12 in this thread:

“As I said, the standard answers to the need to use “non-compliant” code is to use it in a single task at priority 0, and either make sure your yield often enough or allow preemption and enable round-robin scheduling. That way the polling loop won’t lock up the system.”

You can turn off time slicing exactly for that purpose.

Turning off a feature that is poorly designed is not the best solution. Time slicing should not be interrupting any task that has yielded or been resumed during the previous time slice – that is, that hasn’t hogged the time slice. That is the purpose of time slicing to force allocating time to other tasks when there is a task that isn’t cooperating by yielding or suspending/blocking. If time slicing worked as it should there would be no need to turn it off as it simply wouldn’t be causing the harm that it currently causes.

Again, you are free to rewrite the scheduler if there is anything in it that does not serve your needs well.

Given the tone of this discussion, it seems likely that even if I were to write a change for this, it would be soundly rejected given the defensiveness for the current design.

it is a crucial part of the design of real time systems to understand which tasks employ the CPU heavily (“CPU bound tasks”) and order those according to the real time requirements. All an RTOS can do here is provide support.

Again, that is just an excuse. Why have a time slicing feature at all then? Just say that FreeRTOS won’t support anything but the purist designs where everything is interrupt driven and blocking. Why even have yielding? I think this is the wrong philosophy. I think it is perfectly fine to have yielding as a fallback for when one can’t block for whatever reason and I also think it reasonable to have time slicing to prevent total hogging by tasks that don’t even yield. I’m only suggesting that the time slicing be implemented to meet that explicit goal without damaging cooperating tasks.

Again, submit a PR or branch off a custom version of the OS to suit your requirements if you believe that there is a shortcoming in the OS design or implementation.

With that attitude (i.e. “if you believe there is a shortcoming” implies you don’t think there is) why should I waste time with this if the people reviewing PRs are going to have the same attitude.

Actually, in AMX (which was one of the OSs that scorned equal-priority task based systems as poorly designed), this was not a “workaround,” but the documented way to address the inherently missing round robin scheduling strategy. I remember the chief developer arguing that that way, developers actually had MORE control over multitasking behavior.

The bulk of this thread, with a few exceptions (such as from aggarg) has been bashing what I did as an initial workaround that was handling both nonblocking calls and tasks that only did yield. I appreciate the discussion as it had me look at why certain things weren’t blocking (such as ESP32-C3 USB/JTAG serial that wasn’t by default) that eventually could (with a different driver they used in a console component) but the seeming closed-minded defensiveness of every feature of the current design is not, IMHO, a good stand to take.

kstribrn · January 8, 2025, 8:42pm

@RAc The page still exists though the URL was updated as part of the website migration. The tick resolution page can be found here now. I’ll also update the other post.

richard-damon · January 9, 2025, 2:19am

And you ran into another principle that tasks with critical timing should have high priority, and the fact that two different principles give you very different priority requirements says that task isn’t well defined (or based on drivers not compatible with a RTOS).

No, your are PRESUMING on the purpose and technology of round-robin. Of course, a big part of this is that your task (or part of it) has the wrong priority.

First, I presume you have configures the Idle Task to yield after every cycle through its loop, and not to use a whole tick every time it runs. That would at least minimize the impact of switching to it.

No, it wouldn’t be “trivial” to do that, as the way you described it requires the tick interrupt to walk the list of every task to reset that bit. Remember, FreeRTOS is optimized for small systems and interrupts are supposed to only do minimal, and bounded work, so scanning the list of all tasks is not appropriate.

Yes, that is true, because trying to keep track of “good behavior” has a cost, and if the program was designed correctly, that cost should be unneeded.

If your task actually HAD the good behavior, it wouldn’t need to be at priority 0, or you could at least turn off the round-robin if you always yielded. The problem you are running into is that parts of your program aren’t good enough to do that, and those “bad apples” affect your whole program.

The issue is you don’t understand how the feature is defined. In fact, your definition could allow a pair of task to hog the system. If the Priority 0 task kicked a higher priority task to suspend and resume it at least one a tick, then that task could use all the idle time.

One idea I have had, but never became important enough to actually build, would be to add as a option, the ability to set a time-slice “quantum” for a task, and when the system runs the scheduler, it sees if the current task has used up its quantum, that quantum is reset and it goes to the back of the queue, otherwise it can stay at the front of the queue. A task that yields (or possible blocks) goes to the back of the priority queue and gets its quantum reset. The time used by a task could be measured either (a bit crudely) by the tick interrupt, or more accurately using the performance timer (if enabled).

No, the only “bashing” I have seen has been about your demands that FreeRTOS change its fundamental nature to meet your expectations. From what I have seen, you have come into this with a very specific idea of what FreeRTOS should be, and little understanding of what it is and has been for decades. People have pointed out short comings of your methods, and the errors in your assumptions, if you want to consider that “bashing”, then I pity you.

I would say that part of the problem is you are forgetting that you, as the systems programmer, are responsible for your ENTIRE program, including the external libraries you use. If the libraries supplied by the chip vendor aren’t adequate, then YOU have the responsibility to either work with them to fix it, or make your own version of those libraries that does what you need, or change to a different chip with better support, or talk with your “management” about the issue.

If FreeRTOS doesn’t meet your needs, since it is open source, you are free to make your own variation, but then YOU have taken on the responsibility for your variation. If you change is done well, and generally useful, and meets the FreeRTOS performance requirements, you could try to issue a PR to see if it will get included. Many features have been added that way, the key here is working with the system, and not to be bashing the system for not being what you want it to be.

RichardFalk · January 9, 2025, 3:03am

I know that it was lame for Espressif to write RF certification test code that needs to be at a high priority yet uses a Yield instead of a Interrupt/Blocked but the lower-level code they are calling is in their library that is not public and their entire RF controller is private so I don’t know if it had a hardware interrupt or that they could have had a timer interrupt. I suspect they didn’t want their controller code dependent on FreeRTOS, but they could have had a generic registered function call for when they knew they were waiting so could block and another (or same function with different parameter value) for when they needed to run to unblock. That way they could hook into nothing or FreeRTOS or another OS as needed.

First, I presume you have configures the Idle Task to yield after every cycle through its loop, and not to use a whole tick every time it runs. That would at least minimize the impact of switching to it.

I just searched their headers and found that, for whatever reason, ESP-IDF configured configIDLE_SHOULD_YIELD to 0, not leaving it at the FreeRTOS default of 1. So I was wrong with why there was significant delay as I didn’t realize they had done this and I incorrectly assumed the idle task was short yielding right away. Having the idle task hog the rest of a time slice after the RF test code yielded obviously would be very bad. Yet another thing I need to bring up with Espressif.

No, it wouldn’t be “trivial” to do that, as the way you described it requires the tick interrupt to walk the list of every task to reset that bit. Remember, FreeRTOS is optimized for small systems and interrupts are supposed to only do minimal, and bounded work, so scanning the list of all tasks is not appropriate.

What you described with the time-slice “quantum”, at least for marking it, is more of what I was thinking though I wrote about resetting all tasks. All tasks don’t need to be reset on the time slice. They can be marked when they yield with, for example, the time slice number or the time. That’s a simple way to know when they last yielded. So my bit idea was bad but a time would work or time-slice number if that is what you meant by “quantum” as that is recorded as-you-go one task at a time.

As for working with the chip vendor to improve their libraries, I will work on that, as well as the bad configIDLE_SHOULD_YIELD=0 configuration they set. I understand that the problems I’ve been having have been almost all related to how Espressif wrote some library code (for RF cert testing, for Flash erase, and possibly other code I haven’t yet run into) not being RTOS compliant (including having a nonblocking serial as a default) as well as making some bad/strange FreeRTOS configuration setting.

So I’m sorry I’m lashing out at FreeRTOS when it’s mostly the Espressif side. I have worked around this so this isn’t stopping me.