Ensuring idle task runs after vTaskDelete

RAc · December 27, 2024, 12:28pm

RichardFalk:

One is where one is using the ESP32-C3 with its USB CDC support. In that case this isn’t one’s own UART setup with one’s own UART event handler. In this situation, one can have a message processing and event dispatcher loop that polls serial (via a non-blocking “getchar”, for example) and can do so where it is a slow poll (say, once a second, with a task suspend until a timer interrupt resumes it) until one sees a character and is a fast poll once typing is detected and then has a timeout where enough time with no typing goes back to a slow polling.

This works well for serial debugging using USB with the ESP32-C3 but leads to needing to intentionally run the idle task. Again, not the normal situation where one is in full control, but this sort of thing can happen when using other’s libraries (in this case, ESP-IDF).

An obvious solution that most of us revert to sooner or later is abandon the third-party eco system. Most are poorly written and aim at either vendor lock in or highest possible platform compatibility.

Without having followed the entire thread, I have a feeling that your argumentation centers around a) debugging issues or b) poor design issues such as frequently deleting and recreating tasks, as Richard D. pointed out correctly.

As for serial debugging: It is bad to begin with as its sole presence already changes the runtime behavior of your system so significantly that you are likely to deal with a drastically differing set of issues with vs. without it. Also, it tends up to use shiploads of resources if not well used (eg stack space and peripheral CPU cycles, in particular when being used with polling).

I may be misinterpreting your argumentation, but I do not see a need either to change anything in the core system to accomodate for fringe issues.

Finally, I can not see why you and the OP appear to be so terribly worried about “leaking” TCBs. From what I remember, TCBs do not take up too much memory themselves, so this becomes a problem only if tasks are (re)created very frequently which (again) is a bad design to begin with.

I can see how task stacks would be more of an issue as stacks can indeed take up significant memory. If that should be a problem, there is always the possibility to maintain a pool of task stacks and use static task allocation to arbitrate those, but again, I am with Richard here: Dynamically deleting and recreating tasks can and should almost always be avoided, and in my book, there should never be a use case in which the recreation is such a frequent issue that “leaking” task memory can indeed be a bottleneck.

RichardFalk · December 27, 2024, 7:56pm

RAc, regarding the task memory, I think the reason that tasks were created and deleted on an as-needed basis, so occurred frequently, was that we originally had the product on an AVR ATxmega128A1U with 8 KB RAM and 128 KB Flash using external BLE and Wi-Fi chips. So the idea of being tightly resource constrained and only allocating/using what one needed temporarily was the initial mindset when porting to the ESP32 with FreeRTOS especially by the low-level firmware and hardware developer. In fact, our judicious use of RAM which had shared sections (i.e. one at a time usage for non-overlapping usage types) was still needed in the ESP32 in order to use Deep Sleep which only has 8 KB of RTC (slow) preservied-during-sleep memory even though it had 520 KB of additional RAM when awake. Ours is an IoT battery-operated product so minimizing power consumption is essential.

Also, is it just TCB memory? Isn’t it also thread local stack memory? I think it’s the latter (as you later mentioned) that rather quickly added up when tasks were created/deleted without calling the idle task. It didn’t take that many (a few dozen?) create/delete of tasks before we ran out of memory.

Since most of our hardware is “one-at-a-time” usage, the hardware low-level firmware developer doesn’t pre-allocate all the tasks and with stack memory that would use up an awful lot of memory given the many hardware components and functions available. Multiple PWM motors, LEDs, photodiodes (ADC), accelerometer, humidity sensor, moisture sensor, water flow sensor, temperature, voltage, plus Wi-Fi tasks. Given that the device is asleep most of the time and only wakes up to do the heaviest measurements once or a few times per day and only checks in with the cloud every 30 minutes, it doesn’t make sense to have everything pre-allocated all at once, especially when we were using Deep Sleep that did not retain most of the memory (of course one is starting from scratch in that case so wouldn’t run out of memory) and where a wake up most of the time would go back to sleep so you didn’t want to do a lot during startup. The ESP32-C3 we moved to uses Light Sleep so retains memory due to integrated BLE that wakes up every second (or so) for advertising so has to be kept alive (i.e. keep memory) though light sleep does have more of an energy consumption penalty (130 µA vs. 10 µA for deep sleep not counting additional leakage currents from our components).

We don’t only do serial debugging and do GDB/breakpoint and GPIO signal and other debugging (mostly looking at external signals) in the very rare cases it is needed. Most low-level bugs are in hardware/library interaction where examining registers/states is most helpful even printing those out over serial. Pretty much the only timing bugs we see are certain required waits for other devices or library code (i.e. not doing certain calls back-to-back) which really are flaws of others we have to work around (unless they are documented). Because of how we write our code, we don’t have multi-processor (ESP32 had 2 cores) deadlocks and in fact I found an ESP-IDF bug that did have a race condition assertion and I created the trivial fix (they had two source code lines out of order) but it took me a month to convince Espressif to fix it though once they put a senior developer looking at it he immediately committed it and backported the fix to earlier releases as well. But mostly, serial is used extensively for all kinds of product testing (i.e. firmware application and low-level code toegether) and initiating functionality locally not requiring the mobile app the user would use.

If we step up from the weeds here the high-level point isn’t freeing memory or tickling the watchdog timer but rather why these are deferred to an idle task in the first place? For self-deleted tasks, couldn’t the scheduler deal with that freeing of memory instead of waiting for the idle task to run to do it? As for tickling the watchdog timer, shouldn’t that be something the developer does in their event/sleep loop since they know when they are in an effectively idle execution path whether they end up blocking or sleeping or not? Having the watchdog tickle be an option for a priority 0 idle task is fine but having it be unavoidable is an issue as FreeRTOS configuration currently only allows full on/off for task watchdog, not just turning it off for the idle task (which you don’t allow) but still using it elsewhere (which you do allow) nor having it as an explicit tickle function call, not just watching to see if a task is run.

It seems like the FreeRTOS design doesn’t just encourage fully blocked-till-interrupt design (i.e. no busy status or wait loop) but fully requires it in spite of the real-world dealing with 3rd party (MCU vendor) libraries that may not play that way. As aggarg stated in his response, we have a workaround in our code and you want to avoid promoting bad designs, but the real-world sometimes doesn’t cooperate with ideals. As for avoiding 3rd party libraries, that is impossible with some chips like the Espressif ESP32 variants because their integrated RF code is opaque (not open-source) and they also have undocumented ROM functions but also most of their library code works well and is useful. On balance, it’s been pretty fast to develop with their development environment.

hs2 · December 27, 2024, 10:00pm

As far as I know it’s not a FreeRTOS thing that a watchdog is kicked by the idle task. It’s up the the application how, which watchdog is retriggered.
The serious drawback of freeing up deleted task resources immediately is that it might cause a serious, often non-deterministic (depends on the heap implementation used) runtime impact to important things going on caused by a usually rather unimportant event (deletion of a task). That’s what should be avoided.
But if you want to handle task deletion as a hi-prio event b/c you want to free up memory asap. whatever it takes, better signal a delete request notification as very last task action right before e.g. vTaskSuspend(NULL); to a hi-prio task which deletes the task at a dedicated prio (immediately).

richard-damon · December 27, 2024, 10:22pm

The Freeing of the memory MUST be defered to the idle task, as if the task tried to release the memory itself, it would free the memory for the stack in a call to the memory freeing routine that uses that stack, and thus you have a used after free error, that if some task got switched to after the freeing happened, but before the dying task actaully got itself removed from the active execution list, its task might be reused by another newly created task while it was still using it. You have a reverse chicken and egg problem, that you need the stack to get rid of it. The answer is that the final step needs to be transfered to something else, and the idle task is a good choice for that.

As I mentioned before, an alternative would be to create your own “reaper” task that your task does all of the other resource cleanup that it can do, the tells the reaper to kill it while it goes into an infinite block loop waiting to be killed.

it isn’t FreeRTOS, it is the classical “Real-Time” design criteria that require it. Tasks that “busy loop” do not cooperate with a CPU sharing system, and thus the general rule is such operations must be done at “idle” priority. Libraries that don’t play that way are just incompatible with “Real-Time” design.

If you can’t avoid the library, then my answer would be to avoid them anyway, (and perhaps the hardware) or just admit that the advantages of a Real-Time operating system aren’t available for you.

As I have said many time, Task Priority level 0 can tolerate those sort of activities, and refusal to put those things there is just creating problems for yourself.

RichardFalk · December 27, 2024, 11:35pm

The Freeing of the memory MUST be defered to the idle task

I don’t understand why the scheduler can’t, after switching context to the new task, just then do the equivalent of vTaskDelete(taskId) as if it came from that new task (i.e. do it after setting up the new task stack and execution environment but just before jumping to the new task code). You already safely allow one task to delete another and do appropriate cleanup at that time. What am I missing?

RAc · December 27, 2024, 11:36pm

The implication of this statement is that FreeRTOS and the real world would be incompatible or even mutually exclusive. If that is how you feel about it, there is really not much left to discuss. Having been a developer, developer trainer and book author in industrial embedded systems for almost 30 years now (and from what I know, Richard D. and Hartmut have similar backgrounds), I claim to have some knowledge of what the “real world” comprises.

I agree with you that the idle task is not the very best place to delegate important computations to, but I can assure you that task deletion is not an “important computation” in the real world. 90+% of all systems I have come across never do that, and in a good deal of the remaining architectures, race conditions stemming from deferred task resource deallocation are the least issue to worry about.

Middleware packages hardly ever harmonize perfectly with one another. There will always be cases where the design of one does not fit well into the other. However, given that standalone (“main loop only”/no RTOS) architectures these days play a miniscule role, why should it be the responsibility of an RTOS do accomodate for libraries that work better with no RTOS at all?

RAc · December 27, 2024, 11:39pm

If I understand you correctly, this would make context switch time unpredictable and thus seriously impact real time behavior and expectation.

RichardFalk · December 27, 2024, 11:44pm

If I understand you correctly, this would make context switch time unpredictable and thus seriously impact real time behavior and expectation.

No, a pre-emptive context switch doesn’t delete the task so is still predictable. It is the “vTaskDelete(NULL)” itself that would simply take longer for the implicit context switch that occurs after such task deletion with cleanup.

And a developer that cares about that can delete the task from another task in a time and manner of their choosing.

RichardFalk · December 27, 2024, 11:58pm

You are right, that isn’t a FreeRTOS thing. I thought it was, but I now see that this was part of the ESP-IDF library (because it was tied to tasks I thought it was in FreeRTOS). So their tying the task watchdog to the idle task and letting one add other tasks to watch but not allowing the idle task one to be removed (except by removing everything) is their issue, not yours. Sorry I brought that up.

rtel · December 28, 2024, 12:05am

This thread contains a lot of good detail, which I admit I’ve not read in its entirety, but some high level comments in no particular order:

It is rare for a vendor supplied library to be applicable to every disparate requirements of deeply embedded systems. What they are really good for is starting with something that “works” and can then be tailored to the needs of your particular application. Some years back we provided the FreeRTOS+IO library which is a good example of the complexity. That had various different “transaction modes” from fully polling to using a DMA, and several in between to account for different ways users may want to use the library. That, however, made it much too hard to port to new architectures and peripherals. If I were to select just one transaction mode in a driver intended for distribution and use in other’s applications I would probably go for a simple interrupt mode driver. Then, that would be a “mid” point that could hopefully then be adapted to be either a polling driver (don’t configure the interrupts), or a DMA driver (extend the interrupt implementation) - both of which unfortunately require work on the developer’s part.
Delays that are inserted into tasks purely to let other tasks run are generally a “bad smell” in a design - but then a design only has to meet its individual requirements, and if you can get the benefits of modularity, reuse, etc. by encapsulating something into a task that also includes an arbitrary delay so it doesn’t consume 100% CPU time, then that would seem to be the simplest way that works, which is generally the best way. It does impact portability and future reuse though.
Multithreading is useful in itself, as per point 2 immediately above. Real time behaviour may or may not require multithreading. Applications often have a mix of hard, soft and non-real-time requirements. If hard real time requirements can’t be met within the constraints of a thread, then there are alternatives, like having them triggered of a fast timer, or performing the necessary operation in an ISR, but use cases and hardware are both so diverse there is no one right or wrong way of doing these things.
The kernel doesn’t (or at least shouldn’t) do anything non-deterministic either in an interrupt or in a critical section. Allocating and freeing memory are, normally, comparatively long non deterministic operations. If one task deletes another, then the TCB and heap allocated to the task get deleted immediately. If a task deletes itself, then the idle task will clean up those resources. An alternative would be for the kernel to attempt to clean up the resources when a task deletes itself too - but that would add additional code that takes time to execute and additional code space that has little benefit for the huge majority of times it gets called - plus could breach timing assumptions made by the majority of calls. If you need resources cleaned up immediately then you could perhaps do something like use a hook macros to pend a function call that cleans up the resources from a high priority task, rather than the idle task, although I’ve not tried this so don’t know if it would encounter races or not.

I hope my notes aren’t too far off the mark.

RichardFalk · December 28, 2024, 12:07am

I heard you and am making such changes putting any non-compliant code (i.e. calls to non-RTOS compiant code in the ESP-IDF library) into a task with priority 0 so their “yield” in RF FC cert code works and I will contact Espressif to fix that the right way using block/interrupt/resume instead.
I’m also trying to get their USB serial interrupt/blocking driver installed and working (they don’t document it explicitly and it isn’t the default on startup, but they do use it in a console package so it should work – I’m just not done with that yet). These two things should eliminate the need for the tasknotify code hack I did (not that this was so terrible, but it was certainly a workaround hack and quite frankly less work than what I’m doing now).

RichardFalk · December 28, 2024, 12:14am

Shouldn’t the cleanup code be similar in the case of another task calling the vTaskDelete vs. having that done in the kernel/scheduler after the switched to task environment is set up? That is, it would largely be a call to common code, not duplicated.

People expect task creation to be somewhat time consuming so why wouldn’t deletion be similar? Why would self-task deletion be a special case for this? And if you did want that to be the case, you could always make that a configuration option or an optional parameter (or second version of vTaskDelete) that would let one choose between doing the cleanup immediately instead of deferred.

RichardFalk · December 28, 2024, 12:21am

I agree but the reason that was brought up is because that is what is done with Flash erase code because there isn’t an interrupt to tell one that the erasure is done. Instead, one must poll a status register. And yes, that’s a hardware bad design but the only problem with what Espressif did for this is that they should have put that polling code into a priority 0 task and done a yield. Though having a 1 tick delay (at the caller’s task priority) for a much slower erasure is not a horrible thing.

rtel · December 28, 2024, 12:44am

I suppose its an implementation detail of the kernel.

If task A deletes task B, then the heap and TCB allocated to task B are freed from within the context of task A.

If task A deletes itself, then its TCB and heap can only be deleted when its certain they won’t get used again. Typically, the code that performs a context switch starts by saving the context of the currently executing task. That pushes registers onto the stack of the currently running task, so the stack cannot be freed until after this happens (same could happen on any interrupt that co-incidentally occurs at this time). Considering where the stack and TCB could be freed:

From within the vTaskDelete() function. If this were the case then the interrupt that performs the context switch couldn’t be called if the hardware itself uses the stack. In the cases where ISRs have a separate stack, each and every ISR entry would require a test to see if the task’s stack was still valid - that would add instructions to code that executes with very high frequency, wasting CPU cycles in almost every case.

[edit] An alternative would be to have a separate “half switch” function that pops registers for a newly selected task without ever saving those from the currently running task - but that is non portable and wouldn’t work in all cases either (for example, ARMv4T architectures use a synchronous service call to perform the register saves, whereas ARMv8M use an interrupt).[/edit].

From within the code that selects the next task. That code executes after the context of the currently executing task has already been saves (actually, that depends on the port, but lets not make it too complex). Again, it would have to determine if the task it was switching away from had been deleted, and free the resources if so. That will be a way too long operations to perform in the context switch code, and most likely not thread or interrupt safe anyway (depending on the heap allocation algorithm).
From within the context of another task. This is the option FreeRTOS takes. The idle task is used for two reasons: 1) its the only FreeRTOS created task that always exists, no matter what the FreeRTOSConfig.h settings. 2) it is the lowest priority task. Using a higher priority task would make the application code non-deterministic. i.e. the application writer may expect one of their application tasks to execute, but in reality the scheduler would select a high priority system task instead of their application task.

richard-damon · December 28, 2024, 1:08am

The problem is the scheduler just schedules tasks, so you need a task around to do that, or you have some special case code in the scheduler, that it needs to spend time on EVERY schedule change to see if the task that it is switching from has killed itself to run some special code outside of any task to complete the operation. Also, on many processors, the scheduler is running in an ISR context as a special interrupt, and vPortFree may not be safe to call in that context (note, it does not end in “FromISR”), so the kernal wouldn’t be allowed to make the call.

That isn’t a problem when some other task deletes that task, as if first marks the task to never run again, and then it can “at its leisure” delete it as part of that other task and not worry about a conflict. Note, when a task deletes itself, it would need to be careful about marking itself not to run, as if it gets interrupted after that point, it might not finish deleting itself, making for a longer critical section.

The issue isn’t that deleting a task is a length operation, it is that to make the task correctly delete itself, a large part of the operation would need to be inside a critical section to avoid the issue of needing to free a resource that you are still using and needing to avoid something else getting that resource.

RichardFalk · December 28, 2024, 2:10am

This is sort of but not quite what I was suggesting. First off, why does a self-task delete of vTaskDelay(NULL) save its context anyway since that task is to be deleted and never to be returned to? Why wouldn’t vTaskDelay(NULL) in calling the task switching code pass it a parameter/flag saying in effect “I’m getting deleted so don’t bother saving my context”? Nevertheless, that’s just an optimization and not a requirement, but it would make it more efficient and is something that could be done (and marked in the TCB) regardless of when the full deletion and clean-up is done.

As for the time consumption and when/how the deletion/cleanup is done, I’ll respond to richard-damon since his explanation is more detailed on the limitations for this step.

rtel · December 28, 2024, 2:20am

This is sort of but not quite what I was suggesting. First off, why does a self-task delete of vTaskDelay(NULL) save its context anyway since that task is to be deleted and never to be returned to? Why wouldn’t vTaskDelay(NULL) in calling the task switching code pass it a parameter/flag saying in effect “I’m getting deleted so don’t bother saving my context”? Nevertheless, that’s just an optimization and not a requirement, but it would make it more efficient and is something that could be done (and marked in the TCB) regardless of when the full deletion and clean-up is done.

I disagree on the point of it being more efficient and think it is actually the opposite. The code that saves a task’s context is not in the vTaskDelete() function and executes with very high regularity. If, each time it executed, it had a separate test to see if the current task had just deleted itself, that overhead would be on every context switch - whereas it would hardly ever be needed. Also, the task can be interrupted at any time - so on architectures that don’t have a separate ISR stack every interrupt would need to check whether there was a stack available or not and probably abort in the “not” case.

RichardFalk · December 28, 2024, 3:41am

Thanks for the detail. I now see how my initial naive thinking that this could be done in the switched-to task wouldn’t work because it’s not at the start of a new task and can be in the middle of unusual register states. A clean task “home” is needed to do the vTaskDelete call that both marks and then cleans up.

As for a safe home for the actual task deletion, now I understand why the idle task that always exists was chosen. A safe but only being somewhat better way is to have this self-task deletion in a separate task that can be higher priority (probably configurable) than the idle task. This task can be suspended until a vTaskDelete(NULL) is done where that sends a message (puts in a queue with the task ID to delete) and suspends. The deletion task then does the vTaskDelete(task_ID_to_delete). Of course, one could write one’s own vTaskDeleteNow function and their own deletion task that calls vTaskDelete to do all this.

I understand that if freeing memory were slow, such as having a garbage collector, then one wouldn’t want this task deletion/cleanup to be too high in priority. However, in a system without a garbage collector where the free is very, very fast (just marks a block as freed and merges any pre/post freed blocks and possibly adjusting total heap size) then having the task doing deletion/cleanup be very high priority is not a big deal. So having this be a configurable priority that is architecture dependent would make sense.

richard-damon · December 28, 2024, 4:41am

The code for vTaskDelete(NULL) can (and must) “mark” the task, as whatever is going to handle things, you still don’t want the task to get scheduled to run after this, and that is what is done.

Note, the slowness in Freeing memory is that for most memory allocators, you need to search the “free list” to see if there is a free block jut before or after the block you are freeing. There are fixed block allocators (that might allocate multiple consecutive blocks) that use a bit map to keep track of the free list, which can make freeing fast, but then finding a free block can be slower. Scanning that free block list can be “slow” on the scale of Real Time, even if it might be not bad for a non-real time system.

RichardFalk · December 28, 2024, 5:27am

I was thinking that after sending the message to the task that will do the deletion that the task trying to delete itself just has a suspend loop after the message send (i.e. suspends and if reawakened it just suspends again). Wouldn’t that work for the other task to be able to delete it? It doesn’t really have to mark it in this case. It’s kind of a zombie task “waiting to be deleted”. Also other tasks shouldn’t be trying to resume it but if they did try they wouldn’t cause any harm (they would error trying to resume the task after the task got deleted, of course). Maybe I’m missing something.

Yes, I understand the issue regarding the free chain (I didn’t at first but just walked through the cases and see where the walk backward or forward could go through a lot of list entries to find what to change). It could be a fast free if there were no free chain which would require malloc to be (excessively, so not done in practice) slower walking through all blocks to find free ones.