Better Task management system than builtin

RichPiano · February 24, 2022, 3:33pm

Hi guys. I’m in need of a more versatile and efficient task management system than what RTOS currently has to offer. To be more precise, I’m missing the following aspects:

more than 28 states in event group
task unlock on bit clearance
wait for notification AND event group bits AND mutexes at the same time

Is there maybe an external library you know that can offer these features? Also I’m using C++ mostly to make my code more robust, so a library written in C++ would also be nice.

I have about 5-6 tasks that are neatly intertwined and its very hard to control them precisely with the tools offered.

rtel · February 25, 2022, 12:00am

This is quite a list - I don’t think you will find an OS in the same class as FreeRTOS that does all these things - and if it does - it won’t be deterministic. You can probably have more than 28 bits in an event group by changing the type of the integer holding the event group - but that would require some updated - other than that are you able to design your system with the features available, rather than the other way around?

richard-damon · February 25, 2022, 1:43am

My first guess is your system isn’t properly partitioned if you have that complicated of interactions with only 5-6 tasks. I can’t say for certain, as you haven’t given any details, but it just ‘smells’ wrong.

Perhaps what it needs is a set of simpler tasks with a ‘central control module’ that takes in requests and schedules them out to tasks so the complications become ‘business logic’ of that task and not the need for a lot of primitives from the OS.

RichPiano · February 25, 2022, 5:58am

Thanks for the answers. My primary struggle is with global vs task-local states: I seem to need both and also have decision points inside my tasks that wait for either of these bits to become set.
Maybe my architecture is wrong. I designed a finite state machine and have other tasks wait for states to become set or cleared. With event groups this needs 2 bits for each state and you quickly run out of states.
Then I switched to TaskNotifications to set bits of each individual task, but now the trouble is to know wheter a certain state is currently set or if the bits have already been taken. Also I suspect there is a design flaw in the RTOS system which doesn’t really allow for using TaskNotifications as light weight event groups (as is promised in the API reference). I can go into further detail if needed.

In any case, if it is possible to inject some code into the RTOS task scheduler I will probably do that and design my own task management system. Are there any good references to start?

richard-damon · February 25, 2022, 2:29pm

One thing to remember is you can have more than one event group, so a given state machine could signal multiple groups if you really need more conditions. The only restriction is if you want to wait for one of two different states, then need to be in the same event group.

I find TaskNotifications work just fine as a lightweight event group, the one limitation is that the signaler has no way to force themselves to wait (or even test) if an event has already been taken. (Not so much an issue with event groups, which don’t have that, but makes them a bit weaker then a semaphore).

I still wonder a bit about your work division, if you have that complicated of interactions between two or more tasks, it sounds like they aren’t really independent things like tasks normally are. You may be trying to do too much (or too little) in a task.

RichPiano · February 26, 2022, 5:59am

That’s exactly the problem. I could probably with careful composition filter out all the states that are dependently waited on to one event group and put the others into an other event group. But that’s also tedious because I need to probably shift states around once I create new tasks.

Task notifications DO WORK as a lightweight event group unless you decide to use timed intervalls. In this case you need to creat a wrapping loop around the wait function and this doesn’t for some reason clear the bits as expected, so you have “leftover” states that always resurge.

My tasks ARE very interdependent. We have wifi, mqtt, ota and a token servic for secure login to our broker. When one module fails (e.g. wifi) the others need to be informed go back to base state where they wait for connection. Also some module needs to start only when SNTP is synced. Its complicated.

I’m now on the way of designing my own task management system. I just need to figure out how to inject my code into the RTOS kernel.

rtel · February 26, 2022, 6:03am

You can add code into tasks.c easily enough by using the extensions header file that is included at the bottom of the .c file - but I suspect doing what you want to do in the kernel will be harder than refactoring your application code - at least doing it in a deterministic way will.

RAc · February 26, 2022, 6:37am

Hi there RichPiano,

Without having seen any code, I would vaguely agree with Richard D. that your problem might be rather a suboptimal model of the control flow than a deficiency of the OS, although I have several times missed the equivalent of WaitForMultipleObjects in FreeRTOS myself.

One possible way to abstract out some of the more intricate dependencies might be nested FSMs - ie in every state in which a task must wait for several conditions (either AND or OR), you enter a sub FSM in which the respective wait can be implemented platform- or implementation specific (eg in mixed AND or OR scenarios, you could do subsequent timeouting waits on all conditions and then evaluate all notification states at the end of the sub fsm). That kind of thing has helped me in the past to make seemingly complex issues easier.

RichPiano · February 26, 2022, 7:22am

@rtel Thanks, spotted it! What do you mean by deterministic? As far as I can see, the RTOS library is programmed rather functionally and that’s also what I’m planning to do, with the exception of global mutex/state variables (I don’t want to need to initialize mutexes and event groups).

Do you by any chance have an RTOS architecture diagramm that I can consider to help me in the process of designing?

@RAc I also thought about timeouting waits but that would take a way from the responsiveness of the whole thing - even if its marginal in the big picture. It just doesn’t strike me as the optimal solution and it also adds more movable parts. You have to be very careful about how this thing is taped together.
I’m a C++ programmer so I was thinking of using RAII (like scope_guard), operator overloading and other features that C can’t offer but which make the job far easier, more intuitive and safer. Because we plan to add even more modules and the foundation needs to be as easy and robust as possible in my opinion.

RAc · February 26, 2022, 9:48am

Agreed, but as long as there is no “native” superior implementation equivalent of WFMO(…,ANY,…) that serves your needs*, I don’t see a better solution. My point is that by encapsulating the “cumulative wait FSM state” via a sub FSM, you are free in implementing that sub FSM whichever way you want, for example first provide an inferior sub FSM which can later be replaced by a better implementation. Of course that could also be accomplished by functional abstraction (even would amount to the same if there was a suffiecient implementation), but a sub FSM would be more flexible in that you can traverse different control paths via different waits more naturally, but of course that is only my gut feeling.

*To be more detailed, I couldn’t think of any fully “fair” or indeterministic usable implementation of a multiple OR wait function. WFMO, for example, has a prioritization of the objects in the static list in that the first signalled object in the array will be honored before those after, so your application has to be aware of that and ensure through some mechanism that objects later in the array are not starved out. Actually, you want that kind of control; any non determinism via the OS would take away control over the real time behavior.

richard-damon · February 26, 2022, 1:05pm

Again, not having seen the code, my guess is bad factoring and interaction API. For things somewhat like what you are describing I solve the issue at the application side rather than require the OS to try to handle it.

I use a lot of application side subscribing to events, where a module keeps a list of routines that are called when something happens, and those call-backs are part of the subscribers’ code base but run in the context of the subscribed to process, which allow each subscriber to notify itself in an appropriate way. Typically, the subscriber will be waiting on the event it NORMALLY is handling, and the callback either injects a special version of that event or does an abort-wait after setting a flag to be tested on timeout. This way the error conditions don’t require complications out of the OS. It also allows each subscriber to decide what is best of IT, and not require me to begin with a decision on the best way globally.

It sounds like you are designing as a monolith, which is normally much harder and doesn’t handle complications well. Think about modularity with simple defined interactions.

One big requirement that I find is I need to be able to trust myself (in documented ways) for things like that call-back routine. Yes, a bad-call back will break the system, but ANY bad module can break the system. This isn’t a heavy protected mode OS where everything is restricted to its own sand-box, it is a cooperative system. If that model doesn’t fit, you are likely using the wrong OS.

RichPiano · February 28, 2022, 7:42am

You’re replies are definitely food for thought. I’m starting to think how I would implement the proposed architectures with my use case. Is any of your code open source by any chance? I would definitely take a look at it.

But in any case
@RAc As far as I understand you would use global and local event groups then, where you check the global groups and then drop into checking a more local event group for you sub-fsm?

@richard-damon Interesting idea with event loops and tasks! How are you interfacing the event loops? Do modules just “know” about the loop handle of loop to subscribe to or is this abstracted away in a global manner? How many of those event-processing tasks do you usually set up?

richard-damon · February 28, 2022, 12:09pm

If your question is how to tasks know what notifications to subscribe to, if that is really your question, your system is too complicated and wrongly factored. When you design a task, you should be able to know as you design it what sorts of global events it needs to know about, and what service is providing that notification.

For instance, if the system may change its clock speed, and devices need to know about the current speed of the clock, they know they need to be subscribed to the clock change module, and thus are.

If this basic of an operation doesn’t have a well defined home, you aren’t factoring your logic right.

Note, I am not saying that a single task is responsible for deciding the clock frequency, or that it is even a ‘task’ that is doing it. The clock module gets inputs from requestors and sends out notifications to subscribers and manipulates the system clock. There is a base API defined for all of these, and an implementation layer around this that makes things happen depending on the needs of that system. Many systems that don’t need this capability may just stub it out making the operations in the API just no ops.

RichPiano · February 28, 2022, 8:26pm

That’s certainly a clever approach. But how do you do state management within the modules? Do you keep a local event group around and dispatches events based on certain states if the become set?

richard-damon · February 28, 2022, 9:13pm

Modules with a task inside them to implement them, just use local variables to keep track of the state of the module. If a module reaches a ‘state’ that other modules might want to know about, it just runs down its subscriber list of call-backs to let them know the event happened. Each call-back does what it’s module needs to be done to be notified of the event. That might be posting to a queue or signaling a semaphore, or setting a local variable and aborting a wait. The key primary rule is these call-backs need to be quick (though in some cases they might do the real work like update baud rate registers).

One key is that ‘Tasks’ are not the fundamental things I build with, but they are just implementation details within a module, and some modules (like low level I/O drivers) may not have a task connected to them at all.

RichPiano · March 3, 2022, 8:05am

@richard-damon That sounds like a valid pub-sub implementation.

I’ve dug down further into the RTOS kernel and discovered it’s not even too complicated to understand. Here’s an interesting read for anyone interested: Christopher Svec & Richard Barry on FreeRTOS

This was left out though but I figured it out: Each task control block that contains an item named “xEventListItem” of type xList. The list item will be put into an xList created by xEventGroupCreate() when such a call happens. This simultanously means that each task can only be contained in one event group at the time but that’s of course not a limitation as the task can only ever wait for event bits in a certain event group and not multiple.

xTaskIncrementTick() is responsible to kick the task out of its blocked state by removing it from the event list in time (if a time interval is set). So basically I have three parts:

Wait for bits
Set bits
xTaskIncrementTick() which resumes the task.

I can control 1) and 2) as I intend to write new functions anyway. I can’t control 3) as the function doesn’t provide any hook points but it doesn’t matter as it is only responsible for resuming the task at the end of the wait period. The xList allows for each list item to contain an xItemValue of type TickType_t that I can use for my purposes. The problem is my hijacking overlaps the EventBits implementation partially. When I store a handle to my API inside xItemValue, chances are that foreign calls to xEventGroupWaitBits() will interpret my handle as bits and get confused. Also, the function triggers a configASSERT() if I accidently set one of the control bits.

So in any case, using my implementation and EventGroups must be mutualy exclusive to avoid such bugs. If any FreeRTOS maintainer/developer knows a way to integrate this smoothly im thankful. But in any other case, this would be the way I’m intending to go.

StackMaster · March 20, 2023, 3:45am

I remember writing a while ago in the forum about how happy I would be if WFMO were added to FreeRTOS. As I said then, I am firmly committed to the notion that WFMO, or its equivalent, is not just a “nice to have” in OS’s. It is fundamental, as in distilled water fundamental. I said that it was fundamental in 1999, and several times over the years, and alas… here we are in 2023… and still… hard to believe how slowly the general programming community is creeping toward this realization with various ad-hoc frameworks. I see posts like the OP’s all over the Internet trying to find the right primitive for their multi-threading model, and a little ground hog on my shoulder screams… WFMO!!! WFMO!!! What you need is WFMO!!!

I did not have time to add WFMO to FreeRTOS then, and I still don’t now, but FreeRTOS is the future, and a wonderful OS already, and the benefit of doing so would be so great, I would be willing to work with the OP (or anyone else) in adding this.

If anyone is interested, please let me know.

richard-damon · March 20, 2023, 2:26pm

@StackMaster assumin you mean Wait For Multiple Objects by WFMO, FreeRTOS DOES support it in a limited manner (with EventGroups and QueueSets), and in my experience it is something rarely used. It might be “fundamental” to your programming style, that my guess is oriented to “large machines”, but in the Real Time environment, I find it unimportant. A given task should have ONE job to do, so its priority and dead-lines can be well established. That normally implies that it will be waiting for ONE thing to enable it.

Perhaps as FreeRTOS “grows up” to being capable of using more advanced features of larger processors, getting a broader WFMO capability will be useful, but WFMO comes at a significant cost to the kernel, and that cost may well be unacceptable in many cases. The current design allows for a totally static memory allocation, and that can be important. I don’t know of any way to support “General” WFMO that can be done statically except by pre-allocating a total worse case assortment of interconnections, and even then, it replaces known single operations with list walking, which violates the “Real-Time” promises of those operations.

StackMaster · March 20, 2023, 4:10pm

@richard-damon That’s my point. It is not my “programming style”. It really is fundamental, in the sense that 100 years from now, people will look back at WFMO, and say, “What were we thinking.” There are literally 1000’s of programmers all over the world at this very moment trying to make their multi-threaded model tractable. They creep toward WFMO. The more they mull, the closer they get to WFMO. If someone develops a gimpy framework that is non-portable and not very clean, but in the spirit of WFMO, still, they get very excited, and rightly so, because their brain is telling them that they are at least on the right path. I admit to asserting this without proof. I am only asserting it because… whenever I see a programmer creeping toward WFMO, on their own accord, through meandering musing, I want assure them that yes, they found something fundamental, not merely “just another way of doing something”.

That said, I do agree that FreeRTOS is a real-time operating system. But consider: the ESP32 folks now number in the millions. How many of them are using the real-time features, directly , of FreeRTOS? How many of them would care if it were not real-time? These programmers do not have the choice, really, of a other-than-FreeRTOS OS on ESP32, so whether it was meant primarily for real-time applications, for them, it is the opposite: For them, it is their general OS.

I agree about a necessary penalty of adding WFMO. Nothing is free. I also agree that the real-time folks should not suffer because the general-OS folks want WMFO. It would be the responsibility of whoever adds WFMO to do it in such a way so as not to penalize those who do not want it. #define comes to mind.

richard-damon · March 20, 2023, 4:38pm

@StackMaster No, my point is that all those people looking to WFMO are stuck in a bad model for REAL TIME code, but are thinking in terms of BIG IRON machines.

Creeping to WFMO is a move in the WRONG direction for clear REAL-TIME code, as it is a sign that you task is trying to do too many things at once.

Yes, some people use FreeRTOS for situations where it isn’t specifically needed, and supporting them is good, but NOT at the cost of hurting the core purpose of Real Time operation.

Maybe the answer is to spin off a fork as FreeOS that removes a lot of the Real-Time focus and be a more general solution. If it matures well enough, it might be able to be folded back to the trunk with compile time options sort of like how the SMP branch is working.