Better Task management system than builtin

Hi guys. I’m in need of a more versatile and efficient task management system than what RTOS currently has to offer. To be more precise, I’m missing the following aspects:

  • more than 28 states in event group
  • task unlock on bit clearance
  • wait for notification AND event group bits AND mutexes at the same time

Is there maybe an external library you know that can offer these features? Also I’m using C++ mostly to make my code more robust, so a library written in C++ would also be nice.

I have about 5-6 tasks that are neatly intertwined and its very hard to control them precisely with the tools offered.

This is quite a list - I don’t think you will find an OS in the same class as FreeRTOS that does all these things - and if it does - it won’t be deterministic. You can probably have more than 28 bits in an event group by changing the type of the integer holding the event group - but that would require some updated - other than that are you able to design your system with the features available, rather than the other way around?

My first guess is your system isn’t properly partitioned if you have that complicated of interactions with only 5-6 tasks. I can’t say for certain, as you haven’t given any details, but it just ‘smells’ wrong.

Perhaps what it needs is a set of simpler tasks with a ‘central control module’ that takes in requests and schedules them out to tasks so the complications become ‘business logic’ of that task and not the need for a lot of primitives from the OS.

Thanks for the answers. My primary struggle is with global vs task-local states: I seem to need both and also have decision points inside my tasks that wait for either of these bits to become set.
Maybe my architecture is wrong. I designed a finite state machine and have other tasks wait for states to become set or cleared. With event groups this needs 2 bits for each state and you quickly run out of states.
Then I switched to TaskNotifications to set bits of each individual task, but now the trouble is to know wheter a certain state is currently set or if the bits have already been taken. Also I suspect there is a design flaw in the RTOS system which doesn’t really allow for using TaskNotifications as light weight event groups (as is promised in the API reference). I can go into further detail if needed.

In any case, if it is possible to inject some code into the RTOS task scheduler I will probably do that and design my own task management system. Are there any good references to start?

One thing to remember is you can have more than one event group, so a given state machine could signal multiple groups if you really need more conditions. The only restriction is if you want to wait for one of two different states, then need to be in the same event group.

I find TaskNotifications work just fine as a lightweight event group, the one limitation is that the signaler has no way to force themselves to wait (or even test) if an event has already been taken. (Not so much an issue with event groups, which don’t have that, but makes them a bit weaker then a semaphore).

I still wonder a bit about your work division, if you have that complicated of interactions between two or more tasks, it sounds like they aren’t really independent things like tasks normally are. You may be trying to do too much (or too little) in a task.

That’s exactly the problem. I could probably with careful composition filter out all the states that are dependently waited on to one event group and put the others into an other event group. But that’s also tedious because I need to probably shift states around once I create new tasks.

Task notifications DO WORK as a lightweight event group unless you decide to use timed intervalls. In this case you need to creat a wrapping loop around the wait function and this doesn’t for some reason clear the bits as expected, so you have “leftover” states that always resurge.

My tasks ARE very interdependent. We have wifi, mqtt, ota and a token servic for secure login to our broker. When one module fails (e.g. wifi) the others need to be informed go back to base state where they wait for connection. Also some module needs to start only when SNTP is synced. Its complicated.

I’m now on the way of designing my own task management system. I just need to figure out how to inject my code into the RTOS kernel.

You can add code into tasks.c easily enough by using the extensions header file that is included at the bottom of the .c file - but I suspect doing what you want to do in the kernel will be harder than refactoring your application code - at least doing it in a deterministic way will.

Hi there RichPiano,

Without having seen any code, I would vaguely agree with Richard D. that your problem might be rather a suboptimal model of the control flow than a deficiency of the OS, although I have several times missed the equivalent of WaitForMultipleObjects in FreeRTOS myself.

One possible way to abstract out some of the more intricate dependencies might be nested FSMs - ie in every state in which a task must wait for several conditions (either AND or OR), you enter a sub FSM in which the respective wait can be implemented platform- or implementation specific (eg in mixed AND or OR scenarios, you could do subsequent timeouting waits on all conditions and then evaluate all notification states at the end of the sub fsm). That kind of thing has helped me in the past to make seemingly complex issues easier.

@rtel Thanks, spotted it! What do you mean by deterministic? As far as I can see, the RTOS library is programmed rather functionally and that’s also what I’m planning to do, with the exception of global mutex/state variables (I don’t want to need to initialize mutexes and event groups).

Do you by any chance have an RTOS architecture diagramm that I can consider to help me in the process of designing?

@RAc I also thought about timeouting waits but that would take a way from the responsiveness of the whole thing - even if its marginal in the big picture. It just doesn’t strike me as the optimal solution and it also adds more movable parts. You have to be very careful about how this thing is taped together.
I’m a C++ programmer so I was thinking of using RAII (like scope_guard), operator overloading and other features that C can’t offer but which make the job far easier, more intuitive and safer. Because we plan to add even more modules and the foundation needs to be as easy and robust as possible in my opinion.

Agreed, but as long as there is no “native” superior implementation equivalent of WFMO(…,ANY,…) that serves your needs*, I don’t see a better solution. My point is that by encapsulating the “cumulative wait FSM state” via a sub FSM, you are free in implementing that sub FSM whichever way you want, for example first provide an inferior sub FSM which can later be replaced by a better implementation. Of course that could also be accomplished by functional abstraction (even would amount to the same if there was a suffiecient implementation), but a sub FSM would be more flexible in that you can traverse different control paths via different waits more naturally, but of course that is only my gut feeling.

*To be more detailed, I couldn’t think of any fully “fair” or indeterministic usable implementation of a multiple OR wait function. WFMO, for example, has a prioritization of the objects in the static list in that the first signalled object in the array will be honored before those after, so your application has to be aware of that and ensure through some mechanism that objects later in the array are not starved out. Actually, you want that kind of control; any non determinism via the OS would take away control over the real time behavior.

Again, not having seen the code, my guess is bad factoring and interaction API. For things somewhat like what you are describing I solve the issue at the application side rather than require the OS to try to handle it.

I use a lot of application side subscribing to events, where a module keeps a list of routines that are called when something happens, and those call-backs are part of the subscribers’ code base but run in the context of the subscribed to process, which allow each subscriber to notify itself in an appropriate way. Typically, the subscriber will be waiting on the event it NORMALLY is handling, and the callback either injects a special version of that event or does an abort-wait after setting a flag to be tested on timeout. This way the error conditions don’t require complications out of the OS. It also allows each subscriber to decide what is best of IT, and not require me to begin with a decision on the best way globally.

It sounds like you are designing as a monolith, which is normally much harder and doesn’t handle complications well. Think about modularity with simple defined interactions.

One big requirement that I find is I need to be able to trust myself (in documented ways) for things like that call-back routine. Yes, a bad-call back will break the system, but ANY bad module can break the system. This isn’t a heavy protected mode OS where everything is restricted to its own sand-box, it is a cooperative system. If that model doesn’t fit, you are likely using the wrong OS.

You’re replies are definitely food for thought. I’m starting to think how I would implement the proposed architectures with my use case. Is any of your code open source by any chance? I would definitely take a look at it.

But in any case
@RAc As far as I understand you would use global and local event groups then, where you check the global groups and then drop into checking a more local event group for you sub-fsm?

@richard-damon Interesting idea with event loops and tasks! How are you interfacing the event loops? Do modules just “know” about the loop handle of loop to subscribe to or is this abstracted away in a global manner? How many of those event-processing tasks do you usually set up?

If your question is how to tasks know what notifications to subscribe to, if that is really your question, your system is too complicated and wrongly factored. When you design a task, you should be able to know as you design it what sorts of global events it needs to know about, and what service is providing that notification.

For instance, if the system may change its clock speed, and devices need to know about the current speed of the clock, they know they need to be subscribed to the clock change module, and thus are.

If this basic of an operation doesn’t have a well defined home, you aren’t factoring your logic right.

Note, I am not saying that a single task is responsible for deciding the clock frequency, or that it is even a ‘task’ that is doing it. The clock module gets inputs from requestors and sends out notifications to subscribers and manipulates the system clock. There is a base API defined for all of these, and an implementation layer around this that makes things happen depending on the needs of that system. Many systems that don’t need this capability may just stub it out making the operations in the API just no ops.

That’s certainly a clever approach. But how do you do state management within the modules? Do you keep a local event group around and dispatches events based on certain states if the become set?

Modules with a task inside them to implement them, just use local variables to keep track of the state of the module. If a module reaches a ‘state’ that other modules might want to know about, it just runs down its subscriber list of call-backs to let them know the event happened. Each call-back does what it’s module needs to be done to be notified of the event. That might be posting to a queue or signaling a semaphore, or setting a local variable and aborting a wait. The key primary rule is these call-backs need to be quick (though in some cases they might do the real work like update baud rate registers).

One key is that ‘Tasks’ are not the fundamental things I build with, but they are just implementation details within a module, and some modules (like low level I/O drivers) may not have a task connected to them at all.

@richard-damon That sounds like a valid pub-sub implementation.

I’ve dug down further into the RTOS kernel and discovered it’s not even too complicated to understand. Here’s an interesting read for anyone interested: Christopher Svec & Richard Barry on FreeRTOS

This was left out though but I figured it out: Each task control block that contains an item named “xEventListItem” of type xList. The list item will be put into an xList created by xEventGroupCreate() when such a call happens. This simultanously means that each task can only be contained in one event group at the time but that’s of course not a limitation as the task can only ever wait for event bits in a certain event group and not multiple.

xTaskIncrementTick() is responsible to kick the task out of its blocked state by removing it from the event list in time (if a time interval is set). So basically I have three parts:

  1. Wait for bits
  2. Set bits
  3. xTaskIncrementTick() which resumes the task.

I can control 1) and 2) as I intend to write new functions anyway. I can’t control 3) as the function doesn’t provide any hook points but it doesn’t matter as it is only responsible for resuming the task at the end of the wait period. The xList allows for each list item to contain an xItemValue of type TickType_t that I can use for my purposes. The problem is my hijacking overlaps the EventBits implementation partially. When I store a handle to my API inside xItemValue, chances are that foreign calls to xEventGroupWaitBits() will interpret my handle as bits and get confused. Also, the function triggers a configASSERT() if I accidently set one of the control bits.

So in any case, using my implementation and EventGroups must be mutualy exclusive to avoid such bugs. If any FreeRTOS maintainer/developer knows a way to integrate this smoothly im thankful. But in any other case, this would be the way I’m intending to go.