Blocking on Multiple RTOS Objects

richarddamon wrote on Friday, June 28, 2019:

Sorry for the delay, this thread is very off topic and I don’t think of it as urgent. When I build an application in a FreeRTOS like environment, all of my tasks have a specific function to do, and tend to have a very specific thing to wait on to do that task, They have no need for a WaitForAny operation. Because the whole application is inside one ‘Process’, it knows about itself and knows how to talk to the right pieces.

The only cases I can see for a WaitForAny would be in a different environment where the whole system was bigger and bigger and contained multiple applications that aren’t supposed to know about how each other works and are to be isolated from each other, In this envronment, a given application may need to wait for multiple different things, and thus perhaps there is a need for a WaitForAny.

As an example, in my library of widgets I put together to build system is a task that gathers certain envronmental attributes and determines certain condtions and broadcasts this state. In a Big OS type environment, it would need some form of configuring system for the details of how it is to work, and create some from of signalling queue that others interested in its results would listen to. Under FreeRTOS it is much simpler, I add the source file for the widget to my project, that widget includes a header file that isn’t part of its library, but part of the application where I put the parameters that adjust how the widget works. When it gets a result, it calls a function that it declares but doesn’t define (or only weakly defines) with the result, and the part of the system that wants the result defines that function. If I need that result in two or more places, I define that function to just call multiple other functions to let all the places that are interested know of the result. Much simpler code, but wouldn’t be allowed in a multi-process model without a lot of overhead, as processes can’t easily get into the internals of other processes.

There is a fundamental difference in system design. A Big OS system puts at the top of the food chain, the OS, and it assumes that there are multiple users of the system wanting to use it that don’t necesarily total trust each other so it tries to provide isolate between each other.

In FreeRTOS, at the top is the systems programmer that puts together the application that will run, ONE piece of that is the FreeRTOS kernel which is one of many tools to get the job done. It is assumed that the programmer is compitent and allows him to establish efficient channels between the pieces of the application.

As I said in the beginning, I have NEVER seen the need for a WaitForAny type operation in my FreeRTOS designs, and I suspect that your desire from it comes from a multi-process mind set, and if you did get your WaitForAny, you would suddenly find the lack of those other things suddenly being the blockers. There is a LOT of overhead to support a multi-process execution model, that overhead is inappropriate for most of the systems targeted by FreeRTOS. It might be possible to provide much of what you need in a layer above/beside FreeRTOS, just like TCP or FATFS are provided.

stackmaster wrote on Friday, June 28, 2019:

Sorry for the delay, this thread is very off topic and I don’t think of it as urgent.
No worries.
As I said in the beginning, I have NEVER seen the need for a WaitForAny type operation in my FreeRTOS designs, and I suspect that your desire from it comes from a multi-process mind set, and if you did get your WaitForAny, you would suddenly find the lack of those other things suddenly being the blockers.

Actually, the application that I have in mind is a single-process application that is useful in its own right. I just ran it on Windows 7. It is probably 2MB compiled, with run-time “private bytes” at 58MB, and working-set at 65MB. There are no other programs in my suite of applications that need to be run to make this single app useful. This application currently shows 16 threads running, and will burst to maybe 50 or 60 threads for a few seconds (have not tried yet). It is compiled from roughly 235 .cpp/.hpp files.

When I consider porting this app, I ask what exists on the target OS, what does not, whether any deficiencies in the target OS are actually deficiencies, or if the problem is actually my model, etc. If I conclude that the issue is with the target OS, and that I have no choice but to change the app, I ask whether the change would result in a true port, or a pseudo-port, after the application has been lobotomized so much that it becomes something distinct from its original incarnation.

Taking a step back and looking at the application, indeed all of the applications, and being honest with myself about whether there is something fundamental mssing in the target OS, or whether the problem is with my software architectures, I have concluded, to the best of my objectivity, that the issue is a deficiency in the target OS. Of course, this claim can only be true if the primitives that are “missing” really are theoretically-fundamental, which is why I said earlier, that I believe that WaitFormultipleObjects=xWaitForAny and friends are not "nice to haves’, but fundamental to generalized multi-threaded applications.

This is actually the crux of this thread. It’s not really my asking for these synchro functions (I am), it’s asking another question:

Given that there now exist sub-$10US CPU’s with, say, 8MB Flash/4BM RAM, that are too small for OS’s like Linux/Windows - Embedded/etc. (more or less), but big enough to run sophisticated, highly-threaded applications that normally run on Big OS’s, does there exist an OS that is small enough so as not to consume too much of the Flash/RAM of the MCU, but feature-rich enough so that generalized, multi-threaded (single-process) applications can be ported to it without overhauling the apps (supposedly theoretically-regular) architecture?

I believe that the answer, today, is “No”.

But FreeRTOS is very close.

richarddamon wrote on Friday, June 28, 2019:

Let me ask you what set of Multiple Syncronization objects does any of your Threads (to translate to Tasks) needs to wait on, that wouldn’t be amendable to useing a QueueSet. My comment is that there are usually fundamental assumptions in large applications on how the system works and the move from a Big OS environment to a small efficient environment like FreeRTOS (small might still have a lot of resources, more that the kernel doesn’t use much of them).

You say you feel it is fundamental, in my mind I can’t think of a real use for it, any ‘need’ is actually just a mispartioning of responcibilites.

Part of the issue is that at its core, FreeRTOS to achive its objective of being a good Real Time system has tight limitations on the amount of work that needs to be done to manipulate the syncronization primatives. An open ended action like a WaitForAny tends to introduce extra work that might need to be done, and might make it hard to continue to make the constraints for those parts of the system that still need the tight limits. Most Big OS systems are designed to not worry about these real time requirements, and maybe support a limited subset that is sort of real time. Perhaps if you made an effort to look at what it would cost to add something like a WhatForAny operation that still maintained the real-time restrictions on those primatives.

stackmaster wrote on Friday, June 28, 2019:

Waitable Timers.

A thread might do all of the following things simultaneously:

  • Wait on 4 Semaphores.
  • Wait on 1 Event.
  • Wait on 3 Timers.

That is a total of 8 things to wait on simultaneously in an event loop.

Naturally, this begs the question: “Should a programmer even be doing that?”

I claim that the answer is “yes”.

The mental relief that an engineer experiences under an event-loop is greater than that under call-backs or whatever other mechanisms are employed, IMO. Once a programmer gets used to this model, it becomes easier than trying to get the the format specifiers right in printf().

I do not believe that xWaitForAny wold be some whimsical nice-to-have that makes my job as a lazy programmer easier. I believe that it is fundamentally essential in the regular model for generalized multi-threading applications, in the same way that mutexes, themselves, are fundamental. I am saying that there is a reason that Dave Cutler and crew at Micosoft spent so much time crafting such a relative small number of OS functions (with huge salaries). I’m saying that Esdger Dijkstra, original author of the mutex and pioneering thinker in the area of synchronization and distributed computing, in general, would probably bless xWaitForAny as not a nice-to-have, but “fundamentally regular”. Ironically, at the time Dijkstra proposed his mutex, he received considerable push-back from other leaders in computer science, because they felt that his concepts of multi-threading were ridiculous.

It often happens in engineering that some engineers will start insisting that certain things are fundamental, fiddling with them before they become widely accepted, while other engineers remain skeptical, because the extant model has been “good enough”. While the debate is ongoing, there will be a kind of “creeping” toward the truth. Eventually, it will be discovered that, indeed, the primitive is fundamental. And then of course, 10 or 20 years later, we will all look back and think, “So…what were we debating again?” For synchronization, this is happening, but is taking too long, IMO.

Assuming that it is true that event loops are fundamental primitives of generalized multi-threading engineering, which I admit is still under debate, one might ask, “Where is the creeping?” It is libevent. There is also libuv. One can see the stumbling that occurs while said creeping is ongoing. For example, there people who are agnry with the design of the Linux equivalent of xWaitForAny, epoll. Their entire consternation can be traced back to a single design “flaw” which, unfortunately, is present in xQueueSelectFromSet(): The “flaw” is a counter-model to the following:

If it is true that waiting for multiple objects simultaneously using a single function is a fundamental primitive in the realm of generalized synchronization, then that function must force the side-effect of the successfully-triggered object to occur before the function returns.

Of course, xQueueSelectFromSet(), did not do this because it cannot, because not all elements in the queue can have their “side-effects” effected. This is true for epoll as well: Not all types of things that can be waited-on in epoll are amenable to being “side-effected”. This simple articulation of design choice has all kinds of ramifications for engineers trying to get their sychronization models right, hence the raging debate. For example, on Linux and FreeBSD, the number of waitable objecst can be on the order of 1000’s. The number of threads (Tasks) waiting can also be on the order of 10’s or 100’s. Then, when a waitable object (Semaphore/etc.) triggers, you’ll get 10’s or 100’s of threads suddenly coming awake, even though only one of them would have work to do, the so-called thundering herd problem. Not good.

You are right, however: one would need to be sensitive to the effect of grafting-on new primitives onto a kernel that many people are alredy happy with. There would be a minimal tolerance to the imposition of time and space of the new primitive on the kernel, and this cannot be taken lightly. This is why any attempt to add xWaitForAny / etc. would have to be done with a lot of discipline and understanding of context in which it would operate.

richarddamon wrote on Friday, June 28, 2019:

Ahh, that is the fundamental flaw in your design. A central core event loop is NOT a core concept in a single process real time system, but comes from the mindset of a multi-process system where something outside the process is sending the process various notifcations of things to do, and the outside world want to think of the process as a monolith.

To get into details, you have said very little about what those are. WHY are there 4 semaphores, an Event, and 3 timers all providing a single task instructions about what to do? What are those semaphore signallig? What is the Event? Why is a single Task handling things at 3 different time rates? My guess is if proper factored, there is likely a better partition where there is a seperate task for each of these syncronization source, each waiting for a single thing and doing what it is asking for.

My guess is you are deeply ingrained in a big machine model, and think that it is simple to just move it into a smaller machine. My experiance is that generally there are fairly significant assumption built into the design that really do depend on some of the heavy machinery of the big machine, and it really is much hard to port to the small environment.

stackmaster wrote on Friday, June 28, 2019:

But I do not regard FreeRTOS as necessarily being a “real-time” OS, despite the “R” and “T” in “FreeRTOS” staring me squarely in the face. The key word is “necessarily”. Given FreeRTOS’s architecture, it is no more “necessary” to regard it as a real-time OS than it is to regard, necessarily, a Ford F-350 Super Duty pick-up truck as a towing vehicle. Both, given their architectures, can be multi-purposed.

The question then becomes one of reasonableness:

Is it reasonable to take the wife and kids out on a date in a Ford F-350? Sure. No problem. Is that why Ford created the F-350? No. Is it reasonable to add side-steps to make it easier for the little ones to climb-up? Yes. Does that hurt the overall purpose of the vehicle if Ford were to add that side-steps to all F-350’s? No.

I do not see the “much harder to port to a smaller environment”. One might write the following code, then ask if it is portable:

int x = 0;

The answer had better be “yes”.

Then one might ask again for:

struct Foo
{
int x;
int y;
} ;

Again, the answer had better be yes.

One can play this game for the entirety of a codebase, where we eyeball each line of code, and ask ourselves, objectively, “Is this code truly portable?” If the code had been written with portability in mind from the outset, the answer will continue to be “yes”.

There will eventually come a point where the answer is not only “no”, but there is nothing that the programmer can do to make it portable, because the code under consideration is inherently non-portable. We then focus on those lines of code and ask:

  • How much of this non-portable code is there in the entire application?
  • What is its nature? I/O? Sychronization? IPC? Harware control? Machine identification?

In my case, the answers are, respectively:

  • <5%
  • synchronization

Sure, there are issues with C++ RTTI, exception handling, etc…but these are relatively trivial. xWaitForAny is not trivial, and cannot be circumvented in my model.

This gets us back to the sub-$10 MCU. The question becomes:

Is there a market for a multi-tasking OS that can run Big Apps on a sub-$10 MCU.

I claim that the answer is “yes”.

Is there a multi-tasking OS that can run these Big Apps under a fully generalized multi-threading model on such an MCU?

I claim that the answer is “no”.

In light of what OS’s are available, does it make sense to write one from scratch?

No.

Why?

Because FreeRTOS is already very close to what would be needed.

richarddamon wrote on Friday, June 28, 2019:

To work your analogy, adding steps probably doesn’t impact its usability as the pickup truck it was designed as. Removing the bed in the back and replacing it with more seats probably does, even if some people think, I don’t use the bed, I just use it to tow my boat. It might make sense for a one off customization, but not something Ford would do to the base design.

I suspect that adding a fully general WaitForAny might approach that latter. YOU don’t think of FreeRTOS as a Real Time system, and likely don’t have much in the way of real Real Time requirements. I can’t see something being added to the basic FreeRTOS design that breaks the fundamental RT design requirements. At first glance, I suspect that a fully generic WaitForAny is apt to break the Strictly Fixed Maximum Time rule that the core primatives have. QueueSets have a number of fairly specific limitations just because they were needed to maintain those requireements, and EventGroups needed a slightly inefficient design to maintain those requirements, with some some requirements on the priority of the Timer/SysReq Task.

You are of course free to fork off the basic design and create a FreeOS project that meets your goals (just make sure you observe the licence of the pieces you use),