I understand binary semaphores in RTOS in below.
A task takes a semaphore, uses a shared resource, and then gives the semaphore back.
If another task tries to take the semaphore while it’s already taken, it gets blocked until the semaphore is released.
This makes sense to me for protecting a shared resource.
But this is where I get confused.
People often say that one task can take a semaphore and a different task can give it
That doesn’t make sense to me.
If T1 takes the semaphore and is using the resource, how is it safe for T2 to give (release) that same semaphore when T2 never took it in the first place?
It feels like T2 is releasing something it doesn’t own, which sounds wrong.
I feel like I’m missing the correct mental model of what a semaphore really represents.
what you have in mind is a mutex. Semaphores do not track ownership.
Very common misunderstanding, among other things stemming from the historical confusion that 90% of all syncronization use cases are mutual exclusion but semaphores, being the grandfather implementation of syncronization , are generally used as the placeholder for everything sync.
Actually, your own rephrasing of priority inversion in the other thread you opened was pretty accurate, and we discussed muteces vs. bin sempahores there extensively, what is left there to discuss?
I realize now that my confusion came from trying to understand semaphores using the take, use, give pattern. That pattern is actually correct for mutexes, because mutexes imply ownership: the same task that takes the mutex must use the shared resource and then give the mutex back.
I believe this does not apply to semaphores. I followed it mainly because many places use terminology like take semaphore and give semaphore
That is correct. The poster use case for binary sempahores is irq signalling. A processor task sits in its infinite loop waiting for an irq to signal reception of, say, an incoming character over a communication device. Once the task receives the signal, the task can then do non time critical post processing of the input and return to is waiting state. That way, short time critical processing (the irq typically copies the charcter from the hardware device to a buffer) and less time critical processing are separated from each other.
Until task signals were introduced to FreeRTOS, that signalling was typically realized with binary semaphores.
yes, that is exactly what I tried to express earlier when I wrote “Very common misunderstanding, among other things stemming from the historical confusion that 90% of all syncronization use cases are mutual exclusion but semaphores, being the grandfather implementation of syncronization, are generally used as the placeholder for everything sync.”
I have attempted a few times to lobby for renaming the generic functions for exactly that reason, but to no avail.
Actually, if you look at the history of the term, a “Semaphore” was something that signaled “something”, perhaps one of the earliest was a set of physical flags used for signaling. A specific flags was “raised” to send a specific message.
It was later adopted by the railroads as a way to signal that the track was “clear” to proceed, and that became its use for access control.
Thus, it is a good term for the generic functions for signaling. When the priority inversion problem was discovered, the concept needed an improvement, from which the Mutex, as a derivative of the Semaphore was developed. This was not needed for that original use (railroad) as “priority” control was maintained outside the semaphore system.
I would say the problem is more that too many “basic texts” talk about semaphores in the access control application, where you really want the Mutex, thinking it is a simpler and easier to understand case.
Added: Thinking of Semaphores as a signaling device shows why it made sense to build Semaphores as special cases of queues (which just have no data). If Queues were made as an extension to the Semaphore, rather than Semaphores being a special case could have made the Semaphore Object smaller, but might have increased the code or complexity
I disagree (obviously), but I don’t feel strong enough about it to waste time on an extended discussion. Let’s leave it at that there are plausible arguments on both sides of the debate.
I think my confusion came from how I was trying to understand semaphores.
Initially, I was following a pattern like take → use → give, which made sense to me because it looks similar to how we protect a shared resource. In my mind, a task takes a semaphore, uses the resource, and then gives it back, and other tasks block meanwhile. That felt logical.
But based on the discussion, I now feel that this pattern actually applies more to a mutex, where ownership matters and the same task is expected to release what it acquired.
For semaphores, it seems like there’s a different model, more like wait → signal. One task waits for something to happen (so it blocks), and another task detects that event and signals it by giving the semaphore. In that case, the task giving the semaphore is not “releasing” something it owns, but just notifying that an event occurred.
I think I got confused because of the terminology like take semaphore and give semaphore, which sounds very similar to lock/unlock
Part of the issue is that a “Semaphore” is a very generic concept, and there isn’t a single usage pattern to describe its use. Note, that a “Mutex” is really just a special case of a semaphore, with a limited usage pattern, and that limitation allows it to keep track of its “owner”, and that allows it to handle the priority inversion problem when used in a computing system.
SOMETIMES you have a A waits, B signals, A resumes pattern with a semaphore, but that isn’t the only pattern you can use. For instance, a system with 5 machines available, but only the “power” to run 3 of them at once, might use a counting semaphore with a take → use → give pattern. This might not need the features of a Mutex, as you don’t have an “inversion” problem, as your usage isn’t limited by the OTHER limited resource of CPU time.
Yes, take → use → give is a valid pattern for a counting semaphore to limit how many tasks can be be “active” at a time. You do still need to worry about possible priority inheritance issues if CPU availability can be a concern.
yes, that is a valid usage pattern (applicable for example when you design a web server based on the one-task-per-client control flow but need to restrict the number of concurrent clients to be served to n and do not wish to rely on the tcp listen backlog to implement this).
But note that for that usage pattern, again, the OS does NOT support enforcing the implied contractual “same task that claimed it must release the semaphore,” so if you do not code that system carefully to follow that pattern, you may run into subtle concurrency issues. In some situations that can be an advantage - in the web server, for example, if you have a “supervisor task” that detects client connections getting stuck, that supervisor then has the option to clean up the connection, which can include releasing the semaphore on behalf of the stale client service task.
But again, that is only one possible usage pattern for semaphores, the concept is very generic and can thus be employed by many different patterns.
yes, again, this is exactly why I don’t like the names of these functions. I also disagree with @richard-damon that mutexes are “special cases of sempahores,” I think they are not, but we have discussed this in deep bloody detail in the past without a winner, so repetitur non placent.
I’m confused about what “active at the same time” actually means.
Task A waits (take)
Task B signals (give)
My understanding so far is that binary semaphores are mainly used for signaling where one task waits for an event and another task gives the signal.
For example, suppose I have 5 tasks: A, B, C, D, E. I want only tasks B, C, and D to be limited using a counting semaphore, while A and E run freely. using counting semaphore only 3 tasks are allowed to run but only one of them can be run at one time and other has to wait because rule say only one task can run at time
In this case:
What does “active at the same time” mean?
How exactly does the semaphore restrict only B, C and D?
I’m missing the conceptual difference between using a semaphore for signaling vs using it to limit.