uxSchedulerSuspended in 10.2.1

LdB-ECM · December 20, 2019, 7:15am

I have an app that suspends and this has nothing to do with the problem but was just seeking clarity on this code which is very very messy.

It would appear uxSchedulerSuspended is a binary lock with only two values pdFalse and pdTrue. However the asserts are messy some use 0 some use pdFalse.

However the best bit of code is xTaskResumeAll which is bizarre here it is without the comments

configASSERT( uxSchedulerSuspended );
 	taskENTER_CRITICAL();
	{
		--uxSchedulerSuspended;
		if( uxSchedulerSuspended == ( UBaseType_t ) pdFALSE )
	        {

So we assert that uxSchedulerSuspended is not zero aka pdTrue
We enter critical section
We decrement uxSchedulerSuspended … okay so we take it to pdFalse
Then we test if uxSchedulerSuspended is pdFalse … ummm how can it not be false ???

The test is pointless and looks like it is a hangover when uxSchedulerSuspended may have been a count lock

I am also not a fan of the inc and decrement to set the value to me it would be more exact to set the value explicitly to pdTrue and pdFalse

richard-damon · December 20, 2019, 11:42am

Scheduler suspending is NOT a binary lock, but like a recursive mutex, you can call the suspend function multiple times and the scheduler doesn’t restart until you resume it the same number of time.

Calling xTaskResumeAll when the scheduler isn’t suspended is an error, as every resume must be paired with a previous suspend, but this resume might not be paired with the outermost suspend, so after decrementing uxSchedulerSuspended it might not be 0 yet, so it isn’t time to actually resume the scheduler.

LdB-ECM · December 20, 2019, 12:38pm

Then you have a problem which was discussed originally the increment isn’t atomic and there is no protection on it

void vTaskSuspendAll( void )
{
++uxSchedulerSuspended;
portMEMORY_BARRIER();
}

This was discussed 2 tasks comes in , task 1 picks up the value then context switches to second task which also picks the value up. They both increment the value to the same thing then write the same value.

So assume it was zero both see zero and increment to one.

NET result you lost one count.

In you argument that the comments refer to you argued it was binary

That is my point about the mess you can’t have it both ways if it’s binary it’s fine but the resume is crazy and it should not be increment. If it’s a count it needs protection. Its A or B and the code seem to show the same confusion between your two different answers

rtel · December 20, 2019, 3:05pm

Not zero and pdTRUE are not the same as pdTRUE is 1.

because calls to vTaskSuspendAll() can nest. From the reference manual: “Calls to vTaskSuspendAll() can be nested. The same number of calls must be made to xTaskResumeAll() as have previously been made to vTaskSuspendAll() before the scheduler will leave the Suspended state and re-enter the Active state.”

I think your post points this out itself - locking increments the lock value, unlocking decrements the lock value. The value itself is effectively accessed atomically, as described by my post in the forum archive referenced above.

LdB-ECM · December 20, 2019, 3:50pm

My feeling is your original comment is correct the value is binary and the reason it is binary is because the moment the value goes to 1 all task switches stop … that is you can’t increment beyond one unless you were stupid enough to call it again from the task that suspended which makes no sense.

That is why is doesn’t have to be atomic because in the situation above the second task would run do it’s thing then resume suspend setting the value back to zero. Then task 1 would set the value do it’s suspend.

I am pretty certain you can’t actually get a count beyond one even with the crazy increment and decrement. The value just needs to be set and cleared which will not be atomic but still work exactly like inc and decrement and the rather strange test can be removed.

Anyhow I will do some testing but I thought I would see what your thoughts were.

rtel · December 20, 2019, 4:48pm

Whether the scheduler is suspended or not is binary - it is either suspended or not suspended. The uxSchedulerSuspended variable is quite definitely not binary as it increments to keep a count of the nesting depth. It has to enable nesting because an application writer may suspend the scheduler then call a function that also suspends the scheduler.

My post in the forum archive you linked to a few posts above explains how, to an individual task, it must be and is atomic. It is not globally visible as being atomic because once the value is set no other tasks execute hence no other tasks can see it at any other value.

The paragraph above shows you can.

LdB-ECM · December 20, 2019, 6:13pm

A function doesn’t need to increment the count, it simply has to hold the entry state and set the flag to pdTrue regardless. On the exit it just looks at the initial state if it was pdFalse it has to kick the resume otherwise it doesn’t because it didn’t do the suspend. You can nest the functions without a problem.

You only suspend to do things such as GetStates, RunTime stats and whoever suspends is responsible for release as you said they are paired

By having a count you actually might encourage them to have the suspend in one code block and the release in another which is crazy dangerous as it is easy to forget the pairings.

You are the co-ordinator and you get the say on the public code but I have changed my code to binary and made it so any vTaskSuspendAll does the pairing and you pass the function to call while suspended

 void vTaskSuspendAll (void (*suspend_func) (void* param), void* param)
 {
        UBaseType_t origState = uxSchedulerSuspended;
        uxSchedulerSuspended = 1;
       if (suspend_func) suspend_func(param);
       if (origState == 0)  vTaskResume();
 }

rtel · December 20, 2019, 6:59pm

That completely changes the semantics of the function, but I don’t think I understand your suggestion. Can you explain?

My confusion comes from:

As the FreeRTOS code itself uses this function, and you have changed the function’s signature, how is your code compiling now? Did you go through and change every occurrence of this function? Seems doubtful as I don’t think your signature will work in all internal use cases.
You are storing the original state on the stack, so can’t leave vTaskSuspendAll() without loosing that state, so presume the function being protected by having the scheduler be suspended must also execute inside vTaskSuspendAll() directly? If that is the case, then is suspend_func() a pointer to the function that executes inside vTaskSuspendAll() is?
If suspend_func() is the function being protected and is executed inside vTaskSuspendAll(), then given point 1 above, how can it be limited to functions that take a void * parameter (void * not being a good thing in the first place). In most cases, and all cases in the kernel itself, void * won’t be the type of the parameter passed to suspend_func().
If suspend_func() is the function being protected and is executed inside vTaskSuspendFunc(), then how can you nest calls to vTaskSuspendAll() (if suspend_func() also calls vTaskSuspendAll() - which could be the case, especially if it is a third party function) without recursion - recursion being something that is banned in most applications that run on microcontrollers.
If suspend_func() is the function being protected, then how do you use vTaskSuspendAll() to protect short sequences of code within a function, rather than the whole function itself.

hs2 · December 20, 2019, 9:27pm

Sorry, but I also don‘t get your point Leon.
As Richard explained vTaskSuspend/ResumeAll must support nesting and
are normally used to realize critical sections in task (user) code. IHMO that‘s the only right use case
Which bigger problem do you want to solve ?

richard-damon · December 21, 2019, 3:59am

One issue is that a function that wants to suspend the scheduler may want to use another function that wants to suspend the scheduler, so you need to make the suspension allow for nesting.

There are several ways to implement this sort of nesting, one is a counter as the function currently does, another is to return a status code that is used to determine if it is time to resume. Using a counter is generally smaller if the number of resources being checked (1 in this case) is small compared to the number of places they are checked, as the counter uses 1 word per object being tracked verses 1 word per testing to store the old state.

In your method, not only do you cost a word to store the state, but you add the cost of a whole function call, including creating a parameter structure to pass data into the protected function. This also imposes a possibly unacceptable performance hit to the protected code, which while it doesn’t have the strongest requirements like code that uses a critical section based on disabling interrupts (which affects interrupt latency) it does impact high priority tasks (as it effective makes the current task as absolutely the highest priority).

Generally, I find verifying the proper pairing of the suspend and resume isn’t difficult (if it is, then your code is likely being structured in a way that insuring that suspends and resumes are paired properly isn’t your biggest worry. Yes, it enables you to write bad code, but that is the nature of the C language, you count on discipline, not handcuffs, to promote good code.

LdB-ECM · December 23, 2019, 1:34pm

You can ignore this, it is a very specific problem for me because of my hardware. I have fully tested it and my simple bool lock works perfectly and I got the protection I required. My underlying problem for me is I have 4 cores and it was very hard to protect when 2 cores could go for suspend at the same time from 2 different tasks not something that was an issue for you on a small micro but on a Raspberry Pi a very real problem.

Basically each core has a mini version of FreeRTOS with there own small task lists but there is a bigger pool of tasks that are waiting assignment to a core which runs as a task on each core so they top up on tasks based on free time.

The problem for me is suspend is not a single core function I am requiring to suspend all the cores it has to fire an IPC message which is why I was looking so hard at the function. The increment/decrement is not workable and the pairing is required to be on the same core something your current lose arrangement doesn’t work with.

In my situation I don’t care about all the stuff you see as issues because I have 4 cores suspending and waiting on the one core to do whatever it has to do while the system is suspended then resuming. I need the pairing to be much tighter than your lose arrangement.

Anyhow it’s all working exactly as I require so thanks for the help.

richard-damon · December 23, 2019, 2:01pm

Unless you do a LOT of work to adjust FreeRTOS, it is really a Single Core OS (I understand someone does have a multicore port for a particular processor out there). You can run FreeRTOS on multi-core, if each core has its own copy of FreeRTOS (so the cores don’t share things like this suspend count), and there own task lists.

The whole concept of suspending the scheduler to make a critical section falls apart in a mult-core situation with shared data, you can’t just keep the other cores running their current tasks, as those tasks might still try to access the shared data. Either to totally stop the other cores (which would be processor inefficient) or you make this form of critical section into something protected by a cross-core mutex, or maybe some other form of protection.

LdB-ECM · December 23, 2019, 2:27pm

No it really was no effort to port FreeRTOS to a multicore because you have already done things like being able to take the static words out.

All I did was put in a new option call config_FOR_MULTICORE which put a struct called coreblock around the normal static variables and set the condition that removed the static keyword so the variables neatly folded into the struct. Each task was given a new pointer to it’s owner coreblock.

Then I just create a coreblock for each core and change the idle task to load and unload tasks from a large task pool and that is basically it.

Really didn’t take much effort at all.

I should add there is a good thesis work on it

richard-damon · December 23, 2019, 3:20pm

I haven’t had time to read that paper, but based on what I know, I suspect that a port as simple as you describe is not complete. One big issue is that FreeRTOS is written in the C99 language standard, which does NOT account for multi-core issues like cache coherency. To write a C program that can work multi-cored with full definition by the language, you need to add in features of later C standards to implement memory barriers to make sure that everything is working well. (FreeRTOS’s use of volatile is NOT enough to handle a multi-core environment).

You likely have created a system that works 99.99% of the time, but still has a data race that can occasionally cause a problem when just the right sequence of events occur across the multiple cores. I note that the thesis turns off the caches which reduced the likelihood of the issue but doesn’t totally eliminate it.

Unless you have gone through the code and inserted a lot of memory barriers (which the thesis does describe but you did not mention) you are going to have issues.

From the way you are describing how you are doing things, I suspect you don’t fully understand how this works. If you are having issues with VTaskSuspendAll the way you are describing, I suspect you aren’t properly interlocking the cores. Only one core at a time should be able to affect the scheduler, and that includes the variable uxSchedulerSuspended. When you handle that issue, then your other problems should go away.

LdB-ECM · December 24, 2019, 2:18am

Memory barriers are trivial we have been dealing with them years on the Pi with Baremetal, people make mountains out of them.

It’s a simple rule
memory_barrier is already called before and after interrupt dispatch, before and after every chain of accesses to a peripheral.

I think there was something like 6 places I had to put them.

There are a couple of places that required cache operations they were more critical but again not difficult.

rtel · December 24, 2019, 5:06pm

It would be great to make your code available to the rest of the community - unfortunately the FreeRTOS interactive site is not functioning as intended at the moment, so we are looking for an alternative which may be another forum like this categorised as the Interactive site is today. In the mean time it would be great if you could post a zip of your code as an attachment to a post here, or otherwise post a link to a source repository, so it doesn’t get lost.

LdB-ECM · December 29, 2019, 1:58am

I will throw a copy up on my Github site and chuck a link

Topic		Replies	Views
implementation of vTaskSuspendAll() in 5.20, Kernel	6	206	June 12, 2009
Bug with suspended Tasks Kernel	19	312	May 3, 2011
Unbalanced uxSchedulerSuspended in x86 simulator , freeRTOS version 10.0.0 Kernel	15	339	July 28, 2019
Concerns about the atomicity of vTaskSuspendAll() Kernel	5	260	September 18, 2013
uxSchedulerSuspended Kernel	6	477	August 4, 2017

uxSchedulerSuspended in 10.2.1

Related topics