FreeRTOS vTaskSuspendAll() is not thread safe and uxSchedulerSuspended variable becomes zero and asserts error

I am using imxRT1050 with FreeRTOS and the SDK version is SDK 2.13.0. Have a problem with vTaskResumeAll as it is getting stuck in assert associated with variable “uxSchedulerSuspended”

vTaskSuspendAll is not thread safe. Below is the image that contains the disassembly of the code generated by vTaskSuspendAll and also the snapshot of the code.

If you look at the code generated by the assembler for “uxSchedulerSuspended++”, the variable address is loaded on to register r3, then the value at that address is loaded on to r3 and then value is incremented in r3 and then the incremented value is loaded on to memory. Imaging if this gets preempted by another thread, when the control was in “adds r3,#1”, that thread can again read a 0 from the address load it into r3. Now r3 gets modified by this thread and when the other thread resumes, it will not see the modified value, as it has to execute from next line to “adds r3,#1

As the comment suggests, did you read this response from Richard - Concerns about the atomicity of vTaskSuspendAll() - FreeRTOS

There is still a potential problem i beleive, if my understanding is correct

Imagine Task 1 executes “ldr r3, [pc,#16]” and after reading that contex switch happens. Now r3 is loaded with the address of variable uxSchedulerSuspended.

Look into the above assembler instruction snapshot that i have copied
1. Consider uxSchedulerSuspended is at location 0x4000 and the current value is 0
2. Task 1 executes “ldr r3, [pc,#16]” . Register value r3 is now 0x4000
3. Context switch happens now
4. Task 2 executes all the 5 instructions. After that r3 is set to 1, as it increments the value
0 and also stores back the value to the variable location when “Str r3,[r2.#0]” is executed
5. Task 1 is resumed from the point where the context switch happened
6. First instruction it sees is “ldr r3,[r3,#0]”. At this time r3 is set to 1 by task 1. Now it tries
to access an invalid address when this instruction starts executing.

After the variable is incremented by task 2, RTOS scheduler will be blocked and until xTaskResumeAll() is called and completed, task 1 will not be executed. But by then, i dont think the register r3 would have a valid address when task 1 is executed from the point it was preempted.

As explained by Richard

can only happen if the scheduler is NOT longer suspended and on context switch the task context including registers is restored.

You need to understand that the context switch can only happen when the scheduler is NOT suspended (i.e. uxSchedulerSuspended is 0). Therefore, in the sequence your provided, step 5 will not happen until the task 2 resumes the scheduler i.e. sets uxSchedulerSuspended to 0.

Again i am still confused by the answers, or i am not understanding what you have said.
Task1 has not yet incremented the count (uxSchedulerSuspended) yet as only the first assembly instruction “ldr r3, [pc,#16]” is executed for task 1. Task 1 is preempted during vTaskSuspendAll() and task 2 gets executed. Task 2 increments uxSchedulerSuspended to 1 by executing all the 5 instructions . Now the scheduler is blocked due to the increment until xTaskResumeAll() is called for task 2. After xTaskResumeAll() for task 2 is called, scheduler is unblocked by decrementing uxSchedulerSuspended , as this variable was incremented by task2’s vTaskSuspendAll() . Now this can return back to task 1 and that’s when the problem would arise where it resumes execution from the second instruction “ldr r3 [ r3, #0]” as we dont know what would be in r3. I have also looked at the function that saves the registers during context switch. I dont think r3 is being saved before restoring it. if you look at the image below, you can see r3 stores the current location of the tcb, before even r3 register is saved

.

That is not right - r3 does get restored correctly. You do not see register r3 in the assembly code because it is stored by the hardware automatically on the task’s stack and later restored correctly during context restore from the task’s stack. The hardware stores r0, r1, r2, r3, r12, LR, PC and xPSR registers automatically before entering the ISR.

ok. If these registers are saved into the stack by the hardware, then no issues. It should restore the values correctly. THen i need to check why the variable became 0 in my case and got stuck in assert when it called xTaskResumeAll()
THanks!!

Often corupted (FreeRTOS internal) data is caused by stack overflows.
Or application bugs, of course :wink:
Did you define configASSERT and enabled stack checking ?

Thankyou Hartmut and Gaurav. Here i have provided the snapshot of the call stack, fault registers, disassembly during exception. We are using Crank GUI and their libraries. Thread related to Crank is causing the exception. Do you have any idea what could be causing this, by looking at the snapshot. MOst of the time we get this exception when it is exiting critical section ( taskEXIT_CRITICAL() )of xTaskResumeAll() function is called

What is this function _pvHeapStart? Did you try to find the exact assembly instruction that is faulting? Have you defined configASSERT and enabled stack overflow checking as suggested by @hs2 ?

I have enabled configASSERT and stack overflow macros as suggested and _pvHeapStart is coming from the library. This is what i see in the map file and it is basically the heap start location. I do not see _pvHeapStart function anywhere in the map file. THis is part of newlib library and the file associated is _cr_sbrk.c
.heap 0x80e46e28 0x800000
0x80e46e28 _pvHeapStart = .
0x81646e28 . = (. + _HeapSize)
fill 0x80e46e28 0x800000
0x81646e28 . = ALIGN (0x4)
0x81646e28 _pvHeapLimit = .
0x0000f000 _StackSize = 0xf000

Also in the attached image you can see the exact assembly instruction that is faulting.
Sorry . That image does not have the assembly for _PvHeapStart()

The _sbrk implementation is and can’t be part of the newlib C-library. Someone has provided it in _cr_sbrk.c. But I think that’s not the problem guessing it works.
So you’re using the newlib std. heap in your code (malloc/free which are not thread safe without implementing the __malloc_lock/unlock hooks, or did you ?) and for FreeRTOS the heap_3.c (wrapping newlib malloc/free to make it thread safe) memory manager ?

Malloc_lock() and unlock are implemented. Also freeRTOS heap3.c is being used. But the Crank libraries provided by the vendor isnt using the heap3.c. It uses the malloc directly and uses malloc_lock /unlock to keep it thread safe

void __malloc_lock(struct _reent *reent) {
	vTaskSuspendAll();
}

void __malloc_unlock(struct _reent *reent) {
	(void)xTaskResumeAll();
}

void __env_lock() {
	vTaskSuspendAll();
}

void __env_unlock() {
	(void)xTaskResumeAll();
}

Also below is what i have seen in linker. -cr_sbrk.c looks to be part of newlib

_pvHeapLimit                                      c:/nxp/mcuxpressoide_11.6.1_8255/ide/plugins/com.nxp.mcuxpresso.tools.win32_11.6.1.202207200623/tools/bin/../lib/gcc/arm-none-eabi/10.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m+dp/hard\libcr_newlib_semihost.a(_cr_check_heap.o)
_pvHeapStart                                      c:/nxp/mcuxpressoide_11.6.1_8255/ide/plugins/com.nxp.mcuxpresso.tools.win32_11.6.1.202207200623/tools/bin/../lib/gcc/arm-none-eabi/10.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m+dp/hard\libcr_newlib_semihost.a(_cr_sbrk.o)

below is the image of the newlib library selected
image

Ok - memory management seems to be correct. That’s very good.
BTW since the locks are implemented you can omit using heap_3.c wrapper and
simply define these macros e.g. in your FreeRTOSConfig.h:

#define pvPortMalloc        malloc
#define vPortFree           free

I agree that the wrapper is not required. Just curious to know whether having the wrapper would causes any issues?

Shouldn’t cause any issue since vTaskSuspend/ResumeAll can nest. It’s just bit more lean.
@RAc There was typo - ‘All’ was missing: vTaskSuspend/ResumeAll are fine as global mutex for longer code sequences.

I suggest trying a recursive (!) mutex to implement malloc_lock just to see if it makes a difference. As has been mentioned many times, task suspension has its problems. I know from my own experience that the mutex solution works reliably.

One advantage of the wrappers are that the malloc fail hook gets implemented, which isn’t for a direct call to malloc.