FreeRTOS vTaskSuspendAll() is not thread safe and uxSchedulerSuspended variable becomes zero and asserts error

Eugin · March 13, 2023, 10:11am

I am using imxRT1050 with FreeRTOS and the SDK version is SDK 2.13.0. Have a problem with vTaskResumeAll as it is getting stuck in assert associated with variable “uxSchedulerSuspended”

vTaskSuspendAll is not thread safe. Below is the image that contains the disassembly of the code generated by vTaskSuspendAll and also the snapshot of the code.

If you look at the code generated by the assembler for “uxSchedulerSuspended++”, the variable address is loaded on to register r3, then the value at that address is loaded on to r3 and then value is incremented in r3 and then the incremented value is loaded on to memory. Imaging if this gets preempted by another thread, when the control was in “adds r3,#1”, that thread can again read a 0 from the address load it into r3. Now r3 gets modified by this thread and when the other thread resumes, it will not see the modified value, as it has to execute from next line to “adds r3,#1”

aggarg · March 13, 2023, 11:20am

As the comment suggests, did you read this response from Richard - Concerns about the atomicity of vTaskSuspendAll() - FreeRTOS

Eugin · March 13, 2023, 12:57pm

There is still a potential problem i beleive, if my understanding is correct

Imagine Task 1 executes “ldr r3, [pc,#16]” and after reading that contex switch happens. Now r3 is loaded with the address of variable uxSchedulerSuspended.

Look into the above assembler instruction snapshot that i have copied
1. Consider uxSchedulerSuspended is at location 0x4000 and the current value is 0
2. Task 1 executes “ldr r3, [pc,#16]” . Register value r3 is now 0x4000
3. Context switch happens now
4. Task 2 executes all the 5 instructions. After that r3 is set to 1, as it increments the value
0 and also stores back the value to the variable location when “Str r3,[r2.#0]” is executed
5. Task 1 is resumed from the point where the context switch happened
6. First instruction it sees is “ldr r3,[r3,#0]”. At this time r3 is set to 1 by task 1. Now it tries
to access an invalid address when this instruction starts executing.

After the variable is incremented by task 2, RTOS scheduler will be blocked and until xTaskResumeAll() is called and completed, task 1 will not be executed. But by then, i dont think the register r3 would have a valid address when task 1 is executed from the point it was preempted.

hs2 · March 13, 2023, 1:13pm

As explained by Richard

can only happen if the scheduler is NOT longer suspended and on context switch the task context including registers is restored.

aggarg · March 13, 2023, 3:33pm

You need to understand that the context switch can only happen when the scheduler is NOT suspended (i.e. uxSchedulerSuspended is 0). Therefore, in the sequence your provided, step 5 will not happen until the task 2 resumes the scheduler i.e. sets uxSchedulerSuspended to 0.

Eugin · March 13, 2023, 5:20pm

Again i am still confused by the answers, or i am not understanding what you have said.
Task1 has not yet incremented the count (uxSchedulerSuspended) yet as only the first assembly instruction “ldr r3, [pc,#16]” is executed for task 1. Task 1 is preempted during vTaskSuspendAll() and task 2 gets executed. Task 2 increments uxSchedulerSuspended to 1 by executing all the 5 instructions . Now the scheduler is blocked due to the increment until xTaskResumeAll() is called for task 2. After xTaskResumeAll() for task 2 is called, scheduler is unblocked by decrementing uxSchedulerSuspended , as this variable was incremented by task2’s vTaskSuspendAll() . Now this can return back to task 1 and that’s when the problem would arise where it resumes execution from the second instruction “ldr r3 [ r3, #0]” as we dont know what would be in r3. I have also looked at the function that saves the registers during context switch. I dont think r3 is being saved before restoring it. if you look at the image below, you can see r3 stores the current location of the tcb, before even r3 register is saved

.

aggarg · March 13, 2023, 5:30pm

That is not right - r3 does get restored correctly. You do not see register r3 in the assembly code because it is stored by the hardware automatically on the task’s stack and later restored correctly during context restore from the task’s stack. The hardware stores r0, r1, r2, r3, r12, LR, PC and xPSR registers automatically before entering the ISR.

Eugin · March 13, 2023, 5:35pm

ok. If these registers are saved into the stack by the hardware, then no issues. It should restore the values correctly. THen i need to check why the variable became 0 in my case and got stuck in assert when it called xTaskResumeAll()
THanks!!

hs2 · March 13, 2023, 6:29pm

Often corupted (FreeRTOS internal) data is caused by stack overflows.
Or application bugs, of course
Did you define configASSERT and enabled stack checking ?

Eugin · March 16, 2023, 8:18am

Thankyou Hartmut and Gaurav. Here i have provided the snapshot of the call stack, fault registers, disassembly during exception. We are using Crank GUI and their libraries. Thread related to Crank is causing the exception. Do you have any idea what could be causing this, by looking at the snapshot. MOst of the time we get this exception when it is exiting critical section ( taskEXIT_CRITICAL() )of xTaskResumeAll() function is called

aggarg · March 16, 2023, 9:20am

What is this function _pvHeapStart? Did you try to find the exact assembly instruction that is faulting? Have you defined configASSERT and enabled stack overflow checking as suggested by @hs2 ?

Eugin · March 16, 2023, 9:32am

I have enabled configASSERT and stack overflow macros as suggested and _pvHeapStart is coming from the library. This is what i see in the map file and it is basically the heap start location. I do not see _pvHeapStart function anywhere in the map file. THis is part of newlib library and the file associated is _cr_sbrk.c
.heap 0x80e46e28 0x800000
0x80e46e28 _pvHeapStart = .
0x81646e28 . = (. + _HeapSize)
fill 0x80e46e28 0x800000
0x81646e28 . = ALIGN (0x4)
0x81646e28 _pvHeapLimit = .
0x0000f000 _StackSize = 0xf000

Eugin · March 16, 2023, 9:35am

Also in the attached image you can see the exact assembly instruction that is faulting.
Sorry . That image does not have the assembly for _PvHeapStart()

hs2 · March 16, 2023, 9:45am

The _sbrk implementation is and can’t be part of the newlib C-library. Someone has provided it in _cr_sbrk.c. But I think that’s not the problem guessing it works.
So you’re using the newlib std. heap in your code (malloc/free which are not thread safe without implementing the __malloc_lock/unlock hooks, or did you ?) and for FreeRTOS the heap_3.c (wrapping newlib malloc/free to make it thread safe) memory manager ?

Eugin · March 16, 2023, 9:47am

Malloc_lock() and unlock are implemented. Also freeRTOS heap3.c is being used. But the Crank libraries provided by the vendor isnt using the heap3.c. It uses the malloc directly and uses malloc_lock /unlock to keep it thread safe

void __malloc_lock(struct _reent *reent) {
	vTaskSuspendAll();
}

void __malloc_unlock(struct _reent *reent) {
	(void)xTaskResumeAll();
}

void __env_lock() {
	vTaskSuspendAll();
}

void __env_unlock() {
	(void)xTaskResumeAll();
}

Also below is what i have seen in linker. -cr_sbrk.c looks to be part of newlib

_pvHeapLimit                                      c:/nxp/mcuxpressoide_11.6.1_8255/ide/plugins/com.nxp.mcuxpresso.tools.win32_11.6.1.202207200623/tools/bin/../lib/gcc/arm-none-eabi/10.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m+dp/hard\libcr_newlib_semihost.a(_cr_check_heap.o)
_pvHeapStart                                      c:/nxp/mcuxpressoide_11.6.1_8255/ide/plugins/com.nxp.mcuxpresso.tools.win32_11.6.1.202207200623/tools/bin/../lib/gcc/arm-none-eabi/10.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m+dp/hard\libcr_newlib_semihost.a(_cr_sbrk.o)

below is the image of the newlib library selected

hs2 · March 16, 2023, 10:09am

Ok - memory management seems to be correct. That’s very good.
BTW since the locks are implemented you can omit using heap_3.c wrapper and
simply define these macros e.g. in your FreeRTOSConfig.h:

#define pvPortMalloc        malloc
#define vPortFree           free

Eugin · March 16, 2023, 10:12am

I agree that the wrapper is not required. Just curious to know whether having the wrapper would causes any issues?

hs2 · March 16, 2023, 11:26am

Shouldn’t cause any issue since vTaskSuspend/ResumeAll can nest. It’s just bit more lean.
@RAc There was typo - ‘All’ was missing: vTaskSuspend/ResumeAll are fine as global mutex for longer code sequences.

RAc · March 16, 2023, 11:32am

I suggest trying a recursive (!) mutex to implement malloc_lock just to see if it makes a difference. As has been mentioned many times, task suspension has its problems. I know from my own experience that the mutex solution works reliably.

richard-damon · March 16, 2023, 3:14pm

One advantage of the wrappers are that the malloc fail hook gets implemented, which isn’t for a direct call to malloc.