Shared resource between task and ISR

Sorry Robert, but we have to be careful here not to forget the big picture and getting lost in details, in particular since the TO is a newbie to concurrent programming.

In the sample code you provide, no one ever expects the two globals to be equal because there is no requirement or provision that the sequence of the two increments is uninterruptable. The two tasks run concurrently with no dependency whatsoever, so I would expect the two variables to deviate at any time.

What you probably want to point out here is that it is unsafe to do the increment on uintCounter concurrently without atomicity (or exclusive) constraints which is true (but there would not be a need for two variables to prove this), but I do not see how ordering and barriers would bring us any advantage over simple exclusive access primitives (whether hardware/compiler/platform supported or via muteces).

Or does the declaration of your uint_least32CounterAtomic variable as “atomic” imply any side effects on the previous auto increment to uintCounter? If so, it would surprise me quite a bit as that would violate many rules.

Yes, by declaring it as an atomic data type, all access is atomic as it should be and as it’s defined here[1], assuming the C compiler and the C library work as expected :wink:

[1] Atomic types - cppreference.com

But in your code sample, you expect uninterruptability (atomicity) between the assignments to two variables of which only one is declared as atomic? Or am I missing something here?

No, I don’t. I want to show that you should not assume that 32-bit operations are atomic out of the box. The atomic one is supposed to be the “correct” value.

but in my understanding, your code does not prove any of it. What prevents the sequence

t1 increments uintcounter to 2234
t1 increments uint_least32CounterAtomic to 2234
======> t2 increments uintcounter to 2235
t1 compares the two and finds a difference

from happening, regardless of what is declared as atomic or not?

My points here are:

  1. Your code example (or your expectations on its behavior) assumes an uniterruptable sequencing (ordering) of two independent auto increments. I am not sure what that is supposed to prove about atomicity.
  2. Your previous explanations refer to barriers and their relationship. I can see how that can make a difference with respect to very low level coding (for example pipeline ordering), but not to exclusive access enforcement on an RTOS application level.

Or do you say that your above code sample shows different results when both variables are declared atomic?

I didn’t copy all the code of my sample :wink:

Task1 and Task2 don’t do the same thing in my real code. :wink: You are right. I will need to come up with a better test case to show potential problems with atomicity and execution order. :wink:

1 Like

Here is an example I just cooked up, and might show better what I mean.

/* FreeRTOS.org includes. */
#include "FreeRTOS.h"
#include "task.h"
#include "semphr.h"
#include <stdatomic.h>

/* Demo includes. */
#include "basic_io.h"
#include "leds.h"
#include <stdio.h>

/* USE_STDATOMIC default is 0 */
#define USE_STDATOMIC 0
/* MAKE_CNT_ATOMIC default is 0 */
#define MAKE_CNT_ATOMIC 0

/* USE_STDATOMIC 0, MAKE_CNT_ATOMIC 0
 * cnt is a non atomic counter
 * acnt is protected by taskENTER_CRITICAL/taskEXIT_CRITICAL
 *
 * Safe1 Task2 1000
 * Unsafe Task4 1466
 * Unsafe Task6 2466
 * Unsafe Task7 3466
 * Unsafe Task3 1000
 * Safe1 Task5 2000
 * Task1 cnt 1000  - wrong - order of execution problem
 * Task1 acnt 2000 - correct
 */

/* USE_STDATOMIC 0, MAKE_CNT_ATOMIC 1
 * cnt is an atomic data type
 * acnt is protected by taskENTER_CRITICAL/taskEXIT_CRITICAL
 *
 * Safe1 Task2 1000
 * Unsafe Task4 1295
 * Unsafe Task6 2295
 * Unsafe Task7 3295
 * Unsafe Task3 4000
 * Safe1 Task5 2000
 * Task1 cnt 4000  - correct
 * Task1 acnt 2000 - correct
 *
 */

/* USE_STDATOMIC 1, MAKE_CNT_ATOMIC 1
 * cnt is an atomic data type
 * acnt is an atomic data type
 *
 * Safe1 Task2 1000
 * Unsafe Task3 1000
 * Safe1 Task5 2000
 * Unsafe Task6 3000
 * Unsafe Task7 4000
 * Unsafe Task4 4000
 * Task1 cnt 4000  - correct
 * Task1 acnt 2000 - correct
 */

#if USE_STDATOMIC > 0
atomic_int acnt;
#else
/* Manual atomic implementation using critical sections */
volatile uint32_t acnt = 0;
#endif

#if MAKE_CNT_ATOMIC > 0
atomic_int cnt;
#else
/* The non-atomic counter (prone to race conditions) */
volatile uint32_t cnt = 0;
#endif

void vTaskFunctionUnsafe(void* pvParameters)
{
	char *pcTaskName;
	pcTaskName = ( char * ) pvParameters;

	for (int n = 0; n < 1000; ++n)
    {
        /* Non-atomic increment (race condition possible) */
#if MAKE_CNT_ATOMIC > 0
		/* Non-atomic increment (race condition possible) */
		/* but should work if defined as atomic */
		++cnt;
        //atomic_fetch_add(&acnt, 1);
        // relaxed memory order equivalent:
        // atomic_fetch_add_explicit(&acnt, 1, memory_order_relaxed);
#else
		/* Non-atomic increment (race condition possible) */
        ++cnt;
#endif

    }
	   vPrintString( "Unsafe " );
	   vPrintStringAndNumber( pcTaskName, cnt );
    /* In FreeRTOS, tasks usually run indefinitely or delete themselves */
	   vTaskDelete(NULL);
}

void vTaskFunctionSafe1(void* pvParameters)
{
	char *pcTaskName;
	pcTaskName = ( char * ) pvParameters;

	for (int n = 0; n < 1000; ++n)
	{

        /* Atomic increment using stdatomic.h or critical sections */
#if USE_STDATOMIC > 0
		/* acnt is an atomic data type */
        ++acnt;
        //atomic_fetch_add(&acnt, 1);
        // relaxed memory order equivalent:
        // atomic_fetch_add_explicit(&acnt, 1, memory_order_relaxed);
#else
        /* Use a critical section for manual atomicity */
        taskENTER_CRITICAL();
        ++acnt;
        taskEXIT_CRITICAL();
#endif
    }
	   vPrintString( "Safe1 " );
	   printf("%s %u\n", pcTaskName, acnt);

	   /* In FreeRTOS, tasks usually run indefinitely or delete themselves */
    vTaskDelete(NULL);
}

void vTaskFunctionPrinter(void* pvParameters)
{

	char *pcTaskName;
	pcTaskName = ( char * ) pvParameters;

	/* I know it's ugly just to wait, but that's for simplicity */
	vTaskDelay(pdMS_TO_TICKS(1000));
	    printf("%s cnt %u\n", pcTaskName, cnt);
	    printf("%s acnt %u\n", pcTaskName, acnt);
	    vTaskDelete(NULL);
}

void vStartAtomicTest( void ) {
    xTaskCreate( vTaskFunctionPrinter,  "Task1", 240, (void*)"Task1", tskIDLE_PRIORITY + 1, NULL );
    xTaskCreate( vTaskFunctionSafe1, "Task2", 240, (void*)"Task2", tskIDLE_PRIORITY + 1, NULL );
    xTaskCreate( vTaskFunctionUnsafe,  "Task3", 240, (void*)"Task3", tskIDLE_PRIORITY + 1, NULL );
    xTaskCreate( vTaskFunctionUnsafe, "Task4", 240, (void*)"Task4", tskIDLE_PRIORITY + 1, NULL );
    xTaskCreate( vTaskFunctionSafe1,  "Task5", 240, (void*)"Task5", tskIDLE_PRIORITY + 1, NULL );
    xTaskCreate( vTaskFunctionUnsafe, "Task6", 240, (void*)"Task6", tskIDLE_PRIORITY + 1, NULL );
    xTaskCreate( vTaskFunctionUnsafe, "Task7", 240, (void*)"Task7", tskIDLE_PRIORITY + 1, NULL );
}

int main( void ) {
	/* Init the semi-hosting. */
	printf( "\n" );

	vStartAtomicTest();

	/* Start the scheduler so our tasks start executing. */
	vTaskStartScheduler();

	/* If all is well we will never reach here as the scheduler will now be
	running.  If we do reach here then it is likely that there was insufficient
	heap available for the idle task to be created. */
	for( ;; );
	return 0;
}


/*-----------------------------------------------------------*/

void vApplicationMallocFailedHook( void )
{
	/* This function will only be called if an API call to create a task, queue
	or semaphore fails because there is too little heap RAM remaining. */
	for( ;; );
}
/*-----------------------------------------------------------*/

void vApplicationStackOverflowHook( TaskHandle_t xTask, char *pcTaskName )
{
	/* This function will only be called if a task overflows its stack.  Note
	that stack overflow checking does slow down the context switch
	implementation. */
	for( ;; );
}
/*-----------------------------------------------------------*/

void vApplicationIdleHook( void )
{
#if PRINTOUTS == 1
	/* This example does not use the idle hook to perform any processing. */
	printf("Idle Hook\n");
#endif
#if LEDS == 1
	led_blue_invert();
#endif
}
/*-----------------------------------------------------------*/

void vApplicationTickHook( void )
{
	/* This example does not use the tick hook to perform any processing. */
	/* printf("Tick Hook\n"); */
}

I still do not see what that code is supposed to prove. It is wrong code in the sense of concurrent programming, so it is expected to fail.

Your vTaskFunctionUnsafe performs a non-atomic preincrement on cnt, and all four tasks that run that task function execute concurrently. To me that is a trivial poster case of a race condition that is bound to lead to unexpected results. Would wrapping the icrements to cnt into a critical section (or in this case even more naturally a mutex) not resolve the issue, as your “safe” tasks prove? In other words, making accesses to cnt atomic either on the compiler level or the run time level will be acceptable and equally working solutions to that concurrency problem. I do not see what barriers have to do with that and why compiler level atomicity should be superior or less error prone than either pure exclusive access (ie implementing the increment via ldrex and strex without barriers) or run time level using OS primitives.

Of course there are use cases and justifications for barriers, but I have serious doubts that any application level developer will ever stumble across one of those. More likely a subtle programming error or oversight in the implementation of an atomic function on the compiler level (where, say, in an implementation of a complex atomic function a barrier is indeed missing) may lead to unexplainable behavior from the point of view of an application developer that can not be unearthed without knowledge of the processor architecture, the compiler and a scrutiny of machine code. For those cases I’d rather stick to a mutex in your example.

Also note that although there ARE cases in which concurrency issues can be addressed by atomic/exclusive accesses to individual identifiers (and sometimes more complex use cases like ring buffers can be reduced to those exclusive accesses to, say, buffer indices), the much more common case are exclusive accesses to data structures like linked lists where the compiler can not help, so in those cases there is no alternative to OS provided synchronization/serialization mechanisms. I do honestly not believe that the TO (or other newcomers) will benefit from a discussion on this level.

1 Like

My point is that defining the global variable as atomic would probably solve the issue of the OP.

And the counter point is that unless the processor HAS an actual machine level atom increment memory instruction, the complier CAN’T make it atomic without knowing the full environment it is running in to implicitly add the equivalent of the critical section.

The compiler can’t generate code to make it work unless it knows it is compiling to FreeRTOS, or provides a set of hooks (with the inherent overhead) to instruct it.

Sure.

It’s not a generic solution.

The more fundamental question is: ”Why are there C/C++ atomic types[1] ? ” Does the C library need to be written specifically for every OS out there? Can’t we just use it if it’s available for a specific architecture?

Don’t get me wrong. I really try to understand why atomic data types should not be used if they are available.

[1] Atomic types - cppreference.com

Or do you suggest to use something like this[2] instead?

[2] Platform: Atomics

Does the C library need to be written specifically for every OS out there.

In part, YES. Part of the purpose of the library is to hide the OS variations from standardized code.

And “atomics” are a feature that very much needs that support. Part of the issue with the language here is that with C11, C stepped from being defined under the assumption of pure single threaded code, with an escape with volatile to indicate that something strange was happening, and needed any other barriers needing to be implemented in library code, to a language that was modeling a possibly multiprocessor configuration needing to handle cache coherency issues (when you use the right markings, like atomic). This model tends to require OS support if you are doing anything more than the processor innately supports, which is often limited to single, word sized reads or writes, and then you need the memory barriers if you have multiple processors.

Nothing wrong with using the language feature, as long as you have checked that it actually is supported for you environment, and what you need to do to actually support that environment.

For example, many embedded compilers work on the basic assumption of “bare-metal”, single threaded code, and if that is the stated assumption of the compiler, “atomic” becomes largely a no-op that can be ignored. It also leads to heap routines (and other parts of the library) that break when used multi-threaded. The better libraries have hooks to support multi-threaded, but YOU need to implement some routines to enable it, as the implementation doesn’t natively come with them for FreeRTOS.