Semaphore Take Failure Seen Even Though There Isn't A Semaphore Give Failure

sshivsh · February 28, 2022, 11:44am

Hi !

I’m trying to make sense of the following code: uxPriority is 1 !

void vStartSemaphoreTasks( UBaseType_t uxPriority )
{
    xSemaphoreParameters * pxFirstSemaphoreParameters, * pxSecondSemaphoreParameters;
    const TickType_t xBlockTime = ( TickType_t ) 100;

    /* Create the structure used to pass parameters to the first two tasks. */
    pxFirstSemaphoreParameters = ( xSemaphoreParameters * ) pvPortMalloc( sizeof( xSemaphoreParameters ) );

    if( pxFirstSemaphoreParameters != NULL )
    {
        /* Create the semaphore used by the first two tasks. */
        pxFirstSemaphoreParameters->xSemaphore = xSemaphoreCreateBinary();

        if( pxFirstSemaphoreParameters->xSemaphore != NULL )
        {
            xSemaphoreGive( pxFirstSemaphoreParameters->xSemaphore );

            /* Create the variable which is to be shared by the first two tasks. */
            pxFirstSemaphoreParameters->pulSharedVariable = ( uint32_t * ) pvPortMalloc( sizeof( uint32_t ) );

            /* Initialise the share variable to the value the tasks expect. */
            *( pxFirstSemaphoreParameters->pulSharedVariable ) = semtstNON_BLOCKING_EXPECTED_VALUE;

            /* The first two tasks do not block on semaphore calls. */
            pxFirstSemaphoreParameters->xBlockTime = ( TickType_t ) 0;

            /* Spawn the first two tasks.  As they poll they operate at the idle priority. */
            xTaskCreate( prvSemaphoreTest, "PS1", semtstSTACK_SIZE, ( void * ) pxFirstSemaphoreParameters, tskIDLE_PRIORITY, ( TaskHandle_t * ) NULL );
            xTaskCreate( prvSemaphoreTest, "PS2", semtstSTACK_SIZE, ( void * ) pxFirstSemaphoreParameters, tskIDLE_PRIORITY, ( TaskHandle_t * ) NULL );

            /* vQueueAddToRegistry() adds the semaphore to the registry, if one
             * is in use.  The registry is provided as a means for kernel aware
             * debuggers to locate semaphores and has no purpose if a kernel aware
             * debugger is not being used.  The call to vQueueAddToRegistry() will
             * be removed by the pre-processor if configQUEUE_REGISTRY_SIZE is not
             * defined or is defined to be less than 1. */
            vQueueAddToRegistry( ( QueueHandle_t ) pxFirstSemaphoreParameters->xSemaphore, "Counting_Sem_1" );
        }
    }

    /* Do exactly the same to create the second set of tasks, only this time
     * provide a block time for the semaphore calls. */
    pxSecondSemaphoreParameters = ( xSemaphoreParameters * ) pvPortMalloc( sizeof( xSemaphoreParameters ) );

    if( pxSecondSemaphoreParameters != NULL )
    {
        pxSecondSemaphoreParameters->xSemaphore = xSemaphoreCreateBinary();

        if( pxSecondSemaphoreParameters->xSemaphore != NULL )
        {
            xSemaphoreGive( pxSecondSemaphoreParameters->xSemaphore );

            pxSecondSemaphoreParameters->pulSharedVariable = ( uint32_t * ) pvPortMalloc( sizeof( uint32_t ) );
            *( pxSecondSemaphoreParameters->pulSharedVariable ) = semtstBLOCKING_EXPECTED_VALUE;
            pxSecondSemaphoreParameters->xBlockTime = xBlockTime / portTICK_PERIOD_MS;

            xTaskCreate( prvSemaphoreTest, "BS1", semtstSTACK_SIZE, ( void * ) pxSecondSemaphoreParameters, uxPriority, ( TaskHandle_t * ) NULL );
            xTaskCreate( prvSemaphoreTest, "BS2", semtstSTACK_SIZE, ( void * ) pxSecondSemaphoreParameters, uxPriority, ( TaskHandle_t * ) NULL );

            /* vQueueAddToRegistry() adds the semaphore to the registry, if one
             * is in use.  The registry is provided as a means for kernel aware
             * debuggers to locate semaphores and has no purpose if a kernel aware
             * debugger is not being used.  The call to vQueueAddToRegistry() will
             * be removed by the pre-processor if configQUEUE_REGISTRY_SIZE is not
             * defined or is defined to be less than 1. */
            vQueueAddToRegistry( ( QueueHandle_t ) pxSecondSemaphoreParameters->xSemaphore, "Counting_Sem_2" );
        }
    }
}
/*-----------------------------------------------------------*/


#define TEMP_STR_LEN 9

#define DEBUG_PRINT(message)                                        \
        portENTER_CRITICAL();                                       \
        sprintf((char *) tempstr, "%s:%s\n", tasknamestr, message); \
        uart_printf((BYTE *) tempstr, TEMP_STR_LEN);                \
        portEXIT_CRITICAL();

#define DEBUG_PRINT_NUM(num)                                        \
        portENTER_CRITICAL();                                       \
        sprintf((char *) tempstr, "%s:%3d\n", tasknamestr, num);    \
        uart_printf((BYTE *) tempstr, TEMP_STR_LEN);                \
        portEXIT_CRITICAL();



static portTASK_FUNCTION( prvSemaphoreTest, pvParameters )
{
    xSemaphoreParameters * pxParameters;
    volatile uint32_t * pulSharedVariable, ulExpectedValue;
    uint32_t ulCounter;
    short sError = pdFALSE, sCheckVariableToUse;

    // Sid - 2022.02.23
    char *tasknamestr = pcTaskGetName( NULL );
    char tempstr[TEMP_STR_LEN];

    /* See which check variable to use.  sNextCheckVariable is not semaphore
     * protected! */
    portENTER_CRITICAL();
    sCheckVariableToUse = sNextCheckVariable;
    sNextCheckVariable++;
    portEXIT_CRITICAL();

    /* A structure is passed in as the parameter.  This contains the shared
     * variable being guarded. */
    pxParameters = ( xSemaphoreParameters * ) pvParameters;
    pulSharedVariable = pxParameters->pulSharedVariable;

    /* If we are blocking we use a much higher count to ensure loads of context
     * switches occur during the count. */
    if( pxParameters->xBlockTime > ( TickType_t ) 0 )
    {
        ulExpectedValue = semtstBLOCKING_EXPECTED_VALUE;
    }
    else
    {
        ulExpectedValue = semtstNON_BLOCKING_EXPECTED_VALUE;
    }

    for( ; ; )
    {
        
        
        DEBUG_PRINT("IFS");                 // Sid - IFS = Infinite For loop Started
      
        /* Try to obtain the semaphore. */
        if( xSemaphoreTake( pxParameters->xSemaphore, pxParameters->xBlockTime ) == pdPASS )
        {
            DEBUG_PRINT("STS");             // Sid - STS = Semaphore Take Success
          
            /* We have the semaphore and so expect any other tasks using the
             * shared variable to have left it in the state we expect to find
             * it. */
            if( *pulSharedVariable != ulExpectedValue )
            {
                sError = pdTRUE;
            
                DEBUG_PRINT("SNE");         // Sid - SNE = Shared variable Not Expected value
            }

            /* Clear the variable, then count it back up to the expected value
             * before releasing the semaphore.  Would expect a context switch or
             * two during this time. */
            for( ulCounter = ( uint32_t ) 0; ulCounter <= ulExpectedValue; ulCounter++ )
            {
                *pulSharedVariable = ulCounter;

                if( *pulSharedVariable != ulCounter )
                {
                    // Sid - 2022.02.28
                    if (!sError)
                    {
                        DEBUG_PRINT("SNC"); // Sid - SNC = Shared variable Not Counter
                    }
                    
                    sError = pdTRUE;
                }
            }

            /* Release the semaphore, and if no errors have occurred increment the check
             * variable. */
            if( xSemaphoreGive( pxParameters->xSemaphore ) == pdFALSE )
            {
                DEBUG_PRINT("SGF");         // Sid - SGF = Semaphore Give Failure
              
                sError = pdTRUE;
            }

            if( sError == pdFALSE )
            {
                DEBUG_PRINT("NEO");         // Sid - NEO = No Error Occurred
              
                if( sCheckVariableToUse < semtstNUM_TASKS )
                {                  
                    ( sCheckVariables[ sCheckVariableToUse ] )++;
                  
                    DEBUG_PRINT_NUM(sCheckVariables[ sCheckVariableToUse ]);
                }
            }

            /* If we have a block time then we are running at a priority higher
             * than the idle priority.  This task takes a long time to complete
             * a cycle	(deliberately so to test the guarding) so will be starving
             * out lower priority tasks.  Block for some time to allow give lower
             * priority tasks some processor time. */
            if( pxParameters->xBlockTime != ( TickType_t ) 0 )
            {
                DEBUG_PRINT("CVD");         // Sid - CVD = Calling Vtask Delay

                vTaskDelay( 10 );           //vTaskDelay( pxParameters->xBlockTime * semtstDELAY_FACTOR );
            }
        }
        else
        {
            DEBUG_PRINT("STF");             // Sid - STF = Semaphore Take Failure

            if( pxParameters->xBlockTime == ( TickType_t ) 0 )
            {
                DEBUG_PRINT("CTY");         // Sid - CTY = Calling Task Yield
              
                /* We have not got the semaphore yet, so no point using the
                 * processor.  We are not blocking when attempting to obtain the
                 * semaphore. */
                taskYIELD();
            }
        }
    }
}

Here is the initial part of my output:

 BS1:IFS
 BS1:STS
 xQS/GS.
 BS1:NEO
 BS2:IFS
 BS2:STS
 xQS/GS.
 BS2:NEO
 BS1:  1
 BS1:CVD
 BS2:  1
 BS2:CVD
 PS1:IFS
 PS1:STS
 xQS/GS.
 PS1:NEO
 PS2:IFS
 PS2:STS
 xQS/GS.
 PS2:NEO
 PS2:  1
 PS1:  1
 PS1:IFS
 PS1:STS
 xQS/GS.
 PS2:IFS
 PS2:STF       ->   STF without SGF
 PS2:CTY
 PS1:NEO
 PS2:IFS
 PS2:STS
 xQS/GS.
 PS2:NEO
 PS1:  2
 PS1:IFS
 PS1:STS
 xQS/GS.
 PS2:  2
 PS2:IFS
 PS2:STF       ->   STF without SGF
 PS2:CTY
 PS1:NEO

Why am I seeing no Semaphore Give Failure (SGF) messages before Semaphore Take Failure (STF) messages ?

Is it possible to treat xQueueSemaphoreTake() as a black box & yet understand FreeRTOS semaphores ? (xQueueSemaphoreTake() seems to be a large function & I’m not sure if diving into it is smart !)

Thanks in advance !

hs2 · February 28, 2022, 12:27pm

You should explain what you trying to test or what the code is all about. IMHO it’s pretty convoluted test code trying to get in touch with FreeRTOS. Easy to get confused
If I got it right you have 2 tasks running fiddling around on the SAME (nonblocking) semaphore. What do you expect ? If one task got the semaphore the other fails trying to get it.

BTW I’d not use arbitrary TEMP_STR_LEN value, instead something like
configMAX_TASK_NAME_LEN + SOME_VALUE_COVERING_THE_ADDITIONAL_TEXT
Also better use e.g. snprintf to avoid buffer overflows.
Also it makes not much sense to malloc a uint32_t variable and to store its address in a pointer variable.

Edit: Some additional hints:
There is no need to protect s(n)printf to a task local buffer.
Also critical sections should only be used for short/fast pieces of code because it disables (FreeRTOS covered) interrupts. UART output is a rather lengthy operation and should protected by a mutex.
Better add some helper functions e.g. for debug output instead of adding boilerplate code over and over again (leading to lengthy code with limited readability.

sshivsh · February 28, 2022, 12:58pm

Hi Hartmut !

Except for the prints, the code is from FreeRTOSv202112.00\FreeRTOS\Demo\Common\Minimal\semtest.c ! This file’s initial comment reads like:

/*
 * Creates two sets of two tasks.  The tasks within a set share a variable, access
 * to which is guarded by a semaphore.
 *
 * Each task starts by attempting to obtain the semaphore.  On obtaining a
 * semaphore a task checks to ensure that the guarded variable has an expected
 * value.  It then clears the variable to zero before counting it back up to the
 * expected value in increments of 1.  After each increment the variable is checked
 * to ensure it contains the value to which it was just set. When the starting
 * value is again reached the task releases the semaphore giving the other task in
 * the set a chance to do exactly the same thing.  The starting value is high
 * enough to ensure that a tick is likely to occur during the incrementing loop.
 *
 * An error is flagged if at any time during the process a shared variable is
 * found to have a value other than that expected.  Such an occurrence would
 * suggest an error in the mutual exclusion mechanism by which access to the
 * variable is restricted.
 *
 * The first set of two tasks poll their semaphore.  The second set use blocking
 * calls.
 *
 */

From my (full) output (which I haven’t pasted here), I see that the shared variables don’t take unexpected values (i.e. SNE = Shared variable Not Expected value doesn’t print). So that part’s ok !

Since I see no SGF messages in my output, I wonder why there are STF messages, that’s all !

Thanks for your other suggestions ! I’ll see what I can do !

hs2 · February 28, 2022, 1:18pm

In fact posting or signaling a semaphore can’t fail if the semaphore was created/obtained correctly as opposed to waiting (or not) for a semaphore, which might fail (if non-blocking) and the semaphore wasn’t signaled, of course.
That’s the way semaphores work and is not specific to FreeRTOS.

aggarg · February 28, 2022, 8:01pm

In addition, I will recommend this book also to learn more about FreeRTOS primitives.

sshivsh · March 1, 2022, 6:03am

Hi Hartmut ! I’ve edited my first post to make the code more readable through 2 macros named DEBUG_PRINT & DEBUG_PRINT_NUM ! Hope you can bear with the code for now !

Yes, I don’t see an Semaphore Give Failure (SGF) message in my output !

Not sure if I understand. Do you mean that a non-blocking task’s semaphore take failure is known immediately ?

Note in my output that Semaphore Take Failure (STF) occurs for (Polling Semaphore 2) PS2. This would be expected if there is a Semaphore Give Failure (SGF) from PS1 (the task which shares the semaphore with PS2). But I don’t see an SGF from PS1 !


--------------------------------------------------------------------------------

Hi Gaurav ! Thanks for the recommendation ! I’m reading that book, but I haven’t got as far as semaphores. Maybe I’ll skip some topics & read about semaphores first !

I might be missing something that’s obvious to you & Hartmut (if so, please let me know) ! I’ll take another look at the code & docs !

Thanks in advance !

aggarg · March 1, 2022, 6:17am

This is not how semaphore work as hs2@ already explained. A take failure means that the task was waiting for a semaphore to become available and timed out while doing so. It has nothing to do with give failure.

sshivsh · March 1, 2022, 6:37am

Really ? Ok !

Does that mean in the absence of the following messages, the code & output are ok ? Thanks in advance !

Messages:

Shared variable Not Expected value (SNE)
Shared variable Not Counter (SNC)

aggarg · March 1, 2022, 7:12am

Yes. When you call xSemaphoreTake with a finite timeout, you are essentially requesting to take semaphore within the specified timeout. There are 2 possible scenarios:

The semaphore becomes available before the timeout and the operation succeeds.
The semaphore does not become available before timeout (because others tasks are holding it) and the operation fails.

The scenario 2 does not have to do anything with a give failure. For example, consider the following sequence for a binary semaphore:

At time t = 0, task 1 takes the semaphore for 10 seconds.
At time t = 2, task 2 attempts to take the semaphore with a timeout of 2 seconds.
At time t = 4, task 2’s attempt of taking semaphore (i.e. xSemaphoreTake call) fails as task 1 has not yet returned the semaphore.
At time t = 10, task 1 returns the semaphore.

Hope it clarifies.

If you talking about semtest.c, as long as this function returns pdTRUE, everything is okay - https://github.com/FreeRTOS/FreeRTOS/blob/main/FreeRTOS/Demo/Common/Minimal/semtest.c#L255

Thanks.

sshivsh · March 1, 2022, 11:34am

Since the terms we’ve all used are inconsistent, a little clarity about that first !


                     Semaphore Term 1       Semaphore Term 2
Terms Used By:

1. FreeRTOS Code         Take / Obtain         Give / Release
2. Siddhartha            Take                  Give
3. Hartmut               Wait for              Signal / Post
4. Gaurav                Take                  Return

So what I call a “Give Failure”, Gaurav might call it “Return Failure” !

Continuing onwards …

At time t = 12, let’s assume task 2 attempts to take the semaphore with a timeout of 2 secs.

Since task 1 has already returned / given the sempahore (without a return / give failure), we expect step 5 to succeed !

If step 5 fails, we would ask why the semaphore take failed (since the semaphore return / give did not fail) ?

Now see the snippet from my output below & compare it to the flow in my posted code:

...
 PS1:IFS       ->   Infinite For loop Started for PS1
 PS1:STS       ->   Semaphore Take Success
 xQS/GS.       ->   xQueueSend/GenericSend {part of xSemaphoreGive()}
 PS2:IFS       ->   Infinite For loop Started for PS2
 PS2:STF       ->   STF without SGF (Semaphore Take Failure seen without Semaphore Give Failure) ?!?
 PS2:CTY       ->   Calling Task Yield
...

Hope I communicated clearly about the strange observation above !

xAreSemaphoreTasksStillRunning() should return pdTRUE if there is at least 1 error-free iteration of the infinite for loop (but the remaining iterations could be erroneous). sCheckVariables[ xTask ] would be 1 higher than sLastCheckVariables[ xTask ]. Am I right ?

Thanks in advance ! Hope you’re not being inconvenienced !

hs2 · March 1, 2022, 12:08pm

Don’t stick to low level details - try to get the basic idea of a semaphore and things will be more clear

RAc · March 1, 2022, 12:29pm

While I certainly agree that Siddhartha should study the semantics of semaphores before trying to wade through the code, it needs to be pointed out that the Wikipedia article quoted above has at least one severe error - there is no inherent relationship between critical sections and semaphores. A semaphore CAN be used as a mutex (but shouldn’t generally) or a resource protection mechanism, but its most frequent and useful application is as a signalling mechanism (which is unrelated to protection/serialization etc). The article sort of throws everything related to concurrency into one pot and cooks a stew of it. Should certianly be revised and NOT used as the starting point of concurrency research.

Dijkstra’s original semaphore was the “father of all synchronization objects,” and unfortunately, the term semaphore has been and still is being grossly misused as a jolly joker for everything concurrent, in particular muteces. I wish that would stop, it is the source of many many bugs and misunderstandings.

aggarg · March 1, 2022, 11:20pm

One problem in using printfs for understanding timing is that there are windows between the actual operation happening and printf completing which can lead to incorrect trace. The correct way to understand timings is to use a tool like Tracelyzer.

In your specific case, you can try to add an else clause to print completion of give operation:

if( xSemaphoreGive( pxParameters->xSemaphore ) == pdFALSE )
{
    DEBUG_PRINT("SGF");         // Sid - SGF = Semaphore Give Failure
  
    sError = pdTRUE;
}
else
{
    DEBUG_PRINT("SGS");         // Sid - SGF = Semaphore Give Success
}

There can still be a case where SGS is printed and then the same task obtains the semaphore again but now before it gets a chance to print STS, it is switched out and next task tries to obtain semaphore and fails to do so and prints STF. You can call taskYIELD(); to force the task to yield after giving the semaphore and reduce the probability of this scenario but the best approach would be to not rely on prints and use some tool like Tracelyzer.

The application is expected to call xAreSemaphoreTasksStillRunning periodically and whenever there is an error, this function will return pdFALSE.

sshivsh · March 2, 2022, 12:58pm

Hi ! I read through the Wikipedia ‘Semaphore (programming)’ article to get some higher level context ! It was interesting ! Thanks Hartmut !

Ok Rudiger ! I tried to avoid being confused ! Thanks !

With Semaphore Take Failure being STF & Semaphore Take Success being STS:

I added the else clause that you suggested. Then I collected 600+ prints in which, I saw STF in only PS1 or PS2.
But I never saw the scenario that you mentioned in the quoted text above. Whenever there was a PS2:STF, it was preceded closely by a PS1:STS & vice versa !
Naturally, this explains why the STF occurs (because closely preceding it, the other task had an STS which means it took the semaphore).
The error I made earlier was to assume that the xQS/GS. message meant a Semaphore Give Success (SGS).
1. Rather it only meant that xQueueSend/GenericSend occurred (I had placed the print at the start of the xQueueGenericSend() function).

I’m not sure if you understood my point.

Anyway, your post helped me to solve my problem ! So thanks Gaurav !

aggarg · March 2, 2022, 6:27pm

I am not sure which point you mean but glad that it worked for you.

Here is the documentation about how to use these tests: https://github.com/FreeRTOS/FreeRTOS/blob/main/FreeRTOS/Demo/ThirdParty/Template/README.md
Here is an example of how you are expected to call xAreSemaphoreTasksStillRunning again and again to ensure that tests are alive: https://github.com/FreeRTOS/FreeRTOS/blob/main/FreeRTOS/Demo/ThirdParty/Template/TestRunner.c#L389

Thanks.

Topic		Replies	Views
Return value of xSemaphoreGive() Kernel	7	1174	April 17, 2020
xSemaphoreTake macro with delay 0 cannot work Kernel	17	1231	September 18, 2015
Failure case of Semaphore in FreeRTOS Kernel	6	316	October 11, 2019
xSemaphoreTake fails before timeout Kernel	14	266	April 11, 2006
When xSemphoareGive() pdFAIL Kernel	2	178	January 24, 2018

Semaphore Take Failure Seen Even Though There Isn't A Semaphore Give Failure

Related topics