Bug in posix porting layer of pthread_cond_broadcast/wait

Byron · September 25, 2020, 12:11am

If a high priority thread and a low priority thread are blocked in pthread_cond_wait(), the low priority thread will not be awakened,… instead the high priority thread checks it’s condition twice.

The following code when run on linux properly wakes up thread b, but when run with the freertos posix porting layer, the two calls to SemaphoreGive are both consumed by the high priority thread a.

thread b is starved and never re-evaluates it’s condition.

#include <stdio.h>
#include <pthread.h>
#include <sched.h>

pthread_cond_t  c;
pthread_mutex_t mtx;

volatile int expression_a = 0;
volatile int expression_b = 0;


void *medium_priority_thread_a(void*param)
{
    pthread_mutex_lock(&mtx);
    while (!expression_a)
    {
        printf("thread a waiting for expression_a\n");
        pthread_cond_wait(&c, &mtx);
    }
    pthread_mutex_unlock(&mtx);
    return NULL;
}

void *low_priority_thread_b(void*param)
{
    pthread_mutex_lock(&mtx);
    while (!expression_b)
    {
        printf("thread b waiting for expression_b\n");
        pthread_cond_wait(&c, &mtx);
    }
    pthread_mutex_unlock(&mtx);
    printf("thread b proceeding\n");
    return NULL;
}
void *high_priority_thread_c(void*param)
{
    // wait for thread a and thread b to be stuck on the condition.
    MSCC_OS_USLEEP(100000);
    printf("thread c allowing thread b to proceed\n");
    expression_b = 1;
    pthread_cond_broadcast(&c);
    return NULL;
}

void main(int argc,char**argv)
{
    pthread_cond_init(&c, NULL);
    pthread_mutex_init(&mtx, NULL);

    pthread_t thread_id_a;
    pthread_t thread_id_b;
    pthread_t thread_id_c;
    pthread_attr_t thread_attr;
    struct sched_param param;

    // start thread a (medium priority)
    pthread_attr_init (&thread_attr);
    pthread_attr_getschedparam (&thread_attr, &param);
    param.sched_priority = 4;
    pthread_attr_setschedparam (&thread_attr, &param);
    pthread_create(&thread_id_a, &thread_attr,  medium_priority_thread_a,    NULL);

    // start thread b (low priority)
    param.sched_priority = 2;
    pthread_attr_setschedparam (&thread_attr, &param);
    pthread_create(&thread_id_b, &thread_attr,  low_priority_thread_b,     NULL);

    // start thread c (high priority)
    param.sched_priority = 5;
    pthread_attr_setschedparam (&thread_attr, &param);
    pthread_create(&thread_id_c, &thread_attr,  high_priority_thread_c, NULL);

    pthread_join (thread_id_c, NULL);
    printf("thread c finished.\n");
    pthread_join (thread_id_b, NULL);
    printf("thread b finished.\n");
    return 0;
}

the output of running this in freertos is:

thread a waiting for expression_a
thread b waiting for expression_b
thread c allowing thread b to proceed
thread c finished.
thread a waiting for expression_a
thread a waiting for expression_a

whereas if the priorities are inverted and run on linux, the output is:

thread a waiting for expression_a
thread b waiting for expression_b
thread c allowing thread b to proceed
thread a waiting for expression_a
thread c finished.
thread b proceeding
thread b finished.

As a simple fix for us, we changed pthread_cond_wait/broadcast to use a list of threads that should be notified,… and instead use xTaskNotifyGive & ulTaskNotifyTake.

rtel · September 25, 2020, 1:55am

Thanks for reporting. Would be grateful if you could let us know which version you are using, and post the fix.

Byron · September 25, 2020, 8:22am

We are using V10.1.1, but the same strategy seems to exist in the current code.

Our fix was quick and dirty,… since we are using stl, and don’t mind allocating a bit of memory at startup, we simply store a vector of tasks in the pthread_cond_t, and push the task in pthread_cond_wait and iterate over those in pthread_cond_broadcast e.g.

with a bit of hand waving,… our implementation includes defining a vector of waiting tasks:

typedef struct {
	std::vector<TaskHandle_t> vWaitingTasks;
} pthread_cond_internal_t;

in pthread_cond_wait we simply add the task to the vector:

int pthread_cond_wait( pthread_cond_t * cond,
                       pthread_mutex_t * mutex )
{
    pthread_cond_internal_t * pxCond = ( pthread_cond_internal_t * ) ( *cond );

	pxCond->vWaitingTasks.push_back(xTaskGetCurrentTaskHandle());
	pthread_mutex_unlock( mutex );
	ulTaskNotifyTake( pdTRUE,         /* Clear the notification value before exiting. */
	                  portMAX_DELAY ); /* Block indefinitely. */
	pthread_mutex_lock( mutex );
}

And in pthread_cond_broadcast we notify all the waiters and clear the list of waiters. In our application, we always own the mutex when we call broadcast, so there’s no race conditions while modifying the vector.

int pthread_cond_broadcast( pthread_cond_t * cond )
{
    pthread_cond_internal_t * pxCond = ( pthread_cond_internal_t * ) ( *cond );
    for (TaskHandle_t waiting_task : pxCond->vWaitingTasks) // iterate over all waiting tasks
    {
        xTaskNotifyGive( waiting_task );
    }
    pxCond->vWaitingTasks.clear();
}

Sorry that our implementation isn’t readily usable for the generic case !

cobusve · September 25, 2020, 6:26pm

There was some confusion on our side because there are two things called “posix porting layer”. On the one side there is FreeRTOS+POSIX which is a labs project. It has a lot of unimplemented things like thread local storage etc. and as such it is still in Labs. This is listed as V10.1.1 and can be found here. https://github.com/FreeRTOS/Lab-Project-FreeRTOS-POSIX . This seems to be what you are referring to.

In FreeRTOS 10.4.0 we added a Posix porting layer. Instead of allowing you to run pthreads on top for FreeRTOS like the Labs project does, this layer allows you to use FreeRTOS Tasks on top of pthreads, this is useful to test your microcontroller code on Linux while the prior is useful to run your posix code on a microcontroller.

The posix port is here https://github.com/FreeRTOS/FreeRTOS-Kernel/tree/master/portable/ThirdParty/GCC/Posix

We will try to add a fix for what you reported to FreeRTOS+POSIX. That is a bit lower priority than the Kernel so it may not happen immediately and if anyone out there wants to take a stab at doing this one we will be very excited to see a PR with a fix into the FreeRTOS+POSIX labs project!

Cobus

Byron · September 25, 2020, 6:57pm

Thanks Cobus,

Yes the problem is with the FreeRTOS+POSIX labs project.

Is there a test suite that is run? Perhaps I could add a test that demonstrates the problem?

I don’t have a good suggestion for how to properly fix the code (our fix only works because in our application we aren’t using xTaskNotifyGive/Take for other purposes).

cobusve · September 26, 2020, 11:11pm

We are in the process of re-arranging the locations of a number of tests. The posix tests currently can be found here https://github.com/aws/amazon-freertos/tree/master/libraries/freertos_plus/standard/freertos_plus_posix/test

If you can contribute a test you can just do a PR into that location, if it disappears you should look in the labs repository as the intent is to move it over there soon.

Just a note that this particular project is still in labs because neither the implementation nor the tests are complete so any contributions to expand tests would be greatly appreciated.

In the meantime we are going to log an Issue in the FreeRTOS+POSIX repo to note that this is a problem with the current implementation.

Topic		Replies	Views
FreeRTOS + POSIX race condition in pthread_cond_timedwait() Kernel	2	587	November 15, 2019
FreeRTOS+POSIX supports pthread_cond_clockwait Kernel	2	525	May 10, 2021
Pthread_cond_timedwait problem Kernel	6	711	August 13, 2021
Task Crash While Waiting for Semaphore Kernel	5	483	July 30, 2016
Binary Semaphore problem Kernel	27	2492	August 28, 2021

Bug in posix porting layer of pthread_cond_broadcast/wait

Related topics