[bug?] prvSelectHighestPriorityTask leads to same task being scheduled due to vListInsertEnd inserting to the HEAD of the list, not the actual end

gemarcano · February 12, 2024, 11:22am

I ran into this bug while working with FreeRTOS-kernel (master branch, updated yesterday 2021-02-11) on a Raspberry Pi Pico W (SMP enabled).

Apparently new users can’t use links, so this’ll look worse than I wanted it to, but the URLs to the code are useful so they’ll stay in parentheses.

Effectively, the behavior I was seeing was that the same task was being scheduled again, and again, and again, starving all others. Debugging FreeRTOS led me to prvSelectHighestPriorityTask (https://github.com/FreeRTOS/FreeRTOS-Kernel/blob/7284d84dc88c5aaf2dc8337044177728b8bdae2d/tasks.c#L983) where I noticed that there were a lot of recent changes due to the SMP work that was merged not too long ago.

Of note, there’s this little explanation and code earlier in the function (https://github.com/FreeRTOS/FreeRTOS-Kernel/blob/7284d84dc88c5aaf2dc8337044177728b8bdae2d/tasks.c#L999):

        /* A new task is created and a running task with the same priority yields
         * itself to run the new task. When a running task yields itself, it is still
         * in the ready list. This running task will be selected before the new task
         * since the new task is always added to the end of the ready list.
         * The other problem is that the running task still in the same position of
         * the ready list when it yields itself. It is possible that it will be selected
         * earlier then other tasks which waits longer than this task.
         *
         * To fix these problems, the running task should be put to the end of the
         * ready list before searching for the ready task in the ready list. */
        if( listIS_CONTAINED_WITHIN( &( pxReadyTasksLists[ pxCurrentTCBs[ xCoreID ]->uxPriority ] ),
                                     &pxCurrentTCBs[ xCoreID ]->xStateListItem ) == pdTRUE )
        {
            ( void ) uxListRemove( &pxCurrentTCBs[ xCoreID ]->xStateListItem );
            vListInsertEnd( &( pxReadyTasksLists[ pxCurrentTCBs[ xCoreID ]->uxPriority ] ),
                            &pxCurrentTCBs[ xCoreID ]->xStateListItem );
        }

But, vListInsertEnd doesn’t actually insert an node at the end of the list. It inserts a node such that it is the last node returned when listGET_OWNER_OF_NEXT_ENTRY is called to iterate through the list. This node is only the last node if the list’s index points to the head node… which for some reason wasn’t the case for my program (maybe it has to do with the fact I make calls to uxTaskGetSystemState from a different task for status/debugging purposes). In my program, what I was seeing was that the Ready list’s index was actually the node right after the head node, meaning that vListInsertEnd was actually inserting the TCB for the current task at the head!

Later in the code, the ready list is iterated using (https://github.com/FreeRTOS/FreeRTOS-Kernel/blob/7284d84dc88c5aaf2dc8337044177728b8bdae2d/tasks.c#L1041):

for( pxIterator = listGET_HEAD_ENTRY( pxReadyList ); pxIterator != pxEndMarker; pxIterator = listGET_NEXT( pxIterator ) )

And for my program, this always gets the newly re-inserted task as it was apparently placed in the head, due to the index being the next node past the head!

I hacked together a “fix” by using listGET_OWNER_OF_NEXT_ENTRY to iterate through the list. It seems to work, and I’m unable to crash my application with it.

Seeing as it is very late in my neck of the woods right now, and late night debugging is fraught with errors, I just wanted to make sure I’m interpreting the code correctly. Does this all sound like a bug? If so, I can try to polish up what I have and file a proper issue on Github, and maybe try a PR?

In summary, in some cases the new SMP scheduler routines in the kernel gets confused due to vListInsertEnd not actually inserting a node to the end of the list if the ready list’s index variable doesn’t point to the head node.

aggarg · February 12, 2024, 1:47pm

Your assessment seems right. Please raise a PR.

richard-damon · February 12, 2024, 3:17pm

My memory is that there were rule/guidelines about list walking and the expectation of when lists were expected to be pointing at their head, but I think this “documentation” was in a somewhat obscure location (perhaps just some code comments). It may be that a review of how lists are used, and the expectations of the various code using them needs to be done, and put someplace more obvious. This becomes more important in SMP code, as there is more opportunity for race conditions if things that were previously assumed to be atomic due to disabling interrupts are now not atomic.

gemarcano · February 12, 2024, 4:39pm

I’ll poke around and see if I can find any code mentioning these assumptions.

For debugging purposes I locked all of my tasks to core 2 on the rp2040, so whatever issue I somehow uncovered seems possible to manifest without race conditions across cores.

I suspect the index/head misalignment might be triggered by my use of uxTaskGetSystemStatefrom a different task, since I had to effectively spam it (human speeds) on that task to trigger the issue. Effectively, that task is a primitive CLI interface where single letter commands do things, like ‘s’ to get system state info using uxTaskGetSystemState. Any slowdown in GDB (due to breakpoint or watchpoint conditions) would prevent the bug from manifesting, so it does look to be a race condition between this CLI task and the kernel scheduler.

richard-damon · February 12, 2024, 4:47pm

If the only usage of the lists are in kernel code, then it may well be the uxTaskGetSystemState isn’t pulling a strong enough critical section when it walks the task lists in SMP mode. There is a warning (as I remember) that this code isn’t intended for “normal” usage, but does cause an extended critical section as it does more work than normally allowed in such a section.

gemarcano · February 12, 2024, 9:42pm

The pico-sdk library does a lot of stuff under the hood, so it’s unclear if it is using lists itself, although, It’d be horrifying if it’s accessing the task lists used by the scheduler and messing with them. So, as far as I know, it’s only the FreeRTOS kernel using these lists.

I wasn’t able to find any documentation/comments talking about any guarantees about the head node being equal to the pxIndex value of the list. Closest is in the List_t definition, but it just says that it holds the last item returned by listGET_OWNER_OF_NEXT_ENTRY:

/*
 * Definition of the type of queue used by the scheduler.
 */
typedef struct xLIST
{
    listFIRST_LIST_INTEGRITY_CHECK_VALUE      /**< Set to a known value if configUSE_LIST_DATA_INTEGRITY_CHECK_BYTES is set to 1. */
    volatile UBaseType_t uxNumberOfItems;
    ListItem_t * configLIST_VOLATILE pxIndex; /**< Used to walk through the list.  Points to the last item returned by a call to listGET_OWNER_OF_NEXT_ENTRY (). */
    MiniListItem_t xListEnd;                  /**< List item that contains the maximum possible item value meaning it is always at the end of the list and is therefore used as a marker. */
    listSECOND_LIST_INTEGRITY_CHECK_VALUE     /**< Set to a known value if configUSE_LIST_DATA_INTEGRITY_CHECK_BYTES is set to 1. */
} List_t;

In the kernel, the only macros/functions that modify pxIndex are:

listGET_OWNER_OF_NEXT_ENTRY
listREMOVE_ITEM
listINSERT_END
vListInitialise
vListInsertEnd
uxListRemove

and most of these only modify pxIndex when that node is removed. As far as I can tell, the only function that can advance pxIndex is listGET_OWNER_OF_NEXT_ENTRY.

The main functions in the kernel that call listGET_OWNER_OF_NEXT_ENTRY are (with no co-routines and SMP with multiple cores enabled and no optimized task selection):

prvListTasksWithinSingleList

This function is called from within uxTaskGetSystemState. That said, uxTaskGetSystemState does supposedly disable the scheduler with vTaskSuspendAll (I’m making an assumption that vTaskSuspendAll stops the scheduler and preemption-- that assumption might be wrong).

I’m calling uxTaskGetSystemState from a single task which is locked to core 2 on the rp2040, so I don’t think even if for some reason the scheduler is still running on core 1, core 1 shouldn’t be doing anything to mess with the Ready task list for core 2… so I’m not sure.

One thing I might try for kicks and giggles is to put assertions enforcing the assumption that the head node is the pxIndex node, and see if I can catch it blowing up.

gemarcano · February 12, 2024, 11:35pm

OK, I think I understand the bug better. It’s caused by the use of vListInsertEnd. I’m copying the code for that function here for convenience:

void vListInsertEnd( List_t * const pxList,
                     ListItem_t * const pxNewListItem )
{
    ListItem_t * const pxIndex = pxList->pxIndex;

    traceENTER_vListInsertEnd( pxList, pxNewListItem );

    /* Only effective when configASSERT() is also defined, these tests may catch
     * the list data structures being overwritten in memory.  They will not catch
     * data errors caused by incorrect configuration or use of FreeRTOS. */
    listTEST_LIST_INTEGRITY( pxList );
    listTEST_LIST_ITEM_INTEGRITY( pxNewListItem );

    /* Insert a new list item into pxList, but rather than sort the list,
     * makes the new list item the last item to be removed by a call to
     * listGET_OWNER_OF_NEXT_ENTRY(). */
    pxNewListItem->pxNext = pxIndex;
    pxNewListItem->pxPrevious = pxIndex->pxPrevious;

    /* Only used during decision coverage testing. */
    mtCOVERAGE_TEST_DELAY();

    pxIndex->pxPrevious->pxNext = pxNewListItem;
    pxIndex->pxPrevious = pxNewListItem;

    /* Remember which list the item is in. */
    pxNewListItem->pxContainer = pxList;

    ( pxList->uxNumberOfItems )++;

    traceRETURN_vListInsertEnd();
}

The gist of it is that this code inserts a node into the list’s linked list right before the current pxIndex. If the pxIndex is the head node, well, then whatever is inserted becomes the defacto head node. Graphically:

Starting with a linked list with two elements:
NODE_1 and HEAD_NODE and INDEX <--> END_NODE <-->
if we run vListIsnertEnd, the list becomes:
NODE_2 and HEAD_NODE <--> NODE_1 and INDEX <--> END_NODE <-->

From the source code, the head node is defined as the node pointed to by the list’s xListEnd.pxNext. So NODE_2 being inserted before NODE_1 effectively makes it the new head node!

Stepping through the scheduler with a debugger, I’m finding that interestingly, the first time the node is removed and added to the “end”, the pxIndex is pointing to the node right before the xListEnd node, nowhere near the head node! This might be why things generally work, until pxIndex manages to migrate to the front of the list over time as tasks are removed and added.

So unless we want to completely change how to select tasks works, probably the best approach would be to just use listGET_OWNER_OF_NEXT_ENTRY (since this is happening in the scheduler, I’m assuming we don’t have to worry about other things modifying the task lists from under the scheduler).

aggarg · February 14, 2024, 5:48am

Thank you for the detailed analysis. I see that you have raised a PR already - Fix prvSelectHighestPriorityTask ready list iteration by gemarcano · Pull Request #991 · FreeRTOS/FreeRTOS-Kernel · GitHub. We are reviewing that PR and will get back.

Fresh · February 21, 2024, 5:51am

@gemarcano
pxIndex is member of the List_t structure and is used to traverse the list items - it points to the last item returned by listGET_OWNER_OF_NEXT_ENTRY and the next call to listGET_OWNER_OF_NEXT_ENTRY returns the next item and moves pxIndex.

FreeRTOS single core scheduler makes use of listGET_OWNER_OF_NEXT_ENTRY(and thereby, pxIndex) to select the next ready task to run. FreeRTO SMP scheduler does not usepxIndexand it always points to thexListEnd. prvSelectHighestPriorityTaskwill work correctly as long aspxIndex` does not move.

The problem you are seeing is that prvListTasksWithinSingleList uses listGET_OWNER_OF_NEXT_ENTRY (and thereby, pxIndex) for walking a list. As a result, when prvListTasksWithinSingleList is called from uxTaskGetSystemState for a list, pxIndex for that list moves. This causes the scheduler to select incorrect task to run.

IMO, the correct solution to update prvListTasksWithinSingleList to not use pxIndex to traverse the list. I have created this PR. Would you please give it a try and see if it solves your problem?

gemarcano · February 23, 2024, 9:12am

I’ll try it tomorrow and see if it works. In theory it sounds like it should work-- if nothing in the SMP code changes pxIndex from its original value as you described, then this should address the issue.

However, if for any reason anyone accidentally updates pxIndex for the ready list in the future, this bug will pop up again. It’d be nice to document or if possible assert/ban updates to pxIndex from within the kernel to the ready list (and maybe other scheduler lists?). I’m not sure how something like this would be enforced, however.

Another alternative solution I thought up of (which I mentioned in my PR) was to add a new list function that appends to the true tail of the list, and use that instead of vListInsertEnd. That way it doesn’t matter what pxIndex is doing at all in the scheduler.

aggarg · February 26, 2024, 5:18am

Moving pxIndex incorrectly may have a consequence of not selecting the longest waiting task. We, therefore, would like to avoid that.

That is a good suggestion. We will look into it.

Fresh · February 27, 2024, 2:59am

Not defining listGET_OWNER_OF_NEXT_ENTRY can helps to prevent moving pxIndex when using SMP scheduler. We update in the same PR to ban updates to pxIndex within the kernel. Thank you for the suggestion.

kanherea · March 13, 2024, 8:50pm

Hello @gemarcano,

Let us know if your queries have been answered.

Thanks