Problem with array bounds for pxReadyTasksLists (FreeRTOS 8.2.x)

Hi there,

pxReadyTasksLists is an array bound by configMAX_PRIORITIES.

There are cases in which uxTopReadyPriority (which is used as an index into pxReadyTasksLists) exceeds configMAX_PRIORITIES, causing taskSELECT_HIGHEST_PRIORITY_TASK to accessing undefined memory. The control path is this (configUSE_PORT_OPTIMISED_TASK_SELECTION set)*:

#define taskSELECT_HIGHEST_PRIORITY_TASK()	\
{											\
    		UBaseType_t uxTopPriority;			\
    		/* Find the highest priority queue that contains ready tasks.*/  \
    		portGET_HIGHEST_PRIORITY( uxTopPriority, uxTopReadyPriority);	\
    		configASSERT( listCURRENT_LIST_LENGTH( &( pxReadyTasksLists[ uxTopPriority ] ) ) > 0 ); \
    		listGET_OWNER_OF_NEXT_ENTRY( pxCurrentTCB, &( pxReadyTasksLists[ uxTopPriority ] ) );	\
    	} /* taskSELECT_HIGHEST_PRIORITY_TASK() */

where

#define portGET_HIGHEST_PRIORITY( uxTopPriority, uxReadyPriorities ) uxTopPriority = ( 31 - ucPortCountLeadingZeros( ( uxReadyPriorities ) ) )

(Cortex port).

It’s obvious why the macro does what it does; it tries to map the numeric priority to its MSB aligned representation conforming to the ARM interrupt architecture. Yet this value shouldn’t be used as an index into pxReadyTasksLists, right?

I’m obviously missing something because this code has been around for a very long time, so it should have been noticed before. What am I missing?

Thanks!

*Sorry, I couldn’t figure out how to format code reasonably in this forum software

At least in recent FreeRTOS versions configMAX_PRIORITIES is checked being < 32 in this case and there is also code to ensure validity of priority at runtime.
Which cases where uxTopReadyPriority might exceed configMAX_PRIORITIES did you found ?

Enclose code blocks by 3 backticks ‘`’ or tildes ‘~’, use 1 backtick or the ‘</>’ button of the edit widget for inline code formatting.

Ok, I see now where my error might be - in this case the task switcher doesn’t use
uxTopReadyPriority as the index into pxReadyTasksLists, but the reverse offset (I found an 11 in uxTopReadyPriority, deposited there via prvAddTaskToReadyList() with a pri of 3 in a setup where configMAX_PRIORITIES was set to 8). Looks like uxTopReadyPriority will either contain the linear range from 0 to configMAX_PRIORITIES (no optimized task selection) or the POD specified range (optimized task selection). Looks ok now, only caveat being that the declaration assignment

PRIVILEGED_DATA static volatile UBaseType_t uxTopReadyPriority = tskIDLE_PRIORITY;

is valid only for the no optimized task selection semantics.

Sorry for the noise!

Ok, I need to open this again because I’ve made more headway on the fault analysis (please see attachment

):

  • pxReadyTasksLists is an array of variables of type List_t.

  • List_t is defined in list.h (only relevant members listed for readability):

    typedef struct xLIST
    {
    volatile UBaseType_t uxNumberOfItems;
    ListItem_t * configLIST_VOLATILE pxIndex;
    MiniListItem_t xListEnd;
    } List_t;

and MiniListItem_t:

struct xMINI_LIST_ITEM
{
    configLIST_VOLATILE TickType_t xItemValue;
    struct xLIST_ITEM * configLIST_VOLATILE pxNext;
    struct xLIST_ITEM * configLIST_VOLATILE pxPrevious;
};
typedef struct xMINI_LIST_ITEM MiniListItem_t;

Thus, each array variable has size 4 (sizeof (UBaseType_t)) + 4 (sizeof (ListItem_t *)) + 12 (sizeof(xMINI_LIST_ITEM)) = 20 bytes.

All of that looks good. We can see, for example, that pxReadyTasksLists[0] resides at 0x405060 and pxReadyTasksLists[1] at 0x407574. The Delta is 0x14 which is 0d20. Consequently, pxReadyTasksLists[2] is at 0x407574+0x14 = 0x407588 (not displayed here, but trust me).

Now the value of pxReadyTasksLists[1].pxIndex (residing at 0x405078) is the address immediately following, namely 0x40507c. There is nothing wrong with that by itself, but the problem is that pxIndex is not of type MiniListItem_t * but of type ListItem_t * which has two additional members over MiniListItem_t, namely pvOwner and pvContainer (8 bytes together). As we can see in the sketch, if FreeRTOS ever dereferences pxIndex AND accesses one of the two members pvOwner or pvContainer, it’ll trample over the first two members of pxReadyTasksLists[2] (red box in the sketch). Or vice versa.

This appears to be exactly what I see in my sporadic faults; occassionally, the pvOwner field of a tasks on one of the ready lists is 0 although the corresponding uxNumberOfItems field is 1, and I added code that traps every explicit assignment to the pvOwner field to 0. I suspect that the chain of events causing the fault is this: pxReadyTasksLists[2] gets decremented to 0 which makes the pvOwner field of *(pxReadyTasksLists[1].pxIndex) look like 0.

The strange thing is that the exact same code works fine on at least one different platform, even though the structure field overlaps are there as well. The platform on which that works is a Cortex M4; the one which I am porting to right now a Coldfire V2. I have made many many FreeRTOS ports in the course of the last 12+ years, and this is the first time I’ve come across this. This appears to be a clear data organization error (the last parameter of List_t should not be a MiniListItem_t but a ListItem_t to prevent this from happening), yet again, it should have come up long before if it would be a conceptual problem. So it can’t be.

What is the knot in my brain this time (hopefully this time someone else will clear it up before I find it myself)?

Thanks for any input!

Edit: This applies also to FreeRTOS2020.

Hmm, the Coldfire V2 port is very old and not tested anymore, but I don’t see any reason why it would not still work - dependent on compiler changes since it was written.

The overlapping structure is not, by itself, going to cause you a problem. The code has always been like that. Nothing should ever be pointing to the last item, which is there just as a marker. If it is causing an issue then something bad has already happened.

One thing you can try doing is setting configUSE_LIST_DATA_INTEGRITY_CHECK_BYTES to 1 in FreeRTOSConfig.h. That will put the value 0x5a5a5a5a at either ends of the structure, and those values are then checked by asserts and you will be able to see if they still exist by viewing them in the debugger (assuming your version of FreeRTOS has that facility - I think it was introduced in a very early FreeRTOS version). That may help track down the issue.

Thanks for the answer, Richard!

Actually, this is only true for the pvContainer member which is indeed unused, but the pvOwner member appears to be significant and frequently used.

I’ll agree that if there was any inherent issue, it would have shown a long time ago, so whatever I see must be a follow up error of something else. Yet it’s disturbing and shouldn’t have escaped static code analysis (even if it apparently never shows in real life)…

I’ll look into the signature checks, they are present in my FreeRTOS version.

Thanks again, keep up the good work (I’m an incurable fan of FreeRTOS).

? Are you talking about the xListEnd member? It doesn’t either a pvContainer of a pvOwner member. https://github.com/FreeRTOS/FreeRTOS-Kernel/blob/V10.4.3/include/list.h#L154

Please read my post #4 again, Richard. Of course xListEnd does not have either member, but *pxIndex has (being a pointer to a ListItem_t object), and if *pxIndex points to the xListEnd entry of the same structure, there is an implicit coercion to that type, and *pxIndex->pvOwner will point to something invalid!

Sorry for not being clear enough… :roll_eyes:

You are right that if ever de-referenced, it will read invalid data. But, as you can read from this comment, it is used just as the end marker and never referenced: FreeRTOS-Kernel/list.c at V10.4.3 · FreeRTOS/FreeRTOS-Kernel · GitHub

Here is what an empty list looks like:

                     List
             +-------------------+
             | Number of items 0 |
             |                   |
             +-------------------+
             |    Index          |+-------+
             |                   |        |
             +-------------------+        |
    +------->| +---------------+ |<-------+
    |   +--->| |Mini List Item | |
    |   |    | |               | |
    |   +------|   Prev        | |
    |        | |               | |
    +----------|   Next        | |
             | |               | |
             | +---------------+ |
             +-------------------+

Here is what a list with 2 items look like:






                    List
             +-------------------+
             | Number of items 2 |
             |                   |
             +-------------------+       +------------+        +----------+
             |    Index          +-------> List Item  |        |List Item |
             |                   |       |            |        |          |
             +-------------------+       |            |        |          |
             | +---------------+ |       |            |        |          |
             | |Mini List Item | |       |            |        |          |
             | |               | |       |            |        |          |
     +---------+   Prev        | |       |            |        |          |
     |       | |               |<--------+  Prev      |<-------+  Prev    |<-----+
     |   +---->|   Next        | |       |            |        |          |      |
     |   |   | |               |+------->|  Next      +------->|  Next    +---+  |
     |   |   | +---------------+ |       |            |        |          |   |  |
     |   |   +-------------------+       +------------+        +----------+   |  |
     |   |                                                                    |  |
     |   +--------------------------------------------------------------------+  |
     |                                                                           |
     +---------------------------------------------------------------------------+

So the thing to look for is why the code is trying to de-reference end marker which seems like a memory corruption.

Thanks.

pxIndex is only referenced from list.c and list.h. You can see here if it gets set to xListEnd it is immediately moved on again - so the only way it could end up pointing at xListEnd would be if a critical section were breached, resulting in a context switch before the pointer was moved on again. That could potentially happen if the interrupt configuration were incorrect and pxIndex were modified by an interrupt.

The only other time pxIndex points to xListEnd is when the list is empty.

You could try adding configASSERTS() into the code in other places pxIndex is set (maybe the “insert at list end” function to check if pxIndex points to the list end when the function exits and the list is not empty.

Thanks Richard,

yes, the case I analyzed was uxNumberOfItems > 0, but list empty.

As we both suspected, it was apparently #2 of the three cases that cause 90% of all FreeRTOS crashes - misconfigured interrupts. FWIW, here’s the deal in a nutshell: Interrupt priorities are rather error prone on the Coldfire. The CF has 7 interrupt levels as well as 7 subpriorities, and aside from the oddity that one is not allowed to assign two interrupts the same combination of main and subpriorities (at least not when there is a chance that both try to assert at the same time), there is also the special case of the 7 lowest IRQ vectors in each controller whose priority is nailed to midpoint subpriority 3,5 (in between 3 and 4) but it shows as 0 in the priority registers. Finally, of those 7 hardcoded vectors, the lowermost can not be used as a software interrupt.

We initially had used exactly this IRQ for the service handler which in effect made it run at higher than the sys tick handler. Looks like that solved the problem…

Thanks also to aggarg for the clarification.