Data abort in xQueueGenericReceive

Facing data abort issue in xQueueGenericReceive() function.
Here is the call stack
000|xQueueGenericReceive(
** |** xQueue = 0x0,
| pvBuffer = 0x0,
| ?,
| xJustPeeking = 0)
| xTimeOut = (xOverflowCount = 1, xTimeOnEntering = 536870943)
| pcOriginalReadPosition = 0x80E0F460
| pxQueue = 0x0
| uxMessagesWaiting = 0
-001|xQueueTakeMutexRecursive()
| xMutex = 0x901C1E8C → ,
| xTicksToWait = 4294967295)
| xReturn = -1877205364
| pxMutex = 0x901C1E8C → (
| pcHead = 0x0,
| pcTail = 0x0,
| pcWriteTo = 0x901C1E8C,
| uxMessagesWaiting = 1,
| uxLength = 1,
| uxItemSize = 0,
| cRxLock = -1,
| cTxLock = -1,
| ucStaticallyAllocated = 0)
else
| {
| xReturn = xQueueGenericReceive( pxMutex, NULL, xTicksToWait, pdFALSE );

While acquiring mutex it is failing and entering into xQueueGenericReceive()
Parameters passed to xQueueGenericReceive() from xQueueTakeMutexRecursive() looks fine.

pxMutex is passed from xQueueTakeMutexRecursive() to pxQueue in xQueueGenericReceive()

Assembly code:

         1260|        taskENTER_CRITICAL();
    ST:0000:9012E546|F6DBFE07            bl      0x9000A158       ; vPortEnterCritical
                    |        {
                1262|            const UBaseType_t uxMessagesWaiting = pxQueue->uxMessagesWaiting;
    ST:0000:9012E54A|6BA5                ldr     r5,[r4,#0x38]
                    |
                    |            /* Is there data in the queue now?  To be running the calling task
                    |            must be the highest priority task wanting to access the queue. */
                1266|            if( uxMessagesWaiting > ( UBaseType_t ) 0 )
    ST:0000:9012E54C|B31D                cbz     r5,0x9012E596    ; uxMessagesWaiting,0x9012E596
                    |            {
                    |                /* Remember the read position in case the queue is only being
                    |                peeked. */
                1270|                pcOriginalReadPosition = pxQueue->u.pcReadFrom;
    ST:0000:9012E54E|4641                mov     r1,r8            ; r1,pvBuffer
    ST:0000:9012E550|4620                mov     r0,r4            ; r0,xQueue
    ST:0000:9012E552|68E6                ldr     r6,[r4,#0x0C]    ; xEntryTimeSet,[r4,#12]
                1272|                prvCopyDataFromQueue( pxQueue, pvBuffer );
    ST:0000:9012E554|F6DAFBBC            bl      0x90008CD0       ; prvCopyDataFromQueue
                    |
                1274|                if( xJustPeeking == pdFALSE )
    ST:0000:9012E558|F1B90F00            cmp     r9,#0x0          ; xJustPeeking,#0
    ST:0000:9012E55C|D114                bne     0x9012E588
                    |                {
                1276|                    traceQUEUE_RECEIVE( pxQueue );
                    |
                    |                    /* Actually removing data, not just peeking. */
                1279|                    pxQueue->uxMessagesWaiting = uxMessagesWaiting - 1;
    ST:0000:9012E55E|6823                ldr     r3,[r4]
    ST:0000:9012E560|3D01                subs    r5,r5,#0x1       ; uxMessagesWaiting,uxMessagesWaiting,#1
  ST:0000:9012E562|63A5________________str_____r5,[r4,#0x38]//Data abort at this point

Register Info
N N R0 901C1E8C R8 0 ^S+ ^Stack_+
Z _ R1 0 R9 0
C _ R2 0 R10 0
V _ R3 0 R11 A5A5A5A5
Q _ R4 0 R12 BE82209D
R5 FFFFFFFF R13 901C3EE8
0 _ R6 80E0F460 R14 9000A161
1 _ R7 0 PC 9012E562

Any help would be appreciated ? Please share some thoughts

is clearly a bug. The queue handle must not be NULL. Ensure that the queue is created successfully before using it.
Also better define configASSERT which would catch this kind of errors.

#ifndef configASSERT
#define configASSERT( x )
#define configASSERT_DEFINED 0

Does this mean configASSERT is enabled ?

That doesn’t help a lot :slight_smile: It’s kind of disabling it, right.
See the doc of configASSERT for the details.

As a temporary fix we have added null check for Queue in xQueueGenericReceive().
I want to know why Queue is becoming null

what is your platform? On an ARM Cortex, you can set a DWT data breakpoint to catch when the variable is being written with a 0. Extremly useful.

It is on ARM Cortex R7 Problem is that issue is not seen very frequently
Do you want me to add debug watch point ? But the Pointer to Queue is constant
(Queue_t * const pxQueue)
Will it be useful ?

well, if you did correctly set up your queue to something different than 0 but you find a 0 in it in the error case, then certainly you want to figure out who overwrites it, so a data watch point will be useful.

I do not know if the R family supports data watch points, best to consult with your IDE documentation how to set one up.

Well, then it’s likely a data/memory corruption.
Did you enable stack checking (as also highly recommended during developement) or verified that the stacks of your tasks are sufficient ?

We have checked from the moredump log that stack has unused space for the task

Ok. Invalid interrupt priorites might be another reason for data corruption given that it’s not an application code bug, of course.
See Using API functions within interrupts in FreeRTOS FAQ and maybe also Understanding priority levels of ISR and FreeRTOS APIs - #16 by aggarg or search the forum when in doubt.

Free rtos code which we use is a stable one. Interrupts are also disabled when the data abort happened

That does not mean it is not interrupt related. Crashes frequently happen long after the root cause.

A data corruption might have happened earlier without immediate effect.
Do you really disable all interrupts including systick in your application ?
Do you use FreeRTOS API calls in your ISRs at all ?
BTW which MCU/FreeRTOS port do you use and which FreeRTOS version ?

1 Like

We are using cpsid instruction to disable interrupts Not sure this will disable systick interrupts.
Free rtos version: FreeRTOS V9.0.0
No Free rtos API calls made in ISR.
Do not know much about the ports

Yes, it does disable the SysTick interrupt.

Any reason you cannot update to the latest?

Are you able to narrow down what is causing the queue handle to become NULL?

As our codebase is huge, we have been using the FreeRTOS V9.0.0 to prevent any complication.
We are not able to identify the root cause yet. We are checking on the interrupt part

Assuming that you are not changing the queue handle after it has been set to the correct value, can you use data breakpoint to find out what is causing corruption?

We have been trying to reproduce the issue We have also added data watchpoints to check where the queue handler is getting corrupted. But could not reproduce it till now

Okay. Let us know whatever you find.