xQueueGenericReceive Queue Address corruption

system · August 15, 2017, 2:19pm

redcoatonline wrote on Tuesday, August 15, 2017:

Hi,

I’m using freertos 9.0.0 on a cortex M4 MPU (NXP LPC54605J512). It uses the Heap1 (no malloc) implementation.

I have an issue that’s causing a hard fault, and have traced the issue down to the xQueueGenericReceive function in queue.c module. I’ve found that in this case, when this function calls prvCopyDataFromQueue (#1270), the address contained in pxQueue isn’t correct, but the address in xQueue IS correct (points to a valid queue allocated on the heap). I can’t see how this condition happens (expect xQueue and pxQueue to always point to the same place). The attempt to copy out of the queue causes the hard fault (accessing a NULL pointer in the ‘pcTail’ member).

The contents of xQueue and pxQueue from xQueueGenericReceive can be seen here: https://www.dropbox.com/s/ukd8dpaueuloc3e/ContextInxQueueGenericReceive.png?dl=0

Thanks for your time,

Henry

rtel · August 16, 2017, 5:48pm

rtel wrote on Wednesday, August 16, 2017:

Are you saying you are passing a value into a function using a parameter and the parameter is wrong inside the function? Or, maybe that the macro that converts the opaque queue handle into a pointer is failing? In both cases, it could be a stack overflow causing a corruption. As that would appear to be just C code executing it is hard to see what else it could be. Are you using a memory protection unit? Have you got configASSERT() and stack overflow detection in use?

system · August 18, 2017, 9:09am

redcoatonline wrote on Friday, August 18, 2017:

Hi,
The pointer passed into the function (xQueue) is correct, at address 0x20016410, but the pointer pxQueue points to 0x20016a74, which isn’t a valid queue address. I’m expecting the pointers to have the same value throughout the call to xQueueGenericReceive.

I don’t have the MPU protecting any run-time memory areas, configASSERT() resolves to a for(; loop, and configCHECK_FOR_STACK_OVERFLOW is set to 2. I don’t see any asserts triggered.

Can you suggest a way to detect if stack overflow is the cause? I can detect the issue before the hard fault and examine the TCB for this task. Note that the issue always happens in the context of the same task.

Many thanks,

Henry

rtel · August 18, 2017, 2:23pm

rtel wrote on Friday, August 18, 2017:

This is an interesting one. Entry into xQueueGenericReceive() uses the
following code:

BaseType_t xQueueGenericReset( QueueHandle_t xQueue, BaseType_t xNewQueue )
{
Queue_t * const pxQueue = ( Queue_t * ) xQueue;

If I understand correctly, xQueue passed in is 0x20016410, then the line

Queue_t * const pxQueue = ( Queue_t * ) xQueue;

converts 0x20016410 to 0x20016a74. Those are very different numbers.
So effectively you have:

Queue_t * const 0x20016a74 = 0x20016410;

The only thing I can suggest is stepping through the code at assembly
level to see what is happening. 0x20016410 should be in r0 when you
enter the function. Is it? Or does r0 already contain 0x20016a74?

Could it be that a context switch to another task (and therefore stack
frame) is occurring without you realising?

system · August 21, 2017, 8:46am

redcoatonline wrote on Monday, August 21, 2017:

The problem is trapping the condition before the corruption has occured: I don’t know the sequence of events leading up to the issue. The caller successfully uses the queue thousands of times before the corrupted call, so stepping through the corruption isn’t looking possible. I don’t think the copy from xQueue to pxQueue is at fault here: a context switch or some other register touch is happening I believe.

If stack overflow is the issue, is there any way I can confirm it? Once the pointer is corrupt, can I look at the TCB to see if there’s a bad pointer there (or anything else that might indicate it)? I’ll also try doubling the stack size for all tasks temporarily to see if it disappears (but the task stack size is already 1024 bytes).

Thanks

rtel · August 21, 2017, 3:25pm

rtel wrote on Monday, August 21, 2017:

You can always inspect the TCB in the debugger by inspecting the
pxCurrentTCB variable. You might need to cast it to a tskTCB structure
to get the debugger to show the structure members - for example -
“(tskTCB*)pxCurrentTCB” in the variables/expressions window to see the
variable.

If you have configCHECK_FOR_STACK_OVERFLOW set to 2 then that should
tell you if there is a stack overflow. You can also view the stack
manually in the debugger by viewing the memory - the stack address can
be obtained by viewing the pxCurrentTCB variable as described above.

I still think stepping through the asm code will tell you a lot about
how the variable changed.

Topic		Replies	Views
Hardfault in xQueueGenericReceive() at certain optimisation levels on STM32 ARM Cortex M4, FreeRTOS v9.0.0 Kernel	9	1044	March 22, 2018
Data abort in xQueueGenericReceive Libraries debug	23	520	February 5, 2024
FreeRTOS 7.4 :: xQueueGenericReceive Kernel	2	163	November 8, 2013
Really strange thing when using xQueueReceive() in FreeRTOS 7.30 Kernel	5	384	September 21, 2017
xQueueReceive failing with corruption Kernel	24	3301	November 3, 2020

xQueueGenericReceive Queue Address corruption

Related topics