I’m using freertos 9.0.0 on a cortex M4 MPU (NXP LPC54605J512). It uses the Heap1 (no malloc) implementation.
I have an issue that’s causing a hard fault, and have traced the issue down to the xQueueGenericReceive function in queue.c module. I’ve found that in this case, when this function calls prvCopyDataFromQueue (#1270), the address contained in pxQueue isn’t correct, but the address in xQueue IS correct (points to a valid queue allocated on the heap). I can’t see how this condition happens (expect xQueue and pxQueue to always point to the same place). The attempt to copy out of the queue causes the hard fault (accessing a NULL pointer in the ‘pcTail’ member).
Are you saying you are passing a value into a function using a parameter and the parameter is wrong inside the function? Or, maybe that the macro that converts the opaque queue handle into a pointer is failing? In both cases, it could be a stack overflow causing a corruption. As that would appear to be just C code executing it is hard to see what else it could be. Are you using a memory protection unit? Have you got configASSERT() and stack overflow detection in use?
Hi,
The pointer passed into the function (xQueue) is correct, at address 0x20016410, but the pointer pxQueue points to 0x20016a74, which isn’t a valid queue address. I’m expecting the pointers to have the same value throughout the call to xQueueGenericReceive.
I don’t have the MPU protecting any run-time memory areas, configASSERT() resolves to a for(; loop, and configCHECK_FOR_STACK_OVERFLOW is set to 2. I don’t see any asserts triggered.
Can you suggest a way to detect if stack overflow is the cause? I can detect the issue before the hard fault and examine the TCB for this task. Note that the issue always happens in the context of the same task.
If I understand correctly, xQueue passed in is 0x20016410, then the line
Queue_t * const pxQueue = ( Queue_t * ) xQueue;
converts 0x20016410 to 0x20016a74. Those are very different numbers.
So effectively you have:
Queue_t * const 0x20016a74 = 0x20016410;
The only thing I can suggest is stepping through the code at assembly
level to see what is happening. 0x20016410 should be in r0 when you
enter the function. Is it? Or does r0 already contain 0x20016a74?
Could it be that a context switch to another task (and therefore stack
frame) is occurring without you realising?
The problem is trapping the condition before the corruption has occured: I don’t know the sequence of events leading up to the issue. The caller successfully uses the queue thousands of times before the corrupted call, so stepping through the corruption isn’t looking possible. I don’t think the copy from xQueue to pxQueue is at fault here: a context switch or some other register touch is happening I believe.
If stack overflow is the issue, is there any way I can confirm it? Once the pointer is corrupt, can I look at the TCB to see if there’s a bad pointer there (or anything else that might indicate it)? I’ll also try doubling the stack size for all tasks temporarily to see if it disappears (but the task stack size is already 1024 bytes).
You can always inspect the TCB in the debugger by inspecting the
pxCurrentTCB variable. You might need to cast it to a tskTCB structure
to get the debugger to show the structure members - for example -
“(tskTCB*)pxCurrentTCB” in the variables/expressions window to see the
variable.
If you have configCHECK_FOR_STACK_OVERFLOW set to 2 then that should
tell you if there is a stack overflow. You can also view the stack
manually in the debugger by viewing the memory - the stack address can
be obtained by viewing the pxCurrentTCB variable as described above.
I still think stepping through the asm code will tell you a lot about
how the variable changed.