0x06060606 returned from xQueueReceive()

pbeamtn wrote on Thursday, June 17, 2010:

I have introduced some weird bug into previously working code, and I can’t for the life of me find it. 

In an ISR, I have the following code:

                       if (xQueueSendFromISR(scan_q, &scandata, &TaskWoken) != pdTRUE) {

which basically just passes a pointer to a task.  scan_q is created with a max length of 2 and an item size of 4 - this is an ARM7.

In a task, I have the following:

            if (xQueueReceive(scan_q, &msg, 1000) == pdTRUE) {                                          // get data

                // msg now points to a struct scan_data

It used to be the case that I could then use msg as a pointer into my structure, i,e, msg->count, but now I get data aborts because msg has the value 0x06060606 instead of the value put in the queue (i.e. 0x208590)

I have verified the value from the ISR, and queried the memory indicated by the Queues window in IAR, and all that seems fine.   The problem is on the receive end.

If I alter the xQueueReceive statement and give it a timeout of 0, and then vTaskDelay(1000) when it fails, then the proper data is transferred, but I take a performance hit.

For the record, this is FreeRTOS 5.4.2 use EWARM on an Atmel AT91SAM7X512.  I use this exact same logic in other places, and it appears to work, so I have apparently overflowed a stack somewhere or something else weird.  Does 0x06060606 ring a bell? 

davedoors wrote on Friday, June 18, 2010:

0x06060606 is the value that r6 is initialized with when a task is created. Have you changed the C start code (crt0) recently? It could be that the ARM7 is in the wrong mode, although that would normally cause a crash when you attempt to start the scheduler.

Have you checked that you are not getting a stack overflow? That is also a likely cause.


pbeamtn wrote on Friday, June 18, 2010:

No, I have not changed any startup files.  I had already used stack overflow checking, but I set it to 2 to see if it would catch anything, but it doesn’t.  I had seen in port.c that R6 is set to 0x06060606, and I thought that was a clue, but I really don’t understand it.  It would lead me to believe I am overflowing something somewhere that FreeRTOS cares about since the odds of me creating that value are almost nil.

The really annoying thing is that this error only occurs if xQueueReceive has to wait on the queue.  I’m going to work on finding a way to set the break point in this routine so I can see why waiting on the queue makes a difference.  It’s tough with jlink and limited breakpoints.

pbeamtn wrote on Friday, June 18, 2010:

Well, I think I am stuck on this, and I’m just going to have to poll the queue unless I get some more information.  The stack for the task looks like it has plenty of space.  None of the task stack checks seems to find a problem, however, R6 gets corrupted in xQueueGenericReceive().  When the function is first called, R6 holds the destination, and it looks reasonable.  When the task wakes back up after something is put in the queue, R6 has something like 0xeff42, which is not a valid RAM address and not what R6 used to be.  Then, by the time I get back to my task, R6 is 0x06060606.  Before xQueueReceive() is called, R6 is 0x06060606, so I don’t know if this is a context save problem or something else.

edwards3 wrote on Friday, June 18, 2010:

Personally I would not say just polling the queue was a long term solution. You have found that a problem exists, and need to find its source otherwise it will byte you again later.

My guess is that it is another task that is corrupting the context of the task that calls xQueueReceive(). Can you put a data break point on the RAM location that is being corrupted to stop the debugger when the RAM gets written to?

Have you allocated a stack to IRQ mode in your C start code?

pbeamtn wrote on Friday, June 18, 2010:

You’re right.  I need to find the problem now or risk something terrible in the future.  Just getting tired of beating my head against the wall.

I haven’t done anything to the IAR startup files.  These were basically copied from the initial FreeRTOS demo.  My understanding is the IRQ stack is set by the linker options, which is 0x100.  I have upped that to 0x120, and it made no difference.

I’ve thought about a data break point, but I think I would have to put it on the saved context of R6.  I guess I am going to have to trace that and see where it gets put.

pbeamtn wrote on Friday, June 18, 2010:

Well, I’ve tracked it down, I think, but still not quite sure what to do about it.

I have implemented in an ISR the following code:
                    case 101:
                        xSemaphoreGiveFromISR(spi_dma_complete, &TaskWoken);

spi_dma_complete is a binary semaphore that I use to flag to a task that a DMA operation on the SPI is complete.  This is not associated with the queue I have been having trouble with, but eliminating this one line eliminates my R6 corruption problem.  Really weird.  I thought DMA might be dangerous and could corrupt things, but leaving the DMA logic in tact and just not signaling that it is complete is perfectly fine.

I guess it is possible I could be overflowing the ISR stack and it effects the first task, which is the task being corrupted.  That task just happens to spend most of its time waiting for a queue.  That is, at least, the current theory, but I have already the doubled the ISR stack in the IAR linker.

richard_damon wrote on Friday, June 18, 2010:

One possible idea that this points to, could that ISR have a priority set such that it nests with another ISR that uses FreeRTOS APIs. That can cause corruption of FreeRTOS structures.

pbeamtn wrote on Monday, June 21, 2010:

I believe I have nested interrupts.  One ISR actually enables the spi interrupt.  I somehow missed the memo that this corrupts FreeRTOS.  So, I have put OS calls inside a critical section, and things appear to work now.  Need to do more testing on the solution, but I am pretty sure that is the problem.

Thanks for your help!

pbeamtn wrote on Monday, June 21, 2010:

I do have nested interrupts.  I copied some logic from a dsPIC project without FreeRTOS that used different priority interrupts.  In this case, it is bad.  I have an edge triggered interrupt that enables a spi interrupt.  The spi interrupt is a higher priority, so it would fire in the middle of my ISR and potentially issue OS calls.  I may still rework my logic to do things in a less complex manner, but for now, changing the priority of the spi to be the same as the edge has stabilized things.