PIC32 - crash during context switch

system · November 12, 2010, 4:47pm

piero74 wrote on Friday, November 12, 2010:

Hi all.

we found a very strange situation.

We had PIC32 in crash, and we verified the following:

- Stack pointer was on flash address (it pointed to xISRStack address)
- using tracer, it seemed that micro stops inside PORT_SAVE_CONTEXT code, after this instruction:
/* Swap to the system stack. */
la sp, xISRStackTop
- changing a little the application code (even adding a NOP), we remove crash, probably because timing changed, and critical situation didn’t occur

we are using:
- FreeRTOS V6.0.2
- configMAX_SYSCALL_INTERRUPT_PRIORITY 0x06 and we have different ISR routines that use RTOS api and work using lower priority

It is only a theory (we should be in wrong): our doubts are related to macro portSAVE_CONTEXT, expecially in these rows:

/* Swap to the system stack. */
la sp, xISRStackTop
lw sp, (sp)

What’s happen if an interrupt occurs in the middle of these instructions? SP will be corrupted as we found?
Could be useful to wrap these instructions with critical sections, blocking INTERRUPTS?

We will wait some feedbacks

Thanks to all in advance
Piero

system · November 12, 2010, 4:54pm

edwards3 wrote on Friday, November 12, 2010:

Are you sure that interrupts are enabled when these lines are executed. The MIPS core will disable all interrupts when an interrupt is taken, and then the scheduler selectively enables and disables up to a certain priority value during the context switch code. I would be surprised if the lines you have quoted are executed while interrupts can clobber the register values.

system · November 12, 2010, 5:06pm

piero74 wrote on Friday, November 12, 2010:

Thanks for your reply.

We think the same, but reading the code, it seems that scheduler enables irq BEFORE these lines:

/* Enable interrupts above the current priority. */
srl k0, k0, 0xa
ins k1, k0, 10, 6
ins k1, zero, 1, 4

/* s5 is used as the frame pointer. */
add s5, zero, sp

/* Check the nesting count value. */
la k0, uxInterruptNesting
lw s6, (k0)

/* If the nesting count is 0 then swap to the the system stack, otherwise
the system stack is already being used. */
bne s6, zero, .+20
nop

/* Swap to the system stack. */
la sp, xISRStackTop
lw sp, (sp)

At this point,
Richard, can you help us? any idea regarding the crash?

Bye
Piero

rtel · November 12, 2010, 5:28pm

rtel wrote on Friday, November 12, 2010:

I’ve just had a look at the code, and I think edwards3 could be right here.

In the code you are referring to, my quick inspection (which I’m willing to be proven wrong on) is that:

0) Interrupt are globally disabled on interrupt entry.
1) K1 is loaded with the priority of the executing interrupt.
2) The interrupt nesting count is inspected and if it is 0 a swap is made to the system stack - this is where the code you highlighted is executed.
3) The interrupt nesting count is incremented and saved.
4) Only then is the interrupt mask set to that of the currently executing interrupt - effectively enabling interrupts above the priority of the currently executing interrupt. In my 6.1.0 version this is done on line 106 by the instruction “mtc0 k1, _CP0_STATUS” (remember the interrupt priority was saved into k1).

I hope this clears this up. Please tell me if you still think there is an issue.

Regards.

rtel · November 12, 2010, 5:29pm

rtel wrote on Friday, November 12, 2010:

Just saw your next post - interrupt are enabled after the snippet you have posted (see my first post). The code you show only loads K1 with the value that the interrupt mask is later set to. The comment is perhaps a bit misleading.

Regards.

system · November 12, 2010, 9:41pm

piero74 wrote on Friday, November 12, 2010:

Hi Richard
Thanks for your feedback… now the code is more clear.

But we are at the beginning again:
we don’t know the reason of crash.

Current code hasn’t this bug (or, it could be hidden in same operations sequence - we saw it in previous cose doing always the same operations an our board). We saw mcu locked, with PC and to a strange address (it should be address of general exception), cause register didn’t help us, EPC provide an address of apiece of code that has run a lot of time without trouble, and all around (vars, pointers) seemed ok. The only rilevant thing is ths SP that pointed to flash (task and ISR stacks has free space, no overflows).

I really don’t know what i ahve to search. And i’m worried that bug can happen again.
Any idea to debug it if we will see it again?

Thanks
Piero

system · April 14, 2014, 8:03am

jussippi wrote on Monday, April 14, 2014:

Hi,

Have you seen this problem again?

We run into similar problem with FreeRTOS 7.6.0 and PIC32MX795F512L.

First indication was that the code stopped in general exception, reason was “Bus Error Exception (instruction fetch)”. EPC register was “A5A5A5A5” which is the value the stack initialized to. Task SP was updated to invalid value and therefore program counter got value where no memory existed.

Further studying led us to situation that we found a task stack pointer pointing to ISR stack area.

This problem vanishes by altering the code / timing. Which is very scary, unless we understand the reason.

There seems to be something seriously wrong in FreeRTOS scheduler / interrupt handling…

BR,

–Jussi

rtel · April 14, 2014, 8:48am

rtel wrote on Monday, April 14, 2014:

Most likely cause is misuse of the API with respect to interrupts.

Are you following the instructions in the “interrupt service routines” section of the following page: http://www.freertos.org/port_PIC32_MIPS_MK4.html - in particular, are you 100% certain that you are not using an interrupt safe FreeRTOS API function from an interrupt that has a priority above whatever you have set configMAX_SYSCALL_INTERRUPT_PRIORITY to?

Are you correctly wrapping any interrupts with the provided asm code?

Have you read the following page of the FAQ and ensured you are following the information contained there (like not calling functions that don’t end in “from isr” from an interrupt.
http://www.freertos.org/FAQHelp.html

etc.

Regards.

system · April 15, 2014, 7:08am

jussippi wrote on Tuesday, April 15, 2014:

Hi,

Thanks for the hints - although we had gone through all of that.

Anyway, you were right. We had a callback from the old SW interrupt, which was using generic function to read data. That function now uses a mutex (non ISR) for data protection… Which caused all the bad things.

BR,

–Jussi

Topic		Replies	Views
Hair-tearing time with a PI32 Stack Overflow. Kernel	4	229	June 21, 2011
Context Switching In An ISR On Pic32MX Processor Kernel	1	207	February 20, 2014
vTaskSwitchContext not working on PIC32 Kernel	2	246	September 17, 2008
PIC32 project general exception Kernel	20	599	March 21, 2012
PIC32 assembly wrappers problem with portEND_SWITCHING_ISR() Kernel	16	596	January 30, 2015

PIC32 - crash during context switch

Related topics