CortexM3/PSoC-5LP : Trouble identifying the task which caused an MPU exception

mbh-ashcroft wrote on Thursday, January 09, 2014:

Hello,

I’m implementing a “blue screen of death” for a PSoC-5LP (ARM Cortex M3 core) based product, and I’ve encountered a problem identifying which task was the offender that caused the stack overflow or other memory protection exception that I’m reporting in my BSoD. I’m using FreeRTOS v7.2.0 .

Right now, I’m catching stack overflows and other memory errors based on a two-tiered approach: I have a vApplicationStackOverflowHook() function implemented (with configCHECK_FOR_STACK_OVERFLOW set to 2 in FreeRTOSConfig.h) so that the RTOS can do its stack size and stack canary checks when each task is swapped out, and I have an interrupt handler set up for the MPU exception interrupt (position 3 in the interrupt vector) to catch MPU exceptions (“hard faults”).

The MPU exception interrupt handler triggers reliably, but I can’t get it to identify the offending task properly. I’m using xTaskGetCurrentTaskHandle() and pcGetTaskName() to spit the task name to the display, and manually triggering faults for testing purposes by reducing the task stack of a selected task to something tiny (eg. 10 words for a task whose stack usage is around 320 words) which is sure to cause a fault. The problem is the task name I get never matches the task that I intentionally made to overflow its stack, and sometimes is gibberish (which I assume means either that xTaskGetCurrentTaskHandle returned the handle for the idle task, which doesn’t have a valid task name string pointer in its tcb, or that the tcb for the task which overflowed its stack has been corrupted). Nb. the vApplicationStackOverflowHook function never seems to be called at all, even when I purposefully trigger a much smaller stack overflow (eg 4 bytes) which “shouldn’t” make the MPU angry - it’s always the interrupt handler triggered by an MPU exception that gets called.

I’ve got a few speculative ideas as to what might be going on (along the lines of “the mis-sized stack for TaskA is really causing TaskB to access memory in a region it can’t”), but I haven’t dug into the guts of the RTOS to the point where I’m confident my mental model of what’s going on is right in this case, and the debugger that Cypress ships with the IDE is nearly worthless. Does anyone have an idea as to what else might be causing my problem? Thanks.

dumarjo wrote on Thursday, January 09, 2014:

Hi,

I working on M4 MPU (the same apply to the M3) and I wrote some fonction
to help me. I use GDB so the debugging is probably better than cypress I
don’t know.

My function was inspired from others and my reading on the hardfault to
get an I dea of what is going on when we hit the fault.

//// code here /////

void printErrorMsg(const char * errMsg);
void printUsageErrorMsg(uint32_t CFSRValue);
void printBusFaultErrorMsg(uint32_t CFSRValue);
void printMemoryManagementErrorMsg(uint32_t CFSRValue);
void stackDump(uint32_t stack);

void HardFault_mpu_dump(void)
{
static char msg[200];
int i = 0;
uint32_t *mpu_rnr = (uint32_t *) 0xE000ED98;
uint32_t *mpu_base = (uint32_t *) 0xE000ED9C;
uint32_t *mpu_attr = (uint32_t *) 0xE000EDA0;

 //First disable the UART interrupt


 printErrorMsg("-------MPU------\r\n");
 printErrorMsg("Privileged default memory map is ");
 if(MPU->CTRL & 0x01 == 0x01)
     printErrorMsg("Enabled\r\n");
 else
     printErrorMsg("Disabled\r\n");

 for (i = 0; i < 8; i++)
 {
     *mpu_rnr = i;
     if (*mpu_attr & 0x1) //If region is activated
     {
         sprintk(msg,"b:0x%08p - 0x%08p, sz:2**%d (%d), attr",    

*mpu_base & 0xFFFFFFE0,
(*mpu_base & 0xFFFFFFE0) + (1 << ((*mpu_attr & 0x3E) >> 1) + 1),
((*mpu_attr & 0x3E) >> 1) + 1, 1 << ((*mpu_attr & 0x3E) >> 1) + 1);
printErrorMsg(msg);
if(((*mpu_attr) & 0x07000000)== portMPU_REGION_READ_WRITE)
{
printErrorMsg(“: P-RW, U-RW”);
}
else if(((*mpu_attr) & 0x07000000)==
portMPU_REGION_PRIVILEGED_READ_WRITE_USER_READ_ONLY)
{
printErrorMsg(“: P-RW, U-RO”);
}
else if(((*mpu_attr) & 0x07000000) ==
portMPU_REGION_PRIVILEGED_READ_ONLY)
{
printErrorMsg(“: P+RO”);
}
else if(((*mpu_attr) & 0x07000000) == portMPU_REGION_READ_ONLY)
{
printErrorMsg(“: P-RO, U-RO”);
}
else if(((*mpu_attr) & 0x07000000) ==
portMPU_REGION_PRIVILEGED_READ_WRITE)
{
printErrorMsg(“: P-RW”);
}
printErrorMsg(“\r\n”);
}
}
}

void Hard_Fault_Handler(uint32_t stack)
{
static char msg[80];
printErrorMsg(“\x1B[2J”);

 printErrorMsg("In Hard Fault Handler\r\n");
 sprintk(msg, "SCB->HFSR = 0x%08x\r\n", SCB->HFSR);
 printErrorMsg(msg);
 if ((SCB->HFSR & (1 << 30)) != 0)
 {
  printErrorMsg("Forced Hard Fault\r\n");
 }
  sprintf(msg, "SCB->CFSR = 0x%08x\r\n", SCB->CFSR );
  printErrorMsg(msg);
  if((SCB->CFSR & 0xFFFF0000) != 0) {
     printUsageErrorMsg(SCB->CFSR);
  }
  if((SCB->CFSR & 0xFF00) != 0) {
     printBusFaultErrorMsg(SCB->CFSR);
  }
  if((SCB->CFSR & 0xFF) != 0) {
     printMemoryManagementErrorMsg(SCB->CFSR);
  }

  stackDump(stack);
  HardFault_mpu_dump();
  __asm("BKPT #0\r\n") ; // Break into the debugger

  while(1);

}

void printErrorMsg(const char * errMsg)
{
while(*errMsg != ‘\0’)
{
while (!(COM2_PERIPHERAL->SR & USART_SR_TXE));
COM2_PERIPHERAL->DR = (*errMsg & 0x1FF);
++errMsg;
}
}

void printUsageErrorMsg(uint32_t CFSRValue)
{
printErrorMsg("Usage fault: ");
CFSRValue >>= 16; // right shift to lsb

 if((CFSRValue & (1<<9)) != 0)
 {
   printErrorMsg("Divide by zero\r\n");
 }

}

void printBusFaultErrorMsg(uint32_t CFSRValue)
{
static char buf[200];
printErrorMsg(“Bus fault: \r\n”);
if((CFSRValue & (1 << 0)) == (1 << 0)) //IACCVIOL
printErrorMsg(“–>Instruction access violation\r\n”);
if((CFSRValue & (1 << 1)) == (1 << 1)) //DACCVIOL
printErrorMsg(“–>Data access violation\r\n”);
if((CFSRValue & (1 << 8)) == (1 << 8)) //IBUSERR
printErrorMsg(“–>Instruction bus error\r\n”);
if((CFSRValue & (1 << 9)) == (1 << 9)) //PRECISERR
printErrorMsg(“–>Precise data bus error\r\n”);
if((CFSRValue & (1 << 10)) == (1 << 10)) //PRECISERR
printErrorMsg(“–>Imprecise data bus error\r\n”);
if((CFSRValue & (1 << 11)) == (1 << 11)) //UNSTKERR
printErrorMsg(“–>Bus fault on unstacking for a return from
exception\r\n”);
if((CFSRValue & (1 << 12)) == (1 << 12)) //STKERR
printErrorMsg(“–>Bus fault on stacking for exception entry\r\n”);
if((CFSRValue & (1 << 13)) == (1 << 13)) //LSPERR
printErrorMsg(“–>Bus fault on floating-point lazy state
preservation\r\n”);
if((CFSRValue & (1 << 15)) == (1 << 15)) //BFARVALID
{
printErrorMsg(“–>Bus fault adress register valid\r\n”);
sprintk(buf, “----> 0x%08X <------ Fault address\r\n”, SCB->BFAR);
printErrorMsg(buf);
}
printErrorMsg( “Bus faults occur when an error response is
received on the AHB bus. The common causes are as follows:\n\r”
“Attempts to access an invalid memory region (for
example, a memory location with no memory attached)\n\r”
“The device is not ready to accept a transfer (for
example, trying to access SDRAM without initializing the\n\r”
“SDRAM controller)\n\r”
“Attempts to carry out a transfer with a transfer
size not supported by the target device (for example, doing a\n\r”
“byte access to a peripheral register that must be
accessed as a word)\n\r”
“The device does not accept the transfer for
various reasons (for example, a peripheral that can only be\n\r”
“programmed at the privileged access level)\n\r”);
}

void printMemoryManagementErrorMsg(uint32_t CFSRValue)
{
static char buf[200];
printErrorMsg(“Memory Management fault: \r\n”);
CFSRValue &= 0x000000FF; // mask just mem faults
if((CFSRValue & 0x01) == 0x01) //IACCVIOL
printErrorMsg(“–>Instruction access violation\r\n”);
if((CFSRValue & 0x02) == 0x02) //DACCVIOL
printErrorMsg(“–>Data access violation\r\n”);
if((CFSRValue & 0x08) == 0x08) //MUNSTKERR
printErrorMsg(“–>Memory manager fault on unstacking for a
return from exception\r\n”);
if((CFSRValue & 0x10) == 0x10) //MSTKERR
printErrorMsg(“\tMemory manager fault on stacking for exception
entry\r\n”);
if((CFSRValue & 0x20) == 0x20) //MLSPERR
printErrorMsg(“–>Memory manager fault on floating point lazy
state preservation\r\n”);
if((CFSRValue & 0x80) == 0x80) //MMARVALID
{
printErrorMsg(“–>Memory manager fault adress register valid\r\n”);
sprintk(buf, “----> 0x%08X <------ Fault address\r\n”, SCB->MMFAR);
printErrorMsg(buf);
}
}

enum { r0, r1, r2, r3, r12, lr, pc, psr};

void stackDump(uint32_t stack)
{
static char msg[200];
sprintk(msg, “r0 = 0x%08x\r\n”, stack[r0]);
printErrorMsg(msg);
sprintk(msg, “r1 = 0x%08x\r\n”, stack[r1]);
printErrorMsg(msg);
sprintk(msg, “r2 = 0x%08x\r\n”, stack[r2]);
printErrorMsg(msg);
sprintk(msg, “r3 = 0x%08x\r\n”, stack[r3]);
printErrorMsg(msg);
sprintk(msg, “r12 = 0x%08x\r\n”, stack[r12]);
printErrorMsg(msg);
sprintk(msg, “lr = 0x%08x ← In gdb "list *0x%08x" to get the
source code \r\n”, stack[lr], stack[lr]);
printErrorMsg(msg);
sprintk(msg, “pc = 0x%08x ← In gdb "list *0x%08x" to get the
source code \r\n”, stack[pc], stack[pc]);
printErrorMsg(msg);
sprintk(msg, “psr = 0x%08x\r\n”, stack[psr]);
printErrorMsg(msg);
}

// Use the ‘naked’ attribute so that C stacking is not used.
attribute((naked))
void HardFault_Handler(void){
/*
* Get the appropriate stack pointer, depending on our mode,
* and use it as the parameter to the C handler. This function
* will never return
*/
__asm( “TST lr, #4 \r\n”
“ITE EQ \r\n”
“MRSEQ r0, MSP \r\n”
“MRSNE r0, PSP \r\n”
“B Hard_Fault_Handler \r\n”);
}

attribute((naked))
void MemManage_Handler(void){
/*
* Get the appropriate stack pointer, depending on our mode,
* and use it as the parameter to the C handler. This function
* will never return
*/
__asm( “TST lr, #4 \r\n”
“ITE EQ \r\n”
“MRSEQ r0, MSP \r\n”
“MRSNE r0, PSP \r\n”
“B Hard_Fault_Handler \r\n”);

}

void BusFault_Handler(void)
{
/*
* Get the appropriate stack pointer, depending on our mode,
* and use it as the parameter to the C handler. This function
* will never return
*/
__asm( “TST lr, #4 \r\n”
“ITE EQ \r\n”
“MRSEQ r0, MSP \r\n”
“MRSNE r0, PSP \r\n”
“B Hard_Fault_Handler \r\n”);

}

void BusFault_Handler(void)
{
/*
* Get the appropriate stack pointer, depending on our mode,
* and use it as the parameter to the C handler. This function
* will never return
*/
__asm( “TST lr, #4 \r\n”
“ITE EQ \r\n”
“MRSEQ r0, MSP \r\n”
“MRSNE r0, PSP \r\n”
“B Hard_Fault_Handler \r\n”);
}

//// end of code //////

This should help you a little bit. You will need to adapt it to your env
though.

Hope this will help !

Jonathan
Le 2014-01-09 10:23, Mike Heise a écrit :

Hello,

I’m implementing a “blue screen of death” for a PSoC-5LP (ARM Cortex
M3 core) based product, and I’ve encountered a problem identifying
which task was the offender that caused the stack overflow or other
memory protection exception that I’m reporting in my BSoD. I’m using
FreeRTOS v7.2.0 .

Right now, I’m catching stack overflows and other memory errors based
on a two-tiered approach: I have a vApplicationStackOverflowHook()
function implemented (with configCHECK_FOR_STACK_OVERFLOW set to 2 in
FreeRTOSConfig.h) so that the RTOS can do its stack size and stack
canary checks when each task is swapped out, and I have an interrupt
handler set up for the MPU exception interrupt (position 3 in the
interrupt vector) to catch MPU exceptions (“hard faults”).

The MPU exception interrupt handler triggers reliably, but I can’t get
it to identify the offending task properly. I’m using
xTaskGetCurrentTaskHandle() and pcGetTaskName() to spit the task name
to the display, and manually triggering faults for testing purposes by
reducing the task stack of a selected task to something tiny (eg. 10
words for a task whose stack usage is around 320 words) which is sure
to cause a fault. The problem is the task name I get never matches the
task that I intentionally made to overflow its stack, and sometimes is
gibberish (which I assume means either that xTaskGetCurrentTaskHandle
returned the handle for the idle task, which doesn’t have a valid task
name string pointer in its tcb, or that the tcb for the task which
overflowed its stack has been corrupted). Nb. the
vApplicationStackOverflowHook function never seems to be called at
all, even when I purposefully trigger a much smaller stack overflow
(eg 4 bytes) which “shouldn’t” make the MPU angry - it’s always the
interrupt handler triggered by an MPU exception that gets called.

I’ve got a few speculative ideas as to what might be going on (along
the lines of “the mis-sized stack for TaskA is really causing TaskB to
access memory in a region it can’t”), but I haven’t dug into the guts
of the RTOS to the point where I’m confident my mental model of what’s
going on is right in this case, and the debugger that Cypress ships
with the IDE is nearly worthless. Does anyone have an idea as to what
else might be causing my problem? Thanks.


CortexM3/PSoC-5LP : Trouble identifying the task which caused an MPU
exception
https://sourceforge.net/p/freertos/discussion/382005/thread/43c183a5/?limit=25#383f


Sent from sourceforge.net because you indicated interest in
SourceForge.net: Log In to SourceForge.net

To unsubscribe from further messages, please visit
SourceForge.net: Log In to SourceForge.net


This message has been scanned for viruses and
dangerous content by MailScanner http://www.mailscanner.info/, and is
believed to be clean.


Ce courrier électronique ne contient aucun virus ou logiciel malveillant parce que la protection avast! Antivirus est active.

mbh-ashcroft wrote on Friday, January 10, 2014:

Thanks for the reply, jonathan. I didn’t wind up doing as comprehensive of an info dump as you did in my MPU exception handler(my output destination for the BSoD of the device in question is a rather small LCD), but seeing your implementation was helpful.

I’m still curious why the task handle that the API function returned when called from the ISR doesn’t match the task that I expect caused the fault, but I’ve kind of punted on the issue under time pressure from $BOSS, and now I’m working on adding stack high water marks to vTaskGetRunTimeStats() such that I can allocate task stacks a little more intelligently and (hopefully) avoid BSoDs “in the wild” altogether.

rtel wrote on Friday, January 10, 2014:

Ref why the task name can be corrupted:

Depending on the direction of stack growth, that can occur when the stack overflow hits the task control block (in which the task name is stored). It is a good point, ideally the order in which the stack and stack control block are allocated should depend on the direction of stack growth to ensure that never happens. You can always get the handle of the offending task by inspecting the pxCurrentTCB variable - which can be externed as a void * outside of the tasks.c file.

Ref why a different task handle would be returned:

No idea I’m afraid.

Ref why the memory fault exception occurs before stack overflow detection:

The fault is a hardware trap that occurs before the stack overflow occurs, and because of that, can technically be recovered from. The software stack overflow occurs after the stack has already overflowed (or at least come extremely close to overflowing), so can’t really be recovered from as you may not know what was corrupted by the overflow.

Regards.