dsPIC33CK, Address Error with high interrupt rates

jackM · March 10, 2023, 1:08pm

Hello,
i tried to condensate the issue in the title. A little background:
I have an application running on a dsPIC33CK64MP502 (dsPIC33CK core, 64k Flash, 8k RAM)
The FreeRTOS port is the v9.0.0 present on the microchip Github: GitHub - MicrochipTech/freeRTOS-PIC24-dsPIC-PIC32MM: This repository contains the freeRTOS demos for Microchip device families like PIC24, dsPIC33E, dsPIC33F , dsPIC33C and PIC32MM.
I should mention that i do not use the Heap, all kernel objects are allocated statically and all interrupts that deal with kernel objects have the same priority as the kernel timer interrupt.

The application uses CANBus for communication and in order to test its robustness i began flooding the bus with frames that have an ID that would be accepted by the application.

The sequence of operations is
CAN Interrupt happens.
If RX Interrupt check which FIFO has data available, read the data into a CAN Frame structure (declared as a local variable inside the interrupt) then put the frame inside a queue (using the ISR version of the call).
End of CAN Interrupt.

There are multiple queues, the number of the FIFO decides which queue will be filled.
There are other tasks that await data from the queues.

I’ve noticed that when the bus traffic is reasonable (occasional bursts, spaced some tens of ms) the Can Frame variable inside the ISR is always allocated at the same address (and it seems to be inside the IDLE Task stack)
when there bus is being flooded by a constant stream of frames every 1ms the can frame variable is placed at an higher address.
Up to some point (doesn’t happen sistematically) until an Address error trap is generated.
Unfortunately i can’t investigate the trap as the stack pointer is corrupted (shows 0x04 instead of a valid address, greater than 0x1000) and i haven’t been able to find the exact moment when the stack is corrupted.

I assume the problem lies somewhere in the CAN interrupt and in the context switching.

Here are the questions:
1)Am i correct in my assumptions that the interrupt uses the IDLE Task Stack?
2)If so, can i change the stack to the “CAN Task”?
3)I am wondering why the address of the Can Frame variable in the ISR changes depending on the Interrupt rate, what could be a reason for it? because i think i might be overflowing the IDLE Task stack even though the kernel doesn’t detect the overflow

My freeRTOSConfig.h file:

#define configSUPPORT_DYNAMIC_ALLOCATION          0
#define configUSE_PREEMPTION                      1
#define configUSE_IDLE_HOOK                       1
#define configUSE_TICK_HOOK                       1
#define configTICK_RATE_HZ                        ((TickType_t)1000)
#define configCPU_CLOCK_HZ                        ((unsigned long)4000000) //Fosc/2
#define configMAX_PRIORITIES                      (4)
#define configMINIMAL_STACK_SIZE                  (300)
//#define configTOTAL_HEAP_SIZE                     ((size_t)5000)
#define configMAX_TASK_NAME_LEN                   (16)
#define configQUEUE_REGISTRY_SIZE                 (16)
#define configUSE_16_BIT_TICKS                    1
#define configIDLE_SHOULD_YIELD                   1
#define configUSE_MUTEXES                         1
#define configUSE_TIMERS                          1

#define configTIMER_TASK_PRIORITY                 13
#define configTIMER_QUEUE_LENGTH                  4
#define configTIMER_TASK_STACK_DEPTH              configMINIMAL_STACK_SIZE

#define configUSE_CO_ROUTINES                     0

/* Set the following definitions to 1 to include the API function, or zero
to exclude the API function. */

#define INCLUDE_vTaskPrioritySet                  0
#define INCLUDE_uxTaskPriorityGet                 0
#define INCLUDE_vTaskDelete                       0
#define INCLUDE_vTaskCleanUpResources             0
#define INCLUDE_vTaskSuspend                      1
#define INCLUDE_vTaskDelayUntil                   1
#define INCLUDE_vTaskDelay                        1
#define INCLUDE_xEventGroupSetBitFromISR          1
#define INCLUDE_xTimerPendFunctionCall            1
#define INCLUDE_xSemaphoreGetMutexHolder          1
#define INCLUDE_uxTaskGetStackHighWaterMark       1

#define configKERNEL_INTERRUPT_PRIORITY           0x01

#define pdMS_TO_TICKS(xTimeInMs)                  ((TickType_t)(((TickType_t)(xTimeInMs))) * (((TickType_t)configTICK_RATE_HZ)/(TickType_t)1000))

#define configCHECK_FOR_STACK_OVERFLOW            2

#define configASSERT( x )                         __conditional_software_breakpoint( x )

//Priorities
#define appPRIORITY_IDLE                          1
#define appPRIORITY_LOW                           2
#define appPRIORITY_MID                           2
#define appPRIORITY_HIGH                          3

PaulB-AWS · March 10, 2023, 5:40pm

Hello Jack,

In general, ISRs run using a “system” or ISR stack which is separate from the stack used by any particular task (including the IDLE task). My knowledge of the 16-bit PIC architecture is quite limited but I did notice a few things:

Your config does not mention configSUPPORT_STATIC_ALLOCATION (which defaults to 0) and also sets configSUPPORT_DYNAMIC_ALLOCATION to 0, which should result in a compilation error. When configSUPPORT_STATIC_ALLOCATION is enabled, the kernel calls vApplicationGetIdleTaskMemory to determine where to store the stack of the IDLE task.

In port.c, the xISRStack is declared to be of size configISR_STACK_SIZE. Perhaps this config item needs to be increased for your use case? This seems to also be missing from your FreeRTOSConfig.h

To address your questions directly:

Usually this is not the case. From a brief look at the port code, I assume that ISRs run using the xISRStack. It depends on the implementation of your CAN bus ISR and if it quickly hands off data to a high priority task for processing. Can you share the code used for CAN bus communications?

Can you double check that you provided the correct FreeRTOSConfig.h?

It’s hard to say without looking at the code in question. It could be due to interrupt nesting. The dsPIC33 reference manual has a helpful description on page 13.

jackM · March 10, 2023, 6:12pm

Hello Paul,
thanks for answering.
Weird. It seems i copied the file from the second line.
The first line was the define of configSUPPORT_STATIC_ALLOCATION (which is defined to “1”)

I have to think about the separate ISR stack, as AFAIU when an interrupt happens the return address (the instruction following the ISR call) is pushed onto the stack, then prologue - code - epilogue - return. This of course should mean that the ISR could possibly corrupt any stack that is being used at the moment, hence there must be some way for the kernel to change the stack pointer from the task stack to another stack when an interrupt happens. I shall study port.c more carefully then, because i was also not aware of the existance “ISRStack” and it’s obvious necessity until i began typing this reply. It’s strange that i missed something so fundamental when reading the docs. (my main source of learning is the free book and the online documentation)
But this sparks another question: what if my application also have parts of the code that live outside of the RTOS (an usual situation is protocol emulation, hard realtime)? Don’t answer yet, i want to study the subject over the weekend

Unfortunately i can’t share the CAN code at the moment, but i have already ruled out memory leaks from those functions, there are two things happening: the received frame is being copied from the peripheral to another structure in which the data is formatted to be “platform agnostic” - the ID is reassembled, the flags are derived. There is no memcpy or operations on pointers/arrays that could run away. Then the frame is added to a FreeRTOS queue using the appropriate ISR function

I can also safely rule out interrupt nesting in this case, as all interrupts share the same priority, and interrupts with the same priority can’t preempt each other. Although that would explain everything.

Thanks for the help, i will study some more and report back

richard-damon · March 11, 2023, 1:55am

From my memory, and its been a while since I have done much with the dsPIC family, the processor doesn’t use a separate ISR stack (as the processor didn’t support it) and the port doesn’t support interrupt nesting.

The ISR will always use the stack of whatever task was running, so ALL tasks need enough slack space on their stack to handle the worse case ISR.

If somehow the ISR re-enables the interrupts, and thus allows the interrupts to nest, that could cause the ISR stack to keep on growing. I seem to remember the way the end of ISR task switching occurred, if that was done incorrectly, (by not clearing the flag on entry to the ISR) it could in extreme cases cause a problem of growing stack.

jackM · May 9, 2024, 3:27pm

Hello richard,
I don’t know why i never followed up.
Anyway, more than a year later i am revising the issue (as i’m thinking about upgrading the compiler, and possibly from freeRTOS 9 to whatever the current version is. I did some tests a while ago with the latest 10.5.x and i just had to replace the PORT files with the ones supplied by microchip)

What you say is exactly my conclusion: the interrupt does not have its own stack, so task stack has to take account for interrupt worst case scenario. Bummer.
Can i create another stack, just for the ISR, and tell the ISR to use that stack?

Maybe switch to the IDLE Task stack?
It would also be wonderful if the switch could be FAST. No point in having a very responsive low latency architecture if i have to waste 30-60 instructions at every interrupt (note: the architecture implements separate register sets that can be changed in a single cycle, but i concour that it would be too architecture specific for freeRTOS to implement)

My usual usage of interrupts is: create some auto variables, get data from peripherals and perform some operations, then put the data in a queue.
Or what if i would declare the variables inside the ISR to be static?
then they would be allocated outside of the task stack, problem solved?

richard-damon · May 9, 2024, 5:50pm

I think the problem with giving ISRs their own stack is that since the hardware doesn’t support it, you need to do something to the compiler to make this happen, or add some manually added code to all your ISRs to do this. The complier is unlikely to add this sort of support, as it becomes tricky in the presence of nested interrupt, which the processor supports, but not FreeRTOS on those processors (it seems because it doesn’t support the equivalent of the PendSV operation to make a task switch occur on leaving the last nested ISR, at least that was the state when I was looking at those processors). If Mirochip has developed a way to get around those limitations, it would make things better.