Issue with CAN ISR not activating

Jamie.Smith · December 1, 2021, 10:46pm

Hi Everyone!

I’m having an issue resolving a problem with an ISR using FreeRTOS 10.2.1 on an STM32F7 chip. The ISR is a straightforward process to take an incoming CAN message and put it onto a Queue using xQueueSendFromISR(). The problem I’m having is that making any change (both adding or commenting out application code) to a number of RTOS tasks in the system, even if they are unrelated to this ISR (this also includes some tasks that are suspended at the point I expect the interrupt to fire) causes the ISR to stop running entirely. I have another ISR that continues to work correctly throughout testing this issue (EXTI interrupts which also send data to a different queue). Here’s the list of things I’ve tested so far:

I have verified the bus integrity using both an external CAN-sniffer on the testing code and by reverting the changes to the last stable release.
I verified that the ISR does not run at all with both a debug breakpoint and using an onboard debug LED toggle
I ran a quick timing check with Tracealyzer to see if there were any events on the queue or something else pre-empting the ISR somehow
I checked the uxTaskGetHighWaterMark() using method 2 on each of the relevant tasks (each task had at least 80 bytes of the default 256 free), as well as the overall available heap memory
I verified the interrupt priority was not higher than the configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY - both are set to equal at 5, and additionally I tried decreasing the ISR priority (by raising the NVIC priority to 6)
I also tried changing the NVIC priority of the other working interrupt to 6 (both together with the broken ISR and separately)

Now the one part I have suspicion about with this ISR compared to the other is the way we enable it. In the current working code, we use a call to a startCAN() function which is called at the end of FreeRTOS_Init(). This function activates the various Notifications provided in the CAN HAL, sets up the CAN filter and calls the actual HAL_CAN_Start() function. I found that moving this to before the RTOS scheduler starts (but after the peripheral details are configured), also causes the same issue as described above. This includes adding a taskENTER_CRITICAL() section around this function to avoid issues with changing the enabled ISRs. This might just be a red herring, but it stuck out to me as something unique about the specific problem ISR.

At this point I’m both out of my depth as a junior engineer and out of ideas on what to try next - does anyone have any thoughts on what else could be causing this and/or tests for me to try?

Thank you,
Jamie

aggarg · December 1, 2021, 11:27pm

This seems totally strange that removing an unrelated task causes an interrupt to not fire. Seems like a symptom of a problem somewhere else.

Have you tried general debugging techniques?

Enable configASSERT - FreeRTOS - The Free RTOS configuration constants and configuration options - FREE Open Source RTOS for small real time embedded systems
Enable stack overflow checking - FreeRTOS - stacks and stack overflow checking
Enable malloc failed checking - FreeRTOS - RTOS hook (callback) functions for task stack overflows, tick interrupts, idle task, daemon task startup, and malloc failure (pvPortMalloc() returning NULL)

Thanks.

rtel · December 1, 2021, 11:30pm

Additionally - can you view the CAN peripheral’s registers in the debugger once it has stopped executing - that might give a clue as to why it stopped (maybe a data corruption turned it off, or there is some error condition on the perihperal that needs clearing, maybe the interrupt is disabled, etc.) - not that that explains the correlation between changing something unrelated and this simptom but it might give you a clue as to where to look.

Jamie.Smith · December 3, 2021, 12:02am

Hello again!

Thanks for the suggestions - I spent today trying them out and unfortunately didn’t get to a solution. Here’s what I found:

configASSERT was already enabled and appears set up as provided - so we aren’t having an issue with an RTOS function misfiring
I re-ran the stack checking I tried earlier in trying to solve this problem but included every task - the configuration to check for stack overflow was already set, and using the getHighWatermark() function provided it each task had more than sufficient memory remaining - the minimum remaining words I saw returned was 80, and the rest were between 130-220 (of 256 allocated)
The malloc failed config was also already enabled and set up - doesn’t look like that is the cause either (we have the system set up to use heap 3)
Checking the CAN peripheral registers during operation also didn’t yield anything directly - the registers were the same in the “working” and “non-working” versions, and double checking against the datasheet shows that the bit for the RX interrupt is active in both cases.

One thing I did find was while checking the CAN registers I had one instance where just adding the breakpoint to the StartCAN() function mentioned above caused the system to work as expected, but I wasn’t able to reproduce that more than one time. That makes me think I have some sort of timing issue or race condition, but I couldn’t reproduce the timing (I also tried adding an arbitrary HAL_DELAY() to see if that made a difference). I had the same thing happen when adding the stack value checking into the default task - it worked when I checked the 18 application defined values, but went back to failing when I added an additional stack watermark check for the default task itself.

So it seems that I’ve reached a point where certain changes on the level of a single function call change the system operation. Anyone have an idea for more tests I can try?

Thanks Again,
Jamie

aggarg · December 3, 2021, 6:12am

Then the interrupt is probably masked in non-working case. Is any interrupt working at that time? You can put a breakpoint at xPortSysTickHandler or xPortPendSVHandler in port.c and see if those are triggered. Also check the value of the BASEPRI register which is used to masked some interrupts.

Thanks.

Topic		Replies	Views
Execution does not resume from the point it was left before interrupt Libraries	9	532	December 12, 2022
RTOS with CAN Bus Kernel	5	1705	October 14, 2019
One task is not running or crashes after a while in PIC32MX795F512L Microchip debug , tutorial , monitoring	8	957	August 25, 2021
STM32F4 Interrupt Priority Issue Calling xQueueReceiveFromISR Libraries debug	5	1028	January 13, 2021
Interrupt is not working on freertos for cortex m4 Kernel	8	1571	October 21, 2023

Issue with CAN ISR not activating

Related topics