FreeRTOS hangs - iMX8 CM7 RPMSG

richard-damon · October 13, 2025, 2:35pm

If setting portTICK_TYPE_IS_ATOMIC to 0 makes things worse, then there may be problems in you design with critical sections or interrupt control/design, as all that does is put accesses to the tick counter into a critical section.

escherstair · October 13, 2025, 2:53pm

Hi @richard-damon thanks!

Can you clarify a little bit what you have in your mind when you write “problems in you design with critical sections or interrupt control/design“?

Do you mean priorities?

Or wrong usage of xxxFromISR() functions?

Or something else?

richard-damon · October 13, 2025, 2:58pm

It could be almost any of those, some what depends on what you mean by “crashes”. if the system stops responding because of hitting an assert, that is valuable information worth getting as that may pin point the problem.

escherstair · October 14, 2025, 5:57am

Based on all the tests I’ve done and on my knowledge, it seems that no assert is triggeted.

As @jbaum and @qtprashleigh found out in their above posts (better than me) it seems that the freertos scheduler stops switching tasks.

I can confirm that this happens only if there is freertos + rpmsg + an external interrupt (for them it’s a GPT, for me it’s an ADC data ready interrupt). And so something with the interrupts is wrong if rpmsg + freertos is used.

I hope in the investigation from @MichalPrincNXP, but let me know if I can do something to help.

escherstair · October 14, 2025, 9:58am

One thing that just came to my eyes:

in rpmsg_platform.c I see

/**
 * platform_in_isr
 *
 * Return whether CPU is processing IRQ
 *
 * @return True for IRQ, false otherwise.
 *
 */
int32_t platform_in_isr(void)
{
    return (((SCB->ICSR & SCB_ICSR_VECTACTIVE_Msk) != 0UL) ? 1 : 0);
}

and this is based on ICSR register.

But as far as I know, to know if an instruction is running in interrupt handler or thread mode I should read IPSR (not ICSR).

And there is the function __get_IPSR() for this purpose.

Are you sure about ICSR?

MichalPrincNXP · October 15, 2025, 6:55am

Hi @escherstair , based on the research using both IPSR and ICSR should be ok. For FreeRTOS projects, one can use directly the xPortIsInsideInterrupt() function that is reading the IPSR register. Anyway, could you have a try to replace the platform_in_isr() impl. on your side to see the potential effect? I have not succeeded in reproducing the issue on my side yet, I have the working project but no crash observed yet, working on it …

escherstair · October 15, 2025, 8:42am

I did it yesterday. Nothing changes.

I can confirm what @qtprashleigh wrote: setting configUSE_16_BIT_TICKS to 1 doesn’t “stop working” after 37 hours.

I’ll go on with my investigation.

MichalPrincNXP · October 15, 2025, 9:01am

@escherstair thanks. To avoid any optimization issue I would also try to use the xPortIsInsideInterrupt() directly in the rpmsg_env_freertos.c code, replacing the env_in_isr(). May I ask you to try on your side?

I can confirm what @qtprashleigh wrote: setting configUSE_16_BIT_TICKS to 1 doesn’t “stop working” after 37 hours.

Do I understand it correctly that when configUSE_16_BIT_TICKS is set to 1 the issue is no more observed?

escherstair · October 15, 2025, 1:17pm

Just done it.

I doesn’t fix the issue but it takes a longer time to happen. So, it has an impact.

Yes. Correct.

qtprashleigh · October 15, 2025, 3:44pm

I have had my board running continuously for over 12 days now since making this change, with no issues. So I can confirm that this workaround avoids the issue.

escherstair · October 24, 2025, 8:59am

I did other tests. I need to double check (one more time) because I want to be 100% sure about this:

I did test with configUSE_16_BIT_TICKS set to 0 (so 32 bit TickCount)
I’ve been able to build an application that doesn’t crash in 37 hours (I’m not saying it won’t crash forever)
I have another application that crashes in 37 hours (or sooner if I change configINITIAL_TICK_COUNT)
the difference between them is small, and the function added (to get the crash) are never called during the execution
so, it seems to me that only the map file (placing in memory of objects) changes between the two applications

But, give me some more time, because I want to be really 100% sure about what I wrote above. Other test scheduled from my side.

If @MichalPrincNXP can share some updates from his side it would be useful.

jbaum · October 24, 2025, 11:01am

Interesting find! How large/complicated is the project? Would it be possible to share it/a derivation as minimal example to reproduce the issue? My project requires a propietary kernel driver, which makes things harder for Michal.

MichalPrincNXP · October 24, 2025, 11:46am

@escherstair great news, I am interested in what the issue comes from.
As @jbaum indicated, I was not much successful with transferring the provided minimalist app. into another hw and reproducing the issue. I got it working on rt1180 board using IAR compiler and facing task stucking in a blocked state after cca 20min, but I can’t say this the porting issue or the particular problem all we are trying to solve. Anyway, I was observing that the task stuck happens only when MU and GPT interrupts coexist and interacts with the task (GPT posts event and MU puts new item into the rpmsg queue that the task is waiting for). Once I have updated the app. logic to avoid the GPT interrupts, no task stuck observed. Also, no issue observed when the rpmsg_queue_recv api is called with 0 timeout.

escherstair · October 24, 2025, 12:26pm

@jbaum @MichalPrincNXP the project itself it’s not so simple, but the crash happens even if it basically does nothing (I mean, read an external ADC and sending messages from M7 to A53). All the other tasks are not triggered.

The problem is that my application requires at leas an external ADC that must return the ADC samples. Long story short: a custom hardware is required.

I’ll think if I can simplify in some way the application to avoid the necessity of ADC.

Reading what youy wrote I think you got the point because I can confirm that the issue happens only if MU and (an external interrupt coexist (in my case this is ADC).

If I remove the ADC (physically or its app logic) no issue happens.

In my application I call rpmsg_queue_recv_nocopy() with a timeout that it’s not 0. I’ll change something to use 0. But I can say that when I had portMAX_DELAY the issue happened more often. So I decreased the value.

At that time I thought that if something (an interrupt?) happens while rpmsg is waiting, something else stays blocked.

This could explain why timeout 0 doesn’t show the issue (rpmsg doesn’t wait so nothing can happen while waiting).

Let’s stay in touch on this topic. It seems to me we’re not so far from the catch

escherstair · October 27, 2025, 9:10am

I tested this workaround (with rpmsg_queue_recv_nocopy() in my case) and It doesn’t work.

Calling with 0 timeout doesn’t fix the issue.

At the moment the only effective workaround is setting configUSE_16_BIT_TICKS to 1.

I let you know.

escherstair · October 28, 2025, 11:48am

After deeper testing I can confirm that the difference between a firmware that crashes and another one that doesn’t crash it’s only in functions that are never called.

I’m starting an investigation on object alignment in map file and/or firmware size to see what happens.

Topic		Replies	Views
Yield from ISR and Tick Interrupt Collision? Kernel	10	590	February 2, 2012
vTaskDelay cause system halt Kernel	42	899	August 25, 2013
FreeRTOS hanging in idle Task Kernel	20	7000	January 31, 2020
What is this ? Stack Overflow ? Kernel	12	266	September 2, 2009
1 tick delayed task start after portSUPPRESS_TICKS_AND_SLEEP Kernel	22	1511	April 20, 2017

FreeRTOS hangs - iMX8 CM7 RPMSG

Related topics