So, not a factor here, since my cache is write-back.
@escherstair , I got some suggestions from colleagues who are working with imx parts:
a. Disable instruction cache and data cache. - this has been already discussedâŚ
b. Disable Branch Target Address Cache(BTAC).
c. Attach cortex-m7 with latest Jlink software (>= V8.92) to check the context.
I try to give my feedback, based on the real scenario:
a. The whole firmware of Cortex-M runs from DDR. I think I can disable instruction cache and data cache for this memory region (but Iâve never tried). I cannot disable cache for the whole DDR (used by Linux) and for rpmsg shared memory (mailbox).
b. What do you mean? When should I disable/enable BTAC in my firmware? Is there any SDK function to do this?
c. if your colleagues can provide a step by step procedure describing how to attach to running cortex-m7 without resetting Linux theyâre really welcome! I (and everyone I know) never succeeded to to this. Segger answered to me âit worksâ, but without giving any help on this. If cortex-m7 firmware doesnât need Linux (but rpmsg needs it) attaching to cortex-m7 works properly.
sorry for pushing, but can you provide a feedback on my previous post?
While investigating the issue, please take in mind that based on valuable @jbaum tests and experience, the same application ported to ThreadX doesnât show any issue on the same processor.
For this reason I would not keep FreeRTOS out of the pictureâŚ.
Hello @escherstair , I am also pushing for the reply from the other NXP teams, I am sorry no answer so far.
I agree the issue could sit in FreeRTOS but also it could be caused by the cache handling, from what we discussed. Without trying the system with cache disabled we canât eliminate it. I would concentrate on that.
As for the FreeRTOS, are kernel variables and shared data structures also placed in cached memory? It is recommended to use non-cacheable memory or TCM to avoid coherency issues.
In my case everything is in cached DDR (I canât move them. Itâs a lonmg story and it comes from a bug in remoteproc library delivered in NXP iMX downstream linux (in a private support ticket NXP team wrote me that the bug is confirmed but they wonât fix, since the fix is already available in Linux mainline/upstream
)
But in @jbaum case, itâs in TCM so we can exclude this from the investigation.
Based on the time we spent on this, I think that a team-work is the only way to find the issue.
Otherwise, the only long-term solution is leaving FreeRTOS and using ThreadX (sorry for this, but itâs the situation). And, NXP should officially support ThreadX (today it doesnât) since behavior with FreeRTOS is buggy (for whatever reason) and nobody can find why.
Please try to attach cortex-m7 with jlink following getting started with Ozone.
Note: when linux is running, please ensure that these pads of jtag are not used as another function(e.g. uart).
If you want to disable data cache and instruction cache, you can call below apis,
SCB_DisableICache();
SCB_DisableDCache();
Itâs not a matter of âOzone getting started instructionsâ.
I can connect to Cortex-M (firmware running from DDR only - no ITCM/DTCM) leaving Cortex-A running if Cortex-A runs u-boot. But if Cortex-A runs Linux and rpmsg communication, itâs not possible attaching without having Linux somehow âdisturbedâ.
If you can provide instructions on how to do this in this scenario, youâre welcome. Segget tech supports says âif it works with the provided examples and hardware, itâs your faultâ (more or less).
Which jtag pads do you refer to when you wrote these pads?
I know, but the behabvior depends on how you configure MPU for the several regionsâŚ
@escherstairďź
Donât need to check jtag pads. It doesnât have conflict on iMX8MP EVK.
I mean that you can use ozone to check context easily.
Please disable cpuilde for linux in uboot command line, then you also can attach cortex-m7 when linux is up.

When the caches are disabled, then you can igore the cache policy in MPU.
@MichalPrincNXP news from NXP on this?
@escherstair I had a discussion with application team and quite similar symptoms have been observed on another SoC with CM7 acting as the secondary/remote core. It relates to CM7 gcc linker sections alignment and their copying from flash to CM7 ram during the primary core startup. It could happen that even the individual CM7 gcc linker sections are correctly aligned the startup copying mechanism does not address that alignment correctly and sections could not be copied at correct aligned addresses. It could affect some global variables in CM7 ram. I am not able to explain the details of the issue (occurred randomly, dependent on gcc section sizes) and I do not know how the CM7 data sections are initialized in your case (remoteproc?), but what I would recommend to verify is your CM7 app data initialization and data sections alignment. Also, as the data cache is enabled, I would ensure data sections are cache size aligned.
thanks!
In my case CM7 firmware is loaded from uboot (not Linux - remoteproc), from a .bin file (not .elf).
I would recommend to verify is your CM7 app data initialization and data sections alignment. Also, as the data cache is enabled, I would ensure data sections are cache size aligned.
Could you give me some more details on what I should double check, please?
I see, once it is loaded as one binary in your case, it should be ok, I have been told that the issue could occur when elf is parsed and sections are loaded individually. So, you could ignore my comment then.
As for the CM7 data sections alignment to data cache line size, does it make sense?
Not sure I understand what you mean.
Can you clarify a little bit?
I suspected it could be some misalignment when cm7 data are initialized, thinking if it would be good to align .data and .bss to cache line size (32bytes) to avoid that. I am not sure about that now when you shared that the whole cm7 binary is loaded as a block.
Hi,
Iâm currently working on a project that uses an i.MX8MP on a Variscite SoM, and Iâm experiencing the same problem that others in this thread have reported (@escherstair). In my case, the firmware architecture is very similar: I use both TCM and DDR memory, and I have a FreeRTOS application that communicates with the A53 processor through an RPMsg channel.
The application stops working after the system has been running for more than 50 hours. I have been facing this issue for about five months. At first, I thought it was a software bug, so I reviewed my code several times, but I could not find any issue that could explain this behavior.
Later, I found this thread and started following it closely. I investigated the issue in a similar way to others here, and I reached the following conclusion: if I move the code that is not related to FreeRTOS or hardware interrupts into TCM, the stability of my application improves over time. However, I do not know whether this behavior is caused by changes in the output.map file or whether it is actually related to memory access.
@MichalPrincNXP, do you have any updates on this topic? I hope that the behavior I described in this message may help with identifying the root cause of the issue.
On the other hand, if there is any test I could perform that might help find the root cause, please let me know.
PS: I dont have active any data cache. I cant share my test linker file beacause Iâm new user in this forum.
Regards
You should now be able to upload.
