FreeRTOS total system failure

ps_uk · May 2, 2022, 11:12pm

Hi All

I’ve been dropped on an issue with a system running FreeRTOS on Zynq 7000 that seems really difficult to pin down. Long story short, the system dies completely after a sequence of user steps - by that I mean no output on serial, JTAG debugger stops responding, all LEDs and HMI becomes frozen. After a lot of printfs, I’ve narrowed down a spot just before the freeze happens - a call to take a semaphore (with inf blocking time). It’s always the same place, even with a scaled down version of the firmware (i.e. some tasks not started). I’ve set a watchpoint on the aforementioned semaphore, hoping to see memory corruption, but no issue there.
Any ideas on what could cause a catastrophic failure like this and how to debug it?
All comments appreciated.

rtel · May 2, 2022, 11:35pm

Does it happen in the same place on all hardware units - or just one?

richard-damon · May 3, 2022, 2:33am

If you are killing the JTAG, then some likely causes are a wild write that it hitting a control register that is breaking the JTAG, or doing something that is interfering with your system clock. “Normal” software operation shouldn’t be able to disable JTAG.

ps_uk · May 3, 2022, 11:21am

Yes, exactly the same place.
One thing I forgot to mention is that the watchdog on the arm processor becomes dead as well - in less severe problems the product at least reboots on its own.

rtel · May 3, 2022, 2:14pm

If the watchdog stops too then I’m suspicious of a power problem. Can you answer my question as to whether this happens on just one device or all devices.

ps_uk · May 3, 2022, 3:48pm

@rtel Yes, it happens on all devices.
I’ve now found that watchdog doesn’t work if it’s configured as a timer (i.e. Triggers an interrupt - the way it’s implemented now). When I change it to configured as watchdog then reboot is happening. I tried setting a watch point over ‘reset status’ register, but if triggers after the system already restarted.

tomstorey · May 3, 2022, 4:47pm

If its happening after a lot of printf’s, could you potentially be overflowing a buffer and perhaps overwriting something to do with the semaphore, resulting in a bad pointer or some such?

Can you perhaps take a copy of the semaphores memory before and after the crash - if youre able to do the later once it has crashed - to see if something has changed drastically?

But perhaps this could be ruled out simply by looking at where the semaphore pointer/memory is allocated in relation to the buffer used by printf. If the former is directly after the later, perhaps there is some credibility to this.

Another option is monitoring stack usage in your tasks. Some printf implementations, I am led to believe, can be quite memory hungry, and maybe youre blowing out the stack of a task and overwriting something critical.

ps_uk · May 3, 2022, 8:56pm

Thanks, but printf isn’t the issue here, the system freezes with/without it.

madyn · May 4, 2022, 5:27pm

I’ve had problems where the semaphore is still a null pointer. This freezes things on an ARM processor, but doesn’t kill the debugging. The debugging, however, is not JTAG in this case.

Topic		Replies	Views
FreeRTOS crashes after some time Kernel	7	252	October 4, 2006
Lost semaphore on SAM4N Kernel	13	199	March 23, 2015
Watchdog troubles with FreeRTOS Kernel	2	259	July 27, 2011
How to solve Semaphore failed issue Kernel	1	291	February 9, 2022
Init task is stuck on xQueueSemaphoreTake() Kernel	10	3260	March 9, 2020

FreeRTOS total system failure

Related topics