Behaviour of FreeRTOS when a task crashes

EswarKeshav · October 23, 2023, 6:11am

Hello all,
I want to know the behavior of the microcontroller/ Freertos under the below hypothetical situation.
If i am running two tasks on same or different priority.When one tasks get crashed because of stack overflow. What would happen to the another task which is running along with the crashed task. after the stack overflow what would be after effect on the system. will it get rebooted/keep task 2 running.

Thanks in advance.

aggarg · October 23, 2023, 7:52am

Stack overflow will corrupt memory and what happens afterwards depends on what memory is corrupted. It can lead to an unrecoverable fault, if the overflow corrupts something significant OR the system can continue to work normally, if the corrupted memory was not used anywhere else. In short, the system becomes unreliable. What problem are you trying to solve?

RAc · October 23, 2023, 8:01am

What happens is that sooner or later (unpredictable!), the memory corruption will lead to a fault (or, as @aggarg mentioned, may go undetected, which is at least as bad). The default fault handler will enter an infinite CPU bound loop which will normally trigger a watch dog reset, getting the system back to a defined state (out of reset).

As the memory corruption can (and frequently does) affect system data structures such as task lists or queues, it is not guaranteed that the system can recover “gracefully” from a single task failure.

Also remember that in an embedded system, the tasks normally do not run completly independent of one another but typically interact rather closely. Thus, failure of one task typically renders the entire system dysfunctional. Thus, unlike desktop OSs, the only merit in a “graceful” recovery from single task failures - which require a significant effort - would be emergeny rescues (such as orderly flushes of pending writes to a file system) and gathering of crash diagnostics.

EswarKeshav · October 23, 2023, 9:30am

Thanks for your time and reply @aggarg . If i could implement a thin layer over rtos like Task monitoring layer or more like hypervisor who could
1.monitor the whole system and detect the failure of tasks/system earlier and prevents the crash.
2. Enables the developer/user to remove the fault task on runtime without reflash.

It is not a fully formed idea yet. but will this hypervisor/monitor layer helpful for rtos and embedded developers?

EswarKeshav · October 23, 2023, 9:32am

Thanks for your time and detailed answer @RAc .

If i could implement a thin layer over rtos like Task monitoring layer or more like hypervisor who could
1.monitor the whole system and detect the failure of tasks/system earlier and prevents the crash.
2. Enables the developer/user to remove the fault task on runtime without reflash.

It is not a fully formed idea yet. but will this hypervisor/monitor layer helpful for rtos and embedded developers?

RAc · October 23, 2023, 9:35am

That would indeed be nice but impossible to implement. FreeRTOS already supports some forms of fault detection such as stack overflow (this is being discussed very frequently on this forum - not perfect but very helpful), but not every illegal memory can be detected and monitored.

Also, this would not be a “thin layer” regardless of on which level you wanted this to implement.

I also do not see the merits of your point 2. At development time, a reflash is generally necessary to insert a modified code base and does normally not eat up so much time that it pays the effort of attempting to implement it - on Embedded Linux it would, but in FreeRTOS it does not…

EswarKeshav · October 23, 2023, 9:41am

Thanks for the quick reply. If you have time. Can you please tell me the challenges on implementing this feature. will the architecture of FreeRTOS won’t allow it or is there any other challenges.
I am just a beginner on Free rtos.So Sorry if my question sounds lame.
Point 2 is thought out keeping device running on field. The idea is to containerize tasks. so any task should be added or removed on runtime

RAc · October 23, 2023, 9:55am

It would take too much time to explain this in detail, so here are just a few ideas:

It is nor easily possible to determine the exact location of a memory corruption nor attribute it to a particular task. A fault typically does not occur at the time memory got overtrampled but many many cycles later, so there is no direct relationship between the code that behaves illegaly and the symptomatic crash.
Again, critical system structures that may be needed to implement such a monitor system may be subject to overtrampling as well, and if that happens, the monitor itself as well as any hope to use it for recovery is broken.

There are mechanisms on certain hardware platform that can help in making a system sturdier such as MMUs or MPUs, but those are not available on all platform and, if employed, significantly affect the footprint of the system. Also, with every layer of run time debugging you add, you also add to the Heisenberg effect, meaning that the code overhead will modify the code base so significantly that either problems that show up without it may not show with it or conversly cause new problems that do not show up in the release version. Uncountable debugging hours have already been wasted, for example, in pinpointing errors caused by printf() output problems.

edit: There is nothing lame about your argumentation, all of it is very reasonable. It takes a number of years of practical coding to discover the fine line in embedded system development between helping to make it work better and making it worse by adding code, it is a constant tightrope walk.

EswarKeshav · October 23, 2023, 10:02am

Thanks for the explanation @RAc

RAc · October 23, 2023, 10:02am

And one more remark: If you have ideas to make the system better and sturdier, you are more than welcome to submit your code in PRs; the community appreciates all contributions, and the reviewers will be able to weigh the costs and gains of those suggestions very competent and thankful.

EswarKeshav · October 23, 2023, 10:04am

May i know what is PRs and where can i find that?

RAc · October 23, 2023, 10:07am

you need to be familiar with git, then go up to FreeRTOS · GitHub , clone FreeRTOS source locally and go on from there.

EswarKeshav · October 23, 2023, 10:10am

Oh thanks . pull requests → PR’s got it

aggarg · October 23, 2023, 12:39pm

Just to add, the only reliable way to detect it is when hardware supports stack overflow detection. An example is Cortex-M33, which has stack limit registers for this purpose.

RAc · October 23, 2023, 12:43pm

Full ACK! Just to add, stack overflow is a frequent but by no means the only cause for memory corruption. Other poster cases are:

Writes to dynamically allocated memory past its allocated size
Writes to memory that has been deallocated
Broken chained lists

And so on. Needless to say, stack overflow detection even by hardware will be of no help here.

aggarg · October 23, 2023, 12:47pm

You’re right - stack overflow detection, regardless of how accurate it is, cannot catch all other memory corruptions!

Topic		Replies	Views
Task handling in FreeRTOS Kernel	4	600	April 30, 2024
Task crashes after X hours Kernel	3	58	July 21, 2025
Massive stack overflow on task when kernel starts. Any ideas? Kernel debug	50	643	May 29, 2024
My journey with FreeRTOS so far Kernel	16	256	January 28, 2025
Hard fault while calling Check Tasks Waiting Termination in Idle task Kernel	5	813	June 11, 2025

Behaviour of FreeRTOS when a task crashes

Related topics