Behaviour of FreeRTOS when a task crashes

Hello all,
I want to know the behavior of the microcontroller/ Freertos under the below hypothetical situation.
If i am running two tasks on same or different priority.When one tasks get crashed because of stack overflow. What would happen to the another task which is running along with the crashed task. after the stack overflow what would be after effect on the system. will it get rebooted/keep task 2 running.

Thanks in advance.

Stack overflow will corrupt memory and what happens afterwards depends on what memory is corrupted. It can lead to an unrecoverable fault, if the overflow corrupts something significant OR the system can continue to work normally, if the corrupted memory was not used anywhere else. In short, the system becomes unreliable. What problem are you trying to solve?

1 Like

What happens is that sooner or later (unpredictable!), the memory corruption will lead to a fault (or, as @aggarg mentioned, may go undetected, which is at least as bad). The default fault handler will enter an infinite CPU bound loop which will normally trigger a watch dog reset, getting the system back to a defined state (out of reset).

As the memory corruption can (and frequently does) affect system data structures such as task lists or queues, it is not guaranteed that the system can recover “gracefully” from a single task failure.

Also remember that in an embedded system, the tasks normally do not run completly independent of one another but typically interact rather closely. Thus, failure of one task typically renders the entire system dysfunctional. Thus, unlike desktop OSs, the only merit in a “graceful” recovery from single task failures - which require a significant effort - would be emergeny rescues (such as orderly flushes of pending writes to a file system) and gathering of crash diagnostics.

1 Like

Thanks for your time and reply @aggarg . If i could implement a thin layer over rtos like Task monitoring layer or more like hypervisor who could
1.monitor the whole system and detect the failure of tasks/system earlier and prevents the crash.
2. Enables the developer/user to remove the fault task on runtime without reflash.

It is not a fully formed idea yet. but will this hypervisor/monitor layer helpful for rtos and embedded developers?

Thanks for your time and detailed answer @RAc .

If i could implement a thin layer over rtos like Task monitoring layer or more like hypervisor who could
1.monitor the whole system and detect the failure of tasks/system earlier and prevents the crash.
2. Enables the developer/user to remove the fault task on runtime without reflash.

It is not a fully formed idea yet. but will this hypervisor/monitor layer helpful for rtos and embedded developers?

That would indeed be nice but impossible to implement. FreeRTOS already supports some forms of fault detection such as stack overflow (this is being discussed very frequently on this forum - not perfect but very helpful), but not every illegal memory can be detected and monitored.

Also, this would not be a “thin layer” regardless of on which level you wanted this to implement.

I also do not see the merits of your point 2. At development time, a reflash is generally necessary to insert a modified code base and does normally not eat up so much time that it pays the effort of attempting to implement it - on Embedded Linux it would, but in FreeRTOS it does not…

1 Like

Thanks for the quick reply. If you have time. Can you please tell me the challenges on implementing this feature. will the architecture of FreeRTOS won’t allow it or is there any other challenges.
I am just a beginner on Free rtos.So Sorry if my question sounds lame.
Point 2 is thought out keeping device running on field. The idea is to containerize tasks. so any task should be added or removed on runtime

It would take too much time to explain this in detail, so here are just a few ideas:

  • It is nor easily possible to determine the exact location of a memory corruption nor attribute it to a particular task. A fault typically does not occur at the time memory got overtrampled but many many cycles later, so there is no direct relationship between the code that behaves illegaly and the symptomatic crash.
  • Again, critical system structures that may be needed to implement such a monitor system may be subject to overtrampling as well, and if that happens, the monitor itself as well as any hope to use it for recovery is broken.

There are mechanisms on certain hardware platform that can help in making a system sturdier such as MMUs or MPUs, but those are not available on all platform and, if employed, significantly affect the footprint of the system. Also, with every layer of run time debugging you add, you also add to the Heisenberg effect, meaning that the code overhead will modify the code base so significantly that either problems that show up without it may not show with it or conversly cause new problems that do not show up in the release version. Uncountable debugging hours have already been wasted, for example, in pinpointing errors caused by printf() output problems.

edit: There is nothing lame about your argumentation, all of it is very reasonable. It takes a number of years of practical coding to discover the fine line in embedded system development between helping to make it work better and making it worse by adding code, it is a constant tightrope walk.

1 Like

Thanks for the explanation @RAc :slightly_smiling_face:

And one more remark: If you have ideas to make the system better and sturdier, you are more than welcome to submit your code in PRs; the community appreciates all contributions, and the reviewers will be able to weigh the costs and gains of those suggestions very competent and thankful.

May i know what is PRs and where can i find that?

you need to be familiar with git, then go up to FreeRTOS · GitHub , clone FreeRTOS source locally and go on from there.

Oh thanks :slightly_smiling_face:. pull requests → PR’s got it :+1:

Just to add, the only reliable way to detect it is when hardware supports stack overflow detection. An example is Cortex-M33, which has stack limit registers for this purpose.

Full ACK! Just to add, stack overflow is a frequent but by no means the only cause for memory corruption. Other poster cases are:

  • Writes to dynamically allocated memory past its allocated size
  • Writes to memory that has been deallocated
  • Broken chained lists

And so on. Needless to say, stack overflow detection even by hardware will be of no help here.

You’re right - stack overflow detection, regardless of how accurate it is, cannot catch all other memory corruptions!

1 Like