Task hang recovery

wrig0421 · November 10, 2020, 2:48pm

I am developing a plan to recover from potential device hangs. Specifically cases where a task becomes stuck in an infinite loop. My system periodically receives interrupt from another chip at fixed time intervals. My question is if I were stuck in a task and an interrupt was detected from the external chip is there a way of suspending the stuck task from ISR? I would record the failure (send pkt to another chip on board), then reset the system. This seems odd since program is executing task then want to suspend the task from ISR. Any feedback is helpful. Thanks.

richard-damon · November 10, 2020, 3:43pm

While an ISR can suspend a task from within the ISR, that isn’t likely going to solve the issue, as the task will still just be sitting there, just not using any CPU time.

The key feature that you need to design into the task is some form of timeout. Unless it is legitimately expected that it might take a long time for a device to respond (like maybe a command terminal, might wait forever for the next command), most device operations should have definite timeout and handling, even for errors that you think should be ‘impossible’. If you force yourself to handle the impossible errors, you will naturally handle the very unlikely errors that you didn’t think of in the first place.

Your idea of a total system reset IS one possible solution, if the error means you have no idea what the state of the system is, or what would be needed to recover to a useful state, but should be a last resort, and only for failures of core functionality (and consideration of what the system might be doing, it may need a more controlled shutdown and restart)

Xavier · September 7, 2021, 3:15am

Hi,

Some months ago I wrote an entry blog about a software watchdog so that the system detects and recovers from hang-ups. It’s written in spanish, but Google might translate it for you to your mother tongue:

http://fjrg76.com/2021/04/27/hooks_utiles_y_un_watchdog/#watchdog

(Look for example number 4, too). Hope it helps.

Topic		Replies	Views
Restart task from begining without deleting it? Kernel	3	363	October 19, 2017
External Interrupts Handling - Tiva C Evaluation board Kernel	3	241	March 14, 2019
Waiting On Event - Time Out - Suspend Kernel	2	218	April 29, 2006
Suspend a task from an ISR Kernel	3	2269	May 12, 2020
is the function"eTaskGetStateFromISR"exist? Kernel	4	288	August 21, 2015

Task hang recovery

Related topics