Timer task stall when its Q is full

Hi,

I encountered an issue with FreeRTOS version 10.0.1. We have totally 9 tasks including the timer task and idle task. Idle task set at priority 0, timer task and 5 other tasks are at 1, the other 2 tasks at level 2. The 7 tasks are mainly timer driven, e.g. every 100ms or 200ms, etc. using the software timer.

The issue is that sometimes when the Q of the timer task is full, all other tasks are blocked (as the Q is full), and the timer task and idle task are on the ready list, so they are ready, but the timer task looks like does not get a chance to run and keep stalling there.

We tried to set the timer task priority to 2 and the issue disappeared, but I do not understand why setting timer task priority 1 will stall all tasks including timer task. Logically when all other tasks are stalled, timer task should get a chance to run. Does anyone knows the cause behind that?

Thanks,
Adam

Do any of the timer callbacks make blocking calls, especially to set a timer? If the queue is full and a timer callback makes a timer call you will deadlock, which is one reason to make the timer/service task the highest priority task, so that it can keep servicing the timer queue and keep it empty.

If you can’t do that, then make the queue longer, so it is long enough to handle every request that might be pending to it.

Additionally, the way you describe your design it sounds like you are using software timers to trigger periodic functionality in the other tasks. If that is the case then consider using vTaskDelayUntil() within the tasks themselves instead.

Thanks very much for the help and the suggestion. The callback just set a notification (flag) to the corresponding task and then quit.

When the issue happens, the timer’s Q is full (15 messages) and looks like the timer task stall there for a couple of minutes (sometimes hours) before recover. We can simply increase the timer task priority to avoid this issue, but I do not understand why the timer task stop processing the incoming messages.

One more thing I noticed is that all tasks have notification value 0 (no notification), and the notification state of timer and idle task are “no waiting”. Below is the details of all the tasks:

Task DD, no: 1, state: X (0), prio: 1, notfSt: NoWt, notfVal: 0x0000, stackwm: 302
Task Tmr, no: 9, state: R (1), prio: 1, notfSt: NoWt, notfVal: 0x0000, stackwm: 22
Task IDL, no: 8, state: R (1), prio: 0, notfSt: NoWt, notfVal: 0x0000, stackwm: 42
Task M1, no: 2, state: B (2), prio: 1, notfSt: Wait, notfVal: 0x0000, stackwm: 116
Task II, no: 3, state: B (2), prio: 2, notfSt: Wait, notfVal: 0x0000, stackwm: 110
Task M2, no: 5, state: B (2), prio: 2, notfSt: Wait, notfVal: 0x0000, stackwm: 64
Task WW, no: 4, state: B (2), prio: 1, notfSt: Wait, notfVal: 0x0000, stackwm: 60
Task SS, no: 6, state: B (2), prio: 1, notfSt: Wait, notfVal: 0x0000, stackwm: 128
Task FF, no: 7, state: B (2), prio: 1, notfSt: Wait, notfVal: 0x0000, stackwm: 144

Perhaps you have time slicing disabled. Please attach your FreeRTOSConfig.h file.

Do you also have stack overflow detection on, configASSERT() defined, etc. as per the information on this page: https://www.freertos.org/FAQHelp.html

Thanks Richard. The time slicing is enabled, stack overflow is disabled, and configASSERT() is not defined. As we can see from the stackwm (usStackHighWaterMark), there is no overflow occurred (otherwise will be 0).

This issue only happens around 1 ~ 5%, not always, and sometimes can recover by itself after several minutes. I mainly want to understand what are the possible causes that could explain the behavior. Thanks.

It looks like task 1 (DD) is also at priority 1 and is running. Could you have time slicing disabled and task 1 is just running and never blocking.yielding?

Turning stack overflow checking on and defining configASSERT() may help you understand the behaviour. Also consider using a trace tool such as FreeRTOS+Trace from Percepio is a sophisticated diagnostic tool for FreeRTOS

Thanks Richard. with more debug message, I roughly confirm the timer task does not get a chance to run, looks like in a deadlock. Here’s the possible cause I think, please let me know if it make sense.

The timer of our tasks are configured to auto-reload but each task’s handler code will also reset its timer which then sends a “reset” command to timer’s Q.
If at a period the system is very busy and the timer’s Q is full, when the timer task processed a reset command from the Q and found that timer is auto-reload enabled, then the timer task will send a “START_DONT_TRACE” command to the Q of itself as shown below, which then will be pending the Q to be released but it’s full, and then causing a deadlock.

*if( pxTimer->uxAutoReload == ( UBaseType_t ) pdTRUE )*
{
    xResult = xTimerGenericCommand( pxTimer, tmrCOMMAND_START_DONT_TRACE, xMessage.u.xTimerParameters.xMessageValue + pxTimer->xTimerPeriodInTicks, NULL, tmrNO_DELAY );
}

However if this is the case, the timer task should be put to pending (or blocking) list instead of ready list, right? actually it shows the state of the timer task is 1 (Ready), looks like conflict here.

I may not be understanding that fully, as I’m not sure why an auto-reload timer would also send a reset to itself, but timer handlers execute from the time task so should not attempt to write to the timer task as if the timer command queue is full it will either fail (if the block time is zero) or block the timer command queue (i.e. block itself if the block time is not zero). This is demonstrated in your next paragraph.

So this is doing exactly what I said not to do - post a message to itself, but it is using a block time of zero so it won’t block itself. As per my previous comments above if you have configASSERT() defined then you would have noticed this happening right away - best thing to do is increase the length of the timer command queue.

There may also be things would could do in the code to cache missed writes to the queue in a list so it can try again later, or never allow the command queue to become full by keeping space specifically for these lines of code (although that just means the application will find the queue full sooner).

Thanks Richard. We have figured out the root cause, it’s due to task priority issue and causing the time task kept busy handling incoming messages.

Ask a question about the configASSERT(), looks like debugger is needed to run the configAssert(), can run it without debugger? Thanks.

Glad you figured it out, and thanks for reporting back.

You can define configASSERT() to print out the asserting file and line number if you run without the debugger attached.