First, if you found this late a stack overflow, make sure you have stack overflow testing enabled to make sure you don’t have more overflow issues.
I would also, as I suggested, put in a trap to detect the failure of the xEventGroupSetBitsFromISR, and examine what was running at the time to see if you can identify a task that is hogging things.
Of course. I should have done that from the beginning !
I trapped the failure and could notice that tasks and queues were fine. But how can I identify what is hogging things ?
One question: the stack at the time of the failure is always the same. And I could notice that the hardware interrupt is raised while the FreeRTOS kernel has been locked (by the osKernelLock() function, i.e. vTaskSuspendAll() function). Could this have an impact ?
Yes, if some task has a long running suspension of the Kernel (long being long enough to fill the queue) then that can be the cause of your problem.
By suspending the scheduler, NOTHING else will get run except for ISRs, and the xEventGroupSetBitsFromISR needs the Timer Task to run to process the event.
Thanks @richard-damon.
I dived into the scheduler code and your answer confirms my observations.
Not suspending the scheduler sounds solve the current issue. I need now solve some unwanted task switching (preemption means 'round-robin in FreeRTOS)
Thanks again for your support.
Best regards.
That “Round-Robin” causes problems is an indication that the code isn’t properly thread-safe.
It is perhaps a know limitation that preempting a task causes a round-robin rescheduling on the return to that priority, but disabling Round Robin wasn’t intended (to my understanding) the disabling of the change, just avoiding the overhead of looking at the scheduler in the tic interrupt unless some higher priority task became ready at it.
OK. I notice that the “round-robin” effect is due to the preemption of the Timer Task and to the implementation that places the preempted task at the end of the list. Hence, when the Timer task terminates its job, the next ready task is activated. The preempted task has to wait for all the ready task to run. This is pictured in this screenshot from the SEGGERS SYStem View tool:
back to the preempted task (yellow)
The trouble is that the interrupted operation in the yellow task is a bus write operation. Every corrupted operation on the bus is observed with the context layout like on the picture.
So you want to ensure that the bus write operation is not preempted. If it is okay to take interrupts while the bus write is in progress, you can suspend the scheduler by calling vTaskSuspendAll and once the operation is done, resume it by calling xTaskResumeAll:
vTaskSuspendAll();
{
/* Bus write operation. */
}
xTaskResumeAll();
If it is not okay to take interrupts while the bus write is in progress, you can use critical section:
taskENTER_CRITICAL();
{
/* Bus write operation. */
}
taskEXIT_CRITICAL();
Thanks @aggarg .
That’s what is done (as explained above) and that’s the cause of the original issue of this topic. Suspending all tasks prevents the Timer Task from running, leading, in some rare occurrences, to the complete filling of the Timer Queue (For your info: bus operations last about 50µsec on CPU running at 200MHz).
Actually the bus operations afford short interrupts (like for IRQ handlers) but does not afford the too long suspension delay of the preempted task.
In the current situation, the solution is well
to keep the scheduler suspended during the bus operations.
to increase the size of the Timer Queue to a huge value to afford “long” scheduler suspension (later on, tests shall be performed to tune this value).
It sounds like what you need is a mutex between that group of tasks so only one “runs” at a time.
If it is just during those red periods that need to be protected, then having each task boost its priority above the other tasks for that period, so that operation won’t get interrupted, and restore it when that interval is done.
Right, but this becomes an awesome use of the OS (scheduler).
If needed, I would rather modify the implementation of the scheduler and replace the “Insert to end” of the preempted task with an “Insert to the first ready task with the same priority”. According to me me, there is no reason that the preempted task also becomes preempted by the other tasks of the same prioriy. That’s my point of vue, of course
The problem with changing the scheduler to put the task back to the front of the list instead of the end is that this breaks the use of Yield, and Round Robin. That code doesn’t know why the task was being switched out, just that it was.
If I remember right, the code that does this is used for other lists, and that change would impact the use of queue and semaphores and would break their “fairness” property.