FreeRTOS hanging in idle Task

Hi

We face the issue that from time to time, freertos keeps being stuck in idle. The situation is as follows:

  • suddenly just idle task and tick are running, even if there are other active tasks waiting to execute, but the scheduler wont switch into the tasks for some unknown reason. this means the firmware is executing ok for some time, then suddenly, just tick IRQ handler and the idle task keep running. other interrupts sometimes work, sometimes not.
  • the rtos is configured for tickless idle
  • mcu is efm32gg380f1024 (cortex m3)

i suspect a configuration issue with the rtos, havent found it yet though.
can someone who has experience with the tickless idle mode have a look at it?

note:
please review especially those values:

  • configUSE_TICKLESS_IDLE

  • configUSE_PREEMPTION

  • configUSE_PORT_OPTIMISED_TASK_SELECTION

  • configMAX_PRIORITIES

  • configIDLE_SHOULD_YIELD

  • configTIMER_TASK_PRIORITY

  • configPRIO_BITS

  • configLIBRARY_LOWEST_INTERRUPT_PRIORITY

  • configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY

  • configKERNEL_INTERRUPT_PRIORITY

  • configMAX_SYSCALL_INTERRUPT_PRIORITY

  • scheduler_init()

    #define configUSE_NEWLIB_REENTRANT ( 1 )
    #define configTICK_RATE_HZ ( 100 ) // 100 Hz (instead of the default 1kHz)

    #define configUSE_TICKLESS_IDLE ( 1 )
    #define configEXPECTED_IDLE_TIME_BEFORE_SLEEP ( 2 )

    #define configUSE_TICK_HOOK ( 1 )
    #define configCHECK_FOR_STACK_OVERFLOW ( 2 )
    #define configUSE_MALLOC_FAILED_HOOK ( 1 )
    #define configUSE_IDLE_HOOK ( 0 )
    #define configUSE_DAEMON_TASK_STARTUP_HOOK ( 1 )

    #define configUSE_PREEMPTION ( 1 )
    #define configUSE_PORT_OPTIMISED_TASK_SELECTION ( 1 )
    #define configSUPPORT_STATIC_ALLOCATION ( 1 )
    #define configSUPPORT_DYNAMIC_ALLOCATION ( 0 )
    #define configCPU_CLOCK_HZ ( (unsigned long)28000000 )
    #define configMAX_PRIORITIES ( 6 )
    #define configMINIMAL_STACK_SIZE ( (unsigned short)1024 )
    #define configTOTAL_HEAP_SIZE ( (size_t)(64000) )
    #define configMAX_TASK_NAME_LEN ( 32 )
    #define configUSE_TRACE_FACILITY ( 1 )
    #define configUSE_16_BIT_TICKS ( 0 )
    #define configIDLE_SHOULD_YIELD ( 0 )
    #define configUSE_MUTEXES ( 1 )
    #define configUSE_RECURSIVE_MUTEXES ( 1 )
    #define configUSE_COUNTING_SEMAPHORES ( 1 )
    #define configUSE_ALTERNATIVE_API ( 0 )
    #define configQUEUE_REGISTRY_SIZE ( 32 )
    #define configUSE_QUEUE_SETS ( 0 )

    #define configGENERATE_RUN_TIME_STATS ( 0 )

    #define configUSE_CO_ROUTINES ( 1 )
    #define configMAX_CO_ROUTINE_PRIORITIES ( 1 )

    #define configUSE_TIMERS ( 1 )
    #define configTIMER_TASK_PRIORITY ( configMAX_PRIORITIES - 1 ) // Highest priority
    #define configTIMER_QUEUE_LENGTH ( 10 )
    #define configTIMER_TASK_STACK_DEPTH ( 256 )

    #ifdef __NVIC_PRIO_BITS
    #define configPRIO_BITS __NVIC_PRIO_BITS
    #else
    #define configPRIO_BITS ( 3 ) // 7 priority levels given by mcu cm3
    #endif

    #define configLIBRARY_LOWEST_INTERRUPT_PRIORITY ( 0x07 )
    #define configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY ( 0x5 )

    #define configKERNEL_INTERRUPT_PRIORITY
    ( configLIBRARY_LOWEST_INTERRUPT_PRIORITY << (8 - configPRIO_BITS) )

    #define configMAX_SYSCALL_INTERRUPT_PRIORITY
    ( configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY << (8 - configPRIO_BITS) )

    #define INCLUDE_vTaskPrioritySet ( 1 )
    #define INCLUDE_uxTaskPriorityGet ( 1 )
    #define INCLUDE_vTaskDelete ( 1 )
    #define INCLUDE_vTaskSuspend ( 1 )
    #define INCLUDE_xResumeFromISR ( 1 )
    #define INCLUDE_vTaskDelayUntil ( 1 )
    #define INCLUDE_vTaskDelay ( 1 )
    #define INCLUDE_xTaskGetSchedulerState ( 1 )
    #define INCLUDE_xTaskGetCurrentTaskHandle ( 1 )
    #define INCLUDE_uxTaskGetStackHighWaterMark ( 0 )
    #define INCLUDE_xTaskGetIdleTaskHandle ( 1 )
    #define INCLUDE_xTimerGetTimerDaemonTaskHandle ( 1 )
    #define INCLUDE_eTaskGetState ( 1 )
    #define INCLUDE_xTimerPendFunctionCall ( 1 )
    #define INCLUDE_xSemaphoreGetMutexHolder INCLUDE_xQueueGetMutexHolder
    #define INCLUDE_xQueueGetMutexHolder ( 1 )
    #define INCLUDE_xTaskGetHandle ( 0 )
    #define INCLUDE_xTaskAbortDelay ( 1 )
    #define INCLUDE_pxTaskGetStackStart ( 0 )
    #define INCLUDE_xTaskResumeFromISR ( 1 )

    #define configASSERT(x) assert_check(FILENAME,LINE,FUNCTION,(x))

    #define vPortSVCHandler SVC_Handler
    #define xPortPendSVHandler PendSV_Handler
    #define xPortSysTickHandler FreeRTOS_SysTick_Handler //BURTC

    #define fabs __builtin_fabs

    // simplified function for discussion, not the real deal,…
    void scheduler_init()
    {
    // initially set all interrupt handlers to lowest prio
    for(int a=0, a<IRQn_device_nbrOf+1; a++)
    {
    NVIC_SetPriority(static_cast<IRQn_Type>(j-1), 7);
    }

    // then set selected irq handlers higher (lower value)
    NVIC_SetPriority(USART0_RX_IRQn,  5);
    NVIC_SetPriority(USART1_RX_IRQn,  5);
    NVIC_SetPriority(USART2_RX_IRQn,  5);
    NVIC_SetPriority(UART0_RX_IRQn,   5);
    NVIC_SetPriority(UART1_RX_IRQn,   5);
    NVIC_SetPriority(LEUART0_IRQn,    5);
    NVIC_SetPriority(LEUART1_IRQn,    5);
    NVIC_SetPriority(USB_IRQn,        5);
    NVIC_SetPriority(GPIO_ODD_IRQn,   5);
    NVIC_SetPriority(GPIO_EVEN_IRQn,  5);
    NVIC_SetPriority(RTC_IRQn,        5);
    NVIC_SetPriority(BURTC_IRQn,      5);
    NVIC_SetPriority(TIMER0_IRQn,     6);
    NVIC_SetPriority(TIMER1_IRQn,     6);
    NVIC_SetPriority(TIMER2_IRQn,     6);
    NVIC_SetPriority(TIMER3_IRQn,     6);
    NVIC_SetPriority(LETIMER0_IRQn,   6);
    NVIC_SetPriority(USART0_TX_IRQn,  6);
    NVIC_SetPriority(USART1_TX_IRQn,  6);
    NVIC_SetPriority(USART2_TX_IRQn,  6);
    NVIC_SetPriority(UART0_TX_IRQn,   6);
    NVIC_SetPriority(UART1_TX_IRQn,   6);
    

    }

How are you determining the other tasks are all ready to run and not deadlocked or all blocked on something? If you step through the tick handler what does it do - for examaple, is the priority of the task it thinks is running the idle priority, or something higher?

Does this happen when configUSE_TICKLESS_IDLE is set to 0?
Are you using the ‘default’ tickless idle that uses the SysTick clock, or one that is specific to your hardware (using a different clock) - maybe that is what BURTC is?

Ref interrupt priorities - the latest versions of FreeRTOS have more configASSERT() lines that check the interrupt configuration matches the hardware, as far as possible. Which version of FreeRTOS are you using?

hi richard

i use atollic8.1 with the latest version of freertos.
i see that a lot (like 4 of 12) tasks are set ready and wait for the scheduler to give them cpu time, but this does not happen.

so basically what i see is tick, back to idle, tick, back to idle… and this is never ending until a rtos-software timer expires. this is the only situation the scheduling continues.

if i step though the tick, i see that the next event seems not to be ready yet (2seconds in the future), but even if i let it execute for a while (>5secs) the scheduler wont run the tasks.

i did not deactivate tickless_idle but i notice that when i go ahead and say idle_should_yield 1, the behavior seems to be gone (at least what i saw with 2 hours of testing / letting it execute).

we do not use systick, we use BURTC for both sleep and tick. this by overloading the freertos functions where systick normally would get initialized and handled.

no config assert is executing, so as far as freertos is concerned, the configuration seems valid.

please advice.

thanks

Please let me know the version number.

That is presumably because, until that point, the timer task is blocked, but the scheduler sees that when it is unblocked it is the highest priority task, so switches to it.

Task switches are performed in the PendSV handler, which is pended from various places in the code, including the tick interrupt if a switch to a higher priority task is needed. I suspect the scheduler has unblocked the tasks as needed, but the PendSV handler has not executed for some reason. Setting configIDLE_SHOULD_YIELD to 1 will force a PendSV each time the Idle task runs if there are other tasks of equal or higher priority in the Ready state - so that part makes sense.

I assume you have configUSE_TIMESLICING undefined as it is not in your original post, which would mean it will default to 1, which should then mean the tick interrupt would cause a context switch if any of the other ready state tasks had the same priority as the idle task - so I’m assuming the other tasks have a higher priority than the idle task.

If my theory is right we need to narrow down why the PendSV is not executing. Can you please try running with configUSE_TICKLESS_IDLE set to 0 to see if that makes a difference - if it does it gives us some clues as to where to look.

ok ill do first thing tomorrow and let you know.
freertos 10.2.1.

configUSETIMESLICING: ok very interessting, first time i heard about this parameter, never noticed it before. will look into this as well, thanks.

if i readout the cortex M irq priorities, i notice that pendSV() and SysTick() irq handlers are set to the LOWEST (7) priority, unlicke all other default cortex M handlers which are set to the HIGHEST (0). is this ok like this?

also, i notice by default freertos is working with the cortexM systick, and sets it to the lowest (7) priority. here, i use BURTC for both sleep and tick and i manually set its priority HIGHER (5) than lowest possible (7). do i need to correct this?

[583456] -14 : 0
[599194] -13 : 0
[599286] -12 : 0
[615264] -11 : 0
[631063] -10 : 0
[678996]  -5 : 0 // SVCall
[679057]  -4 : 0 // debugmon
[695061]  -2 : 7 // pendSV
[710929]  -1 : 7 // SysTick
// below efm32gg priorities
[711019]   0 : 7
[726971]   1 : 5
[742888]   2 : 6
[742972]   3 : 5
[758965]   4 : 6
[759197]   5 : 5
[774858]   6 : 7
[790809]   7 : 7
[790881]   8 : 7
[806843]   9 : 7
[806991]  10 : 7
[822960]  11 : 5
[838930]  12 : 6
[839058]  13 : 6
[854735]  14 : 6
[934670]  15 : 5
[934809]  16 : 6
[950640]  17 : 7
[966692]  18 : 5
[966761]  19 : 6
[998670]  20 : 5
[998833]  21 : 6
[014821]  22 : 5
[014999]  23 : 6
[030684]  24 : 5
[030965]  25 : 5
[046539]  26 : 6
[062506]  27 : 7
[062558]  28 : 7
[078448]  29 : 7
[078509]  30 : 5
[094513]  31 : 5 // BURTC_IRQn - tick and sleep timer
[110449]  32 : 7
[110510]  33 : 7
[126607]  34 : 7
[126715]  35 : 7
[142512]  36 : 7
[142588]  37 : 7
[158550]  38 : 7

PendSV and SysTick should be set to the lowest priority - so that is correct.

Any interrupt that uses the FreeRTOS API must also be set at or below configMAX_SYSCALL_INTERRUPT_PRIORITY - but that is not as simple as it sounds on Arm architectures, as described here: https://www.freertos.org/RTOS-Cortex-M3-M4.html

You should also read this page if you are not familiar with the above: https://www.freertos.org/FAQHelp.html

please have a look at the code, i added some comments.
i think the issue at hand is that freertos sees tasks with “prio > idle” and therefore is the “xExpectedIdleTime” correctly 0. so the tickless_idle sleep procedure is skipped, also correctly.

a task yield is not being invoked as "listCURRENT_LIST_LENGTH( &( pxReadyTasksLists[ tskIDLE_PRIORITY ] ) ) > ( UBaseType_t ) 1 ) // 1 > 1" is not true.

so this results basically in:

	static portTASK_FUNCTION( prvIdleTask, pvParameters )
	{
            return;
        }

therefore, as far as the system is concerned, it does everything correctly.
please have a look at the CODE COMMENTS i added:

#if ( configUSE_TICKLESS_IDLE != 0 ) // ------------------------------------ THIS IS ACTIVE

	static TickType_t prvGetExpectedIdleTime( void )
	{
	TickType_t xReturn;
	UBaseType_t uxHigherPriorityReadyTasks = pdFALSE;

		/* uxHigherPriorityReadyTasks takes care of the case where
		configUSE_PREEMPTION is 0, so there may be tasks above the idle priority
		task that are in the Ready state, even though the idle task is
		running. */
		#if( configUSE_PORT_OPTIMISED_TASK_SELECTION == 0 )
		{
			if( uxTopReadyPriority > tskIDLE_PRIORITY )
			{
				uxHigherPriorityReadyTasks = pdTRUE;
			}
		}
		#else // ------------------------------------ THIS IS ACTIVE
		{
			const UBaseType_t uxLeastSignificantBit = ( UBaseType_t ) 0x01;

			/* When port optimised task selection is used the uxTopReadyPriority
			variable is used as a bit map.  If bits other than the least
			significant bit are set then there are tasks that have a priority
			above the idle priority that are in the Ready state.  This takes
			care of the case where the co-operative scheduler is in use. */
			if( uxTopReadyPriority > uxLeastSignificantBit ) // 41 > 1
			{
				uxHigherPriorityReadyTasks = pdTRUE; // ------------------------------------ THIS IS EXECUTED
			}
		}
		#endif

		if( pxCurrentTCB->uxPriority > tskIDLE_PRIORITY ) // 0 > 0
		{
			xReturn = 0;
		}
		else if( listCURRENT_LIST_LENGTH( &( pxReadyTasksLists[ tskIDLE_PRIORITY ] ) ) > 1 ) // 1 > 1
		{
			/* There are other idle priority tasks in the ready state.  If
			time slicing is used then the very next tick interrupt must be
			processed. */
			xReturn = 0;
		}
		else if( uxHigherPriorityReadyTasks != pdFALSE ) // 1 != 0
		{
			/* There are tasks in the Ready state that have a priority above the
			idle priority.  This path can only be reached if
			configUSE_PREEMPTION is 0. */
			xReturn = 0; // ------------------------------------ THIS IS EXECUTED
		}
		else
		{
			xReturn = xNextTaskUnblockTime - xTickCount;
		}

		return xReturn; // ------------------------------------ RETURN 0
	}

#endif /* configUSE_TICKLESS_IDLE */


static portTASK_FUNCTION( prvIdleTask, pvParameters )
{
	/* Stop warnings. */
	( void ) pvParameters;

	/** THIS IS THE RTOS IDLE TASK - WHICH IS CREATED AUTOMATICALLY WHEN THE
	SCHEDULER IS STARTED. **/

	/* In case a task that has a secure context deletes itself, in which case
	the idle task is responsible for deleting the task's secure context, if
	any. */
	portALLOCATE_SECURE_CONTEXT( configMINIMAL_SECURE_STACK_SIZE );

	for( ;; )
	{
		/* See if any tasks have deleted themselves - if so then the idle task
		is responsible for freeing the deleted task's TCB and stack. */
		prvCheckTasksWaitingTermination();

		#if ( configUSE_PREEMPTION == 0 ) // ------------------------------------ THIS IS !!! NOT !!! ACTIVE
		{
			/* If we are not using preemption we keep forcing a task switch to
			see if any other task has become available.  If we are using
			preemption we don't need to do this as any task becoming available
			will automatically get the processor anyway. */
			taskYIELD();
		}
		#endif /* configUSE_PREEMPTION */

		#if ( ( configUSE_PREEMPTION == 1 ) && ( configIDLE_SHOULD_YIELD == 1 ) ) // ------------------------------------ THIS IS ACTIVE
		{
			/* When using preemption tasks of equal priority will be
			timesliced.  If a task that is sharing the idle priority is ready
			to run then the idle task should yield before the end of the
			timeslice.

			A critical region is not required here as we are just reading from
			the list, and an occasional incorrect value will not matter.  If
			the ready list at the idle priority contains more than one task
			then a task other than the idle task is ready to execute. */
			if( listCURRENT_LIST_LENGTH( &( pxReadyTasksLists[ tskIDLE_PRIORITY ] ) ) > ( UBaseType_t ) 1 ) // 1 > 1
			{
				taskYIELD();
			}
			else
			{
				mtCOVERAGE_TEST_MARKER();
			}
		}
		#endif /* ( ( configUSE_PREEMPTION == 1 ) && ( configIDLE_SHOULD_YIELD == 1 ) ) */

		#if ( configUSE_IDLE_HOOK == 1 ) // ------------------------------------ THIS IS !!! NOT !!! ACTIVE
		{
			extern void vApplicationIdleHook( void );

			/* Call the user defined function from within the idle task.  This
			allows the application designer to add background functionality
			without the overhead of a separate task.
			NOTE: vApplicationIdleHook() MUST NOT, UNDER ANY CIRCUMSTANCES,
			CALL A FUNCTION THAT MIGHT BLOCK. */
			vApplicationIdleHook();
		}
		#endif /* configUSE_IDLE_HOOK */

		/* This conditional compilation should use inequality to 0, not equality
		to 1.  This is to ensure portSUPPRESS_TICKS_AND_SLEEP() is called when
		user defined low power mode	implementations require
		configUSE_TICKLESS_IDLE to be set to a value other than 1. */
		#if ( configUSE_TICKLESS_IDLE != 0 ) // ------------------------------------ THIS IS ACTIVE
		{
		TickType_t xExpectedIdleTime;

			/* It is not desirable to suspend then resume the scheduler on
			each iteration of the idle task.  Therefore, a preliminary
			test of the expected idle time is performed without the
			scheduler suspended.  The result here is not necessarily
			valid. */
			xExpectedIdleTime = prvGetExpectedIdleTime();  // ------------------------------------ retun 0

			if( xExpectedIdleTime >= configEXPECTED_IDLE_TIME_BEFORE_SLEEP ) // 0 > 2
			{
				vTaskSuspendAll();
				{
					/* Now the scheduler is suspended, the expected idle
					time can be sampled again, and this time its value can
					be used. */
					configASSERT( xNextTaskUnblockTime >= xTickCount );
					xExpectedIdleTime = prvGetExpectedIdleTime();

					/* Define the following macro to set xExpectedIdleTime to 0
					if the application does not want
					portSUPPRESS_TICKS_AND_SLEEP() to be called. */
					configPRE_SUPPRESS_TICKS_AND_SLEEP_PROCESSING( xExpectedIdleTime );

					if( xExpectedIdleTime >= configEXPECTED_IDLE_TIME_BEFORE_SLEEP )
					{
						traceLOW_POWER_IDLE_BEGIN();
						portSUPPRESS_TICKS_AND_SLEEP( xExpectedIdleTime );
						traceLOW_POWER_IDLE_END();
					}
					else
					{
						mtCOVERAGE_TEST_MARKER();
					}
				}
				( void ) xTaskResumeAll();
			}
			else
			{
				mtCOVERAGE_TEST_MARKER();
			}
		}
		#endif /* configUSE_TICKLESS_IDLE */
	}
}

Is your last post showing why a yield is not performed AFTER the issue you describe has already occurred? If so then I think that is understood - the question was why did a yield not occur to the higher priority task when it was actually supposed to - preventing this situation in the first place.

yes correct. this scenario described above is AFTER the issue occures.

situation:

  • we have waiting-for-context-switch-tasks in ready-state
  • rtos determines 0 waiting time until next rtos-element-occurence, therefore no sleep is happening, BUT THE CONTEXT SWITCH IS NOT HAPPENING, NOT FROM IDLE NOR portSYSTICK.

my question:

why not?

i take it, that line if( listCURRENT_LIST_LENGTH( &( pxReadyTasksLists[ tskIDLE_PRIORITY ] ) ) > ( UBaseType_t ) 1 is somehow not corretly functioning at this moment as its always 1 > 1 --> false

Yes - because something went wrong BEFORE this - did you try this without tickless idle as requested?

Yes, tested with tickless idle.

as this mis-behavior is fairly difficult to reproduce, i cannot say for certain but without tickless_idle it seems OK.

can you say if this behavior is possibly based on an incorrect vStepTick call?
i can say that we dont jump to far as the configAssert would protect that, but what if i step too less? would this produce such a case? but i dont think so because if this would be the case, it would just go back to sleep and there would not be rtos elements in READY state…

Most likely there is some kind of race condition between leaving sleep mode and re-enabling either the tick or interrupts that makes the kernel think it has performed a switch to the latest task, whereas in actual fact although the logic executed the interrupt pended by that logic (that actually performs a switch) either didn’t execute or was otherwise prevented from completing.

Is your tickless implementation overriding the default implementation in tasks.c? So called from the same place in the code as it would if the default version were used?

Which version of FreeRTOS are you using (sorry if that information is already in the thread - can’t see the whole thread just now)?

Ok. I look into it. Thank you.
What i notice though is that it happens during awake time where the chip is not handling tickless idle. I can show you this with a percepio trace but unfortunatelly i cannot make this public. Can i send you an image via mail?

My tickless idle is overwriting the default tickless idle by overwriting the weak default function.

Freertos 10.2.1 which should be latest with MIT as far as i know.

If you use the Business Related Inquiries link on this page the email will come to me.

hello richard.

unfortunatelly the email link for Business Related Inquiries is down “javascript:void(0)”
since friday at least.

The email is obfuscated by a simple javascript to stop spam bots, but some browser settings object. You can contact using r dot barry AT freertos dot org.

hello barry

im still unable to get to the root cause of this issue. as i see it, the idle task is behaving as expected, but the systick wont set the pendSV exception for some reason.

do you have experience on what i could to do trace/log/analyze the difference in behavior (compare) the default use case and my issue, maybe if i can compare them id see the problem.

What is the processor’s BASEPRI register set to? Anything other than 0 and the PendSV interrupt will be masked. Does anything in your code modify the BASEPRI by any means other than the taskENTER_CRITICAL() and taskEXIT_CRITICAL() functions provided by FreeRTOS?

ok the BASEPRI mask is 0 in both the systick handler and during the idle task call.

we do have some atomic sections (__disable_irq() / __enable_irq()) sections in place to mask everything. they are nesting save implemented and they respect the PRIMASK value. and we trace their execution time. the longest period the system is completely masked is 10ms.