The firmware execution falls into Idle Task

We use FreeRTOS 10.4.3. I found a problem in our firmware. After running a few hours, the firmware can be hung. As the watch dog timer isn’t kicked, after about 30 seconds, the firmware will be reset. I tried to run the firmware from IAR. When it is hung, I pause the exection. I found it falls to prvIdleTask in “\amazon-freertos\freertos_kernel\task.c”. I’m wondering what’s the reason that the firmware runs into the Idle task. Is there a way to avoid it?
Thanks!

The idle task runs if no other task is ready to run. You need to find out why none of your tasks are ready to run for so long.

You might need to check if any of your tasks are getting stuck in a dead-lock waiting for another task that is stuck waiting on a task waiting for something.

Another thing to watch out for is that invalid ISR priorities can cause tasks to fall off the ready lists due to kernel data corruption.

Hi Richard,
Thank you so much for the info. Are there any recommended ways to check if any of tasks are getting stuck? I found configCHECK_FOR_STACK_OVERFLOW in FreeRTOSConfig.h, which is currently 0.
Please let me know if you know how to check if the tasks are stuck.

Please enable stack overflow checking and define configASSERT.

The best way, if it is available, is to use a FreeRTOS aware debugger and see what each task is waiting on to figure out why the tasks are all blocked.

Otherwise, you might need to trace through memory yourself to see what each task is waiting for, or implement something using trace macros to determine what is happening.

Thanks aggarg.
I enabled configCHECK_FOR_STACK_OVERFLOW and defined configASSERT. I’ll run the build again to see what’ll happen when the tasks get stuck.

Thanks Richard.
I’ll try Tracealyzer. Meanwhile, can you provide more info about how to “trace through memory to see what each task is waiting for, or implement something using trace macros to determine what is happening”?
Thanks!

I can’t speak for the ‘trace through memory’ portion, but the second half refers to the trace hook macros. These macros are littered throughout the code at important places and allow you to define functions for these macros which can gather data on how your application is functioning.

Trace through memory … = pause the system with your debugger and manually look at each TCB, and look at its status and if blocked, look at what it is blocked on. Much easier if you have a FreeRTOS aware debugger that does it automatically, but with a debugger and a map file, you can do it manually. Much easier if you allocate your queues and semaphores statically so you can just look up their address in the map, but if done dynamically, you just first go to each spot you store the handles and make a list of them

Thanks kstribrn and richard. I’m trying to get Tracealyzer working with my IAR. I’ll let you know if I have more questions.

I checked the potential stack overflow and assertion. When the firmware got stuck, there is no info from the UART showing any assert failuer or stack overflow.
I try to get Tracealyzer working with IAR. It’s not working yet: Tracealyzer with IAR
Thanks!

I increased the stack size for a few tasks. I didn’t see the hard fault ever since. But I saw the system hung. I ran the firmware from IAR. When the firmware is stuck, I hit the Pause button in IAR. It falls into the Idle task. Here is the call stack:

rm_freertos_port_sleep_preserving_lpm
vApplicationIdleHook
prvIdleTask
prvTaskExitError

I suspect there is a dead lock besides the stack size issue. I have a recursive mutex. Previously, I called “xSemaphoreTakeRecursive(myMutex, 0xFFFFFFFF)”. Now I change it to something like: xSemaphoreTakeRecursive(myMutex, pdMS_TO_TICKS(20000))
I’ll try this to see if it’ll help.
Let me know if you have any suggestions.
Thanks!

Is this function related to low power mode? Are you using tickless idle? Jus to narrow down, can you check if the problem happens when you disable tickless idle?

Just make sure to check the return value to see if the operation succeeded or timed out.

Hi aagarg,
Thank you so much for you info!
We don’t enable tickless idle. The configUSE_TICKLESS_IDLE is defined 0 in our project.
Sure, I’ll check the return value to see if the operation succeeded or timed out.

Thank you. What does the function rm_freertos_port_sleep_preserving_lpm do then?

Thanks aggarg!
Once I captured the call stack when the firmware got stuck. It’s something like this:

StopDebugger
TaskExecution
TimerProcessInterrupt
r_gpt_call_callback
< Exception frame >
rm_freertos_port_sleep_preserving_lpm
vApplicationIdleHook
[Region$$Table$$Base + 0xe]

In the code:
“Renesas\aws\amazon-freertos\freertos_kernel\tasks.c”, the function “static portTASK_FUNCTION( prvIdleTask, pvParameters )” calls the function “vApplicationIdleHook();” which is in “Renesas\fsp_3.5.0\ra\fsp\src\rm_freertos_port\port.c”.
In the “vApplicationIdleHook()” function, it calls “rm_freertos_port_sleep_preserving_lpm(1);”

I’m not sure how the firmware runs into this function. I increased the stack size for a few tasks. So far, I haven’t seen any hard fault. Can we firmly say that the hard fault issue is fixed by increasing the stack size for those tasks?
Meanwhile, I’m not sure if the firmware stuck issue is a kind of hard fault. This is the issue I’m trying to figure out. The problem is, I don’t know what happened when the firmware got stuck. The behaviour is, the firmware got stuck for about 30 to 40 seconds. Then the firmware restarts itself. I’m trying to see if it’s related to the semaphore or mutex I used. Other than the semaphore/mutext, is there other possible cause of the stucking issue?
I also look for some way to find out what exactly happened when the firmware got stuck. I haven’t get Tracealyzer working with IAR yet. Please let me know if you have some other ways to help me find out the cause of the issue.
Thanks!

Thank you for sharing the call stack. You can examine the definitions of the functions and see if something is unexpected there.

If you randomly increase stack sizes, you might just be masking the real problem. We cannot say anything for sure.

It is impossible to tell without looking at the code.

Since the FreeRTOS port seems to be from Renesas, reaching out to them may be a good idea.

The key would seem to be looking at what rm_freertos_port_sleep_preserving_lpm does, and what it expects of the rest of the program.

I increased the configTOTAL_HEAP_SIZE. I’ll give it a try to see if it’ll help.
The tricky part is that I’m trying to find a way to find out which task is blocked or waiting for something. As the stucking happens randomly(it may happen 1 hour or 2 hours since the firmware started), it’s hard to find out where the problem is when it happens.

Hi Richard,
Thank you for your reply. That rm_freertos_port_sleep_preserving_lpm() enters sleep mode. That’s the port code from Renesas. I think the real question is why it enters to the sleep mode. I’ll check with Renesas. Here are the code:

/***********************************************************************************************************************
 * Suspends tasks and sleeps, waking for each interrupt.
 *
 * @note This is a weak function. It can be overridden by an application specific implementation if desired.
 **********************************************************************************************************************/
__attribute__((weak)) void vApplicationIdleHook (void)
{
    /* Enter a critical section but don't use the taskENTER_CRITICAL() method as that will mask interrupts that should
     * exit sleep mode. This must be done before suspending tasks because a pending interrupt will prevent sleep from
     * WFI, but a task ready to run will not. If a task becomes ready to run before disabling interrupts, a context
     * switch will happen. */
    __disable_irq();

    /* Don't allow a context switch during sleep processing to ensure the LPM state is restored
     * before switching from idle to the next ready task. This is done in the idle task
     * before vPortSuppressTicksAndSleep when configUSE_TICKLESS_IDLE is used. */
    vTaskSuspendAll();

    /* Save current LPM state, then sleep. */
    rm_freertos_port_sleep_preserving_lpm(1);

    /* Exit with interrupts enabled. */
    __enable_irq();

    /* Allow context switches again. No need to yield here since the idle task yields when it loops around. */
    (void) xTaskResumeAll();
}

/***********************************************************************************************************************
 * Saves the LPM state, then enters sleep mode. After waking, reenables interrupts briefly to allow any pending
 * interrupts to run.
 *
 * @pre Disable interrupts an suspend all tasks before calling this function.
 *
 * @param[in] xExpectedIdleTime Expected idle time in ticks
 **********************************************************************************************************************/
void rm_freertos_port_sleep_preserving_lpm (uint32_t xExpectedIdleTime)
{
    uint32_t saved_sbycr = 0U;

    /* Sleep until something happens.  configPRE_SLEEP_PROCESSING() can
     * set its parameter to 0 to indicate that its implementation contains
     * its own wait for interrupt or wait for event instruction, and so wfi
     * should not be executed again. The original expected idle
     * time variable must remain unmodified, so this is done in a subroutine. */
    configPRE_SLEEP_PROCESSING(xExpectedIdleTime);
    if (xExpectedIdleTime > 0)
    {
        /* Save LPM Mode */
        saved_sbycr = R_SYSTEM->SBYCR;

        /** Check if the LPM peripheral is set to go to Software Standby mode with WFI instruction.
         *  If so, change the LPM peripheral to go to Sleep mode. */
        if (R_SYSTEM_SBYCR_SSBY_Msk & saved_sbycr)
        {
            /* Save register protect value */
            uint32_t saved_prcr = R_SYSTEM->PRCR;

            /* Unlock LPM peripheral registers */
            R_SYSTEM->PRCR = RM_FREERTOS_PORT_UNLOCK_LPM_REGISTER_ACCESS;

            /* Clear to set to sleep low power mode (not standby or deep standby) */
            R_SYSTEM->SBYCR = 0U;

            /* Restore register lock */
            R_SYSTEM->PRCR = (uint16_t) (RM_FREERTOS_PORT_LOCK_LPM_REGISTER_ACCESS | saved_prcr);
        }

        /**
         * DSB should be last instruction executed before WFI
         * infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dai0321a/BIHICBGB.html
         */
        __DSB();

        /* If there is a pending interrupt (wake up condition for WFI is true), the MCU does not enter low power mode:
         * http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0552a/BABHHGEB.html
         * Note that interrupt will bring the CPU out of the low power mode.  After exiting from low power mode,
         * interrupt will be re-enabled. */
        __WFI();

        /* Instruction Synchronization Barrier. */
        __ISB();

        /* Re-enable interrupts to allow the interrupt that brought the MCU
         * out of sleep mode to execute immediately. This will not cause a
         * context switch because all tasks are currently suspended. */
        __enable_irq();
        __ISB();

        /* Disable interrupts again to restore the LPM state. */
        __disable_irq();
    }

    configPOST_SLEEP_PROCESSING(xExpectedIdleTime);

    /** Check if the LPM peripheral was supposed to go to Software Standby mode with WFI instruction.
     *  If yes, restore the LPM peripheral setting. */
    if (R_SYSTEM_SBYCR_SSBY_Msk & saved_sbycr)
    {
        /* Save register protect value */
        uint32_t saved_prcr = R_SYSTEM->PRCR;

        /* Unlock LPM peripheral registers */
        R_SYSTEM->PRCR = RM_FREERTOS_PORT_UNLOCK_LPM_REGISTER_ACCESS;

        /* Restore LPM Mode */
        R_SYSTEM->SBYCR = (uint16_t) saved_sbycr;

        /* Restore register lock */
        R_SYSTEM->PRCR = (uint16_t) (RM_FREERTOS_PORT_LOCK_LPM_REGISTER_ACCESS | saved_prcr);
    }
}