FreeRTOSv202012.00 LTS on TI DSP c66x problem

Hugo · June 9, 2022, 1:04pm

Hi,

I ported FreeRTOSv202012.00 LTS to the TI c66x DSP of a KeystoneII SOC. Everything works fine but randomly interrupts are disabled, I don’t know why. I have a simple task looping on a printf() on the console and a vTaskDelay(), randomly the vTaskDelay() never returns because interrupts are disabled and so the tick is not incremented. All the stacks are OK and kernel is compiled with asserts.

Thanks for any help.

rtel · June 9, 2022, 4:23pm

How is your printf() implemented? Specifically, does it have any critical sections to arbitrate between different tasks calling it? If it does, then your issue could be as simple as a bug in that code. More likely though it will be in the port you created - either something that needs to be done with interrupts disabled is leaving interrupts enabled, so you just get a corruption, or there is a path through the kernel that leaves interrupts disabled. For example, consider paths such as a task yielding from a task and returning to the same task, and running other tasks before returning to the same task. Also a task being switched out from an ISR and then being switched back in from a task…and other combinations, etc.

Hugo · June 10, 2022, 5:34am

Hi and thanks for the quick answer.
There is only one task so the printf is not implemented in a critical section. There is a loop in it to wait for the UART to be ready to receive one more character but there is a timeout in it and it never happen.
I think there is something around the kernel leaving interrupts disabled in a particular case. The macros portDISABLE_INTERRUPTS() and portENABLE_INTERRUPTS() are defined as a call to two functions one for disable/enable interrupts and the other one to enable/disable external exceptions. I use the vTaskEnterCritical() and the vTaskExitCritical() pair for critical sections and the flag portCRITICAL_NESTING_IN_TCB is set to 1.
Interrupts are enabled at scheduler startup and should remain enable during all the system uptime at task level, the only code that is manipulating interrupts is is the kernel so I think the problem is something in the port that is not correctly defined around critical sections or atomicity.
The other thing I can say is that the interrupt disabled problem is very rare and it occurs more often if I increase timer rate.

kanherea · June 10, 2022, 7:46pm

I believe that it is highly unlikely (but possible) that the bug is in the Kernel.

Can you remove all tasks and try something very simple as incrementing a counter with a vTaskDelay? Something like:

void vExampleTask(void * pvParams)
{
    volatile uint32_t ulCounter = 0;
    while( 1 )
    {
        vTaskDelay( pdMS_TO_TICKS( 1 ) );
       ulCounter++;
    }
}

If there is no error, you can incrementally add the tasks back and see at which point does the bug manifest itself.

Hugo · June 14, 2022, 6:35am

Hi,

I made a simple test in task.c, I added an assert at the beginning of vTaskEnterCritical():

    void vTaskEnterCritical( void )
    {
        portASSERT_IF_INTERRUPTS_DISABLED(xSchedulerRunning);
        portDISABLE_INTERRUPTS();

        if( xSchedulerRunning != pdFALSE )
        {
            ( pxCurrentTCB->uxCriticalNesting )++;

            /* This is not the interrupt safe version of the enter critical
             * function so  assert() if it is being called from an interrupt
             * context.  Only API functions that end in "FromISR" can be used in an
             * interrupt.  Only assert if the critical nesting count is 1 to
             * protect against recursive calls if the assert function also uses a
             * critical section. */
            if( pxCurrentTCB->uxCriticalNesting == 1 )
            {
                portASSERT_IF_IN_ISR();
            }
        }
        else
        {
            mtCOVERAGE_TEST_MARKER();
        }
    }

And then when I run my program I have some asserts because interrupts are disabled when vTaskEnterCritical() is called sometimes.

May be I made a mistake in the implementation of vTaskEnterCritical()/vTaskExitCritical() ?

aggarg · June 14, 2022, 9:12am

This is not necessarily a problem as you can nest critical sections and only the last one should enable interrupts. What is the call stack at that time? Is it something that you expect in your application? You need to find out what code disabled the interrupts.

Hugo · June 14, 2022, 9:30am

The assert added at the beginning of vTaskEnterCritical() shows that interrupts are disabled sometimes when vTaskEnterCritical() is called. When this happens, it seem that pxCurrentTCB->uxCriticalNesting of the current task is 0. So it seem that the call to vTaskEnterCritical() is disabling interrupts and then they are never enabled by vTaskExitCritical(). I wonder if pxCurrentTCB->uxCriticalNesting should be a part of the task context or not. It seem to not be the case in the ports I looked in to build my one. I mean pxCurrentTCB->uxCriticalNesting should be saved and restored on each task switch or not ?

aggarg · June 14, 2022, 9:47am

pxCurrentTCB->uxCriticalNesting is already a part of TCB, so you do not need to save and restore it.

What instruction do you use for yield? Is it synchronous? i.e. if you call yield from a critical section, will the context switch happen immediately or will it happen after the critical section is exited? If context switch will happen immediately (i.e. context switch can happen when interrupts are disabled), you need to enable/disable interrupts based on the value of pxCurrentTCB->uxCriticalNesting while restoring context.

Hugo · June 14, 2022, 9:58am

The context switch in case of a yield within API is immediate so how should I take care of pxCurrentTCB->uxCriticalNesting while restoring context ?

aggarg · June 14, 2022, 10:28am

You need to write roughly the following in your context restore code:

if( uxCriticalNesting == 0 )
{
/* Enable interrupts. */
}
else
{
/* Disable interrupts. */
}

You would probably be better off storing the nesting count as part of task context and not in TCB. This port is an example - FreeRTOS-Kernel/portASM.S at main · FreeRTOS/FreeRTOS-Kernel · GitHub

Hugo · June 14, 2022, 10:41am

You mean :

after a call to vTaskSwitchContext()

if(pxCurrentTCB->uxCriticalNesting == 0)
{
    /* Enable interrupts. */
}
else
{
    /* Disable interrupts. */
}

aggarg · June 14, 2022, 11:14am

Yes, that is what I meant. Additionally, you can set portCRITICAL_NESTING_IN_TCB to 0 and store the nesting count in the task context as shown in the port I shared above.

Hugo · June 14, 2022, 11:29am

I tried what you suggested and the result is the same, interrupts are disabled sometimes.
I’d like to let it work if it is possible with portCRITICAL_NESTING_IN_TCB set to 1.

aggarg · June 14, 2022, 12:12pm

Well, it is hard to make a guess now. You need to debug the code path that is leaving interrupts disabled. One way to debug could be to put breakpoints at all the locations you are disabling interrupts and try to follow.

richard-damon · June 14, 2022, 2:42pm

Remember that with portCRiTiCAL_NESTING_IN_TCB set to 1, the port code to switch to a new task needs to save the old value and read the new value and enable/disable the interrupts based on the flag in the TCB. Also Tasks need to remember that their critical section will have “holes” in them when they do something that might block, and interrupts might occur during that time. If your trying to hold the critical section over tasks switches, that won’t work and cause this sort of problems.

rtel · June 14, 2022, 3:26pm

This is very technical - but creating a port is by nature:

Yielding from a task occurs when the task blocks or the application calls vTaskYield(). There are two methods of yielding which we loosely call synchronous and asynchronous.

Asynchronous
Arm Cortex-M is an asynchronous example. On that architecture the context switch occurs inside the PendSV interrupt handler. To request a context switch the kernel sets the PendSV interrupt into the pending state. The kernel’s implementation will often do that from inside a critical section, meaning the context switch will not occur until it exits the critical section. Therefore architectures that use the asynchronous method do not need to store the critical section nesting depth as part of their context because they can only switch to another task when they are not in a critical section (the nesting count is 0). Likewise, on those architectures, tasks are always restored with interrupts enabled because they can’t be saved in any other state.

Synchronous
ARM7 is an synchronous example. On that architecture the context switch occurs inside a system call (SVC/SWI) exception handler. Those exceptions occur even when interrupts are disabled, so a task yield can occur inside a critical section, which will happen inside the kernel code itself (not recommended for applications to do this - the kernel is designed for it to be ok). That means the critical nesting count must be saved as part of the task’s context when it is swapped out, then restored to its original value when the task is swapped in again. When a task is swapped in, if the restored critical nesting count is 0, the task should be restarted with interrupts enabled, if the restored critical nesting count is >0, then the task should be restarted with interrupts disabled. On some architectures that happens automatically because the status register that holds the interrupt enable status is also restored along with the task’s other registers. If the task yielded from a critical section it will restart with interrupt disabled, continue executing from where it left off, so exit the critical section, re-enabling interrupts as it does. On architectures that use the synchronous method the critical nesting count can be saved on the task’s stack, as is done for ARM7, or (and less efficiently) in the task’s TCB, as is being done for the port discussed in this thread. Yielding with interrupts disabled confuses many people - but because the interrupt state is part of the task’s context - interrupts are only disabled for that task and not necessarily for the task it switches to.

aggarg · June 15, 2022, 4:36am

Added 2 diagrams to the above post to help understanding.

Hugo · June 15, 2022, 5:45am

Thanks for the information, what should I do now ? As I understand, the synchronous version should work for me, so what am I missing ?

kanherea · June 17, 2022, 7:17pm

@Hugo, I concur with what Gaurav said above:

I think this is what you should do.

Hugo · June 24, 2022, 9:12am

I tried to reenable interrupts in the vPortYield() function in case of critical nesting equal to zero and it seem to work. I saw this in other ports. Thank you for the help and for the answers. If there are some additional questions, to not hesitate to ask them.