xStreamBufferReceive(…, portMAX_DELAY) Task Moved to Suspended and Never Wakes on Data

Preliminary information:
The ISR of context switcher doesn’t trigger because the execution is already in an ISR - in SysTick ISR handler.
Is it normal to process RTOS API functions being inside of a SysTick ISR?

Can you share the callstack? Are you calling a FreeRTOS API from an ISR? If yes, you need to call the “FromISR” version of the API.

Not yet. Eclipse IDE with GDB don’t reveal actual stack, may be due to context switching. May be I become more experienced FreeRTOS engineer and will be able to unroll the stack. Or, may be I just walk through the code by single instruction steps (a lot of time). I don’t know.

I think no. All user code that exhibits the problem is in main.c, see this post.

The same code as in message body is attached to that post in .zip.
There are three tasks with none interrupts involved.

I believe the problem is not in FreeRTOS itself but in bad porting for certain CPU core and interrupt controller.

Who’s more experienced, tell me please, can (or should) taskYIELD_WITHIN_API() be called from an inside of an interrupt?

Since the stream function you are looking at isn’t a “FromISR” function, it shouldn’t be called from within an ISR.
Do you have a vApplicationTickHook defined? if so, that is part of an ISR and needs to only use FromISR API.

No, configUSE_TICK_HOOK is 0, none hooks defined.
Who can describe in short, what is going on inside of SysTick ISR? What should be done from SysTick ISR entrance till exit?
I need to know some more to debug the cause the StreamBufferReceive is called inside of an interrupt when the caller function itself is not in an interrupt nor called from an interrupt.

The SysTick ISR should increase the current time, and then check to see if any timeouts are expiring, and if so move the tasks from blocked to ready, then if round robin scheduling is enabled, or a higher priority task was woken, trigger a task change.

It should NOT in invoke any StreamBuffer functions.

YOU need to figure out how you are getting into that routine, best answer is to follow the stack trace that you should be able to get.

If I’m reading you correctly, the task switching could occur inside of SysTick interrupt but task execution should go after exiting the SysTick ISR?

Yes, that is correct.

This should probably get you to the root cause. How did you confirm that StreamBufferReceive is getting called inside in ISR?

Since your Task Switching is done via an interrupt, the SysTick will trigger the interrupt, which will actually occur when the SysTick ISR exits. The Scheduler ISR will then run changing what task it will resume.

The code you have been showing should not be executing while in an ISR, if it is, you need to find out what ISR is calling it and change it at your code level.

Also remember that only one task at a time can be sending to a StreamBuffer, I notice in the code above there is no protection for that

The CPU I’m running FreeRTOS has some special registers (some are RISC-V standard, some manufacturer-specific) which clearly indicating that this is an interrupt execution, also indicating a preemption level (if any) and indicating the request number. Together I can say for sure that bad execution of StreamBufferReceive(), more precisely the execution of xTaskGenericNotifyWait() was preformed inside of a SysTick ISR.
Today I’ve conducted some more tests. There were not much results.
All what I saw, were normal, explainable and expected. But I was unable to step through the code instruction-by-instruction due to bad things of CPU and debugger combination:

  • When a debugger breaks the execution, the SysTick continues to run at full speed. At next debug instruction step, an ISR of SysTick is pended. Each time I entering breakpoint the SysTick ISR is pended momentary and makes further debugging almost impossible.
  • When a debugger is stepping by assembly instructions with no debug “Run” command but with debug “Step” command, an interrupt currently pended is not going by its vector. The code continues to step through like an interrupts are disabled. But when I hit “Run”, the ISR is executing.

So with such bad debug behavior I get none valuable results today.

Than you for a reminder. I’ve overlooked Gaurav recommendation above:

But my original code had that protection with no luck yet.

I would suggest editing the code there to check if it is being called inside an ISR, and execute an instruction you can set a breakpoint on, to see what the stack trace looks like there or infinite loop so you can just break there). If the debugger can’t show you the stack frames there, you may need to go manual mode, and look at the stack pointer and the memory it points to.

One real possibility is that there is corruption going on and something uncontrolled is happening. These ARE hard to trace.

To unroll the stack, one need to know the principles of stack growth and the rules of context switching. As I’m understanding, there are at least two types of a stacks:

  • Global stack which is usual stack defined in linker script and used before starting RTOS sheduler and between task switching-out and switching-in. This stack is only one per project and it’s start (top) is fixed at compile time.
  • Task stacks which are defined for each task at the creation of a task. Task stack may be static (with position known at compile time) or dynamically allocated if task was created as non-static.

As I saw in FreeRTOS sources, the scheduler is juggles task stacks voluntarily, interrupting task code flow at any moment except protected sections.
Am I right?
But I don’t know where to look for current global stack state? May be in mscratch register of RISC-V core? Need to look at port code again.

There is no such thing as the “current global stack”, just the stack that is currently being used, and perhaps a second ISR stack that is used when an ISR interrupts the system.

The “Global Stack” that was setup by the linker is just the initial stack that is used to run the startup code of your program. Many ports then reclaim that for the ISR stack.

The current stack will be pointed to by the Stack Pointer register (ISR or Task depending on the processor mode). By saving ALL the processor state on the task stack before switching it away, and then restoring all that state when selecting the new tasks stack, the tasks can resume right where they left off. The local context is unchanged by the interruption, they just need to be designed to handle that the global context may have changed somewhat by other tasks having a chance to run.

Tried to insert a check inside of xTaskGenericNotifyWait() which is watching on interrupt status bits and triggers a trap with a breakpoint if the function was called from inside of any interrupt. Nope. It doesn’t trigger. So my previous observations may be incorrect due to debugging problems I described above:

I tried to set to context switch software interrupt higher priority with preemption enabled to let context switch to trigger anyway even if being inside of SysTick interrupt (I know that this is not a kind of solution but as an experiment). Nope. The problem with notification persists. The task staying in eSuspended being unsubscribed from buffer notifications.
Now I will try to put a check if xTaskGenericNotifyWait() goes running after it’s unblocking point (after a context switch call) to catch the event when task is running while it’s state is eSuspended.

I feel like a blind kitten. Have tried to add into xTaskGenericNotifyWait, just before taskYIELD_WITHIN_API() a trap (line 7781 of tasks.c):

        if( ( xShouldBlock == pdTRUE ) && ( xAlreadyYielded == pdFALSE ) )
        {
        	{ //Debug ToDo: remove this
        		TaskStatus_t CurrentTaskStatus;
        		vTaskGetInfo(pxCurrentTCB, &CurrentTaskStatus, pdFALSE, eInvalid);
        		if (CurrentTaskStatus.eCurrentState == eSuspended) {
        			asm volatile ("NOP");
        		}
        	} //ToDo: Remove above
            taskYIELD_WITHIN_API();
        }

Checked - task status is tested correctly. The task is expected, task TCB is expected. But non trap trigger on eSuspended… Every time taskYIELD_WITHIN_API() is called the status of receiving task is eRunning. But notifications are not passed.

UPD: The code before taskYIELD_WITHIN_API() operates normally. The task become eSuspended AFTER the sheduler execution initiated by taskYIELD_WITHIN_API().
I should place this trap AFTER taskYIELD_WITHIN_API().

Hurray! I’ve got some clue. There is a place in FreeRTOS Kernel V11.1.0 task.c file, inside of blocking function xTaskGenericNotifyWait(), between scheduler request where the task become eSuspended and entering critical section where the task is eRunning after the block is gone, see from line 7778 of task.c FreeRTOS Kernel V11.1.0:

        xAlreadyYielded = xTaskResumeAll();

        /* Force a reschedule if xTaskResumeAll has not already done so. */
        if( ( xShouldBlock == pdTRUE ) && ( xAlreadyYielded == pdFALSE ) )
        {
            taskYIELD_WITHIN_API();
        }
        else
        {
            mtCOVERAGE_TEST_MARKER();
        }

//    	{ //Debug check if the task is running being nominally suspended. ToDo: remove this
//    		TaskStatus_t CurrentTaskStatus;
//    		vTaskGetInfo(pxCurrentTCB, &CurrentTaskStatus, pdFALSE, eInvalid);
//    		if (CurrentTaskStatus.eCurrentState == eSuspended) {
//    			asm volatile ("NOP");
//    		}
//    	} //ToDo: Remove above

    	taskENTER_CRITICAL();
        {

Look at commented code block. If this block is commented, the code builds as in official release. And the problem exhibiting itself. The task goes into critical section at a bottom being not suspended which is blocking notification mechanism.
If I’m enabling commented block, this block is doing nothing but introduces some delay between taskYIELD_WITHIN_API() and taskENTER_CRITICAL(). This delay is enough to let the code running correctly.
A bottom line at the moment:

  • There is no stack overflows (tested by watermarks, increased stack sizes to 8k per task and for global stack).
  • There is something wrong with context switching. It is somewhat delayed the way the code execution advances to critical section faster than context switching is executed.
  • If a delay is placed between the pending of context switch ISR and next critical section designated for eRrunning state the context switching occurring in time.

The problem is between these instructions:

NVIC_SetPendingIRQ(Software_IRQn);

and

__asm volatile("csrw mstatus,%0" ::"r"(0x7800));
__asm volatile("fence.i");

Trying to determine minimum required delay for normal operation…

WOW! Even single “NOP” is enough to let the notifications work!

This is not working:

        xAlreadyYielded = xTaskResumeAll();

        /* Force a reschedule if xTaskResumeAll has not already done so. */
        if( ( xShouldBlock == pdTRUE ) && ( xAlreadyYielded == pdFALSE ) )
        {
            taskYIELD_WITHIN_API();
        }
        else
        {
            mtCOVERAGE_TEST_MARKER();
        }

    	taskENTER_CRITICAL();

This is working:

        xAlreadyYielded = xTaskResumeAll();

        /* Force a reschedule if xTaskResumeAll has not already done so. */
        if( ( xShouldBlock == pdTRUE ) && ( xAlreadyYielded == pdFALSE ) )
        {
            taskYIELD_WITHIN_API();
            asm volatile ("NOP");
        }
        else
        {
            mtCOVERAGE_TEST_MARKER();
        }

    	taskENTER_CRITICAL();

By the way: MCU manufacturer support provided me with a solution for SysTick stopping for a debug:

Debug Control and Status Register (dcsr)
Bit 9 “stoptime” (DRW) 0: System timer running in Debug mode 1: System timer stop in Debug mode

So I can debug as usual like on ARM CPUs.

In assembler it looks linke this. Scheduler ISR call:

229         NVIC->IPSR[((uint32_t)(IRQn) >> 5)] = (1 << ((uint32_t)(IRQn) & 0x1F));
00002486:   lui     a5,0xe000e
0000248a:   lui     a4,0x4
0000248c:   sw      a4,512(a5) # 0xe000e200

Critical section entrance:

00002490:   lui     a5,0x8
00002492:   addi    a5,a5,-2048 # 0x7800
00002496:   csrw    mstatus,a5
0000249a:   fence.i

If don’t insert a nop between 0000248c and 00002490 the scheduler ISR become blocked before it triggers (but is remaining pended).

I see the solution like this. Instead of:

#define portYIELD()   NVIC_SetPendingIRQ(Software_IRQn)

Define:

#define portYIELD() do{NVIC_SetPendingIRQ(Software_IRQn); \
                       __asm volatile("fence.i");}while(0)

I suppose that skipping interrupt entrance during only three instructions after ISR pending is normal. Nobody cancelled an ISR latency and this is normal CPU and interrupt controller behavior.
Just tested. This quirk works.

What do you think? Is such a solution robust enough?

P.S. 5 days ago the CPU developer updated core manual:

Note: When using registers to mask any interrupt or using CSR registers to mask global interrupts, add a ‘fence.i’ instruction to synchronize between core control state and interrupt enable state.

So it seems to me interrupt pending request is also the right place for “fence.i” instruction.