Any reason for xQueueReceive not unblocking?

AlbertoGe · February 7, 2024, 4:11pm

Hello everybody
just asking if someone knows any subtle reason that could prevents an xQueueReceive with infinite blocking time (portMAX_DELAY) to not return when the queue is filled with an xQueueSendToBackFromISR.

Short description:

a task (with a quite high prio) is blocking on an xQueueReceive;
an ISR triggered by an external signal wakes up the previous task with an xQueueSendToBackFromISR;
there is only one ISR that fills the queue and one task that consumes it (for the purpose to defer to the task a long computation);
the system (which has many other tasks) basically works, it can run for days (and the queue is filled each 1 ms), then without any reason, other evidences or ciclicity the xQueueReceive on the task stops to proceed when the ISR fills the queue;
I’ve verified (when the problem happens):
the interrupt is still executing (each 1 ms);
inside the ISR the call to xQueueSendToBackFromISR exit with retval != pdPASS;
I imagine that the queue is now full;
but the task waiting on the xQueueReceive is still present and in BLOCKED state;
i’m convinced that there are no other points where that task could be blocked, but of course bugs could be anywhere.

Already verified:

the queue has been created with success (and I don’t check with xQueueReceive if the queue pointer is void for any reason);
configASSERT is defined and working (when called…);
any other task (also at lower priority) is running and the FreeRTOS, in general, is healty;
rules for interrupt priorities are respected;

thank you for any suggestion
Alberto

RAc · February 7, 2024, 5:13pm

Maybe the variable that holds the queue in the receiving task is being overwritten with something that is close enough to a real queue object not to generate a fault, or the queue structure itself is being “benevolently” overtrampled.

richard-damon · February 7, 2024, 5:43pm

On thing you don’t mention is the ISR using the wasWoken flag to wake the task, but failing to do that should just delay the task till the next tick (assuming you have preemption enabled, which is by default).

The other issue sounds like corruption of system memory some how.

One question, what version of FreeRTOS are you using, older versions, even with configASSERT defined might let some errors sneek past. A typical issue that can cause this sort of issue is using FreeRTOS API inside an ISR with too high of a priority, or other similar configuration errors.

AlbertoGe · February 8, 2024, 7:11am

Hello Richard

yes, handled the yield stuff inside ISR.

should be 10.1.1. The target is Zynq7000.
Regarding the isr prio issue, i’ve checked that deliberately putting the prio at the wrong level correctly raises the assert inside FreeRTOS’s API.

Hello RAc

this is a good point. Now i’ve added a task that monitors the queue when there is the failure (printing " uxQueueMessagesWaiting" and the likes). Of course there is EMI noise around (is a motor control application) and staying connected with the jtag is not so practical.

thank you
Alberto

funky23 · February 8, 2024, 10:16am

i cannot help but i basicly have the same problem on a renesas rx631

a basic system is running. i can run multiple tasks, the scheduler is working, semaphores aswell.

I have one queue which is filled by an RXIRQwith xQueueSendToBackFromISR() and portYIELD_FROM_ISR(). xQueueSendToBackFromISR returns true.
my irq has prio 4

#define configKERNEL_INTERRUPT_PRIORITY 1
#define configMAX_SYSCALL_INTERRUPT_PRIORITY 4

if i raise the prio of my irq above configMAX_SYSCALL_INTERRUPT_PRIORITY i also get the assert that my prio is wrong configured

the Task waits forever with xQueueReceive()

After leaving the irq handler the scheduler stops working and it seems like it always hangs at

if( listCURRENT_LIST_LENGTH( &( pxReadyTasksLists[ tskIDLE_PRIORITY ] ) ) > ( UBaseType_t ) 1 )

no other task is executed anymore(my blinky tasks stop working) and i don’t get into the irq handler again if i send other chars to the controller

i’m a little bit lost here. Used Freertos on stm32 before and never had such problems

RAc · February 8, 2024, 11:35am

how does your port implement the critical section?

funky23 · February 8, 2024, 11:51am

on the rx631

void vTaskEnterCritical( void )
    {
        portDISABLE_INTERRUPTS();

        if( xSchedulerRunning != pdFALSE )
        {
            ( pxCurrentTCB->uxCriticalNesting )++;

            /* This is not the interrupt safe version of the enter critical
             * function so  assert() if it is being called from an interrupt
             * context.  Only API functions that end in "FromISR" can be used in an
             * interrupt.  Only assert if the critical nesting count is 1 to
             * protect against recursive calls if the assert function also uses a
             * critical section. */
            if( pxCurrentTCB->uxCriticalNesting == 1 )
            {
                portASSERT_IF_IN_ISR();
            }
        }
        else
        {
            mtCOVERAGE_TEST_MARKER();
        }
    }

AlbertoGe · February 9, 2024, 8:57am

Hello everybody
Sorry for my late reply.
If you refers to “taskENTER_CRITICAL()”, the Zynq7000 port (FreeRTOS Kernel V10.5.1) ends up here:

*void vPortEnterCritical( void )*
{
	/* Mask interrupts up to the max syscall interrupt priority. */
	*ulPortSetInterruptMask();*

	/* Now interrupts are disabled ulCriticalNesting can be accessed
	directly.  Increment ulCriticalNesting to keep a count of how many times
	portENTER_CRITICAL() has been called. */
	ulCriticalNesting++;

	/* This is not the interrupt safe version of the enter critical function so
	assert() if it is being called from an interrupt context.  Only API
	functions that end in "FromISR" can be used in an interrupt.  Only assert if
	the critical nesting count is 1 to protect against recursive calls if the
	assert function also uses a critical section. */
	if( ulCriticalNesting == 1 )
	{
		configASSERT( ulPortInterruptNesting == 0 );
	}
}

while “taskEXIT_CRITICAL” is after the #defines of the port:

*void vPortExitCritical( void )*
{
	*if( ulCriticalNesting > portNO_CRITICAL_NESTING )*
	{
		/* Decrement the nesting count as the critical section is being
		exited. */
		*ulCriticalNesting--;*

		/* If the nesting level has reached zero then all interrupt
		priorities must be re-enabled. */
		if( ulCriticalNesting == portNO_CRITICAL_NESTING )
		{
			/* Critical nesting has reached zero so all interrupt priorities
			should be unmasked. */
			portCLEAR_INTERRUPT_MASK();
		}
	}
}

Then “portCLEAR_INTERRUPT_MASK” is:

/* Macro to unmask all interrupt priorities. */
*#define portCLEAR_INTERRUPT_MASK()*									\
{																	\
	*portCPU_IRQ_DISABLE();*											\
	*portICCPMR_PRIORITY_MASK_REGISTER = portUNMASK_VALUE;*			\
	*__asm volatile (	"DSB		\n"*								\
						*"ISB		\n" );*							\
	*portCPU_IRQ_ENABLE();*											\
}

and:

	/* The critical section macros only mask interrupts up to an application
	determined priority level.  Sometimes it is necessary to turn interrupt off in
	the CPU itself before modifying certain hardware registers. */
	#define portCPU_IRQ_DISABLE()										\
		dmb();															\
		__asm__ __volatile__ ( "cpsid	i" ::: "memory" );				\
		dsb();															\
		isb();

	#define portCPU_IRQ_ENABLE()										\
		dmb();															\
		__asm__ __volatile__ ( "cpsie	i" ::: "memory" );			    \
		dsb();															\
		isb();

thanks

aggarg · February 10, 2024, 9:39am

Can you examine pxReadyTasksLists for your task priority and see if the task is on ready list? Also, are you interrupts getting disabled somehow? Can you check the value of uxCriticalNesting in your task’s TCB?

Did you find anything from this?

AlbertoGe · February 10, 2024, 6:19pm

hi @aggarg , no, for the moment the system is running and my counters don’t show anything strange. As I’ve told before, it’s a very rare occurrence - but still to catch.

aggarg · February 11, 2024, 5:55am

Let us know whatever you find.

funky23 · February 12, 2024, 11:42am

unfortunately as usual the error sat in front of the pc

i switched from RX to GCC compiler and somehow the interrupt declaration of my pheripheral interrupt(where i access the queue) was wrong/missing and i think i missed a warning for a missing declaration.

everything compiled, the program seemed to be ok. multiple tasks where working fine. as soon as i jumped to my irq handler i could write to the queue but after i returned from it everything was garbage and i jumped elsewhere(sometimes the scheduler was quit, sometimes i hang at the described line in the scheduler.

so i think the compiler did not IRET and RET instead.

sorry for the hassle. this was absolutley my fault(but the description of the error felt the same to what i observed)
Regards

aggarg · February 12, 2024, 11:49am

Glad that you figured. Thank you for reporting back!

AlbertoGe · March 22, 2024, 11:16am

Hi everybody,
finally I had the chance to see the problem again (the occurrence is very rare).
I have added some debug counters to have a picture. I can read the counters from a console terminal, to stay connected with the jtag is quite impossible for EMI noise;
The question could be described in this way:

a counter in FPGA (Zynq 7000) raises an isr when it expires;
that ISR awakes a task with “xQueueSendToBackFromISR”;
the queue has been created with deep = 2 (really only uses 1);
the queue is empted by a task that does the processing; the queue is really used as a signal;
the cpu load is absolutely quiet and there isn’t the possibility that the “eating” task is not running;
the counter overflows each 1 ms, and the failure happens in days (= basically it works);
when the failure happens, the “xQueueSendToBackFromISR” can’t push to the queue any longer (returning != pdPASS) and then the waiting task keeps waiting forever. Still, the surrounding is working, and a monitoring task says me that the queue is full with 2 elements (=the maximum); let’s say uxQueueMessagesWaiting(xQ_P50_trigger) = 2;

any debug idea is appreciated.

thanks
Alberto

RAc · March 22, 2024, 8:45pm

So your system timer is still working in the erratic scenario?

Are you absolutely sure that your receiving task is suspended on the queue wait and not in some other code location? Is the task stack not overflown?

If you have terminal output, can you guarantee that your printf/sprintf logic is not part of the problem?

AlbertoGe · March 25, 2024, 10:53am

Hi RAc,

yes, not only the timer of the scheduler, but the whole FreeRTOS system and tasks, which is quite complex.

To debug (no jtag, remember) I’ve assigned different values to a debug variable, which I then can examine on the console. This variable has exactly the same value assigned before the (blocking) call to xQueueReceive.
Regarding the stack overflow, I’ve put:

#define configCHECK_FOR_STACK_OVERFLOW 2

and I do not see anything suspect. But I will investigate further.

Of course anything could be. I’ve tried to avoid problems, I execute sprintf in protected areas, and then I send the string to be emitted (using a queue) to a lo-prio task that write to the uart when there is free space.

thank you for any debug idea.
Alberto

aggarg · March 25, 2024, 4:45pm

Can you examine the task’s TCB and Queue control block? We want to see if the task’s xEventListItem is on the queue’s xTasksWaitingToReceive list. If we can get, the call stack of the task may also provide some useful information - not sure of you can get that though.

AlbertoGe · May 20, 2024, 3:01pm

Hi @aggarg @RAc @richard-damon, I understand that my reply to Your question has come very late.
Indeed, I’ve tried to reproduce the problem many times (while solving other bugs in between), but I’ve not seen it any longer. So my current suspect is that it was a consequence of something else that has been fixed meanwhile.
Thank you for all the suggestions

aggarg · May 21, 2024, 4:23am

Thank you for reporting back!