Task waiting on Queue post or timeout quits running

smachell wrote on Wednesday, December 21, 2016:

The target for my project is a FRDM-KL25Z board connected to a Raspberry Pi running Domoticz home automation. The development environment is Kinetis Design Studio 3.2 with the task aware plug-in installed. This is probably the 20th project I have done with FreeRTOS and ARM (M0, M0+ & M4) processors.

I’m having an issue in a task (named Dispatch) with a Queue that is also using an associated timer. This is the first time I have used a Queue with a timer as well. What happens is my code runs fine for an hour or up to 48+ hours. But at some point the Dispatch task waiting on a Queue post or timer expiration stops running. Right now I have “fixed” the problem by having this task be the one to tickle the watchdog.

When the task does stop running I can pause the debugger and the task state is showing as READY, but it never restarts. The Queue window shows that the Queue is full (16/16). The tic timer is still firing fine. The Queue timer for this task is set to 50mS if a post is not received.

The Dispatch task is the highest priority task and there are 4 other tasks in the system. The other 4 tasks still run fine after the Dispatch task locks up. There is a Console task that uses a Queue for new activity. There are 2 other tasks that run at a periodic rate using vTaskDelay() and the Idle task is still running fine. I have hooks for the Idle and Tic timer so I can see them running OK too.

The RTOS MaxSysCallInterruptPriority is set to 1. The tic rate is 10mS. The RTOS Interrupt Priority is set to 3.

The FRDM board has an XBee radio on one UART that receives periodic status updates from 3 other remote XBee radios at 30-60 second intervals. This UART runs at IRQ level 2 and uses the xQueueSendToBackFromISR() function to post the Dispatch task queue when a full packet arrives. I was using portEND_SWITCHING_ISR() to switch tasks, but I have that code commented out for now and just allow the next tic to switch. This interrupt routine is still running fine, but of course it gets errors on Queue posts since the queue is full.

I’ve tried making different “fixes” to diagnose the issue. I’ve set the stack sizes to really large numbers.
Is there something I might be missing or do you have any other ideas for a fix to try?

rtel wrote on Wednesday, December 21, 2016:

I’m having an issue in a task (named Dispatch) with a Queue that is also using an associated timer.

So the task is using a queue and a timer?

How is the task using the queue? Is it reading from the queue with a timeout?
How is the task using the timer? You say it is waiting for the timer to expire, but how does it know when the timer has expired?

Please paste in the structure of the task.

This UART runs at IRQ level 2 and uses the xQueueSendToBackFromISR() function

That is a very high priority. What is configMAX_SYSCALL_INTERRUPT_PRIORITY set to?

Do you have configASSERT() defined?

Is this the port.c file you are using (compare the files as sometimes NXP/Freescale have their own version): https://sourceforge.net/p/freertos/code/HEAD/tree/trunk/FreeRTOS/Source/portable/GCC/ARM_CM4F/port.c

smachell wrote on Thursday, December 22, 2016:

Here is how the task waits for the Queue post:
// Assign time to a variable so that we we can stop task by setting it to -1
// or 0xffffffff while debugging.
semphTime = MYS_DISPATCH_TASK_RTOS_TICS ;
// Wait for Pi or XBee input.
// Otherwise timeout and do housekeeping.
queExit = xQueueReceive( xQuePiXbeeMsg, &rxAdd, semphTime );

sempTime is set for 5 tics or 50mS so if there is no Queue post the timeout will occur and the task will do periodic housekeeping.

configMAX_SYSCALL_INTERRUPT_PRIORITY is set to 1.

configASSERT() is defined. I’m using Kinetis Design Studio with Processor Expert. FreeRTOS is 9.0.0

The port.c file you referenced above is quite a bit different than the one Erich Styger supplies with Processor Expert?? Not sure why?

rtel wrote on Thursday, December 22, 2016:

If configMAX_SYSCALL_INTERRUPT_PRIORITY is 1 it is almost certainly wrong as that value should be left shifted. It is complicated but an attempt to explain this can be found on the following link: http://www.freertos.org/RTOS-Cortex-M3-M4.html

smachell wrote on Friday, December 23, 2016:

That article you reference is for ARM M3/M4. I’m using the Kinetis KL25 which is an M0+.

davedoors wrote on Friday, December 23, 2016:

you dont show the timer use in the task.

smachell wrote on Friday, December 23, 2016:

Not sure I understand your quastion?

// Wait for Pi or XBee input.
// Otherwise timeout and do housekeeping.
queExit = xQueueReceive( xQuePiXbeeMsg, &rxAdd, semphTime );

sempTime is set for 5 tics or 50mS so if there is no Queue post the timeout will occur and the task will do periodic housekeeping.

In my code I check the return code in queExit to see if it returned because of a Queue post or a timeout.

rtel wrote on Saturday, December 24, 2016:

You mention originally that your task is using a queue and a timer. So far we can only see how the queue is being used, and as there doesn’t seem to be an issue with that small portion of the code it would be helpful to see a greater portion of the code, including the bit that is using the timer. So far no informatio is given as to what the task is doing with the timer (starting, stopping, resetting, being notified by the timer’s callback function, etc.).

smachell wrote on Tuesday, December 27, 2016:

You mention originally that your task is using a queue and a timer. So far we can only see how the queue is being used, and as there doesn’t seem to be an issue with that small portion of the code it would be helpful to see a greater portion of the code, including the bit that is using the timer.

Is it possible you are confusing my issue with another person?

Here is the line of code that waits:
// Wait for Pi or XBee input.
// Otherwise timeout and do housekeeping.
queExit = xQueueReceive( xQuePiXbeeMsg, &rxAdd, semphTime );

Note that the task waits for a queue post in: xQuePiXbeeMsg

Or it waits for the timer in: semphTime

semphTime is set to 5 or 50mS in my case.

Therefore the task should resume upon a queue post in xQuePiXbeeMsg or it should resume after the 50mS timer in semphTime expires.

As you requested, here is more code showing what the task does when it gets a Queue post vs a timeout in the queue wait:

    // Assign time to a variable so that we we can stop task by setting it to -1
    // or 0xffffffff while debugging.
    semphTime = MYS_DISPATCH_TASK_RTOS_TICS ;
    // Wait for Pi or XBee input.
	// Otherwise timeout and do housekeeping.
	queExit = xQueueReceive( xQuePiXbeeMsg, &rxAdd, semphTime );

	// This is pdTRUE if we get a messages from Pi or XBee and a Queue insertion.
	if( queExit == pdTRUE )
	{
		// process data from Pi or XBee
		if( *rxAdd == MESSAGE_SOURCE_PI )
		{
			// See who Domoticz needs to talk with?
			ParsePiResponse( rxAdd );
		}
		else if( *rxAdd == MESSAGE_SOURCE_XBEE )
		{
			// Figure our which XBee this is.
			ParseXbeeResponse( rxAdd ) ;
		}
	}
	// Otherwise we timed out so do periodic updates for Domoticz.
	else
	{
		// Get the RTC time.
		ReadRtcTime() ;
		// Update for KL25 sensors.
		NodesKl25SensorUpdates( timeChangeFlag ) ;
		// Updates for the garage doors.
		NodesGarageDoorHousekeeping( timeChangeFlag ) ;
		// Doorbell
		NodesDoorbellHousekeeping( timeChangeFlag ) ;
		timeChangeFlag = 0 ;
	}

As I stated before this task eventually fails with the queue full. When I pause the debugger its status is showing ready, its queue is full and the timer (50mS) should have expired. All other tasks in the system are still running fine.

davedoors wrote on Tuesday, December 27, 2016:

You mention a timer several times in your posts, including the last post, but I think you mean “timeout”.

So which task is running? Maybe it is starving the task which is why the queue is getting full.

smachell wrote on Tuesday, December 27, 2016:

OK - yes, timeout it is.

There are 5 tasks in the system. The task in question is the highest priority so it should not get starved. The task aware debugger shows the task in question is in the Ready state with a full queue but it never runs again after the lockup.

The other 4 tasks (one of which is Idle task) all are running fine and they are lower priority. The tic timer and idle hooks are also running fine. The locked up task should be brought out of the Ready state either as a result of queue posts or a timeout. Neither is happening.

davedoors wrote on Tuesday, December 27, 2016:

How do you know it is waiting in the xQueueReceive() function and not locked somewhere else?

Add a new variable, set the variable to 1 before calling xQueueReceive() and 0 after calling xQueueReceive. When the task stops what is the variable set to?

smachell wrote on Tuesday, December 27, 2016:

Good point. Set it up and trying now. It may take up to 48 hours or more before it locks up again.

smachell wrote on Wednesday, December 28, 2016:

Test is complete. I set a status bit to 1 prior to calling: xQueueReceive

If it exits because of a queue post or timeout then I immediately clear the status bit. Task locked up again this morning and the status bit is set to 1.

Do either of these screen shots help:

smachell wrote on Wednesday, December 28, 2016:

Here is the other:

Also my debug task can show task stats from FreeRTOS calls. Here they are:

-------- Run Time Stats --------
Dispatch Task 19 <1%
DbgConsole 15 <1%
IDLE 5638358 97%
RTC Task 5 <1%
Sensor Task 135511 2%

-------- Task List --------
Dispatch Task R 4 144 1
DbgConsole R 1 136 4
IDLE R 0 126 5
Sensor Task B 2 158 3
RTC Task B 3 162 2

The Dispatch task is the one in question.

rtel wrote on Wednesday, December 28, 2016:

A curious one.

I’m not looking at the entire thread so am going from memory - apologies
if I have asked this before:

Are you using an idle hook function? If so, is it possible the idle
hook function is in a loop that is either in a critical section or has
the scheduler suspended?

I think you previously said you had configASSERT() defined, but were
using the NXP provided RTOS port layer. Does that port layer have
asserts that check the interrupt priority is at or below
configMAX_SYSCALL_INTERRUPT_PRIORITY (remembering that low interrupt
priority numbers mean high logical priority).

smachell wrote on Wednesday, December 28, 2016:

I do have a hook on the idle loop. All my code does is update a min & a max counter and check stack depth (uxTaskGetStackHighWaterMark) and then it returns to kernel.

I turned off configASSERT() about a week ago and it made no difference, the task till locks up.

smachell wrote on Friday, December 30, 2016:

I need to update the prior post about the Idle hook.

Every time thru my code it just updates a single counter and then returns to the kernel.

Then once every second it updates the min/max counters as well as calling uxTaskGetStackHighWaterMark().

rtel wrote on Friday, December 30, 2016:

What happens if you remove the call to uxTaskGetStackHighWaterMark()?

smachell wrote on Saturday, December 31, 2016:

OK - I’ve stopped calling that function in the Idle Task Hook function.

All my other tasks call uxTaskGetStackHighWaterMark() as well each time they come out of suspension. Should I stop calling from those tasks too or are you just interested in the Idle Task?