Yet another task starvation question!

I have a relatively simple program with three tasks as follows:

  1. A housekeeping thread

  2. A GPIB thread

  3. A serial thread

I can run the program for a long time (days under normal conditions) and then the GPIB stops working. Having put a load of debug code in, I can see it’s not crashing the processor. I have heartbeat LEDs in the housekeeping thread and the GPIB thead. The GPIB heatbeat LED stops while the housekeeping LED continues.

I don’t have exactly the same setup as the one that runs for days in a remote location, but today I managed to repeat the same symptoms by bombarding the GPIB with comms every 100ms and I was able to replicate the GPIB thread stopping.

I have the debugger attached and, but stopping and setting breakpoints I can see it’s never getting into the GPIB thread. The housekeeping thread is running fine. When it gets to the end of its infinite loop, the housekeeping thread calls vTaskDelay() with a delay of 100ms.

The serial task is fine. It just sits there until it gets a byte in over the serial port. If I send something into it, it does receive the serial data. However, no serial data is being sent into the unit, so really we only have two threads demanding processor time.

It’s just the GPIB task that stops getting called. It does use some pin change interrupts which send to a queue from interrupt. I am including in any pin interrupts:


	if (pdTRUE == xHigherPriorityTaskWoken)

	{

		portYIELD_FROM_ISR(xHigherPriorityTaskWoken);

	}

And my priorities for all these pin interrupts are set to 10 which is lower than the 5 set for configLIBRARY_MAX_SYSCALL_INTERRUPT_PRIORITY in the header file.

I have a massive stack size of 512 for the GPIB thread which cannot be consuming that much RAM, and the processor never crashes, so I do not think it’s a stack overflow. I will rerun and put some high watermark checking in just to be sure. However, whereas I’ve never had an issue like this in fuifteen years of using FreeRTOS, I have had many stack issues and they’ve all ended in hard fault interrupts. I am definitely not getting hard fault interrupts or crashes.

So, for the moment, before I set it running again; with the debugger connected and GPIB thread not getting any time from the scheduler, is there anything I can look at with the debugger (STM32 Cube IDE and JLink Basic) that might help me track this down.

Somewhere while ploughing through a shed load of posts on various places I think I saw a reference to a FreeRTOS aware plug in for Eclipse. Is that really available and if so, where could I find it please?

Maybe more importantly, with the task getting starved, the serial thread only going when it receives a serial byte in (which as things stand never happens) and the housekeeping thread - which is really only one of two threads that is taking any processing time - relinquishing control by calling vTaskDelay, does anyone have some tips for tracking down what is going on here please? Is there somewhere in the scheduler I can set a breakpoint to see why the scheduler doesn’t give the GPIB thread any time.

The GPIB thread itself has two xQueueReceive calls, one for if it is transmitting on to the GPIB bus and one for if it is receiving. Both of them time out, the GPIB thread should not be getting stuck anywhere. There are no while loops that do not have escape clauses.

I should probably also confirm the GPIB thread is running highest priority.

Many thanks.

One thing to try is to make sure none of your waits are for the max timeout, but always for some moderate period of time (somewhat long but not forever) and if that timeout happens, do something noticeable in the code. This will trigger when you aren’t exercising the GPIB, but if you are expecting it, and the timeout occurs, you can point to the problem being in the ISR that should be sending to the queue.

If it doesn’t happen, then the code has to be blocking on something else.

Another thing to try is use the debugger to stop the system when the GPIB should be active, and look at the TCB for the task to see if it is ready or not, and what task is currently running.

1 Like

Hi Richard, thanks for the reply. Both my time outs are about two seconds. I am already doing exactly as you suggest in terms of doing something when it times out.

I’ve found some info for looking at stack information in Eclipse, so a, re-running to see if I can get it to fall over with that as an extra debug tool. I’ve taken a debug array out of the task itself and as a static file variable, although I don’t expect that to be an issue. Also enabled task overflow hook, and set a breakpoint in there.

I didn’t word my original message very well in terms of the priorities of the threads. I should have said “I can state that the thread with the issue is already set to highest priority.”

I’ve managed to get it to fall over with the FreeRTOS aware debugging enabled in Eclipse and it looks like the GPIB thread is getting deleted! :upside_down_face:

The only four tasks now there are serial, housekeeping, idle and Tmr Svc. The GPIB thread is no longer listed. For clarity I have no called to vTaskDelete in my code.

So my guess is this is something to do with overwriting some part of memory? What else could cause a task to disappear? Am pretty sure all my arrays are bounds checked and any calls to anything like printf are using snprintf, ie always using the safe versions of anything like that.

Your thread could just end. Maybe because of a stray break inside the infinite loop.

It’s impossible for the thread to end. I have the thread initialiser’s infinite loop calling another function doing all the GPIB work which also has an infinite loop in it.

I never, ever used breaks from while loops or for ifs or anything like that. Just never. Only time I ever used a break is in a switch statement and that’s not going to have it bomb out of the loop. In this case, even if I broke form the infinite loop that does the work it would go back to the thread entrance infinite loop and then back to the main worker loop for GPIB.

So I have this:

void gpib_entry(void *argument)

{

  /* USER CODE BEGIN gpib_take */

  /* Infinite loop */

  for(;;)

  {

	  gpib_task_loop();

  }
}

I could be less lazy and stop Cube ioc from creating the task, do that myself and no have the call from one loop to another, but I can’t see that being the issue.

I can see the stack size for the GPIB loop is easily big enough as I’m setting a breakpoint if the watermark gets within 100 bytes of overflowing, and it’s not breaking, so that’s clearly not the issue either.

I’ve stopped sending out debug data over the UART in case that’s causing and that makes no difference, I still lose this one thread.

This may also mean that the TCB list has been corrupted, so the task is still there but fell off the task list.

EDIT: Can we see your complete code?

I have solved this. It was a bug in terms of setting the interrupt priorities. The pin interrupts were not getting set to 10 as requested, they were all at 0. It was re-reading, with fresh eyes, someone else’s issue with deleted threads - once I realised my task was getting deleted not starved, that made me go and look closer at interrupt priorities.

Thanks to everyone for suggestions.

Maybe it would be a good feature for he kernel to check whether any interrupt priorities are incorrectly set and flag it up?

There are checks in recent versions at least for Cortex-M3/4…
See port.c::vPortValidateInterruptPriority (with configASSERT enabled).

Thanks, good to know for future reference. This is an M33.

M33 has the same feature (if you enable configASSERT)