A single FreeRTOS task does not start - debugging suggestions?

Hi all,

Unfortunately, I cannot provide a full code example for this - but I still hope I can get some suggestions for debugging.

I am trying out some code on RP2040, using the smp branch of FreeRTOS.

My application had 4 tasks originally - all of them run as expected.

Then I decided to add a 5th task (which for now, just waits on a Queue, and does a printf) - everything compiles fine; and once I flash the RP2040 with this program, all of the previous 4 tasks run - except for the 5th one!

What I have done so far:

This is my startup code for the task:

  BaseType_t ret;
  ret = xTaskCreate(
    my_printer_task,            // The function that implements the task.
    "my printer task",          // Text name for the task, just to help debugging.
    configMINIMAL_STACK_SIZE,   // The size (in words) of the stack that should be created for the task.
    NULL,                       // A parameter that can be passed into the task. Not used here
    configMAX_PRIORITIES-4,     // The priority to assign to the task. tskIDLE_PRIORITY (which is 0) is
                                //   the lowest priority. configMAX_PRIORITIES-1 is the highest priority.
    &my_printer_taskhandle      // Used to obtain a handle to the created task.
  );
  configASSERT( my_printer_taskhandle );
  configASSERT( ret == pdPASS );
  // run my_printer_task on cpu core 0
  vTaskCoreAffinitySet( my_printer_taskhandle, 0b01 );

In my FreeRTOSConfig.h I have:

#define configMINIMAL_STACK_SIZE                ( configSTACK_DEPTH_TYPE ) 256
...
#define configTOTAL_HEAP_SIZE                   (128*1024)
...
#include <assert.h>
/* Define to trap errors during development. */
#define configASSERT(x)                         assert(x)

Out here, none of the asserts seem to fail; program proceeds through this code without stopping, and goes on to run the other four tasks.

Furthermore, in my FreeRTOSConfig.h I have:

#define configCHECK_FOR_STACK_OVERFLOW          2
#define configUSE_MALLOC_FAILED_HOOK            1

… and in my main.c I have:

#if ( configCHECK_FOR_STACK_OVERFLOW > 0 )
void vApplicationStackOverflowHook( TaskHandle_t xTask, char *pcTaskName ) {
  printf("STACK OVERFLOW: pcTaskName %s !!\n", pcTaskName);
  configASSERT(0);
}
#endif

#if ( configUSE_MALLOC_FAILED_HOOK > 0 )
void vApplicationMallocFailedHook( void ) {
  printf("MALLOC FAILED !!\n");
  configASSERT(0);
}
#endif

Neither of these functions fire, either.

When I want to confirm the other tasks are running, I try this in gdb (working over openocd):

Reading symbols from MY_TEST_PROGRAM.elf...
Remote debugging using localhost:3333
warning: multi-threaded target stopped without sending a thread-id, using first non-exited thread
0x100026dc in xTaskGetCurrentTaskHandle () at C:/path/to/FreeRTOS-Kernel-SMP/tasks.c:4994
4994            xReturn = pxCurrentTCBs[ portGET_CORE_ID() ];

(gdb) b working_01_task
Breakpoint 1 at 0x10000bb4: file C:/path/to/MY_TEST_PROGRAM/working_01.c, line 89.
Note: automatically using hardware breakpoints for read-only addresses.
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y

Starting program: C:\path\to\MY_TEST_PROGRAM\build\MY_TEST_PROGRAM.elf
[New Thread 2]
[Switching to Thread 2]

Thread 2 hit Breakpoint 1, working_01_task (pvParameters=0x0) at C:/path/to/MY_TEST_PROGRAM/working_01.c:89
89      void working_01_task( void* pvParameters ) {

(gdb) bt
#0  working_01_task (pvParameters=0x0)
    at C:/path/to/MY_TEST_PROGRAM/working_01.c:89
#1  0x10001278 in isr_pendsv () at C:/path/to/FreeRTOS-Kernel-SMP/portable/ThirdParty/GCC/RP2040/port.c:402
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

This is basically the entry of the working_01_task - before the while (1) even starts (and before the task is ever signalled to do some work).

If I try to do the same for the abovementioned my_printer_task that does not start:

Reading symbols from rp2040_FROS_DRTM_test.elf...
Remote debugging using localhost:3333
warning: multi-threaded target stopped without sending a thread-id, using first non-exited thread
xTaskGetCurrentTaskHandle () at C:/path/to/FreeRTOS-Kernel-SMP/tasks.c:4997
4997            return xReturn;

(gdb) b my_printer_task
Breakpoint 1 at 0x10000f24: file C:/path/to/MY_TEST_PROGRAM/my_printer_task.c, line 14.
Note: automatically using hardware breakpoints for read-only addresses.
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y

Starting program: C:\path\to\MY_TEST_PROGRAM\build\MY_TEST_PROGRAM.elf

… then gdb just keeps hanging there, and never enters the demanded breakpoint - confirming that the task does not start at all.

What other options do I have in debugging this?

For instance, I have found this thread: freeRTOS preemptive scheduling not able to run other tasks | Microchip:

I think FreeRTOS maintains a list (array) of tasks ready to run per priority.

Is there an array which contains a list of tasks, that I could inspect in gdb to further debug this problem - and if so, what is it called?

If not, what else should I try?

Are you posting something to the queue so that this task unblocks? What are the priorities of all the tasks? Also, printf may not be thread safe - can you try removing printf and check by just putting a break point?

Here it is - FreeRTOS-Kernel/tasks.c at main · FreeRTOS/FreeRTOS-Kernel · GitHub

Thanks.

Thanks for the response, @aggarg :

Are you posting something to the queue so that this task unblocks?

Yes I am; I’ve even confirmed in gdb:

Thread 2 hit Breakpoint 1, working_02_task (pvParameters=<optimized out>) at C:/path/to/MY_TEST_PROGRAM/working_02.c:370
370             if (xQueueSend(my_printer_task_queue, (void *)&my_var, (TickType_t) 0 ) != pdTRUE) {
(gdb) n
380           my_var = 0; // data sent to queue, proceed

… that xQueueSend to this queue does indeed return pdTRUE ?!

However, that seems to me is not really the problem, because as I’ve mentioned in OP: when I set breakpoints on the working tasks (by function name), these breakpoints do fire at function entry - before any while(1) in them starts running, and before any signals (via queue etc) are sent to the tasks, so they do actual work.

In my problem case, though, setting a breakpoint on the non-running my_printer_task does not ever fire.

What are the priorities of all the tasks?

  1. 1 (that is, tskIDLE_PRIORITY+1)
  2. configMAX_PRIORITIES-1
  3. configMAX_PRIORITIES-2
  4. configMAX_PRIORITIES-3
  5. configMAX_PRIORITIES-4 (this is the problematic my_printer_task)

Also, printf may not be thread safe - can you try removing printf and check by just putting a break point?

As I mentioned earlier, I already tried a breakpoint on the my_printer_task (so on function entry), and it never fires.

But, I did also remove the printf, and I placed a breakpoint on the queue waiting line (which in my case is something like while (xQueueReceive(my_printer_queue, (void *)&queuedataitem, 0) == pdTRUE) {, and the printf was inside) - that one never fires either.

Here it is - FreeRTOS-Kernel/tasks.c at main · FreeRTOS/FreeRTOS-Kernel · GitHub

Thanks for that - just tried inspecting it, and boy - it will be hard to find what goes on here:

(gdb) p pxReadyTasksLists
$1 = {{uxNumberOfItems = 2, pxIndex = 0x20001760 <pxReadyTasksLists+8>, xListEnd = {xItemValue = 4294967295,
      pxNext = 0x20004fb0 <ucHeap+11604>, pxPrevious = 0x20005438 <ucHeap+12764>}}, {uxNumberOfItems = 0,
    pxIndex = 0x20001774 <pxReadyTasksLists+28>, xListEnd = {xItemValue = 4294967295,
      pxNext = 0x20001774 <pxReadyTasksLists+28>, pxPrevious = 0x20001774 <pxReadyTasksLists+28>}}, {
    uxNumberOfItems = 0, pxIndex = 0x20001788 <pxReadyTasksLists+48>, xListEnd = {xItemValue = 4294967295,
      pxNext = 0x20001788 <pxReadyTasksLists+48>, pxPrevious = 0x20001788 <pxReadyTasksLists+48>}}, {
    uxNumberOfItems = 0, pxIndex = 0x2000179c <pxReadyTasksLists+68>, xListEnd = {xItemValue = 4294967295,
      pxNext = 0x2000179c <pxReadyTasksLists+68>, pxPrevious = 0x2000179c <pxReadyTasksLists+68>}}, {
    uxNumberOfItems = 0, pxIndex = 0x200017b0 <pxReadyTasksLists+88>, xListEnd = {xItemValue = 4294967295,
      pxNext = 0x200017b0 <pxReadyTasksLists+88>, pxPrevious = 0x200017b0 <pxReadyTasksLists+88>}}, {
    uxNumberOfItems = 0, pxIndex = 0x200017c4 <pxReadyTasksLists+108>, xListEnd = {xItemValue = 4294967295,
      pxNext = 0x200017c4 <pxReadyTasksLists+108>, pxPrevious = 0x200017c4 <pxReadyTasksLists+108>}}, {
    uxNumberOfItems = 0, pxIndex = 0x200017d8 <pxReadyTasksLists+128>, xListEnd = {xItemValue = 4294967295,
      pxNext = 0x200017d8 <pxReadyTasksLists+128>, pxPrevious = 0x200017d8 <pxReadyTasksLists+128>}}, {
    uxNumberOfItems = 0, pxIndex = 0x200017ec <pxReadyTasksLists+148>, xListEnd = {xItemValue = 4294967295,
      pxNext = 0x200017ec <pxReadyTasksLists+148>, pxPrevious = 0x200017ec <pxReadyTasksLists+148>}}, {
    uxNumberOfItems = 0, pxIndex = 0x20001800 <pxReadyTasksLists+168>, xListEnd = {xItemValue = 4294967295,
      pxNext = 0x20001800 <pxReadyTasksLists+168>, pxPrevious = 0x20001800 <pxReadyTasksLists+168>}}, {
    uxNumberOfItems = 0, pxIndex = 0x20001814 <pxReadyTasksLists+188>, xListEnd = {xItemValue = 4294967295,
      pxNext = 0x20001814 <pxReadyTasksLists+188>, pxPrevious = 0x20001814 <pxReadyTasksLists+188>}}, {
    uxNumberOfItems = 0, pxIndex = 0x20001828 <pxReadyTasksLists+208>, xListEnd = {xItemValue = 4294967295,
      pxNext = 0x20001828 <pxReadyTasksLists+208>, pxPrevious = 0x20001828 <pxReadyTasksLists+208>}}, {
    uxNumberOfItems = 0, pxIndex = 0x2000183c <pxReadyTasksLists+228>, xListEnd = {xItemValue = 4294967295,
      pxNext = 0x2000183c <pxReadyTasksLists+228>, pxPrevious = 0x2000183c <pxReadyTasksLists+228>}}, {
    uxNumberOfItems = 0, pxIndex = 0x20001850 <pxReadyTasksLists+248>, xListEnd = {xItemValue = 4294967295,
      pxNext = 0x20001850 <pxReadyTasksLists+248>, pxPrevious = 0x20001850 <pxReadyTasksLists+248>}}, {
    uxNumberOfItems = 0, pxIndex = 0x20001864 <pxReadyTasksLists+268>, xListEnd = {xItemValue = 4294967295,
      pxNext = 0x20001864 <pxReadyTasksLists+268>, pxPrevious = 0x20001864 <pxReadyTasksLists+268>}}, {
    uxNumberOfItems = 0, pxIndex = 0x20001878 <pxReadyTasksLists+288>, xListEnd = {xItemValue = 4294967295,
--Type <RET> for more, q to quit, c to continue without paging--
      pxNext = 0x20001878 <pxReadyTasksLists+288>, pxPrevious = 0x20001878 <pxReadyTasksLists+288>}}, {
    uxNumberOfItems = 0, pxIndex = 0x2000188c <pxReadyTasksLists+308>, xListEnd = {xItemValue = 4294967295,
      pxNext = 0x2000188c <pxReadyTasksLists+308>, pxPrevious = 0x2000188c <pxReadyTasksLists+308>}}, {
    uxNumberOfItems = 0, pxIndex = 0x200018a0 <pxReadyTasksLists+328>, xListEnd = {xItemValue = 4294967295,
      pxNext = 0x200018a0 <pxReadyTasksLists+328>, pxPrevious = 0x200018a0 <pxReadyTasksLists+328>}}, {
    uxNumberOfItems = 0, pxIndex = 0x200018b4 <pxReadyTasksLists+348>, xListEnd = {xItemValue = 4294967295,
      pxNext = 0x200018b4 <pxReadyTasksLists+348>, pxPrevious = 0x200018b4 <pxReadyTasksLists+348>}}, {
    uxNumberOfItems = 0, pxIndex = 0x200018c8 <pxReadyTasksLists+368>, xListEnd = {xItemValue = 4294967295,
      pxNext = 0x200018c8 <pxReadyTasksLists+368>, pxPrevious = 0x200018c8 <pxReadyTasksLists+368>}}, {
    uxNumberOfItems = 0, pxIndex = 0x200018dc <pxReadyTasksLists+388>, xListEnd = {xItemValue = 4294967295,
      pxNext = 0x200018dc <pxReadyTasksLists+388>, pxPrevious = 0x200018dc <pxReadyTasksLists+388>}}, {
    uxNumberOfItems = 0, pxIndex = 0x200018f0 <pxReadyTasksLists+408>, xListEnd = {xItemValue = 4294967295,
      pxNext = 0x200018f0 <pxReadyTasksLists+408>, pxPrevious = 0x200018f0 <pxReadyTasksLists+408>}}, {
    uxNumberOfItems = 0, pxIndex = 0x20001904 <pxReadyTasksLists+428>, xListEnd = {xItemValue = 4294967295,
      pxNext = 0x20001904 <pxReadyTasksLists+428>, pxPrevious = 0x20001904 <pxReadyTasksLists+428>}}, {
    uxNumberOfItems = 0, pxIndex = 0x20001918 <pxReadyTasksLists+448>, xListEnd = {xItemValue = 4294967295,
      pxNext = 0x20001918 <pxReadyTasksLists+448>, pxPrevious = 0x20001918 <pxReadyTasksLists+448>}}, {
    uxNumberOfItems = 0, pxIndex = 0x2000192c <pxReadyTasksLists+468>, xListEnd = {xItemValue = 4294967295,
      pxNext = 0x2000192c <pxReadyTasksLists+468>, pxPrevious = 0x2000192c <pxReadyTasksLists+468>}}, {
    uxNumberOfItems = 0, pxIndex = 0x20001940 <pxReadyTasksLists+488>, xListEnd = {xItemValue = 4294967295,
      pxNext = 0x20001940 <pxReadyTasksLists+488>, pxPrevious = 0x20001940 <pxReadyTasksLists+488>}}, {
    uxNumberOfItems = 0, pxIndex = 0x20001954 <pxReadyTasksLists+508>, xListEnd = {xItemValue = 4294967295,
      pxNext = 0x20001954 <pxReadyTasksLists+508>, pxPrevious = 0x20001954 <pxReadyTasksLists+508>}}, {
    uxNumberOfItems = 0, pxIndex = 0x20001968 <pxReadyTasksLists+528>, xListEnd = {xItemValue = 4294967295,
      pxNext = 0x20001968 <pxReadyTasksLists+528>, pxPrevious = 0x20001968 <pxReadyTasksLists+528>}}, {
    uxNumberOfItems = 0, pxIndex = 0x2000197c <pxReadyTasksLists+548>, xListEnd = {xItemValue = 4294967295,
      pxNext = 0x2000197c <pxReadyTasksLists+548>, pxPrevious = 0x2000197c <pxReadyTasksLists+548>}}, {
    uxNumberOfItems = 1, pxIndex = 0x20001990 <pxReadyTasksLists+568>, xListEnd = {xItemValue = 4294967295,
      pxNext = 0x200039a8 <ucHeap+5964>, pxPrevious = 0x200039a8 <ucHeap+5964>}}, {uxNumberOfItems = 0,
--Type <RET> for more, q to quit, c to continue without paging--
    pxIndex = 0x200019a4 <pxReadyTasksLists+588>, xListEnd = {xItemValue = 4294967295,
      pxNext = 0x200019a4 <pxReadyTasksLists+588>, pxPrevious = 0x200019a4 <pxReadyTasksLists+588>}}, {
    uxNumberOfItems = 1, pxIndex = 0x200019b8 <pxReadyTasksLists+608>, xListEnd = {xItemValue = 4294967295,
      pxNext = 0x20003098 <ucHeap+3644>, pxPrevious = 0x20003098 <ucHeap+3644>}}, {uxNumberOfItems = 1,
    pxIndex = 0x200019cc <pxReadyTasksLists+628>, xListEnd = {xItemValue = 4294967295,
      pxNext = 0x20002c10 <ucHeap+2484>, pxPrevious = 0x20002c10 <ucHeap+2484>}}}

Although, I can see this array is defined as pxReadyTasksLists[ configMAX_PRIORITIES ]; - so I guess each “slot” in this array represents a priority, and then uxNumberOfItems says how many tasks are there for a given priority.

If I try to reformat this array:

{
"00" {uxNumberOfItems = 2, pxIndex = 0x20001760 <pxReadyTasksLists+8>, xListEnd = {xItemValue = 4294967295, pxNext = 0x20004fb0 <ucHeap+11604>, pxPrevious = 0x20005438 <ucHeap+12764>}},
"01" {uxNumberOfItems = 0, pxIndex = 0x20001774 <pxReadyTasksLists+28>, xListEnd = {xItemValue = 4294967295, pxNext = 0x20001774 <pxReadyTasksLists+28>, pxPrevious = 0x20001774 <pxReadyTasksLists+28>}},
"02" { uxNumberOfItems = 0, pxIndex = 0x20001788 <pxReadyTasksLists+48>, xListEnd = {xItemValue = 4294967295, pxNext = 0x20001788 <pxReadyTasksLists+48>, pxPrevious = 0x20001788 <pxReadyTasksLists+48>}},
"03" { uxNumberOfItems = 0, pxIndex = 0x2000179c <pxReadyTasksLists+68>, xListEnd = {xItemValue = 4294967295, pxNext = 0x2000179c <pxReadyTasksLists+68>, pxPrevious = 0x2000179c <pxReadyTasksLists+68>}},
"04" { uxNumberOfItems = 0, pxIndex = 0x200017b0 <pxReadyTasksLists+88>, xListEnd = {xItemValue = 4294967295, pxNext = 0x200017b0 <pxReadyTasksLists+88>, pxPrevious = 0x200017b0 <pxReadyTasksLists+88>}},
"05" { uxNumberOfItems = 0, pxIndex = 0x200017c4 <pxReadyTasksLists+108>, xListEnd = {xItemValue = 4294967295, pxNext = 0x200017c4 <pxReadyTasksLists+108>, pxPrevious = 0x200017c4 <pxReadyTasksLists+108>}},
"06" { uxNumberOfItems = 0, pxIndex = 0x200017d8 <pxReadyTasksLists+128>, xListEnd = {xItemValue = 4294967295, pxNext = 0x200017d8 <pxReadyTasksLists+128>, pxPrevious = 0x200017d8 <pxReadyTasksLists+128>}},
"07" { uxNumberOfItems = 0, pxIndex = 0x200017ec <pxReadyTasksLists+148>, xListEnd = {xItemValue = 4294967295, pxNext = 0x200017ec <pxReadyTasksLists+148>, pxPrevious = 0x200017ec <pxReadyTasksLists+148>}},
"08" { uxNumberOfItems = 0, pxIndex = 0x20001800 <pxReadyTasksLists+168>, xListEnd = {xItemValue = 4294967295, pxNext = 0x20001800 <pxReadyTasksLists+168>, pxPrevious = 0x20001800 <pxReadyTasksLists+168>}},
"09" { uxNumberOfItems = 0, pxIndex = 0x20001814 <pxReadyTasksLists+188>, xListEnd = {xItemValue = 4294967295, pxNext = 0x20001814 <pxReadyTasksLists+188>, pxPrevious = 0x20001814 <pxReadyTasksLists+188>}},
"10" { uxNumberOfItems = 0, pxIndex = 0x20001828 <pxReadyTasksLists+208>, xListEnd = {xItemValue = 4294967295, pxNext = 0x20001828 <pxReadyTasksLists+208>, pxPrevious = 0x20001828 <pxReadyTasksLists+208>}},
"11" { uxNumberOfItems = 0, pxIndex = 0x2000183c <pxReadyTasksLists+228>, xListEnd = {xItemValue = 4294967295, pxNext = 0x2000183c <pxReadyTasksLists+228>, pxPrevious = 0x2000183c <pxReadyTasksLists+228>}},
"12" { uxNumberOfItems = 0, pxIndex = 0x20001850 <pxReadyTasksLists+248>, xListEnd = {xItemValue = 4294967295, pxNext = 0x20001850 <pxReadyTasksLists+248>, pxPrevious = 0x20001850 <pxReadyTasksLists+248>}},
"13" { uxNumberOfItems = 0, pxIndex = 0x20001864 <pxReadyTasksLists+268>, xListEnd = {xItemValue = 4294967295, pxNext = 0x20001864 <pxReadyTasksLists+268>, pxPrevious = 0x20001864 <pxReadyTasksLists+268>}},
"14" { uxNumberOfItems = 0, pxIndex = 0x20001878 <pxReadyTasksLists+288>, xListEnd = {xItemValue = 4294967295, pxNext = 0x20001878 <pxReadyTasksLists+288>, pxPrevious = 0x20001878 <pxReadyTasksLists+288>}},
"15" { uxNumberOfItems = 0, pxIndex = 0x2000188c <pxReadyTasksLists+308>, xListEnd = {xItemValue = 4294967295, pxNext = 0x2000188c <pxReadyTasksLists+308>, pxPrevious = 0x2000188c <pxReadyTasksLists+308>}},
"16" { uxNumberOfItems = 0, pxIndex = 0x200018a0 <pxReadyTasksLists+328>, xListEnd = {xItemValue = 4294967295, pxNext = 0x200018a0 <pxReadyTasksLists+328>, pxPrevious = 0x200018a0 <pxReadyTasksLists+328>}},
"17" { uxNumberOfItems = 0, pxIndex = 0x200018b4 <pxReadyTasksLists+348>, xListEnd = {xItemValue = 4294967295, pxNext = 0x200018b4 <pxReadyTasksLists+348>, pxPrevious = 0x200018b4 <pxReadyTasksLists+348>}},
"18" { uxNumberOfItems = 0, pxIndex = 0x200018c8 <pxReadyTasksLists+368>, xListEnd = {xItemValue = 4294967295, pxNext = 0x200018c8 <pxReadyTasksLists+368>, pxPrevious = 0x200018c8 <pxReadyTasksLists+368>}},
"19" { uxNumberOfItems = 0, pxIndex = 0x200018dc <pxReadyTasksLists+388>, xListEnd = {xItemValue = 4294967295, pxNext = 0x200018dc <pxReadyTasksLists+388>, pxPrevious = 0x200018dc <pxReadyTasksLists+388>}},
"20" { uxNumberOfItems = 0, pxIndex = 0x200018f0 <pxReadyTasksLists+408>, xListEnd = {xItemValue = 4294967295, pxNext = 0x200018f0 <pxReadyTasksLists+408>, pxPrevious = 0x200018f0 <pxReadyTasksLists+408>}},
"21" { uxNumberOfItems = 0, pxIndex = 0x20001904 <pxReadyTasksLists+428>, xListEnd = {xItemValue = 4294967295, pxNext = 0x20001904 <pxReadyTasksLists+428>, pxPrevious = 0x20001904 <pxReadyTasksLists+428>}},
"22" { uxNumberOfItems = 0, pxIndex = 0x20001918 <pxReadyTasksLists+448>, xListEnd = {xItemValue = 4294967295, pxNext = 0x20001918 <pxReadyTasksLists+448>, pxPrevious = 0x20001918 <pxReadyTasksLists+448>}},
"23" { uxNumberOfItems = 0, pxIndex = 0x2000192c <pxReadyTasksLists+468>, xListEnd = {xItemValue = 4294967295, pxNext = 0x2000192c <pxReadyTasksLists+468>, pxPrevious = 0x2000192c <pxReadyTasksLists+468>}},
"24" { uxNumberOfItems = 0, pxIndex = 0x20001940 <pxReadyTasksLists+488>, xListEnd = {xItemValue = 4294967295, pxNext = 0x20001940 <pxReadyTasksLists+488>, pxPrevious = 0x20001940 <pxReadyTasksLists+488>}},
"25" { uxNumberOfItems = 0, pxIndex = 0x20001954 <pxReadyTasksLists+508>, xListEnd = {xItemValue = 4294967295, pxNext = 0x20001954 <pxReadyTasksLists+508>, pxPrevious = 0x20001954 <pxReadyTasksLists+508>}},
"26" { uxNumberOfItems = 0, pxIndex = 0x20001968 <pxReadyTasksLists+528>, xListEnd = {xItemValue = 4294967295, pxNext = 0x20001968 <pxReadyTasksLists+528>, pxPrevious = 0x20001968 <pxReadyTasksLists+528>}},
"27" { uxNumberOfItems = 0, pxIndex = 0x2000197c <pxReadyTasksLists+548>, xListEnd = {xItemValue = 4294967295, pxNext = 0x2000197c <pxReadyTasksLists+548>, pxPrevious = 0x2000197c <pxReadyTasksLists+548>}},
"28" { uxNumberOfItems = 1, pxIndex = 0x20001990 <pxReadyTasksLists+568>, xListEnd = {xItemValue = 4294967295, pxNext = 0x200039a8 <ucHeap+5964>, pxPrevious = 0x200039a8 <ucHeap+5964>}},
"29" {uxNumberOfItems = 0, pxIndex = 0x200019a4 <pxReadyTasksLists+588>, xListEnd = {xItemValue = 4294967295, pxNext = 0x200019a4 <pxReadyTasksLists+588>, pxPrevious = 0x200019a4 <pxReadyTasksLists+588>}},
"30" { uxNumberOfItems = 1, pxIndex = 0x200019b8 <pxReadyTasksLists+608>, xListEnd = {xItemValue = 4294967295, pxNext = 0x20003098 <ucHeap+3644>, pxPrevious = 0x20003098 <ucHeap+3644>}},
"31" {uxNumberOfItems = 1, pxIndex = 0x200019cc <pxReadyTasksLists+628>, xListEnd = {xItemValue = 4294967295, pxNext = 0x20002c10 <ucHeap+2484>, pxPrevious = 0x20002c10 <ucHeap+2484>}}
} 

… I can see there are

  • two tasks with priority 0 (IDLE); then
  • one task with priority 31 (configMAX_PRIORITIES-1),
  • one task with priority 30 (configMAX_PRIORITIES-2),
  • NO tasks with priority 29 (configMAX_PRIORITIES-3) !!!,
  • one task with priority 28 (configMAX_PRIORITIES-4) (this should have been the problematic my_printer_task)

Strangely, the priority 29 task is actually what I’ve called above “working_02_task” - which I’ve already confirmed that it runs via the breakpoint (and it seemingly even pushes data on the queue without a problem), so I’m kinda puzzled by the fact, that the pxReadyTasksLists has no tasks with this priority (maybe because it is not “ready”?) - unless this array changes the uxNumberOfItems dynamically depending on “ready” status …

Thanks for the assistance so far - and if there are any more suggestions, I’d love to hear them!

Is it possible that other high priority tasks are never relinquishing the CPU and as a result, this task never gets a chance to run. Can you confirm by increasing the priority of this task?

Many thanks @aggarg - this was a step in the right direction:

Is it possible that other high priority tasks are never relinquishing the CPU and as a result, this task never gets a chance to run. Can you confirm by increasing the priority of this task?

Thanks for this - yes, it is possible that other high priority tasks never relinquish the CPU since indeed, I did not think about this at all. I’m just starting with FreeRTOS, so I’m yet to adjust my thinking to “task priorities”.

Anyways, I tried changing the priorities of the problematic my_printer_task first to configMAX_PRIORITIES-3, then to configMAX_PRIORITIES-2, and there was no change (same behavior as earlier - task does not start).

However, then I tried to experiment, and changed vTaskCoreAffinitySet( my_printer_taskhandle, 0b01 ); which would run the task on CPU core 0 - to vTaskCoreAffinitySet( my_printer_taskhandle, 0b10 );, which would run the task on CPU core 1.

And - shiver me timbers! - I can now, via breakpoint on function name (entry), confirm that this task is started up in gdb (regardless of priority)!

Reading symbols from MY_TEST_PROGRAM.elf...
Remote debugging using localhost:3333
warning: multi-threaded target stopped without sending a thread-id, using first non-exited thread
0x100031d0 in vPortRecursiveLock (uxAcquire=1, pxSpinLock=0xd000013c, ulLockNum=1)
    at C:/path/to/FreeRTOS-Kernel-SMP/portable/ThirdParty/GCC/RP2040/include/portmacro.h:192
192                     while ( __builtin_expect( !*pxSpinLock, 0 ) );

(gdb) b my_printer_task
Breakpoint 1 at 0x10000f30: file C:/path/to/MY_TEST_PROGRAM/my_printer_task.c, line 14.
Note: automatically using hardware breakpoints for read-only addresses.
(gdb) r
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: C:\path\to\MY_TEST_PROGRAM\build\MY_TEST_PROGRAM.elf

[New Thread 2]
[Switching to Thread 2]

Thread 2 hit Breakpoint 1, my_printer_task (pvParameters=0x0) at C:/path/to/MY_TEST_PROGRAM/my_printer_task.c:14
14      void my_printer_task( void* pvParameters ) {
(gdb) c
Continuing.

However, now the problem is that (one of) my other tasks, that wait on other queues, end up spinning in xQueueReceive permanently, and never relinquish, and so the application as a whole does not run :frowning:

At least, now I know I should re-think how the CPU core allocation is done.

Btw, this comment made me think: so far, I had always assumed, that as long as there are any tasks (even low priority ones), eventually the scheduler would allow them to run, even without a specific yield or task notification.

But now, I’ve realized that I have configUSE_PREEMPTION 1 in my config, and taskYIELD - Task Control Functions and Macros for the Free Open Source RTOS FreeRTOS mentions (emphasis mine):

taskYIELD() is used to request a context switch to another task. However, if there are no other tasks at a higher or equal priority to the task that calls taskYIELD() then the RTOS scheduler will simply select the task that called taskYIELD() to run again.

If configUSE_PREEMPTION is set to 1 then the RTOS scheduler will always be running the highest priority task that is able to run, so calling taskYIELD() will never result in a switch to a higher priority task.

So, even if I thought of using taskYIELD() to “force” my_printer_task to run, it would never have worked in the OP context, as my_printer_task has lower priority than the tasks where I’d have considered calling yield from.

(then again, I also have a task on cpu core 1, with tskIDLE_PRIORITY+1 priority, and that one always ran fine, even if it had very low priority - and no specific yield/signalling either).

So to summarize: if I instantiate this problematic my_printer_task with affinity for cpu core 0, and regardless of priority, it never gets started at all; if I change the affinity to cpu core 1, then it does get started, but all other tasks get into a mess, and program as a whole stops running.

I find this slightly surprising, as I would have expected the scheduler to start tasks always regardless (that is, at least the function representing the task is called at least once), and then suspends the tasks (as per priority and other factors), if they are not supposed to run at a given time.

However, it seems that - either at compile time or at runtime - the scheduler does some heuristics, and if it determines “something” about a task, then it does not even start the task at all (as in, the function representing the task is never called/ran) – is that correct?

FreeRTOS doesn’t analyze your tasks to change its behavior. It uses the basic rule that when it is time to run the schedule, it Chooses the highest priority ready task, and in the case of ties, the first one on the list of that priority which will be the one waiting the longest.

Yield doesn’t make a task “unready”, just runs the scheduler, so it will NEVER switch to a lower priority task, as it is still ready.

Generally ALL Task must “block” on something and make themselves “unready” (be it for a time or a queue or semaphore). The ONLY exception should be tasks at priority 0 (Idle), which can run continuously, and if you have time slicing, (or they occasionally yield) will switch among themselves to use up remaing time.

As Richard explained, all the scheduler does is, it picks the highest priority ready task.

This indicates that the actual problem is elsewhere. It is hard to make a guess without looking at the code - can you remove all your proprietary code and provide a minimal project which just uses FreeRTOS constructs and demonstrates the problem?

Thanks.

Many thanks for the replies, all - much appreciated!

Thanks for these remarks, good to have this clarified.

I guess one part of my confusion is when “highest priority” gets mentioned, then I always think “but hey, I’ve got a low priority task here, how comes it runs?!” - then I have to remember that a condition is possible where all the higher-priority tasks are actually blocking on something, and the only “highest priority” task which is ready is the low priority task (or the idle one).

From my limited reading so far, I also managed to build more-less the same mental model for tasks, however I was not sure how correct it is - thanks for confirming this.

I will keep this in mind as well.

I tried to - and it turns out, I cannot get really get it to run, even with what I thought was the original problem (my_printer_task) removed.

Ok, so here is the code:

Basically, what I’m trying to show is:

  • There is a LED task, which simply blinks the LED, all day long … but it also starts the ISR
  • There is some ISR running, generating data - here simulated with a timer
  • This ISR writes to queue_01, and thereby wakes working_01_task, which stores the data from the ISR in different buffers, then depending on some condition, wants to wake either working_02_task to handle one of the buffers, or working_03_task to handle the other. This signalling I tried to do via task notifications
  • To give them some busywork, here working_02_taskand working_03_task merely should write out string reports based on their respective buffer when triggered
    • When done, as a final step, I wanted these tasks to signal a final task, my_printer_task - here by writing a “task id” number to another queue, queue_02 - which would then printf() the corresponding string report (depending on which task id was read from the queue). This part is, however, undefined away in the posted code.

The code has some debug pin instrumentation, and here is a capture:

So if one looks from top:

  • led_task starts first, and indeed manages to start timer_alarm_isr
  • timer alarm_isr does indeed manage to (seemingly) trigger the worker_01_task
  • however, worker_02_task starts pulsing randomly waaaaaaay before led_task ever started!
  • Something also managed to trigger worker_03_task (but I’ve taken several of these captures, and it does not always happen)

However, it is clear, that after a couple of runs of the timer isr, the whole program crashes. This happens:

  • Regardless if a printf is fully undef’d (there are more pulsings and everything happens more quickly, but we still run into a crash)
  • Regardless if I use indexed task notification or unindexed/not indexed

Somehow, it looks to me that task notification fails, since worker_02_task simply goes wild, as if it does not block waiting on anything - whereas worker_03_task might react, but usually the crash is not soon after.

(Strange because I have more/less same setup in my proprietary code do, which does a lot more with printfs all over the place, and it works?!)

Would anyone have any comments on these results?

I think the non-blocking xQueueReceive (timeout set to 0) in working_01_task causes some confusion because in the following code the other tasks are notified.
As far as i can see there is basically no synchronization just wild triggering/notifying the other tasks.
Just a few additional minor hints.
You should initialize BaseType_t xHigherPriorityTaskWoken = pdFALSE;
I’d recommend enabling configUSE_PORT_OPTIMISED_TASK_SELECTION if there not more than 32 tasks. It’s much faster than the non-optimized code.

Yes, you almost NEVER should use a 0 timeout for an operation unless you really mean I want to test if there is something, and if not I have something else to do, and will be blocking on something else later.

Hi all,

Thanks again for all the comments - they helped me clarify things quite a bit.

I’ve also been forced to revise what it is I was expecting from the code in post /8: essentially, I was hoping to achieve a “cascade” of tasks: ISR trigger working_01_task as-soon-as-possible; working_01_task triggers working_02_task or working_03_task again ASAP; obviously, this is not what happens in post /8.

So, I’ve done some three experiments, which I’ve added as revisions to the gist in post /8. And just to make sure, the correct gist link for post /8 is now:

https://gist.github.com/sdbbs/652d4abeda027999be9245db0035c78f/d7b41f932e54fd3c5f51230a3ea243c8388b0f3e

I will try to provide a discussion of my experiments below, and while this thread is already getting kinda heavy - I really hope I can get some feedback, especially about the stuff I still might have misunderstood. On the other hand, all of my experiments still end up into some sort of a deadlock after some 100 ms, and I would very much hope for some hints related to that.

Thanks a ton; in my exercises so far, I was careful to copy-paste in that form, I cannot tell why I had failed in that example. In any case, I corrected this.

Many thanks for noticing this! I did not think much about this, possibly because I might have encountered other (unrelated) code, where timeout==0 meant “block indefinitely”, and due to this mixup, failed to pay enough attention.


So, first, I made xQueueReceive timeout with portMAX_DELAY … and nothing worked ( I believe I’d just get a single transition for led_task, and that’s it ).

Then, I made xQueueReceive timeout with 1 (1 FreeRTOS tick, which in my case should mean 1 ms). Now things sort of started working, as I could see pulsings for the tasks, before the program reaches “deadlock”.

However, what I found strange, was that the pulses for working_01_task were mostly high, which I really did not expect. So I ended up into a bit of a confusion myself - until I decided to repurpose GP5 pin, to toggle after each and every command in the working_01_task.

This is now in the gist revision:

https://gist.github.com/sdbbs/652d4abeda027999be9245db0035c78f/93dea52c40d141874f303d2ac0f58190ac1dd253

And this is how the pulses at start look like:

So basically, this is the trouble: I wanted working_01_task to react immediately to data from the ISR; but I also wanted to a-priori handle the situation where working_01_task might have been prevented to handle ISR, and therefore there are more than one items in the queue - which is why I wanted to “flush” the queue in working_01_task.

Now, this first logical problem is, that in this program, there is nothing else that could pre-empt working_01_task in reacting: working_01_task has the highest priority!

But leaving that logical problem aside, the immediate technical problem is that I tried to “flush the queue” as I had historically done in non-FreeRTOS code when “flushing a buffer”: read bytes in a while loop until the buffer is empty; and basically I thought this while loop would do it:

    while (xQueueReceive(queue_01, (void *)&queuedataitem, 1) == pdTRUE) {
    ...

This however, does not flush the queue per se in this case; what I think happens now is:

  • We end up at the while (xQueueReceive...) of working_01_task - it starts blocking due to timeout
  • Since there has already been one ISR completed before the first entry into while (xQueueReceive...) - xQueueReceive returns having read the byte (note the first “thick” edge of “GP5 / (working_01_task)”
  • Rest of the code inside the while (xQueueReceive...) runs, and the loop ends
  • We go back to the while (xQueueReceive...) blocking;
    • Now, at first I would have expected this would return pdFalse, as no ISR hit in the meantime, and queue should be empty
    • However, since now our xQueueReceive has a timeout, it keeps blocking
    • And since this is already the highest priority task, there is no other higher-or-equal priority task to yield to, while this xQueueReceive is blocking!
  • Eventually new ISR hits, queue is populated again,
  • xQueueReceive returns from blocking with pdTRUE, and the new item on the queue is handled
  • … and eventually, again we go back to the while (xQueueReceive...) blocking due to timeout!

So, looking at this interpretation, my first impression would be, that also this code can in principle indefinitely.

So I am kinda puzzled why at a certain point, the xQueueReceive returns pdFALSE at all?

The image suggests that working_01_task holds high for approx 1 ms (I’ve also seen longer durations, that are sort of around 2 ms) - basically a multiple of the tick.

So, even if I would at first assume, that each time we hit xQueueReceive, the 1 tick/1 ms timeout runs from start - it seems as if it starts only upon first call? And then, as long as we get pdTRUE, we don’t really re-start the timeout?!

In any case, as this does not really look like a proper “flush”, or a proper “cascade” of tasks, I decided to revise.


So, in revision https://gist.github.com/sdbbs/652d4abeda027999be9245db0035c78f/384f208147c6ecf54a232469c8bebd91cbd667a0, in working_01_task:

  • At start, the amount of items in queue is found
  • A while loop reads exactly those amount of items, using xQueueReceive with timeout 0, so it doesn’t block
  • Items are handled, other tasks signaled
  • Then working_01_task should “yield” to other tasks
    • however, since working_01_task is the only task with highest priority, calling taskYIELD will be useless
    • So the next best thing is to have the task “sleep” for a tick with vTaskDelay((TickType_t) 1);

This is how code pulses at start:

So, now indeed I have achieved a “cascade”:

  • Either ISR → working_01_taskworking_02_task ASAP;
  • Or ISR → working_01_taskworking_03_task ASAP

However, now working_01_task also explicitly obeys the delay, and indeed it only runs each tick (1ms), allowing ISR to fill the queue with approx 3-4 items.

This is maybe not bad in itself - but I was still wondering, whether I could achieve “cascade” of tasks one after another ASAP, while still having working_01_task react immediately after every ISR.

Before I get to that, I should mention that this code also ends up in a “deadlock” - but here, when the deadlock happens, basically the ISR stops, while working_01_task keeps running indefinitely (except since there is no ISR anymore, there is no data in queue either, and so the other tasks never get called).


So, to make working_01_task answer ASAP, I first thought about “cancelling” the vTaskDelay; there is xTaskAbortDelay – however, I’d have to call it from ISR, and there is no “*FromISR” version of this function.

So, I thought, the next best thing would be to:

  • Have working_01_task “yield” by suspending itself via vTaskSuspend
  • Have the ISR “wake up” the working_01_task via xTaskResumeFromISR

This is in the gist revision: https://gist.github.com/sdbbs/652d4abeda027999be9245db0035c78f/ee51b8b64e107542dec91b1b68de21062205c565

This is how the pulsings behave at start:

Finally, I do have immediate response to ISR from working_01_task, and a “cascade” to the other task(s) (they run ASAP after working_01_task)

Except, - and this I didn’t take into account at first - now that working_01_task can explicitly handle each ISR, the number of processed queue items in this task (reccount) is always 1, and therefore only working_03_task gets called.

And in this sense, my original expectation that I could achieve both ASAP reaction of working_01_task to the ISR, and “cascade”/ASAP reaction of both working_02_task and working_03_task alternately, was not thought through very well.

But at least, I think I’m better aware of the pitfalls in how FreeRTOS tasks are scheduled and when they run, so I can re-think this better.

However, now we come to the actual problem:


All of these three variants start up as shown on respective screenshots - but eventually, after some time, there occurs a brief period of time where apparently all interrupts and tasks stop running; after that, there are a couple of more runs of the ISR, and then the ISR stops. Depending on each variant it is, this also means that either all tasks stop running (or in one of the examples, as mentioned, working_01_task can keep running indefinitely).

Here is how this looks for the final variant:

So, code runs for about 90 ms, then for some reason, ISR and tasks stop for around 1.4 ms, then we have two more hits of the ISR, and then ISR stops - and in this case, since the ISR “wakes” working_01_task (using xTaskResumeFromISR), no other tasks are running either.

I have no idea why this happens; the backtrace I get from gdb/openocd here is:

Remote debugging using localhost:3333
warning: multi-threaded target stopped without sending a thread-id, using first non-exited thread
vPortRecursiveLock (uxAcquire=1, pxSpinLock=0xd000013c, ulLockNum=1)
    at C:/path/to/FreeRTOS-Kernel-SMP/portable/ThirdParty/GCC/RP2040/include/portmacro.h:192
192                     while ( __builtin_expect( !*pxSpinLock, 0 ) );
(gdb) bt
#0  vPortRecursiveLock (uxAcquire=1, pxSpinLock=0xd000013c, ulLockNum=1)
    at C:/path/to/FreeRTOS-Kernel-SMP/portable/ThirdParty/GCC/RP2040/include/portmacro.h:192
#1  vTaskSwitchContext (xCoreID=0) at C:/path/to/FreeRTOS-Kernel-SMP/tasks.c:3880
#2  0x10000816 in isr_pendsv () at C:/path/to/FreeRTOS-Kernel-SMP/portable/ThirdParty/GCC/RP2040/port.c:402
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) c
Continuing.

I’m not sure how accurate this is, but it seems FreeRTOS ends up in a state waiting for a spinlock …

So, to summarize my questions for this:

  • Can anyone see any obvious errors in my understanding of the above three examples, and in that case, help me get to the correct understanding?
  • Does anyone have an explanation of why does the code end up in a “deadlock”/waiting for a spinlock after some milliseconds of running, and a suggestion on how can I prevent it?

Flushing a queue is one of the valid cases of using a 0 timeout, if something is in the queue you get it, and you can get another, until you get a 0 then the queue is empty. But, if you never want more than one item in the queue, just make the queue length 1, and you can’t overfill it. (And you can then use xQueueOverwrite[FromISR] to put the item on it to make it always the most recent).

As for the deadlock, I am having a bit of trouble following all you are doing, but using a spin-lock in an ISR is a bit dangerous, as it effectively reduces the number of CPUs available to resolve the spinlock by one (as that CPU will never switch to a task to resolve the issue) so if all CPUs get into such a spinlock, you are deadlocked, that says that in the SMP version of FreeRTOS, using things based on spinlocks in an ISR are not really safe unless care is taken to avoid that problem.

1 Like

Hi all,

Many thanks for the response, @richard-damon - it seems I finally got further, so I’d like to document this.

Indeed - so let me try to summarize:

First post (OP) was me asking a generic question, which opened up and clarified some misunderstandings that I’ve had - but there was no definite answer on the “single task does not start” problem.

So then, I came up with a “minimal example” that tried to emulate the original problem, which is in the gist; it has a timer ISR that generates random data; a led_task which starts the timer ISR, but is otherwise independent; the timer ISR pushes data to working_task_01 via a queue; and depending on the data, working_01_task wakes either working_02_task or working_03_task. To demonstrate my proprietary problem, I would have had to have one more task - but the problem was, that even without it, the described setup would start, and after some milliseconds, the program would stop running (as visible from the “oscilloscope” capture of the GPIO pins).

So, what is “solved” now is this gist example; as deduced in RP2040, FreeRTOS SMP and Timer ISR , it turns out that the problem was not in the code itself, but in the way the code is restarted via “reset” of the MCU: what works is that either you connect with gdb over openocd to the MCU, and issue gdb run to restart the code - or, if using just openocd to restart the MCU, one must make sure that core 1 is started before core 0:

/c/path/to/openocd/src/openocd.exe -s /c/path/to/openocd/tcl -f interface/picoprobe.cfg -f target/rp2040.cfg -c "init ; reset halt ; rp2040.core1 arp_reset assert 0 ; rp2040.core0 arp_reset assert 0 ; exit"

I don’t understand why core 1 needs to be started first, but it seems to work; for instance, here is the “oscilloscope” trace of the GPIO pins of the program from post #11, for the last gist revision ( ee51b8 ):

Note the timing axis - this screenshot means that the tasks started running continuously, and there is no “stop” after some 90 ms; which from my perspective, means that the code is working.

Again, there was no change in the code from gist rev. ee51b8 to obtain working behavior - I just used the one-liner openocd command for reset that starts core 1 before core 0 (instead of the classic init ; reset ; exit).

So, I will mark this post as a solution, but it is a solution to post #11; the OP cannot really be answered, as I’ve had too many understandings when I was writing that question. If I ever come to the similar problem, I’ll make a new example, and post a new question.

At first, my initial response to this was “but, I am not using a spinlock in an ISR anywhere?! Backtrace shows it stems from isr_pendsvvTaskSwitchContextvPortRecursiveLock”; but upon later reading, I did notice " in the SMP version of FreeRTOS"…

In any case, it seems that was not the problem with the deadlock - again, it seems that one simply needs to ensure that core 1 is started before core 0 when restarting the code (by resetting the chip)… although I’m not 100% sure myself, as I haven’t had any confirmation about that from others - simply my observations of how the code behaves.

Thanks, that’s a great tip!

And thanks again, all, for the assistance!

Thank you for taking time to report your solution.

1 Like