A single FreeRTOS task does not start - debugging suggestions?

sdbbs · June 1, 2022, 1:37pm

Hi all,

Thanks again for all the comments - they helped me clarify things quite a bit.

I’ve also been forced to revise what it is I was expecting from the code in post /8: essentially, I was hoping to achieve a “cascade” of tasks: ISR trigger working_01_task as-soon-as-possible; working_01_task triggers working_02_task or working_03_task again ASAP; obviously, this is not what happens in post /8.

So, I’ve done some three experiments, which I’ve added as revisions to the gist in post /8. And just to make sure, the correct gist link for post /8 is now:

https://gist.github.com/sdbbs/652d4abeda027999be9245db0035c78f/d7b41f932e54fd3c5f51230a3ea243c8388b0f3e

I will try to provide a discussion of my experiments below, and while this thread is already getting kinda heavy - I really hope I can get some feedback, especially about the stuff I still might have misunderstood. On the other hand, all of my experiments still end up into some sort of a deadlock after some 100 ms, and I would very much hope for some hints related to that.

Thanks a ton; in my exercises so far, I was careful to copy-paste in that form, I cannot tell why I had failed in that example. In any case, I corrected this.

Many thanks for noticing this! I did not think much about this, possibly because I might have encountered other (unrelated) code, where timeout==0 meant “block indefinitely”, and due to this mixup, failed to pay enough attention.

So, first, I made xQueueReceive timeout with portMAX_DELAY … and nothing worked ( I believe I’d just get a single transition for led_task, and that’s it ).

Then, I made xQueueReceive timeout with 1 (1 FreeRTOS tick, which in my case should mean 1 ms). Now things sort of started working, as I could see pulsings for the tasks, before the program reaches “deadlock”.

However, what I found strange, was that the pulses for working_01_task were mostly high, which I really did not expect. So I ended up into a bit of a confusion myself - until I decided to repurpose GP5 pin, to toggle after each and every command in the working_01_task.

This is now in the gist revision:

https://gist.github.com/sdbbs/652d4abeda027999be9245db0035c78f/93dea52c40d141874f303d2ac0f58190ac1dd253

And this is how the pulses at start look like:

So basically, this is the trouble: I wanted working_01_task to react immediately to data from the ISR; but I also wanted to a-priori handle the situation where working_01_task might have been prevented to handle ISR, and therefore there are more than one items in the queue - which is why I wanted to “flush” the queue in working_01_task.

Now, this first logical problem is, that in this program, there is nothing else that could pre-empt working_01_task in reacting: working_01_task has the highest priority!

But leaving that logical problem aside, the immediate technical problem is that I tried to “flush the queue” as I had historically done in non-FreeRTOS code when “flushing a buffer”: read bytes in a while loop until the buffer is empty; and basically I thought this while loop would do it:

    while (xQueueReceive(queue_01, (void *)&queuedataitem, 1) == pdTRUE) {
    ...

This however, does not flush the queue per se in this case; what I think happens now is:

We end up at the while (xQueueReceive...) of working_01_task - it starts blocking due to timeout
Since there has already been one ISR completed before the first entry into while (xQueueReceive...) - xQueueReceive returns having read the byte (note the first “thick” edge of “GP5 / (working_01_task)”
Rest of the code inside the while (xQueueReceive...) runs, and the loop ends
We go back to the while (xQueueReceive...) blocking;
- Now, at first I would have expected this would return pdFalse, as no ISR hit in the meantime, and queue should be empty
- However, since now our xQueueReceive has a timeout, it keeps blocking
- And since this is already the highest priority task, there is no other higher-or-equal priority task to yield to, while this xQueueReceive is blocking!
Eventually new ISR hits, queue is populated again,
xQueueReceive returns from blocking with pdTRUE, and the new item on the queue is handled
… and eventually, again we go back to the while (xQueueReceive...) blocking due to timeout!

So, looking at this interpretation, my first impression would be, that also this code can in principle indefinitely.

So I am kinda puzzled why at a certain point, the xQueueReceive returns pdFALSE at all?

The image suggests that working_01_task holds high for approx 1 ms (I’ve also seen longer durations, that are sort of around 2 ms) - basically a multiple of the tick.

So, even if I would at first assume, that each time we hit xQueueReceive, the 1 tick/1 ms timeout runs from start - it seems as if it starts only upon first call? And then, as long as we get pdTRUE, we don’t really re-start the timeout?!

In any case, as this does not really look like a proper “flush”, or a proper “cascade” of tasks, I decided to revise.

So, in revision https://gist.github.com/sdbbs/652d4abeda027999be9245db0035c78f/384f208147c6ecf54a232469c8bebd91cbd667a0, in working_01_task:

At start, the amount of items in queue is found
A while loop reads exactly those amount of items, using xQueueReceive with timeout 0, so it doesn’t block
Items are handled, other tasks signaled
Then working_01_task should “yield” to other tasks
- however, since working_01_task is the only task with highest priority, calling taskYIELD will be useless
- So the next best thing is to have the task “sleep” for a tick with vTaskDelay((TickType_t) 1);

This is how code pulses at start:

So, now indeed I have achieved a “cascade”:

Either ISR → working_01_task → working_02_task ASAP;
Or ISR → working_01_task → working_03_task ASAP

However, now working_01_task also explicitly obeys the delay, and indeed it only runs each tick (1ms), allowing ISR to fill the queue with approx 3-4 items.

This is maybe not bad in itself - but I was still wondering, whether I could achieve “cascade” of tasks one after another ASAP, while still having working_01_task react immediately after every ISR.

Before I get to that, I should mention that this code also ends up in a “deadlock” - but here, when the deadlock happens, basically the ISR stops, while working_01_task keeps running indefinitely (except since there is no ISR anymore, there is no data in queue either, and so the other tasks never get called).

So, to make working_01_task answer ASAP, I first thought about “cancelling” the vTaskDelay; there is xTaskAbortDelay – however, I’d have to call it from ISR, and there is no “*FromISR” version of this function.

So, I thought, the next best thing would be to:

Have working_01_task “yield” by suspending itself via vTaskSuspend
Have the ISR “wake up” the working_01_task via xTaskResumeFromISR

This is in the gist revision: https://gist.github.com/sdbbs/652d4abeda027999be9245db0035c78f/ee51b8b64e107542dec91b1b68de21062205c565

This is how the pulsings behave at start:

Finally, I do have immediate response to ISR from working_01_task, and a “cascade” to the other task(s) (they run ASAP after working_01_task)

Except, - and this I didn’t take into account at first - now that working_01_task can explicitly handle each ISR, the number of processed queue items in this task (reccount) is always 1, and therefore only working_03_task gets called.

And in this sense, my original expectation that I could achieve both ASAP reaction of working_01_task to the ISR, and “cascade”/ASAP reaction of both working_02_task and working_03_task alternately, was not thought through very well.

But at least, I think I’m better aware of the pitfalls in how FreeRTOS tasks are scheduled and when they run, so I can re-think this better.

However, now we come to the actual problem:

All of these three variants start up as shown on respective screenshots - but eventually, after some time, there occurs a brief period of time where apparently all interrupts and tasks stop running; after that, there are a couple of more runs of the ISR, and then the ISR stops. Depending on each variant it is, this also means that either all tasks stop running (or in one of the examples, as mentioned, working_01_task can keep running indefinitely).

Here is how this looks for the final variant:

So, code runs for about 90 ms, then for some reason, ISR and tasks stop for around 1.4 ms, then we have two more hits of the ISR, and then ISR stops - and in this case, since the ISR “wakes” working_01_task (using xTaskResumeFromISR), no other tasks are running either.

I have no idea why this happens; the backtrace I get from gdb/openocd here is:

Remote debugging using localhost:3333
warning: multi-threaded target stopped without sending a thread-id, using first non-exited thread
vPortRecursiveLock (uxAcquire=1, pxSpinLock=0xd000013c, ulLockNum=1)
    at C:/path/to/FreeRTOS-Kernel-SMP/portable/ThirdParty/GCC/RP2040/include/portmacro.h:192
192                     while ( __builtin_expect( !*pxSpinLock, 0 ) );
(gdb) bt
#0  vPortRecursiveLock (uxAcquire=1, pxSpinLock=0xd000013c, ulLockNum=1)
    at C:/path/to/FreeRTOS-Kernel-SMP/portable/ThirdParty/GCC/RP2040/include/portmacro.h:192
#1  vTaskSwitchContext (xCoreID=0) at C:/path/to/FreeRTOS-Kernel-SMP/tasks.c:3880
#2  0x10000816 in isr_pendsv () at C:/path/to/FreeRTOS-Kernel-SMP/portable/ThirdParty/GCC/RP2040/port.c:402
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) c
Continuing.

I’m not sure how accurate this is, but it seems FreeRTOS ends up in a state waiting for a spinlock …

So, to summarize my questions for this:

Can anyone see any obvious errors in my understanding of the above three examples, and in that case, help me get to the correct understanding?
Does anyone have an explanation of why does the code end up in a “deadlock”/waiting for a spinlock after some milliseconds of running, and a suggestion on how can I prevent it?