System goes sleep while a task is still active

grygorek wrote on Monday, October 10, 2016:

Hi,

I am using Cortex M0+ (SAMR21) with FreeRTOS v9.0.0

I can observe strange behaviuor while using queues/events and sleep mode.

I have one task that it sends something over i2c bus. The task waits for an interrupt. The interrupt sends an event and the task is woken up.

In the pseudo code it looks like this:

void task()
{
	while(1)
	{
		i2c_send();
		if( error == xSemaphoreTake(handle, wait2ticks) )
			Log(timeout);
		sleep(300ms);
	}
}

void i2c_irq()
{
	xSemaphoreGiveFromISR(handle);
}

The ‘task’ runs with the same priority as Idle task. The i2c_send sends very short message. Very often the irq is called while the xSemaphoreTake is stil being executed, not always though. So far this part runs correctly - if the task is put to a wait state, the irq brings it back to a ready state. And the same, if the irq comes first the xSemaphoreTake does not wait.

The strange thing is that sometimes the system goes to a very long sleep even though the ‘task’ is woken up and system should not go to sleep.

I spent some time trying to debug this and I think this is what happens.

When the xSemaphoreTake is executed the scheduler is suspnded and the events queue is locked. The pxReadyTasksLists[tskIDLE_PRIORITY].uxNumberOfItems is 1 (only the idle on the list). The ‘task’ is put to a wait list. When the interrupt arrives, it cannot access the queue because it is locked. It increments a counter.
At the end when the interrupt returns the xSemaphoreTake knows that the event has arrived. It moves the task from the waiting list to a pending ready and then to a ready tasks list. So, the usNumberOfItems is 2 now. All sounds correct for now. Cant see any issue here.

So, what happens in the idle task. The Idle task calls the prvGetExpectedIdleTime to calculate the number of ticks to sleep over. This function checks the pxReadyTasksLists if there is more tasks than just the idle one. If there is it prevents the system from going to sleep. Lets assume there is only the Idle task there and the ‘task’ is not ready yet. The system enters the portSUPPRESS_TICKS_AND_SLEEP function. This function disables global interrupts and calls eTaskConfirmSleepModeStatus. It should abort sleep if any of the tasks have awaken. However, this function checks only xPendingReadyList. It does not check the pxReadyTasksLists. It also checks if the xYieldPending is set to true.

I do not know how it happens in my system but between the prvGetExpectedIdleTime
and the eTaskConfirmSleepModeStatus the ‘task’ is moved from the waiting list
to the ready list. I think the course of events is:
a) prvGetExpectedIdleTime is called; the task is waiting for an event so the sleep time is the max allowed one.
b) interrupt happens, the task is moved to the ready list
c) eTaskConfirmSleepModeStatus does not see any new tasks woken up
d) system goes to sleep
e) game over (event has already arrived, system sleeps forever)

For the experiment I extended the interrupts disable section to include the prvGetExpectedIdleTime. The issue is gone. All works.

I dont know if this helps but i think (i am not 100% sure) the moment when the task is moved to the ready list is here:

Inside the xQueueGenericReceive there is this piece of code:

vTaskPlaceOnEventList( &( pxQueue->xTasksWaitingToReceive ), xTicksToWait );
prvUnlockQueue( pxQueue );
if( xTaskResumeAll() == pdFALSE )
{
	portYIELD_WITHIN_API();
}

The interrupt comes around the vTaskPlaceOnEventList. So, the task is moved from the pending ready to the ready state in xTaskResumeAll. This should happen only when the scheduler is not suspended. If this is called between the prvGetExpectedIdleTime and the eTaskConfirmSleepModeStatus then the sheduler is suspended there. I dont understand when and how it happens.

Coming to the end.
Is there something wrong i have in my FreeRTOS configuration?
Do I use something incorrectly? Interrupts priority? My ticks timer is a regular
timer, not the SysTicks one (this one does not work when system is in deep sleep)
Any advice what I should check more?

Should the eTaskConfirmSleepModeStatus check the pxReadyTasksLists as well? This is already being checked in the prvGetExpectedIdleTime. So maybe should the two functions be combined into one?

rtel wrote on Monday, October 10, 2016:

I am using Cortex M0+ (SAMR21) with FreeRTOS v9.0.0

Which compiler? I’m going to assume GCC in this post.

I can observe strange behaviuor while using queues/events and sleep mode.

Which sleep mode? I don’t think the FreeRTOS Cortex-M0 port has the
same tickless sleep implementation as does the M3/4/7 ports.

I have one task that it sends something over i2c bus. The task waits for
an interrupt. The interrupt sends an event and the task is woken up.

In the pseudo code it looks like this:

void task()
{
while(1)
{
i2c_send();
if( error == xSemaphoreTake(handle, wait2ticks) )
Log(timeout);
sleep(300ms);

What is the sleep function doing? Is it just the same as vTaskDelay(),
so it puts the task into the Blocked state for 300ms?

Are you using a counting semaphore? What happens if the interrupt gives
the semaphore more than once during the 300ms delay?

 }

}

void i2c_irq()
{
xSemaphoreGiveFromISR(handle);
}

Not related to your question, so an aside, but I would recommend
replacing the semaphore with a direct to task notification. The
following link has some examples:

The ‘task’ runs with the same priority as Idle task.

So if the idle task is executing when the interrupt occurs the interrupt
won’t necessarily switch immediately to your task, but instead wait
until the next tick interrupt. If you task was above the idle priority,
and the highest priority that was able to run, then it would execute
immediately.

The i2c_send sends
very short message. Very often the irq is called while the
xSemaphoreTake is stil being executed, not always though. So far this
part runs correctly - if the task is put to a wait state, the irq brings
it back to a ready state. And the same, if the irq comes first the
xSemaphoreTake does not wait.

The strange thing is that sometimes the system goes to a very long sleep
even though the ‘task’ is woken up and system should not go to sleep.

I don’t understand this part. If you are in the sleep() function when
the interrupt occurs, assuming sleep() is the same as vTaskDelay(), then
the task will not run until the sleep() has completed.

I spent some time trying to debug this and I think this is what happens.

When the xSemaphoreTake is executed the scheduler is suspnded and the
events queue is locked. The
pxReadyTasksLists[tskIDLE_PRIORITY].uxNumberOfItems is 1 (only the idle
on the list). The ‘task’ is put to a wait list. When the interrupt
arrives, it cannot access the queue because it is locked. It increments
a counter.

At the end when the interrupt returns the xSemaphoreTake knows that the
event has arrived. It moves the task from the waiting list to a pending
ready
and then to a ready tasks list. So, the usNumberOfItems is 2
now. All sounds correct for now. Cant see any issue here.

So, what happens in the idle task. The Idle task calls the
prvGetExpectedIdleTime to calculate the number of ticks to sleep over.

Does it? Are you using the official FreeRTOS code, or something else?
I think that is what would happen if you were using tickless idling on a
Cortex-M3/4/7, but not M0.

Should the eTaskConfirmSleepModeStatus check the pxReadyTasksLists as
well?

It does, see line 3342 (at the time of posting) in the following source
file:
https://sourceforge.net/p/freertos/code/HEAD/tree/trunk/FreeRTOS/Source/tasks.c
However, as mentioned above, that code does not get called at all in
the Cortex-M0 port so I’m confused what it is you are running.

grygorek wrote on Monday, October 10, 2016:

From your post I recognise the fact I am moving in the terra incognita area as the oficial port of M0 does not have tickless feature implemented.

My appologies. I have worked on this project for a while and forgot that FreeRTOS CortexM0 port does not implement the functionality I am talking about.
I have my own implementation of portSUPPRESS_TICKS_AND_SLEEP. In this case the eTaskConfirmSleepModeStatus and prvGetExpectedIdleTime are called in my code.

The compiler is IAR.
The sleep from my pseudocode is indeed vTaskDelay.

The 3342 line which is part of the eTaskConfirmSleepModeStatus function checks the xPendingReadyList. My task is not there. It is already on the pxReadyTasksLists. This list is not checked in this function.

Are you using a counting semaphore? What happens if the interrupt gives
the semaphore more than once during the 300ms delay?

This is very good point. Yes, I am using a counting semaphore. However, I dont belive this is the problem i am chasing here. I see on the oscilloscope there is only one transfer per 300ms. One transfer and one interrupt. I tested this also by putting the xSemaphoreTake to wait for few seconds and the effect was the same. I will look at this closer anyway. Maybe I am missing something here.

The strange thing is that sometimes the system goes to a very long sleep
even though the ‘task’ is woken up and system should not go to sleep.

I don’t understand this part. If you are in the sleep() function when
the interrupt occurs, assuming sleep() is the same as vTaskDelay(), then
the task will not run until the sleep() has completed.

I am sorry for the confusion. My description was not clear in this part. Forget about the vTaskDelay. What I mean is that when the system is waiting for the event from the interrupt (inside the xSemaphoreTake), the ‘task’ is put on a delay/wait list. I can see the interrupt arrives (inside the xSemaphoreTake) and the task is put back on the ready list (not the pending ready!). The xPendingReadyList has count 0 while the pxReadyTasksLists has count 2 in my example. The eTaskConfirmSleepModeStatus cannot see it and does not abort the sleep. The xYieldPending flag is also false. The cpu will be put to sleep waiting for an interrupt which has already arrived. On the other hand the prvGetExpectedIdleTime should see that the task is woken up and should return 0. So i guess the interrupt arrives between these two functions.

The scheduler is suspended when the prvGetExpectedIdleTime and eTaskConfirmSleepModeStatus are called. So, the context switch will not happen. However when the interrupt happens in this time there must be some way of aborting cpu sleep. I guess the xPendingReadyList and the xYieldPending flag are tools to achieve it. This is the part I dont understand. Why is my task on the ready list, not the pending ready.

rtel wrote on Tuesday, October 11, 2016:

I’m not sure this should be possible. Is your M0 implementation calling
portSUPPRESS_TICKS_AND_SLEEP() exactly as per the M3/4/7 ports (at the
same place in the code)? If so, then the scheduler should be suspended
for the entire execution of the macro, and a task being unblocked by an
interrupt during that period will be added to the pending ready list,
and cannot be moved from the pending ready list into a ready list until
the scheduler is unsuspended (resumed). Does you
portSUPPRESS_TICKS_AND_SLEEP() implementation leave the scheduler suspended?

grygorek wrote on Tuesday, October 11, 2016:

I am not using any suspend/resume functions in my code. I avoid messing up with the kernel code. Only thing i do is disable and enable global interrupts in the same way as the M3 code does inside the portSUPPRESS_TICKS_AND_SLEEP.

You thrown some light on this and made me thinking.

As you said, the context switch should not happen between the two mentioned functions calls. I actually captured the trace and I can clearly see from the log that there is a context switch. Something that should/must not happen…

The ‘task’ is executing (inside of the prvAddCurrentTaskToDelayedList) and obviously is in a ready state. It puts it onto the wait list. The uxSchedulerSuspended counter is 1. The interrupt arrives and signals the task to wake up. The interrupt returns and the ‘task’ is continouing to execute. It recognises the fact that the event has arrived and puts the task to the pending ready. The xTaskResumeAll puts it into the ready tasks list and decrements the uxSchedulerSuspended. The counter is 0 now. The critical section enables interrupts and the context switch happens…

And now it is interesting… it jumps to the idle task, just before the portSUPPRESS_TICKS_AND_SLEEP is called… and more… uxSchedulerSuspended is 1!!!

So, impossible became possible…

I went to the vTaskSuspendAll function. I fully agree with the comment left there in regard to the discussion in this link: http://goo.gl/wu4acr Although I agree but started thinking what if there is a cache or read/write buffers.

I modified this function in this way:

 #include "intrinsics.h"
 ...
void vTaskSuspendAll( void )
{
	__DSB();
	__ISB();
	++uxSchedulerSuspended;
}

Now the problem is gone. All works as expected.

The SAMR21 from Atmel is a Cortex-M0+. It has cache implemented between the NVM and the AHB bus. There is no cache to SRAM.

So… Do you think the memory barriers are the fix to my problem or it is just a coincidence that it works and the bug is still out there? I do not have much experience with memory barriers and cache. I am very open to your suggestions…

rtel wrote on Tuesday, October 11, 2016:

I couldn’t say for sure, but I my initial thought is that the memory
barriers are now disguising your issue, probably due to different
timing, rather than fixing it.

I need to read your post again, with the code in front of me, to know
exactly the scenario you are seeing in order to understand and comment
further.

grygorek wrote on Wednesday, October 12, 2016:

My first thought was the same. I removed the barriers from the code and back to debugging.

This time I looked into the generated assembler. I think I found the issue.

The idle task checks the pxReadyTasksLists 3 times: once explicitly inside the loop and then inside the prvGetExpectedIdleTime which is called two times (once before suspending the scheduler and once after).

The assembler code generated by IAR 7.70.1 inlines the prvGetExpectedIdleTime twice. But for some reason does not have the 3rd check. It has all the conditions from the function except the one checking the ready list. Now, this clearly explains why the system goes to sleep while the task is in ready state - the ready list is not checked.

So far I have tried different optimisation settings and most of them generated the same code. The only optimisation when my program works is obvioiusly when the prvGetExpectedIdleTime is not inlined.

What is more interesting, when I added the memory barriers back to the vTaskSuspendAll function the correct code is generated all the time. The ready list is checked 3 times then. Regardless of the optimisation.

I tried to make the ready list volatile to test more but it required to modify many functions so i did not test this.

At first I thought that optimizer generates bad code but at the end i think it is good. The vTaskSuspendAll is compiled to just a simple variable incrementation. All the functions are inlined. The variables are not volatile. So, there is nothing realy in the code to tell the compiler/optimiser that the content of the list can change in between and not to do what it does. Putting the memory barriers is the first indication to the compiler/optimizer that it should be careful here. And it is. Although I am going to send this example to IAR support to ask what is expected but i think they will confirm this.

I am not a specialist to tell if the memory barriers should have an effect on compiler/optimiser. It definitely has in IAR. In this case I am going to leave the barriers in the code until I see some strong argument telling the issue is somewhere else.

What do you think? Am i oversimplifying it or this is the solution? Or maybe volatile is the solution? Can you test this?

rtel wrote on Wednesday, October 12, 2016:

Please try adding “#define configLIST_VOLATILE volatile” to
FreeRTOSConfig.h and report back if that helps. Thanks.

grygorek wrote on Wednesday, October 12, 2016:

Yes! Works. This is it… Thanks a lot!

rtel wrote on Sunday, October 16, 2016:

I don’t seem to be able to replicate this, although I’m using version 7.70.2.

Question: If you take out the definition of configLIST_VOLATILE again, and instead make just the uxNumberOfItems member of the list_t structure volatile, does it still solve your problem?

So, in list.h, change “configLIST_VOLATILE UBaseType_t uxNumberOfItems;” to “volatile UBaseType_t uxNumberOfItems;”.

Please report the findings.