ulTaskNotifyTake timeout accuracy

Thatseasy · April 24, 2020, 5:41pm

Hi,

The version is 8.2.3, and the configTICK_RATE_HZ is set to 1000.

My task is blocked by ulTaskNotifyTake( pdTRUE, pdMS_TO_TICKS(5)) for up to 5ms, and the ISR should notify the task every 4ms. I understand that the actual timeout will range from 4 to 5ms depending on when the ulTaskNotifyTake is called. However, occasionally, ulTaskNotifyTake timed out prematurely, and using a JTAG debugger, and I measured that the ulTaskNotifyTake timed out in 3.1ms, although the diff from xTaskGetTickCount WAS 5.

Here is the code snippet:

  u32 tick = xTaskGetTickCount();
  if (!ulTaskNotifyTake( pdTRUE, pdMS_TO_TICKS(5)))
  {
    ReportStatus(M4_INT_TIMEOUT, xTaskGetTickCount()-tick);
  }

Have anyone seen the problem before and had a resolution? I suppose I can increase the timeout, ex. to 6ms, but still not sure whether it is a solid workaround to guarantee the timeout will not be less than 4ms.

Thanks!
Bin

rtel · April 24, 2020, 6:42pm

If you ask for a timeout of 5 ticks then you should get between 4 and 5 ticks as there is a resolution of 1 tick.

What does ReportStatus() do? For example, does it print out a message? If so, how is it printing out the message? You say you are using a debugger, so if that is using semi hosting you will get timing anomolies.

Is this the only task in the system, and what is its priority relative to other tasks, including the time task?

I would recommend doing one of the following:

Run the system without ReportStatus and without the debugger, that means you will have to have another method of measuring the timing, such as toggling a pin and watching it on a scope…or
Use something like this tool to view what the system is doing.

Thatseasy · April 24, 2020, 7:28pm

Thank you Richard! Yes, I know the timeout ranges from 4 to 5 ms, but don’t expect it can be less than 4ms.

I am using an IAR I-jet probe, and using its ITM_EVENT and trace feature to capture the timing, it is very accurate. I did not show that code to avoid confusing.

There are 5 tasks including idle (Prio=0), and this task (Prio=6) has higher priority than others (Prio=3). The Timer/Tick interrupt has the highest priority (15).

The issue was observed initially without the ReportStatus and debugger, and I used them to isolate the cause.

Thanks for the tool suggestion, I’ll look at it. In the meantime, please let me know if you have other advices.

Thanks,
Bin

jefftenney · April 24, 2020, 11:22pm

For what it’s worth, there is one explanation for your timeouts before your efforts to isolate the problem. The timeout period can be as little as 4ms (as you stated), and the desired event can take up to 4ms. No wiggle room there, and if you have two asynchronous clocks measuring the 4ms, then there’s even less wiggle room there.

rtel · April 24, 2020, 11:32pm

You are right in that it can’t be less than 4. If the observation is accurate then something is wrong elsewhere - in the configuration, API usage, or whatever. Or something is disrupting the clock, such as semi hosting. If you are printing anything out anywhere then try stopping that.

If this is a Cortex-M, then 0 is the highest priority. Some links that may help, although I expect you have seen them already:

Thatseasy · April 27, 2020, 2:35pm

After interrupt asserts, it wakes up the task to process some samples, which can take about 1ms. Therefore, the next interrupt should assert in about 3ms, which gives about 1ms wiggle room.

Thanks,
Bin

jefftenney · April 27, 2020, 10:32pm

This part is really interesting. The ITM_EVENT timestamp measures 3.1ms, but the OS measures 4-5ms. You may need a third opinion, as Richard suggested, like a scope.

Thatseasy · April 28, 2020, 12:27am

I just enabled a separate timer (STM32 TIM2) as a free run counter in 10KHz, running 3 devices in release build without the debugger, and was able to capture ulTaskNotifyTake returned from the last interrupt in 3.6-3.9ms, minute 0.7ms that the task takes to run, ulTaskNotifyTake timed out prematurely in 2.9-3.2ms, which was in line with what captured using the debugger…

Here is the code snippet:
u32 lastIntTime = 0;
…
while(1)
{
if (!ulTaskNotifyTake( pdTRUE, pdMS_TO_TICKS(5)))
{
u32 elapsed = __HAL_TIM_GET_COUNTER(&htimer2) - lastIntTime;
if (elapsed < 40)
{
ReportStatus(M4_STATUS_RTOS_TIMEOUT, elapsed );
continue;
}
else
{
ReportStatus(M4_STATUS_INT_TIMEOUT, elapsed );
}
}
}
lastIntTime = __HAL_TIM_GET_COUNTER(&htimer2);
…

Thanks,
Bin

jefftenney · April 28, 2020, 6:39am

Yes, strange. Following the evidence at this point, you could verify your tick periods, using the same technique (TIM2) but in the tick hook function vApplicationTickHook(). It’s called from the tick ISR. Each tick should be e.g. 9, 10, or 11 TIM2 counts from the last one. (Or even better, every tick should be 39, 40, or 41 TIM2 counts from the one 4 ticks ago.) Assuming you’re not using tickless idle.

Is the tick being driven by the same clock source as TIM2? (Would typically be true if systick is generating the tick.)

Double check that configSYSTICK_CLOCK_HZ is set correctly since vPortSetupTimerInterrupt() relies on it for configuration of the tick period.

Thatseasy · April 28, 2020, 4:10pm

Thanks for the suggestion. The interval for 4 ticks range from 3.42ms to 4.57ms based on the test so far. (I increased the TIM2 timer frequency to 100KHz to get more accurate number.)

See the code snippet below, the min and max intervals are reset every 30s after logging, most of the time, the minTickInterval = 399-400, and maxTickInterval=400-401, and interestingly, (minTickInterval + maxTickInterval)/2 is about 400. My guess is, since the Tick interrupt priority is the lowest, sometimes, it has to play catchup, in rare cases, if they happen to be consecutive, then it can cause the issue. Thoughts?

void vApplicationTickHook(void)
{
static u32 count = 0;
static u32 lastTickTime = 0;

if (++count == 4)
{
u32 cur = __HAL_TIM_GET_COUNTER(&htimer2);
u16 elapsed = cur - lastTickTime + 1;
if (elapsed < minTickInterval)
{
minTickInterval = elapsed;
}
else if (elapsed > maxTickInterval)
{
maxTickInterval = elapsed;
}

lastTickTime = cur;
count = 0;

}
}

Thanks,
Bin

richard-damon · April 28, 2020, 4:33pm

If you could be inside ISRs for about 1/2 ms that would explain some of the numbers. That would be a very long ISR, but if you are using HAL code, some of that is worse than that.

jefftenney · April 28, 2020, 4:36pm

Seems that you have another ISR (or back-to-back sequence of other ISRs) that delay the tick ISR by 570us. That does seem high, but right now I can’t think of a scenario where long ISR latency would cause a timeout to be too short. You can experiment though by setting the tick priority to configMAX_SYSCALL_INTERRUPT_PRIORITY instead of configKERNEL_INTERRUPT_PRIORITY. Just hack portNVIC_SYSTICK_PRI temporarily so you don’t also increase the priority of PendSV.

rtel · April 28, 2020, 4:44pm

I would try using the trace tool to see what is going on in there - I think you can get a free license at the moment if you are working from home.

Thatseasy · April 28, 2020, 7:23pm

Changing the Tick interrupt priority to the highest allowed, or changing all my device interrupts to the lowest priority, solved the issue.

@richard-damon, yes, we use some of the HAL drivers, however, I measured the time in the ISRs, the max is 20-30us.

@rtel, the tool I am currently using does have some issue with Interrupt tracing, I’ll try your suggested one next.

Thanks,
Bin

rtel · April 28, 2020, 7:39pm

Which other interrupts are you using? Historically some STM32 HAL drivers required the tick interrupt to be the highest priority because the HAL drivers implemented timeouts by polled the HAL’s tick count - and that used the same timer as that used to generate the RTOS tick count.
Now I think the generated projects default to using two separate timers.

Additionally, and specifically to the STM32, note you have to call NVIC_PriorityGroupConfig( NVIC_PriorityGroup_4 ); before starting the scheduler.

Thatseasy · April 30, 2020, 6:04pm

Noticed the following tick ‘catchup’ code in xTaskResumeAll():

/* If any ticks occurred while the scheduler was suspended then
they should be processed now. This ensures the tick count does
not slip, and that any delayed tasks are resumed at the correct
time. */

UBaseType_t uxPendedCounts = uxPendedTicks; /* Non-volatile copy. */
if( uxPendedCounts > ( UBaseType_t ) 0U )
{
do
{
if( xTaskIncrementTick() != pdFALSE )
{
xYieldPending = pdTRUE;
}
else
{
mtCOVERAGE_TEST_MARKER();
}
–uxPendedCounts;
} while( uxPendedCounts > ( UBaseType_t ) 0U );

uxPendedTicks = 0;

}

The uxPendedTicks could be 2 in my case; can this explain my issue? :
if ulTaskNotifyTake is called right before the tick ‘catchup’, the timeout(5) has instant 2 tick count, and will expire in 3 ticks, which is in line with the 3.1 ms I was seeing.

jefftenney · April 30, 2020, 6:26pm

Yes, suspending the scheduler can cause this kind of catch up. But you may be violating the API rules somewhere if it causes the delay to be too short. Does any of your own code suspend the scheduler?

Thatseasy · April 30, 2020, 6:39pm

@jefftenney, no, none of my own code suspends the scheduler, however, it does use xQueueSend(), which may call vTaskSuspendAll().

jefftenney · April 30, 2020, 6:45pm

I don’t think that’s related then. The scheduler is running when you call ulTaskNotifyTake(), which means xPendedTicks is zero at that time. Any code that suspends the scheduler after that point could only cause your timeout to end late, not early.

Thatseasy · May 1, 2020, 7:17pm

@jefftenney, you are right that the uxPendedTicks is zero before calling ulTaskNotifyTake(). However, while tracing the uxPendedTicks changes, I saw whenever ulTaskNotifyTake() returned early, the uxPendedTicks was 1; since uxPendedTicks was 0 right before calling ulTaskNotifyTake(), a systick interrupt must be deferred at the beginning of the ulTaskNotifyTake() call for long enough, which essentially made timeout from 5 to 4. I think this explains the issue, thanks all for the help!