Microsec Timestamp - vTaskSetTimeoutState vs. vTaskSuspendAll

Flo · January 12, 2024, 3:47pm

Hi guys,

CPU: Cortex-M4, STM32F4, 1 kHz kernel tick, FreeRTOS v10.4.x

The short version:
Is there some rule that forbids calling vTaskSetTimeOutState() or reading the systick counter register when the kernel is disabled by vTaskSuspendAll()?

The lengthy version:
My code is confidential unfortunately, but I try to give you the idea of my problem as simple as possible.

I have some function now() which gives me the current time(stamp) in microsecond resolution and is also overflow safe (or at least that was the idea). This function now() is based on the FreeRTOS kernel tick.
For this you’d normally use xTaskGetTickCount(), but instead I use vTaskSetTimeOutState() because this function does not only return the 32 bit TickType_t but also returns the number of overflows of this 32 bit value as BaseType_t, i.e. effectively I have a 64 bit counter which is overflow-safe until I’m dead and beyond.
Since the kernel tick is 1 millisecond only, I combine the kernel tick counter values I get from xTaskGetTickCount() with the Systick counter register value which is running at CPU frequency. This way I get the microsecond resolution as well.

The C++ pseudo code of now() looks like:

MyTimeType now()
{
// vTaskSuspendAll(); // Problematic, keep in mind, explained later.
curSysticks = getSystickRegisterAndSubstractReloadReg(); // takes care of down-counting systick
TimeOut_t millisecsAndOverflows{0,0};
vTaskSetTimeOutState(&millisecsAndOverflows)
// vTaskResumeAll(); // Problematic, keep in mind, explained later.
latestSysticks = getSystickRegisterAndSubstractReloadReg();
if(latestSysticks >= curSysticks)
{
     // no overflow of systick counter reg
     microsecs = calcMicrosecsFromSysticks(curSysticks);
     millisecs = combineOverflowsAndMillisecTicks(millisecsAndOverflows);
     return nowTimeInMicrosecs = millisecs * 1000 + microsecs;
}
else
{
     // There was an overflow of systick counter reg, refetch FreeRTOS kernel tick values
     // Would usually mean +1 millisecond, but we do it clean ... 
    vTaskSetTimeOutState(&millisecsAndOverflows); // ... and refetch the FreeRTOS values.
    microsecs = calcMicrosecsFromSysticks(latestSysticks );
    millisecs = combineOverflowsAndMillisecTicks(millisecsAndOverflows);
    return nowTimeInMicrosecs = millisecs * 1000 + microsecs;
}
}

Hope this makes sense, now() so far worked rather OK for me.

Now the current problem:
I use my function now() to measure time differences in my application in the sub-millisecond range, i.e. an accuracy of 10us or so is acceptable.
One of those time diff calculations looks somehow like this:

vTaskSuspendAll(); // MAD boy!
setDebugLED();
MyTimeType startTime{ now() }; // Can be WRONG!
MyTimeType expectedTime {somePreviousTime_MemberVar - startTime};
vTaskResumeAll(); // MAD boy!

/* ... Some other code in this task is running, the task might block and other tasks might run. ... */

MyTimeType endTime { now() }; // Kernel not suspended here!
resetDebugLED();
MyTimeType  actualTime {endTime - startTime};

printf("actual time: %ld, expected time: %ld", actualTime, expectedTime);
}

For some reason I thought at some point, that I need to suspend my scheduler there as a “soft critical section” so that expectedTime would be calculated correctly (no context switch, no delay), considering that my application has many tasks. Suspending the scheduler has now been proven to be unnecessary and without any benefit in this case, anyway …
The printed actualTime showed some weird behaviour that the times were mostly correct, but occasionally were exactly 1 ms off the real value I could measure via the LED with an oscilloscope. When I changed the kernel tick frequency to 500 Hz, that offset value changed to 2 ms. So the effect is dependant on the kernel tick rate.
I found that startTime was causing the wrong time diff.
When I removed the “soft critical section”, i.e. the vTaskSuspendAll() and vTaskResumeAll() pair from above calculation of startTime, everything was fine. And I cannot not explain why …
Now I remembered that if-clause in my function now() and was able to track down the problem of the “soft critical section” to the point that I pointed out as reminder in the two comments in the code of now().

One big “soft critical section” around getSystickRegisterAndSubstractReloadReg() and vTaskSetTimeOutState() → Problem, wrong values for startTime!
One “soft critical sections” around each getSystickRegisterAndSubstractReloadReg() and vTaskSetTimeOutState() → No problem.

And now my question from above again:
Is there some rule that forbids calling vTaskSetTimeOutState() or reading the systick counter register when the kernel is disabled by vTaskSuspendAll()?
Or is this just some task timing thing of my application I don’t understand?

EDIT:
I have seen that general warning in the API docu: “API functions that have the potential to cause a context switch (for example, vTaskDelayUntil(), xQueueSend(), etc.) must not be called while the scheduler is suspended.”
But it doesn’t seem relevant to me in case of vTaskSetTimeOutState(). Or am I wrong here?

Thanks for your patience and help!

Regards,
Flo

RAc · January 12, 2024, 4:01pm

To me all of this looks way too complicated. I would use one free running CPU timer for a cycle-precise, lockout safe time base and program it to generate an interrupt > MAX_SYSCALL on or near overflow so you can make yourself a 64 bit granularity. You can at any time from anywhere query that register plus the overflow count to have a 64 bit precise time base you can convert to microseconds based on the CPU clock frequency.

richard-damon · January 12, 2024, 7:40pm

One issue with your stratagy is that if the overflow between the time you read the counter and the time you .check the tick count, you will be off by a full tick period due to the race condition.

One way around this is to read the tick count, then the systick counter, then the tick count again. If it is the same as before, you got a good reading. it it changed, consider the second tick count as your first tick count, read the systick counter again and the tick counter again to make sure it didn’t change again.

Of course, as @RAc said, if you have a Free Running counter available, you can use that.

Flo · January 15, 2024, 9:25am

Thanks to both of you for your replies!

So I get that there is no genernal prohibition of what I am doing, it’s probably just things getting a bit complicated.

@RAc: I agree, it was supposed to be easier when I started the implementation.
So yes, this should be the way to go for me in future.

@Richard:
I don’t quite get the why my strategy wouldn’t solve the false reading due to the race condition, so for me to learn just let me understand where my idea is wrong.

I guess this is the case you have mentioned: Bold values are latched. Systick counts upwards in this example, that’s the way I get it in my code too.

read systick: systick counter: 0xFFFFFF | kernel tick: 0 ms
Overflow of systick: systick counter: 0x000000 | kernel tick: 1 ms
read kernel tick: systick counter: 0x00000A | kernel tick: 1 ms
read systick again: systick counter: 0x00000D | kernel tick: 1 ms
Compare systick counter vals: if(0x00000D >= 0xFFFFFF) => false
else:
Use 2nd systick counter value: 0x00000D
read kernel tick again: systick counter: 0x0000F9 | kernel tick: 1 ms
time = 0x00000D * SystickPeriod + 1 ms

In this case reading the kernel tick a 2nd time in 6. is not necessary I think (though in the case of an overflow between 3. and 4.). Despite the indicated small inaccuracies due to the runtimes of the code, I can’t figure where I would be off by one full millisecond.

Anyway, this job is better done by a HW timer, true.