Resolution of port GET RUN TIME COUNTER VALUE can cause overflow in run-time stats counters

kedarsc · January 7, 2021, 10:51am

We are porting FreeRTOS to our R5F based platform. We see that portGET_RUN_TIME_COUNTER_VALUE returns a 32b and that is accumulated in pxCurrentTCB->ulRunTimeCounter,

further
*pulTotalRunTime = portGET_RUN_TIME_COUNTER_VALUE();

is used to get total time and then report task load numbers as %.

However in modern systems we do tend to switch tasks pretty rapidly so we want to use a high res timer, like 1us or 10us resolution.

But at 32b variables, @ 1us , the variables used by freertos will overflow in ~ 1hr and @ 10us in about 10 hrs and so.

A better solution would have been to define a type for this counters and then let the porting layer or application decide the type as uint32_t or uint64_t

Is my understanding correct this is a limitation in FreeRTOS run-time stats counting ?
Is there is a solution that can be used without modifying the FreeRTOS kernel ?

regards
Kedar

richard-damon · January 7, 2021, 1:11pm

Yes, my understanding is that FreeRTOS is limited to just a 32bit counter for run time states. I agree that it would be nice if this could be configured.

One thing to point out, is that with a slower run time counter, as long as nothing in the system actually can get synchronized to it, by the time you fill much of the counter, the fineness of the counter isn’t as important as you make it out.

If a routine is run for say 10us at a time, a 1us counter will catch 10 counts in it, a 10us counter will count it once per execution, and a 100us second counter will likely catch it one time in 10, so over many invocation, all will indicate about the same amount of time.

Another option is to create an add-on routine with the extenstion capability of tasks.c to add a clear stats function to FreeRTOS so you can build up the stats for a short period of time (so as to not overflow the counter) at the times you want to.

rtel · January 7, 2021, 4:43pm

We have had a few requests to add a feature that allows the run time counter value to be reset - would that help in your case or is a long term count required?

Making the type user configurable would be a simple change, we will do that. For now, it may be possible for you to add your own implementation into the code using the traceTASK_SWITCHED_IN macro - see https://www.freertos.org/rtos-trace-macros.html. It may also be possible using the application task tag https://www.freertos.org/xTaskCallApplicationTaskHook.html although the hook macro would be leaner.

[edit - the code that formats the run time counter to human readable ascii may have to be considered more if the type holding the count is configurable]

kedarsc · January 8, 2021, 5:40am

Thanks for the tips.

Yes, we will look at using the traceTASK_SWITCHED_IN with application task tag.
Yes, we will need to provide our own formatting code but we can base it off vTaskGetRunTimeStats so this should be fine (it does not access kernel structures directly, so we can still be portable to new kernel versions).

The only constraint I see is that we also use FreeRTOS+POSIX in some cases, and that also uses application tags, so we may need tweak the code a bit in FreeRTOS_POSIX_pthread.c

Let me play around a bit and see how it goes.

My main doubt was to confirm if the overflow condition can indeed happen, so thanks for the quick responses and pointers to potential solutions.

regards
Kedar

basprins · May 26, 2023, 10:10am

Hi @rtel ,
I am running in the same issue, trying to implement what I think you are suggesting.

I added this in the FreeRTOSConfig.h

extern volatile unsigned long ulHighFrequencyTimerTicks;
#define traceTASK_SWITCHED_IN()                                         \
{                                                                       \
  if (pxCurrentTCB->ulRunTimeCounter > ulHighFrequencyTimerTicks)       \
  {                                                                     \
        pxCurrentTCB->ulRunTimeCounter = 0;							    \
  }												                        \
}                                                                       \

Where ulHighFrequencyTimerTicks is the counter I increment every microsecond, which I let overflow.

Is this the correct approach?

richard-damon · May 26, 2023, 4:33pm

Randomly clearing the counter in the TCB like that will really make the accumulation unreliable.

Later versions of FreeRTOS (10.4?) allow extending the type for the run time counter, and if you make it a 64-bit type (unsigned long long) then you are not apt to overflow it. The counter read routine needs to see if the hardware counter has overflowed and append to the top the number of overflows it has seen.

I would NOT have the software incrementing the counter every microsecond, that is a LOT of software overhead. Take a hardware counter, prescale if needed, and have a routine to read, and add on overflow count to extend it to 64 bits.

basprins · June 2, 2023, 9:34am

Hi @richard-damon
I realize my info was a bit confusing.

I am using a hardware timer which I configured to “interrupt” every microsecond (we are using a 500 mhz cpu, so the overhead of the timer alone seems neglectable).

The interrupt that fires every microsecond indeed increments a uint64_t, which indeed makes it kind of nonsense to check for the overflow.

It will only reset the task when the duration of the task is higher than the uint64_t microsecond counter. Although, you make a good argument, this will never happen. uint64_t max value / 1000 / 60 / 60 / 24 / 7 / 52 => forever years. I won’t live that long anyway

So I will delete the check and reset counter, as it makes no sense.

Are you referring to this? Or am I still missing an easier approach?

/* Run time statistics */
extern volatile unsigned long long ulHighFrequencyTimerTicks;

#define portCONFIGURE_TIMER_FOR_RUN_TIME_STATS() (ulHighFrequencyTimerTicks = 0UL)
#define portGET_RUN_TIME_COUNTER_VALUE()         ulHighFrequencyTimerTicks

If you have any other remarks on my reasoning I’d be more than happy to hear them.

aggarg · June 2, 2023, 9:44am

@richard-damon is referring to this - https://github.com/FreeRTOS/FreeRTOS-Kernel/blob/main/include/task.h#L162. You can define configRUN_TIME_COUNTER_TYPE to uint64_t in your FreeRTOSConfig.h.

richard-damon · June 2, 2023, 11:44am

I think you will find that the counter increment isn’t as neglectable as you might think. 500 MHz CPU means 500 cycles between increments. In each increment, the processor needs to first save the processor state, then do the increment, and then restore the state. Unless you are using a processor with an antiquiated minimal ISR entry, with an assembly lanuge routine using special instructions to not affect state, this is going to be several dozen cycles. The Cortex-M series, one of the more efficent for interrupt entry needs about 12 cycles to enter, and another 12 to exit the isr (which may save enough state for this simple of an ISR). You likely end up with at least 30 cycles, which means that “neglectable” time is 6% of your CPU processing.

basprins · June 2, 2023, 3:30pm

Oh! I am really glad you pointed that out. I will try to confirm the figures you gave here to get a better feeling of how costly interrupts really are. I hope with some help of a logic analyzer and some fiddling with gpio pins will proven your case and make me a whole lot more aware of this.

richard-damon · June 2, 2023, 5:31pm

You will have to be careful how you measure that cost, as the biggest part for you case is getting in and getting out of the ISR. To see that, you would need a task toggling the bit at high speed and notice the pauses in the toggle.

basprins · June 2, 2023, 6:41pm

Hmm, you are right. This is hard to visualize.

The green graph represents toggling a pin (gpio1 24) twice at the beginning of the interrupt, and again toggling the same pin twice at the end of the interrupt handler.

The yellow graph is constantly toggling gpio1 25.

I’ve plotted both digital and analog values. It behaves a bit “noisy” I guess, but still a rather steady view on how the toggle task toggles the pin. A reproducing 280ns low state followed by a 272ns high state.

Then when the interrupt fires, again a bit noisy signal (it should only show two spikes, since I am writing the pin high-low once at start of interrupt handler, and again high-low at the end of the interrupt handler). I don’t really understand why this pin gives such a crappy read out. I’ll try another pin in a minute. But, at least this gives a consistent result. The block signal now gets interrupted for approx 1.2us.

I’ll stop bothering you with updates, I just wanted to share this somewhat interesting picture (but not really yet ;-)). And I wanted to thank you again for all the insights! Much appreciated

richard-damon · June 2, 2023, 6:59pm

The 4 green pulse could be the ISR getting repeated if you didn’t clear the interrupt request early enough.

basprins · June 2, 2023, 10:35pm

@richard-damon I’ve spend the entire evening figuring out what the hell was causing this weird behavior. Also, my math for calculating the timer didn’t add up.

After hours, I found out exactly what you already pointed out…! I cleared the interrupt at the end of the interrupt routine, instead of at the beginning. When I finally found the root cause, and captured new data from saleae I thought lets share one more screenshot. Then I read that you already pointed out where the potential bug could be. Ouch. Wasted some time. Learned a lot though…

Things are starting to make sense now.

So one more time: thanks a lot for all the advice/input/insights! I learned a lot.

Ditched the pointless reset of the task counter
Used the configRUN_TIME_COUNTER_TYPE and defined it as unsigned long lone
Learned in general that interrupts are much more expensive then I thought they were
Fixed a horrible bug in the interrupt handler, which caused it to interrupt twice

I’m impressed

richard-damon · June 3, 2023, 1:29pm

I will point out that most of this I have learned by making similar mistakes.

The double interupt is perhaps one of the trickier ones, as it is often buried in the architecture manual that interrupts need to be cleared some “x” cycles before the end of the ISR to allow the interrupt request to actually get removed before you leave the ISR, and “x” tends to get bigger on faster processors as they have more pipelining and syncronization in these systems.

basprins · June 8, 2023, 7:11am

Ah this is something I will also dive into then. I was wondering exactly that, why on earth does it matter. That’s the reason then.