How to analyse latency problem ISR to task

gulweb · June 9, 2025, 7:47am

I resume a task from an ISR, where the task is waiting with

xTaskNotifyFromISR( Radioprog, 1, eSetbits, &HigherPriorityTaskWoken);
portYIELD_FROM_ISR( pdTrue, &HigherPriorityTaskWoken); // Why do I need this??

The Radioprog task wakes up with

ulTaskNotifyTake( pdTRUE, port_MAXDELAY); // wait forever until ISR runs

The time between interrupt and task activation is typically 100 - 180 uS. However,
one time in 100 the latency is about 3 xxx uS where xxx resembles the 100 - 180 uS.,
i.e 3 mS longer than “normal”. This causes packet overruns on radioreception.

The task, Radioprog, has a priority of 7, which is the highest in my config.

I have only ONE timed wait, of 1mS using the standard task delay in a supervisory task,
but that is running at prio 6. I modified the delay parameter to test if behavior would change, but it did not.
With these priorities the only causes I could see is a misbehaving interrupt routine, or some
strange RTOS mechanism.
Suggestions how to find out the cause of a 3 mS delay that hits 1.25 % of all task activations
are gratefully accepted.

Gullik

hs2 · June 9, 2025, 8:46am

Just for completeness do you properly initialize HigherPriorityTaskWoken ?
It’s only set to pdTRUE in case it has to by xTaskNotifyFromISR.
And isn’t the correct API just portYIELD_FROM_ISR( HigherPriorityTaskWoken ); ?
portYIELD_FROM_ISR would initiate an immediate context switch to the new highest prio task going to be woken up resp. notified in your case if the HigherPriorityTaskWoken flag was set to pdTRUE by invoking the scheduler on leaving the ISR.
A possible reason to cause such differing latencies could be a way too long vTaskSuspendAll/ xTaskResumeAll section in your code.
BTW which MCU do you use ? E.g. a Cortex-M3/M4 ?

gulweb · June 9, 2025, 11:15am

Hello Hartmut,

I see my description is wrong, this is actual code:
xTaskNotifyFromISR(Radioprog,1,eSetBits,&xHigherPriorityTaskWoken );
xHigherPriorityTaskWoken = pdTRUE; // added this …
portYIELD_FROM_ISR(xHigherPriorityTaskWoken);
I was wrongly using &xHigher… but correcting this changed nothing…

I do not use Suspend / resume, there are 6 tasks, of which I created 4.
One of these use vTaskDelay(pdMS_TO_TICKS(1));
All others are either like Radio here triggered by ISR or using blockng I/O
The port is Texas Instruments CC1314R10 distribution of freertos, M33 is the CPU, devide CCS

Gullik

Name State Priority Stack num
Cmd X 3 344 2
IDLE R 0 92 5
Net B 6 350 4
Radio S 7 318 1
Eth S 4 404 3
Tmr Svc B 5 72 6

How come Radio and Eth are S? I never suspended them. I expected B ( waiting??)

gulweb · June 9, 2025, 11:36am

This is what is logged, I have a 10 uS (hardware) timer running. It interrupts and
increments Runtime, and decrements 2 timers if running ( !=0) To me this should just take
1 or a few uS to execute.

On each radio interrupt I snap the Runtime value, and in the task, right after the “wait”
I calculate the offset from current Runtime. If offset greater than TEST_DELAY ( now 400 uS)
I log the time, and later a “drop” message is logged, the nr 7 is the last segment of
a segmented Ethernet packet ( 1 - 6) times 253 bytes
Application runs for many hours but looses 1% packets…

Drop 7
1320 uS
Drop 7
1800 uS
Drop 7
1780 uS
Drop 7
1430 uS
Drop 7
760 uS
Drop 7
1590 uS
Drop 7
1690 uS
Drop 7
1620 uS
400 uS
Drop 7
1910 uS
Drop 7
1670 uS

Gullik

richard-damon · June 9, 2025, 12:03pm

Blocked forever (port_MAXDELAY) is considered suspended, as no time duration will wake up the task.

My first guess on your problem is either some ISR is taking too long to run, remember, ALL ISRs effectively have priority over tasks. Alternatively you might have some piece of code that has a long critical section.

aggarg · June 9, 2025, 12:26pm

Though I do not think it is the cause of your issue, the correct code should be the following:

void ISR( void )
{
    BaseType_t xHigherPriorityTaskWoken = pdFALSE;

    xTaskNotifyFromISR( Radioprog, 1, eSetBits, &xHigherPriorityTaskWoken );

    portYIELD_FROM_ISR( xHigherPriorityTaskWoken );
}

gulweb · June 9, 2025, 3:02pm

Hello Gaurav,

Yes, I have the declaration there in the beginning of the ISR.
I set it to pdTRUE to make sure it was always set, else I guess the Notify would set it
when needed…

Anyway, this does not change anything…

Thanks,

Gullik

hs2 · June 9, 2025, 3:17pm

I’d try a minimal example by eliminating all other tasks and (armed) ISRs and verify the expected behavior regarding bounded latencies and bring the other tasks back one after the other to see which task causes the rather extreme jitter. As mentioned also by Richard there is probably a very/too long critical section or scheduler suspension or a badly implemented higher prio ISR taking way too long causing the latency problem.

gulweb · June 10, 2025, 11:18am

Hello Hartmut,

I have suspected ISR’s all drivers are written / adapted by TI . There are only 3 possible offenders,

a callback from the TI radio driver
a pin interrupt that signals ethernet receive, falling edge triggered, just Notifies the Eth task
a timer interrupt, each 10 uS that just increments the Runtime var, and Timers 0 and 1.

Only freertos calls used are Notify / Wait and one instance of vTaskDelay(1). I have changed the
timeout value, but there is no difference in behaviour. So, from an OS point this is real simple.

Right now I can see with a logic analyzer that the Radio callback is hit, the process activates,
and typically this happens within 100 - 200 uS. Every now and then the process activation takés
1,2 mS or more…this overlaps the inte pcaket gap…wich could be an explanation to the pkt loss.
Unscientifically I guess the 1.2 mS is actually 1 mS PLUS the normal task switc time…???

Regards,
Gullik

RAc · June 10, 2025, 11:48am

can you run your system under supervision of tracealyzer?

hs2 · June 10, 2025, 12:44pm

So maybe you’ve to verify (with TI ?) that the radio driver incl. ISR is implemented right means it doesn’t miss any packet (interrupt).

gulweb · June 10, 2025, 2:04pm

Thanks Hartmut,
I have a counter, Recd that increments on every callback.
I also have a counter in the Task that increments on every wakeup.

These counters slip apart, and do not show the same after some while…
So it is either FreeRTOS or my “misuse” of it…
Could it be FreeRTOS that disables callbacks for a whole mS?? Doing what??

Gullik

gulweb · June 10, 2025, 2:14pm

Ok, here is som thoughts…

Callback gets called, task reponds, callback gets called, task does not schedule for some reason,
callback gets called again, we now have TWO notify, task does schedule, but only sees the last Notify? Because i Set bits? Can this be counted instead?

Program is running on prio 7, highest of the tasks, BUT…something busy that prevents task
to run for more than 1 mS? How does this sound?

Gullik

hs2 · June 10, 2025, 2:33pm

No - that’s not the case.

Question is what happens after (or during) notifying the highest prio task.
Using a debugger you could step thru the code (vTaskSwitchContext) the to see which task is scheduled after leaving the ISR.
BTW which FreeRTOS version do you use and do you have configASSERT defined for developement ?

aggarg · June 11, 2025, 6:24am

Yes, this can be done. Here is how you do it - RTOS task notifications - FreeRTOS™.

You should still figure out which task/ISR is running in between the 2 subsequent ISR callbacks. As @RAc suggested, using tracelyzer would likely reveal what is going on here. Percepio has a free version also - New free trace tool from Percepio.

gulweb · June 11, 2025, 8:06am

I am using FreeRTOS 10.5.1 which is what TI supplies in their SDK.
I have been tracing the packet reception with a logic analyzer.
I see times from callback to task running of 160 - 200 uS. This is a 48 Mhz cortex M33,
which i believ makes about 48 mil instr / second. This equates to around 8000 instructions to:

At end of int routine issue callback.
A few statements in the callback routine
Set a GPIO to 1
execute
TaskNotifyFromISR
portYIELD
return from interrupt
detect that the process should run and restore registers and sp
return from system state to task
Return from int
set GPIO to 0

This does not seem reasonable, does it?

This is when it WORKS, at fault i sometimes log 1500 uS or 2000 uS…

Gullik

johankraft · June 11, 2025, 10:11am

Indeed, Tracealyzer is a good choice for this kind of analysis. You can add you own “user events” in the ISR and in the task, and then measure the time between them. This has been designed for low overhead (microseconds). There is a relevant article here: Understanding your Application with User Events - Percepio

See also Tracealyzer Tips and Tricks - Percepio

The free version Percepio View also supports user events, but does not include the Custom Interval feature so you would need to measure the latency “manually”, using the selection tool the views (hold and drag the left mouse button), or hold down the shift key and click on two event labels.

gulweb · June 11, 2025, 10:45am

Thanks all,

I have solved the problem with a workaround. The packets are retrieved from a queue, and if
I miss a callback, that segment is lost. I have rearranged the emptying of the queue, so that
when the task is woken up, it empties the queue. It would have been better to find the actual cause of the problem, maybe I will setup a separate config to iron it out.

Anyway, my application port runs at least the same as without RTOS, and I can now concentrate on improving performance and complete functionality. It still bugs me with the overhead though.

Thanks for all help and suggestions,

Gullik

aggarg · June 11, 2025, 12:17pm

Thank you for sharing your solution.