Float support on M4F -context switch efficiency

dnadler wrote on Sunday, June 25, 2017:

Hi Guys - Sorry if this is a dumb question but I’m wondering about float support, and particularly context-switch efficiency. If I read the code correctly, FreeRTOS M4F is saving/restoring the floating point registers each context switch. I thought I understood from the ST example that float was even supported in ISRs, though I didn’t understand how this is implemented reading the FreeRTOS code.

First, my understanding (please correct if I’ve got this wrong) about M4F float support:

  • processor will interrupt on attempt to use float if magic bit is set
  • processsor will interrupt on leaving interrupt context if different magic bit is set

I naively expected no overhead if float only used in one task. Is the time to save/restore FP registers inconsequential and I’m worrying about nothing? In an OS I wrote donkeys years ago, the save/restore was so expensive I prohibited more than one task using FP, and trapped any offender to HCF. BTW a facility to enforce FP in only one task would be great in FreeRTOS…

Anyway, what I naively expected was:

  • OS tracks which tasks use float,
  • on context switch from task using float to task not using float, magic bit is set to disable float and cause interrupt if its used, and note is made of last task to use float
  • on said interrupt, FP context saved to noted task’s TCB, current task is marked float-user, new float context initialized and float enabled, also note more than one task is using FP so for FP-users context save/restore is required on context switch.
  • if float-used interrupt caused from ISR, save FP registers as required above, and restore on leaving ISR…

Sounds a bit complicated but isn’t much code, and makes average context switch less expensive.
Can you educate me? What am I missing?
Best Regards, Dave

rtel wrote on Sunday, June 25, 2017:

The overhead for saving and restoring flop registers on the Cortex-M4F
is not high. The scheme you describe sounds like a manual lazy save of
flop registers, whereas the Cortex-M4F has automatic (hardware level)
lazy saving. The problem is, though, the automatic lazy save only
really works in non-multithreaded applications because the CPU saves
stack space for flop registers, but doesn’t actually save them unless it
needs to. For example, if an interrupt uses flop registers too, then
the saved stack space actually gets used to save the flop registers
before the interrupt corrupts them. The problem is that, in a
multithreaded environment, the stack pointer is manipulated by the
scheduler, so the hardware might save space on the stack of one task,
then, if the running task changes, save the registers to the wrong task.

In FreeRTOS the flop registers are only saved for tasks that are
actually using flop instructions. Again, this happens automatically and
the hardware tracks when flop registers are in use and when not. Doing
this manually, the way you describe, takes more instructions and
requires the execution of more interrupts. For example, during a
context save the flop unit has to be turned off, then if it is used, the
flop interrupt has to execute for more code to run to then change the
flop owner and manually save then restore flop registers - much simpler
to have just a few instructions and let the intelligent hardware do the
heaving lifting without needing additional assembly instructions or flop
specific interrupts.

dnadler wrote on Sunday, June 25, 2017:

Thanks for the explanation - much appreciated!
Best Regards, Dave