Some of your questions, such as the overhead added by the FPU, you can get from the Cortex-M hardware documentation, as that is fixed. Interrupt entry varies only in whether tail chaining is invoked or not - that saves a few cycles - but again is fixed by hardware.
Thank you very much for reply. It is very helpful.
One specific question to interrupts. Do I correctly understand FreeRTOS uses a zero latency interrupt policy?
With this I mean an interrupt handler registered in the NVIC vector is perfectly valid when it implements the complete interrupt handler functionality. In other words: There is no general recommendation to use *FromISR like functions. Such functions might be intended/recommended to do further things, such as protocol decoding of received communication characters. Such *FromISR functions are not required for simple interrupt handlers that might only change a digital output.