STM32F4 with FPU

anonymous wrote on Sunday, October 16, 2011:

Hi!

I just got my discovery board, and would like to try out the FPU. Did anyone write a port yet?! Or a time estimate when it will be officially supported?

I just had a quick look at the architecture manual… It seems that FreeRTOS would have to store the entire state of the FPU, adding at least 32x4 bytes (Are all 32 FPU-registers in use by compilers? Seems to be an awful lot!). Perhaps i’ll give it a try myself.

rtel wrote on Sunday, October 16, 2011:

A lot of thought and work has already gone into supporting the Cortex-M4F, but support is not yet officially available.  Note that if you have the FPU turned off then the standard Cortex-M3 port will work fine, but having the FPU turned on is much more complex than you might imagine.

The easy option, if you wish to do it yourself, is to set the FPU related registers to save and restore the FPU context automatically on each interrupt.  This is horrendously inefficient with the VFP architecture of the M4F, especially when you consider that only a few tasks will ever use the FPU.  Only half the context can be saved automatically, so the other have has to be done manually.

Another option is to allow tasks to register themselves as FPU context users, then manually save the FPU context for just those tasks.  That is a little more efficient, but will still result in FPU contexts being saved unnecessarily sometimes.

Another extreme is to attempt to use the lazy save mechanism of the FPU (note lazy save is turned on by default).  If you do that, then you have an extremely complex problem to implement, and if interrupts use the FPU too (they might if they are doing something like motor control) then there are a dozen corner cases to take care of once interrupts start nesting that are near impossible to test.

Yet another options is to preform a software lazy save.

Etc. Etc.

Also a word of warning - take extreme care to set up your compiler such that it does not randomly use FPU registers as temporary registers in tasks that are not themselves using the FPU.  Some do that, unless special non default command line options are used.

Have fun.

Regards.

anonymous wrote on Friday, November 25, 2011:

Hi again!

I think I got my port up and running… please find it here:

  https://github.com/thomask77/FreeRTOS_ARM_CM4F

Before I started, I did some performance measurements. As you said, the time for a full FPU state save/restore is quite long. A pair of vpush {s0-31}/vpop {s0-s31} takes around 400ns on my STM32F407 @ 168MHz.

On the other hand, that translates to just ~68 cycles, which is not that bad at all if you consider the overall performance gain of the FPU vs. software emulation.

Still, I don’t want to have the performance hit for things like serial-port or motor-control interrupts. So I’ll leave the hardware lazy-save mode enabled.

Without an OS switching tasks, the CPU will just do the right thing anyways:

  The AAPCS says that s0-s15 are used as scratch registers, so they’re automatically (lazy)-saved on exception entry. s16-s31 are saved by the compiler. There is a performance hit of ~200ns for entry/exit if the lazy save is actually triggered. For interrupts without FPU instructions there is no additional overhead.

The only time when all registers must be saved and restored is for a task switch. This will take about 400ns longer than without FPU.

I added the extended stack frame registers to pxPortInitialiseStack, vPortSVCHandler and xPortPendSVHandler. Additionally, vPortSVCHandler marks the stack frame as an extended frame (Bit 4, LR/EXC_RETURN value).

I must warn that the code is _not_ yet fully tested! Use at your own risk!

Have fun,
Thomas Kindler <mail_cm4@t-kindler.de>

anonymous wrote on Wednesday, November 30, 2011:

Hi!

In the meantime, I’ve improved my port. Actually, it was simpler than I thought… compared to the normal Cortex-M3 port, very few additions were required.

  https://github.com/thomask77/FreeRTOS_ARM_CM4F

Here’s the README:

This is the second version of my FreeRTOS port for ARM Cortex M4 cores with FPU support.

It does now support both FPU and non-FPU tasks, and tries to only save the necessary registers.

To achieve this, the EXC_RETURN value (stored in the LR register during exceptions, esp. the PendSVCHandler) of a task is saved on it’s stack. Only if bit 4 of the EXC_RETURN value indicates an extended stack frame, the FPU registers are saved or restored.

See the ARM architecture manual, B1-653 for more details.

If a task uses the FPU, it will automatically set the CONTROL.FPCA bit. No special user interaction or task registration is required.

This port is also fully compatible with the FPU lazy-save feature (which is enabled by default).

Have fun,
Thomas Kindler <mail_cm4@t-kindler.de>

glen19 wrote on Thursday, December 15, 2011:

Hi and thanks for your work Thomas!
When I try to use your port.c in an eclipse-yagarto enviroment i run into problems …

'Building file: ../FreeRTOS/portable/port.c'
'Invoking: ARM Yagarto Windows GCC C Compiler'
arm-none-eabi-gcc -DUSE_STDPERIPH_DRIVER -DUSE_STM32F4_DISCOVERY -DSTM32F4XX -I"E:\INDIGO\YAG-FreeRTOS-123\FreeRTOS\include" -I"E:\INDIGO\YAG-FreeRTOS-123\Libraries\STM32F4xx_StdPeriph_Driver\src" -I"E:\INDIGO\YAG-FreeRTOS-123\FreeRTOS\portable" -I"E:\INDIGO\YAG-FreeRTOS-123\Libraries\CMSIS\Include" -I"E:\INDIGO\YAG-FreeRTOS-123\Libraries\Device\STM32F4xx\Include" -I"E:\INDIGO\YAG-FreeRTOS-123\Libraries\STM32F4xx_StdPeriph_Driver\inc" -I"E:\INDIGO\YAG-FreeRTOS-123\src" -I"E:\INDIGO\YAG-FreeRTOS-123\Utilities" -O0 -Wall -Wa,-adhlns="FreeRTOS/portable/port.o.lst" -c -fmessage-length=0 -MMD -MP -MF"FreeRTOS/portable/port.d" -MT"FreeRTOS/portable/port.d" -mcpu=cortex-m4 -mthumb -g3 -gdwarf-2 -o "FreeRTOS/portable/port.o" "../FreeRTOS/portable/port.c"
C:\Users\GL\AppData\Local\Temp\cczJmjjA.s: Assembler messages:
C:\Users\GL\AppData\Local\Temp\cczJmjjA.s:389: Error: selected processor does not support Thumb mode `vstmdbeq r0!,{s16-s31}'
C:\Users\GL\AppData\Local\Temp\cczJmjjA.s:390: Error: instruction not allowed in IT block -- `stmdb r0!,{r14}'
C:\Users\GL\AppData\Local\Temp\cczJmjjA.s:406: Error: selected processor does not support Thumb mode `vldmiaeq r0!,{s16-s31}'
C:\Users\GL\AppData\Local\Temp\cczJmjjA.s:407: Error: instruction not allowed in IT block -- `ldmia r0!,{r4-r11}'
make: *** [FreeRTOS/portable/port.o] Error 1

What compiler version are you using? Which options do you pass to avoid problems like that?

Gregor

rtel wrote on Thursday, December 15, 2011:

Note that FreeRTOS V7.1.0 has two basic Cortex-M4F ports now, one for IAR and one for Keil.  GCC is the next on the hit list.

The errors seem to be telling you that GCC is not expecting floating point instructions to be present.  I have not tried using GCC with an M4F yet, but looking at your command line, and your output I would suggest that either you need to define the CPU as Cortex-M4F rather than just Cortex-M4 (not all Cortex-M4s have a floating point unit), or that you need to manually tell GCC that a hardware floating point unit is being used via a separate command line option.

That assumes the version of GCC you are using supports an M4F, of course.

Regards.

anonymous wrote on Wednesday, December 21, 2011:

Hi!

I’m using the codesourcery toolchain with the following options:

-mcpu=cortex-m4 -mthumb -mfpu=fpv4-sp-d16 -mfloat-abi=softfp

Keep in mind that the FPU is single precision only. So you should use sqrtf() instead of sqrt() to prevent double precision emulation calls.

You should also try

-fsingle-precision-constant

To treat float literals as single precision. Otherwise, a term like  x = x * 0.123 will call a double precision library function (or write 0.123f, which I find quite awkward).

have fun!

cd334 wrote on Tuesday, January 03, 2012:

Hi!

I have tried out your Cortex-M4F port ver 0.2. with the STM32F4-Discovery board.
I use Mentor CodeSourcery Lite GCC compiler (2011.09-69-arm-none-eabi).

I can compile your code, with the compiler flags:

-mcpu=cortex-m4 -mthumb -mfpu=fpv4-sp-d16 -mfloat-abi=softfp

But i have a problem at xTaskCreate funtion.
My program hangs in this function.

My program slice:

portBASE_TYPE   task_create_LED;
task_create_LED = xTaskCreate( prvLEDTask, ( signed char * ) "Led", configMINIMAL_STACK_SIZE, NULL, mainLED_TASK_PRIORITY, NULL );
    if (task_create_LED == pdPASS) printf("    LED Task Created!\r\n");
    else printf("    LED Task Create FAILED! Err. Code: %u!\r\n",task_create_LED);

configMINIMAL_STACK_SIZE is 256
mainLED_TASK_PRIORITY is ( tskIDLE_PRIORITY + 1 )

If I look deeper with a SWD debugger the program hang in task.c xTaskGenericCreate function at line:

/* Check the alignment of the initialised stack. */
		portALIGNMENT_ASSERT_pxCurrentTCB( ( ( ( unsigned long ) pxNewTCB->pxTopOfStack & ( unsigned long ) portBYTE_ALIGNMENT_MASK ) == 0UL ) );

Can you look depper in your code? With the official Cortex-M3 port without FPU works well.

Best Regards!
cd334

cd334 wrote on Tuesday, January 03, 2012:

Hi!

I have forgotten:
If i can help (futher setup, makefile, code or somteing important), please write me.
I will send you my details.

Thank you!

Best Regards!
cd334

rtel wrote on Wednesday, January 04, 2012:

I know the port you are using is not the official FreeRTOS port, but I think if you update to the FreeRTOS V7.1.0 code (and use the same contributed port layer as you are now), then you might find the problem doesn’t exist.

To know if there really is a problem, set a break point on entry to a task (before the task function prologue assembly code manipulates the stack pointer to create a stack frame for the task function), then check to see if the stack pointer is 8 byte aligned.

Regards.

cd334 wrote on Wednesday, January 04, 2012:

Hi!

Thank you for your help!

I know that is an unofficial Cortex-M4F port.
I wait for the offical Cortex-M4F gcc port. When would you release it? :slight_smile:
I had some free time, i though I try the FPU with FreeRTOS out.
I use the latest V7.1.0 version of FreeRTOS.

I have checked what you say, and yes the stack pointer is not 8 byte aligned.
When i set in the new portmacro.h the

#define portBYTE_ALIGNMENT			4

the unofficial port works with FPU.

What is the significance of the aligment settings? What happens when I leave it at 4? Or at STM32F407 must be 8?

Best Regards!
cd334

rtel wrote on Wednesday, January 04, 2012:

What is the significance of the aligment settings? What happens when I leave it at 4? Or at STM32F407 must be 8?

You probably won’t notice any problems with it at four until you use 64 bit numbers, or use a library function that makes assumptions about how 64 bit numbers are stored.  The most common symptom is getting an incorrect value for a printf() with a floating point modifier.

Regards.

anonymous wrote on Thursday, January 19, 2012:

Hi!

I just uploaded a (really) minimal demo project for my port:

  https://github.com/thomask77/STM32F4_demo

have fun,
Thomas Kindler

zbrozek wrote on Wednesday, January 25, 2012:

Is there any word on when/if this unofficial port will be made official? Or if there will be an official port sometime in the nearish future?

Cheers,
Sasha

johnsotack wrote on Tuesday, January 31, 2012:

If I run with a single task the assert fails for the line in bold.  More specifically, if I remove the comments from the task create below, everything will work.

void DebugUART::Start()
{
// Init the Debug UART then start the task.
SerialDebugUARTInit();
// xTaskCreate( vDebugUARTOutputTask, (signed char *) “DebugUART”, configMINIMAL_STACK_SIZE,
// NULL, mainDEBUG_UART_TASK_PRIORITY, &hDebugOutputTask );
}

My solution was simply to run with two tasks.  I don’t know if this is a bug or something I am doing wrong.  By the way.  Thank you for the port.  The STM32F4 series seems very nice in many respects.  It is nice to have a FreeRTOS port for it.

void vTaskSwitchContext( void )
{
                                       .
                                       .
                                       .
while( listLIST_IS_EMPTY( &( pxReadyTasksLists ) ) )
{
configASSERT( uxTopReadyPriority );
-uxTopReadyPriority;
}

/* listGET_OWNER_OF_NEXT_ENTRY walks through the list, so the tasks of the
same priority get an equal share of the processor time. */
listGET_OWNER_OF_NEXT_ENTRY( pxCurrentTCB, &( pxReadyTasksLists ) );

traceTASK_SWITCHED_IN();
}
}

John