I understand context-switch mechanism for float (using single-precision hardware and associated lazy-save mechanism). Is there any mechanism for thread-safety between tasks using double (software library) - ie, is this included in newlib context switching?
Thanks!
Best Regards, Dave
Grateful if you can provide more context. How does the compiler handle doubles if not using the floating point registers? If I know how doubles are represented and stored I should be able to answer the question.
Generally software based floating point doesn’t have that much of an issue unless the library uses static locations for temporaries. Most of the libraries I have seen (at least for ARM) use register pairs to store doubles (and spill to stack for longer storage), so these get saved as part of a normal context save. If a library did use static variables, the library would need to support putting them in ‘Thread Local’ storage. Newlib tends to be good at this.
Note that the low level library tends to naturally register based, as the basic operations take one or two parameters in register pairs, and return normally a single result (also in a register pair). It would be ‘higher’ level operations that might want more storage, but for ARM, the stack is so much easier to use than static memory locations, that unless the library needed some storage between function calls, it won’t need static memory (and such a function would be unsafe even with hardware floating point).
@Richard-Barry If I knew I wouldn’t be asking
Background: Other platforms/tool-chains emulate the FP hardware. For example an 8086 tool-chain I used generated FP instructions and emulated the FP coprocessor. Consequently thread-safety for FP required saving and restoring the entire emulated coprocessor context. I wrote an OS for such a beast, and it was too expensive to do the save-restore but I could at least swap the trap vectors to fault if other than the desired thread used float.
I don’t know what GCC for M4F does about doubles (float, single-precision FP uses the hardware processor, and context switch does save/restore and even permits FP in ISRs). Doubles might be handled entirely on the stack, in which case there’s nothing to be done to provide thread-safety for doubles.
Did I adequately explain my confusion?
Thanks Richard!
Best Regards, Dave
@richard-damon - Looks like we were typing at the same time. Do you have any idea how GCC8 for M4F supports double?
Thanks!
Best Regards, Dave
Yes, the x86 is register poor, and thus a floating point emulator library likely want to implement a floating point stack in memory (the x86 also addresses absolute memory locations easier than the ARM). Such an implementation needs that stack to either be saved on a context switch or better make it part of the thread local storage. The ARM was designed with many more registers, so tends to use them for the floating point. They also don’t try to do it via an ‘emulation’ layer where the code acts like it has a co-processor, but the co-processor is actually just emulated, you need to compile for the targeted processor, so it knows if you have the floating point hardware or not. This does mean that if you move the code to a machine with floating point hardware and run the code, it won’t use the hardware unless you recompile for that processor.
I can’t say I know for certain, but I do believe it is register based. A good check would be to step through a bit of code that does double arithmetic and see what code was generated.
I would be very surprised if the compiler did something you had to worry about when using double precision. Only once, in the 40+ architectures ported to, have we ever seen global statics being used to hold temporary value during mathematical calculations, and this was on a tiny 8-bit processor that never expected to be using multithreading. If you can post the assembly code generated for a simple double floating point calculation then we could determine for sure.
As Richard said the model used e.g. by GCC is not emulating a dedicated co-processor HW but do the (double) math in SW. Except the runtime hit this might require a bit more stack.
I also think that the soft-algorithms are non-recursive or bounded at least to avoid surprises regarding stack usage and I’m convinced that the math routines are not stateful. I can’t see any need for that, too.
In short using software (double) FP these days is a no-brainer regarding OS / context switching even if it’s mixed with (partial) HW supported FP math.
All the results below are on the STM32L475 with ARM GCC compiler. If I use float, the following code is generated:
00000000 <StartDefaultTask>:
0: b580 push {r7, lr}
2: b086 sub sp, #24
4: af00 add r7, sp, #0
6: 6078 str r0, [r7, #4]
8: f04f 537f mov.w r3, #1069547520 ; 0x3fc00000
c: 617b str r3, [r7, #20]
e: 4b0f ldr r3, [pc, #60] ; (4c <StartDefaultTask+0x4c>)
10: 613b str r3, [r7, #16]
12: ed97 7a05 vldr s14, [r7, #20]
16: edd7 7a04 vldr s15, [r7, #16]
1a: ee77 7a27 vadd.f32 s15, s14, s15
1e: edc7 7a03 vstr s15, [r7, #12]
22: ed97 7a03 vldr s14, [r7, #12]
26: edd7 7a05 vldr s15, [r7, #20]
2a: ee77 7a67 vsub.f32 s15, s14, s15
2e: eeb7 7a08 vmov.f32 s14, #120 ; 0x3fc00000 1.5
32: eef4 7ac7 vcmpe.f32 s15, s14
36: eef1 fa10 vmrs APSR_nzcv, fpscr
3a: d503 bpl.n 44 <StartDefaultTask+0x44>
3c: 2001 movs r0, #1
3e: f7ff fffe bl 0 <osDelay>
42: e7e6 b.n 12 <StartDefaultTask+0x12>
44: 2002 movs r0, #2
46: f7ff fffe bl 0 <osDelay>
4a: e7e2 b.n 12 <StartDefaultTask+0x12>
4c: 40200000 .word 0x40200000
The above code uses stack, general purpose registers and floating point registers. Both the set of registers are stored/re-started on context switch and so there should be no problem.
If I use double, the following code is generated:
00000000 <StartDefaultTask>:
0: b590 push {r4, r7, lr}
2: b089 sub sp, #36 ; 0x24
4: af00 add r7, sp, #0
6: 6078 str r0, [r7, #4]
8: f04f 0300 mov.w r3, #0
c: 4c15 ldr r4, [pc, #84] ; (64 <StartDefaultTask+0x64>)
e: e9c7 3406 strd r3, r4, [r7, #24]
12: f04f 0300 mov.w r3, #0
16: 4c14 ldr r4, [pc, #80] ; (68 <StartDefaultTask+0x68>)
18: e9c7 3404 strd r3, r4, [r7, #16]
1c: e9d7 2304 ldrd r2, r3, [r7, #16]
20: e9d7 0106 ldrd r0, r1, [r7, #24]
24: f7ff fffe bl 0 <__aeabi_dadd>
28: 4603 mov r3, r0
2a: 460c mov r4, r1
2c: e9c7 3402 strd r3, r4, [r7, #8]
30: e9d7 2306 ldrd r2, r3, [r7, #24]
34: e9d7 0102 ldrd r0, r1, [r7, #8]
38: f7ff fffe bl 0 <__aeabi_dsub>
3c: 4603 mov r3, r0
3e: 460c mov r4, r1
40: 4618 mov r0, r3
42: 4621 mov r1, r4
44: f04f 0200 mov.w r2, #0
48: 4b06 ldr r3, [pc, #24] ; (1c <__aeabi_dcmplt+0x1c>)
4a: f7ff fffe bl 0 <__aeabi_dcmplt>
4e: 4603 mov r3, r0
50: 2b00 cmp r3, #0
52: d003 beq.n 5c <StartDefaultTask+0x5c>
54: 2001 movs r0, #1
56: f7ff fffe bl 0 <osDelay>
5a: e7df b.n 1c <StartDefaultTask+0x1c>
5c: 2002 movs r0, #2
5e: f7ff fffe bl 0 <osDelay>
62: e7db b.n 1c <StartDefaultTask+0x1c>
64: 3ff80000 .word 0x3ff80000
68: 40040000 .word 0x40040000
The above code uses stack, general purpose registers and functions like __aeabi_dadd
and __aeabi_dsub
. I traced the definition of these functions as well and they also use only stack and general purpose registers. So the context switch code will work in this case as well.
As Ricahrd mentioned, if you can share the assembly generated for your platform, we can be double sure.
Thanks.
Right, but the problem is that the language definition (well, in some C/C++ corners and variants) assumes a stateful floating-point processor. This to control things like rounding modes, and also for managing floating-point exceptions. See <fenv>
. Now, for wee Arms, it looks like GCC punts for at least the soft float implementations - punting is apparently permitted by the standard. I have no idea what happens when you’ve got hardware float and software double; I don’t know if the standard addresses this. Here’s some recent discussion on the newlib mailing list about fenv support. Perhaps its best to avoid dark corners.
Thanks for the explanations!
Best Regards, Dave
I think it’s rather a library/POSIX thing than the core language.
And I fully agree, just try to avoid the dark corners.
Usually you can mix float and double math with just single precision FP HW support w/o any problem.
The language also allows the implementation to ‘punt’ on most of those features, as long as they document that (and only somewhat recently even admitted to ‘multi-threaded’ code). Yes, to support those features the memory to store those options needs to be ‘Thread Local’ (which FreeRTOS supports) or for hardware, you would need to save that hardware state as part of the task context (increasing the overhead to using Hardware Floating Point in a task).