David - love your post - I guess that makes me a geek ;o) If C pedantry
is a hobby then I recommend trying LLVM, if you are not already a user.
You do however paint a picture of a perfect world. There was one
particular platform I used that had two huge compiler bugs. I forget
exactly which it was but it might have been a GCC IA32 COFF compiler
(although maybe not as alignment was one of the issues).
One of the issues was attribute((packed)) didnât work. Didnât
generate any compiler errors or warnings, just didnât work. Took a
while to track down as I was compiling code I trusted as it had run on
my architectures.
Relevant to the post though was that memcpy() was broken and caused
faults due to alignment. Also took time to track down as it was assumed
to be a good implementation.
Had a really hard time fixing that, hence Heinâs references to the
compiler options that prevent GCC using its inlined version. We
provided our own implementation, but the only way we could get GCC to
use our own in all cases was to explicitly tell it not to use its own
built-in equivalent. I also learned then that it was particularly
clever in noticing the signature of the memcpy() function - even when we
called ours something else it would sometime replace it with its own
version. Grr.
Sometimes I wonder if, considering the laws of diminishing returns,
extracting the last nâth of optimization by pedantic scrutiny of the C
standard [by compiler vendors] is worth the hassle it causes.
I also found the incorrect in-lining behaviour of both memcpy() as well as memset() in the GCC AVR32 cross compiler. I must admit that it was a couple of years ago ( time flies ). My application called the functions with a constant length, a multiple of 4. The compiler inserted 32-bit memory moves. When I saw the bogus assembly code ( shâŚt ! ), I decided to use mentioned compiler options.
I hope that this bug has been solved in all current releases of all compilers that we use to test the /Labs demoâs
There was another case where the standard GCC implementation of memcpy() causing head-aches: on Cortex-A9 ( Richard mentioned it briefly here above). That implementation uses 64-bit floating point registers. But by default, the FPU registers are not stored on stack as part of the task context ( because storing FPU registers on stack is very expensive ). At that point we decided to supply a memcpy.c, that only uses standard registers.
( before writing memcpy.c, I did study the assembler sources of many GCC implementations )
I was not suggesting that the generic /Labs implementation in memcpy.c is âbetterâ, or âto be preferredâ. I wrote that if the standard memcpy() is suspect, give it a try with this hand-made memcpy(), to see if the problem gets solved.
In fact also the GCC implementation of memcpy is port specific.
A (cross) GCC port has to make assumptions regarding the target runtime environment/EABI.
So the GCC port used e.g. for a bare metal Cortex-A9 application was obviously built with the assumption (and GCC build config) itâs allowed to use FP registers e.g. in memcpy which might be not compatible with a specific runtime environment (e.g. FreeRTOS) the GCC toolchain was not really built for. GCC bug or feature ?
As always - things get difficult when digginâ into details Sometimes itâs worth checking the compiler builtin config before being trapped by those pretty nasty compatibilty issues.
For sure there were and are bugs even in compilers. Itâs just software, but luckily very, very, very well tested In my experience (nowadays) quite often compiler bugs reported by people are user code bugs resp. misunderstandings of the C/C++ standard and sometimes target compatibilty issues.
Yes, I understand what you mean about compiler bugs and perfect worlds. Compiler bugs can be a real pain! It is particularly difficult for you folks writing something like FreeRTOS where you need to have code that works on a wide range of platforms - it is a lot easier for the user who only needs it to work on their particular compiler version and target. I make a point of consider the exact toolchain version as part of the build for a project - once I have used a version of gcc + libraries (or whatever other compiler), that is archived along with the project. Clearly that is a luxury you donât have for the FreeRTOS source that works on dozens of targets and many more toolchains and versions.
Correctness always trumps speed. (Of course I am generalising here, and speed may also be part of making the device run correctly according to requirements.) It is better to have a simple but known safe solution, than a complex one that might be faster, but may have problems.
It is also not easy writing optimal code, and not easy writing portable code - and pretty much impossible to write optimal portable code! On gcc, memcpy() is usually very efficient, especially when it can use the builtin version. But messing with casted pointers to move data in larger chunks can easily fall foul of aliasing rules and end up with code that the compiler happily accepts but the results donât match expectations. On other platforms, such manual casting might be the fastest method while memcpy() gives slow library calls. There really is no easy answer.
When considering compiler options, you might like to recommend â-fno-strict-aliasâ in gcc. That makes it safe to use pointers of different sizes and types to access other data, at the cost of some optimisations.
(Regarding attribute((packed)) - I am not a fan of this, and prefer to avoid it, partly because compiler bugs have been known with packed structs. Iâd rather add any padding manually with âdummyâ fields, check none are missing with â-Wpaddedâ, and check the size of types using static assertions. Again, this is easier when you donât need to try to be portable on a range of targets.)
Iâve only used the AVR32 briefly, but I think it did not allow unaligned accesses. So if memcpy() is doing that, itâs a bug. And compiler bugs are a pain
As for using things like floating point registers, that is a more tricky area. I can well appreciate that it can be a headache - but it is not a compiler bug as such. The compilerâs job is to generate code for the cpu. If that cpu has floating point registers, then it can use them. You might be able to control things to some extent, such as with compiler options, but it very difficult to try to say âI want to use floating point registers for these things but not those things.â That should usually be a choice left to the compiler. Sometimes non-floating point code can be made more efficient using floating point registers. Similar issues can apply to other bits of hardware - the msp430âs hardware multiplier peripheral springs to mind.
For the ARM devices, lazy context switching of the floating point registers can be an answer. (Yes, I know thatâs easy for me to say, and far from easy to implement - especially in a way that avoids too much wasted ram space.)