memcpy NEON support for Cyclone V based architectures build with Altera Quartus 16.1

stefan-bat-mv wrote on Thursday, March 16, 2017:

Hi,

are there any plans for FreeRTOS to support a NEON optimized version of memcpy? Or is this working already an I am just not using it correctly? When building with certain compiler flags for the above architecture ‘memcpy’ will use NEON internally. Now when FreeRTOS is running in interrupts as well as during task switches memcpy might be interrupted which results in the data stored in certain registers being lost/overwritten:

-mfpu=neon
memcpy() will use 4*32Byte NEON operations
VLD und VST
VLD1.8 {d0,d1,d2,d3}, [r1]!
VLD1.8 {d4,d5,d6,d7}, [r1]!
VST1.8 {d0,d1,d2,d3}, [r0@128]!
VLD1.8 {d0,d1,d2,d3}, [r1]!
VST1.8 {d4,d5,d6,d7}, [r0@128]!
VLD1.8 {d4,d5,d6,d7}, [r1]!
VST1.8 {d0,d1,d2,d3}, [r0@128]!
VST1.8 {d4,d5,d6,d7}, [r0@128]!

However none of the registers d0 - d7 seem to be stored in case of a task switch…

I am not entrily sure if I explained this correctly so please have mercy if I didn’t. I can provide additional information.

Regards,

Stefan

heinbali01 wrote on Thursday, March 16, 2017:

Hi Stefan, I’m not sure if this recent topic is useful?

rtel wrote on Thursday, March 16, 2017:

are there any plans for FreeRTOS to support a NEON optimized version of
memcpy?

It already does. I’m not sure if this was in FreeRTOS V9.0.0 but its
definitely in V9.0.1 (which is only tagged in SVN, rather than provided
as a .zip file). In that version you have the choice to waist CPU
cycles and RAM by giving every task a floating point context, and
likewise for every nested interrupt - an overhead that outweighs any
benefit that is obtained by using the NEON registers in calls to memcpy().

stefan-bat-mv wrote on Friday, March 17, 2017:

Thanks!

However on our Cortex-A9 processor there is a FPU unit as well as a NEON unit. Is FreeRTOS 9.0.1 aware which unit is used and which registers have to be saved or is there only support for one of these units and in that case which unit is actually supported?

rtel wrote on Friday, March 17, 2017:

Which registers are used by one but not the other?

stefan-bat-mv wrote on Tuesday, March 21, 2017:

Sorry! I guess you are correct! There are no such registers used by one unit only… We will try the latest FreeRTOS version then! Thanks a lot!!!

thomask wrote on Tuesday, March 21, 2017:

Hi!

I’m also experimenting with FreeROTS on Altera’s SOCFPGA platform. One problem of the toolchain is the NEON optimazation in memcpy() as mentioned. The other problem is that it doesn’t support -mfloat-abi=hard.

I therefore built my own GCC toolchain with Crosstool-NG. You might be interested to try it out:

https://github.com/thomask77/ct-ng-toolchains

Best regards,

  • thomas