memcpy in ISR

system · August 22, 2017, 8:24am

michaeln32 wrote on Tuesday, August 22, 2017:

Hello

Is it ok to use memcpy() in ISR ?

Is it ok to use strcpy() in ISR ?

Thanks

hs2 · August 22, 2017, 9:47am

hs2sf wrote on Tuesday, August 22, 2017:

Why not ? Both are stateless (reentrant) functions.

system · August 22, 2017, 11:26am

michaeln32 wrote on Tuesday, August 22, 2017:

Hi HS2,

Thanks for your answer.

I think that the execution time of memcpy (or strcpy) is unknown so

it is better not to use in in ISR.

Am I right/wrong ?

Thnks

system · August 22, 2017, 11:54am

davidbrown wrote on Tuesday, August 22, 2017:

The execution time might be unknown to you, but it is certainly clear and deterministic. A simple memcpy() implementation will copy the given number of characters, one by one. You have the call overhead, and you have the loop for each character - the loop count is known when you call memcpy(). With strcpy(), the loop count is the length of the string, which may or may not be known by you.

If you have a more sophisticated compiler and enable optimisations, then memcpy is often inlined. If the count is known at compile time, and alignments are known to the compiler, then it can use larger moves than characters, unroll the loop, and in some cases it can omit the memcpy altogether if it can see it is unnecessary.

richard-damon · August 22, 2017, 12:13pm

richard_damon wrote on Tuesday, August 22, 2017:

memcpy/strcpyy for small buffers shouldn’t be a time issue, if you need to move the data, it is probably the fast option.

The one issue is that on some processors and some compilers, memcpy might be placed inline with code that uses some floating point registers. There exists a few corner cases where the FP registers in the ABI are considered ‘caller save’ (normally called functions are allowed to trash these registers), but the standard interrupt prolog doesn’t save them, as it isn’t expected for ISRs to use floating point. (I beleive this is the case with the Cortex M4F). There tends to be an option to prevent this optimization.

system · August 22, 2017, 12:16pm

gezab wrote on Tuesday, August 22, 2017:

Check the implementation of your standard library, there are some variants
that are not reentrant. However, even in this case you can still use the
above functions with wrappers that revet them in critical sections.

hs2 · August 22, 2017, 12:17pm

hs2sf wrote on Tuesday, August 22, 2017:

I fully agree with David. Both functions are used very often, are implemented in an optimized way and get some spezial support from compilers. You won’t be better when rolling your own copy routines.
Runtime behaviour is fixed/deterministic and only depends on input parameters since there is no internal locking or unknown code paths.

system · August 22, 2017, 2:24pm

michaeln32 wrote on Tuesday, August 22, 2017:

Ok.
Thank you very much !

rtel · August 22, 2017, 2:38pm

rtel wrote on Tuesday, August 22, 2017:

This is an interesting one - and the answer is not as straight forward as some of the posts in this thread make out. It is something we have often considered ourselves and concluded that, as you don’t know in advance how many bytes are to be copied, the best and most efficient in the general case way of doing this is to call the standard C memcpy() function.

memcpy() needs to be efficient, so their implementations are normally intricate, but optimised for moving large amounts of data. That means they are not necessarily optimised for moving small amounts of data - where a byte by byte copy would be the most efficient. The need to be efficient and intricacy means assembly language is used (even required) with an excellent knowledge of the hardware architecture and characteristics. That means that, unless you know in advance what the maximum number of bytes that are to be copied are, it is extremely unlikely you will come up with a more efficient general memcpy() algroithm.

memcpy will typically perform byte copies to get the to/from addresses word aligned, then word copies until the to/from addresses are aligned to the requirements of any other more efficient move implementation that might be available on the architecture in question. That might be moving instructions by using push multiple and pop multiple instructions, or, as David B noted in this thread already, using wide floating point registers.

Using floating point registers is where it gets interesting, and gets back to Michael’s original question about if it is ok to use memcpy() in an ISR.

Some FreeRTOS ports require the application writer to specify which FreeRTOS tasks use flop registers, and FreeRTOS will then only store a flop context for those tasks (because flop context’s can be very expensive in memory and time). However, if the standard C library uses flop registers for memory operations then every task will need a flop context, and if flop registers are used in an ISR, then each ISR will need to save/restore flop registers too. Luckily I have only seen this be a problem once, and never on a Cortex-M.

system · August 22, 2017, 3:21pm

michaeln32 wrote on Tuesday, August 22, 2017:

Now I am a little bit confused.

I understand that it is not 100% safe to use memcpy in ISR.

Should I better copy memory in ISR in the next way (instead of using memcpy) ?

void ISR(void)
{
for(i=;i<200;i++)
buff1[i]=buff2[i];
}

rtel · August 22, 2017, 3:37pm

rtel wrote on Tuesday, August 22, 2017:

Which architecture and compiler are you using?

system · August 22, 2017, 3:44pm

michaeln32 wrote on Tuesday, August 22, 2017:

STM micro controller - STM32L433.

Compiler - TrueSTUDIO - Atollic.

rtel · August 22, 2017, 3:48pm

rtel wrote on Tuesday, August 22, 2017:

In which case I would say I am 99.9% sure using memcpy() will be fine.

system · August 22, 2017, 4:07pm

michaeln32 wrote on Tuesday, August 22, 2017:

Thanks !

system · August 23, 2017, 6:59am

davidbrown wrote on Wednesday, August 23, 2017:

Yes, with gcc on a Cortex M then memcpy will be either done inline, or using a fully re-entrant library call. It will be safe to use in an interrupt.

As always when you are looking for efficient code (and you always want efficient code in interrupts), make sure optimisation is enabled, and give the compiler as much information as you can. memcpy will be more efficient if the size of the copy is known at compile time, and if your source and destination are nicely aligned then the compiler can use 16-bit or 32-bit transfers rather than doing everything byte by byte.

htibosch · August 23, 2017, 1:13pm

heinbali01 wrote on Wednesday, August 23, 2017:

Hi Michael, lots of responses about a simple memcpy(), apparently it is an interesting subject.

Why were you asking the question, out-of theoretical interest, or did you encounter a problem? Did you see instabilities or crashes?

In case you do encounter problems, you might want to try the attached module memcpy.c ( see below ). It is pretty well optimised and it is absolutely ISR-safe.
Attached memcpy.c is part of the FreeRTOS/plus release,

There is the “automatic inlining” of memcpy(), in case the actual length is small and known at compile time. Please be aware that compilers sometimes make erroneous assumptions about the alignment:

This memcpy() :

    memcpy( target, source, 4 );

May not always be replaced with :

    *( ( uint32_t * ) target ) = *( ( uint32_t * ) source );

I have seen crashes ( exceptions ) because of this.
A memcpy() function is smart about alignment. It will test the memory locations of both source and target.
GCC has the -fno-builtin-memcpy option which will avoid automatic in-lining. I tend to use it ( and also -fno-builtin-memset ) in all of my projects.

And if you ask me: I would try to avoid massive memory copies from within an ISR
Good luck.

system · August 24, 2017, 9:12am

davidbrown wrote on Thursday, August 24, 2017:

The large memcpy implementation here is unlikely to be inlined automatically on many compilers, even when the actual length and alignements are known at compile time. More sophisticated compilers (like later versions of gcc) will do the constant propagation first, then see that the resulting function is short enough to inline - with less sophisticated automatic inlining, the compiler will see the size of the full memcpy() function and decide it is too big to inline. And inlining will not occur anyway if the compiler does not have the source of memcpy() on hand when it is used.

gcc’s builtin memcpy will inline correctly and optimally when information is known at compile time. It will do a better job than you will get with this memcpy() implementation.

Additionally, gcc’s builtin (and library) memcpy is correct. Your one here has a fundamental error. It is not defined behaviour to access data via pointers of incompatible types. If this memcpy is called with sources or destinations that are, say, 16-bit types, then you are not allowed to access the data as 32-bit types. A union like this does not give you that ability - the compiler knows that the pointers involved cannot alias, and it can assume that the 16-bit data is not affected by any 32-bit writes. Moreover, it can make the same assumption about incompatible types that are the same size - if “uint32_t” is a typedef for “unsigned long”, then it is incompatible with “unsigned int” even if that also is 32 bits.

As long as the function is compiled and called as a separate function, this type-based aliasing information will be lost and thus the compiler will do the copying. But if it is inlined (or you have link-time optimisation enabled), then the compiler can use this aliasing information to skip memory accesses that it knows cannot legally happen.

So how do you reliably and correctly copy chunks of data in C? There are three ways. One is to use character pointers and do it byte for byte - such accesses are always allowed to alias. Another is to use implementation-specific techniques, such as gcc’s “may_alias” type attribute. The standard method is to use “memcpy”.

Remember, the memcpy that comes with the implementation is guaranteed to be correct for that implementation - it can use whatever tricks needed to avoid any aliasing issues even if it copies in larger lumps.

Note that when you use memcpy on gcc without the -fno-builtin-memcpy flag, gcc will generate inline code when appropriate. This inlining can be so efficient that it is removed entirely, or done as simple register-to-register movement. It means that memcpy() can be used to make code clear, correct and safe, without worrying about efficiency, rather than using unions or pointer casts that often are not fully safe (like in your code here).

Finally, remember that even if you implement your own memcpy(), and even if you use the -fno-builtin-memcpy flag, the compiler is free to assume that memcpy works exactly according to the standards. If you write this:

static uint32_t memcpyBytes;

void *memcpy( void *pvDest, const void *pvSource, size_t ulBytes ) {
memcpyBytes += ulBytes;
// rest of implementation
}

static uint8_t d[100];
static uint8_t s[100];

uint32_t test(size_t n) {
memcpyBytes = 123;
memcpy(d, s, n);
return memcpyBytes;
}

then the compiler is free to return the fixed value 123 from this test() function. It knows memcpy cannot affect the value of memcpyBytes.

(As a side issue, the correct signature for memcpy() has “restrict” in the pointers.)

system · August 24, 2017, 9:14am

davidbrown wrote on Thursday, August 24, 2017:

The summary here is use the compiler’s memcpy. It is safe and efficient.

Do not disable its useful and safe optimisations. Do not mess with defining your own versions of such standard library functions - they will not be better than the implementations versions, and they may have subtle risks. It may be a different matter if you are using a poor quality or limited compiler, but that is not the case here.

hs2 · August 24, 2017, 10:10am

hs2sf wrote on Thursday, August 24, 2017:

Again - I couldn’t agree more. Great posts David !!

system · August 24, 2017, 10:31am

davidbrown wrote on Thursday, August 24, 2017:

C pedantry is a hobby of mine - I am glad it can sometimes be of interest or use to others.