We have been using Message Buffers for a while now to stream data between cores and it has been working great.
Now we plan to migrate one of the Processors to AARCH64 and will run into problems with the current Message Buffer implementation (especially StreamBuffer_t struct).
I think it is becoming more prevalent in Embedded to have big.LITTLE architectures with varying width architectures. In many cases the big CPU will run Linux but if both processors run FreeRTOS it will be very handy to be able to use Message Buffers to communicate between them.
While adapting the shared types to accommodate different architectures may not be too hard, it may introduce code complexities in some places.
Do you think this is a relevant and feasible issue to raise for the FreeRTOS-Kernel?
Some member types are not fixed size, e.g. uxTail is size_t and will be 8 byte on AARCH64 but 4 byte on most 32 bit architectures. Sames goes for the pointer members in the struct.
As I understand the code, both the sender and the receiver of a Message Buffer access the same structure in shared memory, so the memory layout of the structure must be identical for the sender and receiver architectures.
In case it did not became clear in my original post: the idea is to use Message buffers between two different architectures (e.g. ARMv7-A and AARCH64).
I have one question here - the shared memory is accessible to both the cores. How does the addressing work? Is it like the address are 32-bits only and therefore upper 32-bits are ignored/zeroed out when accessing it as a 64-bit address from the 64-bit core.
The Message Buffer must be placed within the 32-bit Memory region, to be accessible by aarch32. This means the 4 MSB of Message Buffer addresses in the aarch64 view are always zero.
There are several ways this could be implemented to be platform independent. What comes to my mind (maybe also combinations):
store addresses as uint32_t in struct and cast to pointer when used
zero-pad structure so each member is 8 byte aligned
serialization/deserialization functions to access memory (endianess independent)
…
It will always be a compromise between performance, memory usage, portability and clean code.
My guess from looking at the design of the stream buffers, is that the size_t member could be converted to a fixed size. A uint16_t would allow for stream buffers up to 64k in size, and uint32_t for bigger. (I can’t imagine a need even on 64 bit architectures for more than 4GB stream buffers). The one limitation this would make is limitations on systems with a 16 bits size_t, but I suspect that making it size_t unless you were on an aarch64 processor would work.
The bigger issue is that 3 actual pointers are stored, and those would need to be storing 32 bit numbers restriction the buffer address and the call back funcitions needing to be limited to the lower 32-bit memory address space, and the aarch64 processor needing to zero extend those addresses.
We will take a closer look at this in the coming weeks.
If we find a solution that we think can make it into the main branch, we will contribute it.
Otherwise, we will probably just implement our own solution outside the kernel.
We had a look into it and got it working for us.
However, we didn’t really find a sleek solution that would meet the requirements for code quality and portability of the kernel.
The cleanest solution would probably be to fix the streamBuffer member size like this:
/* Structure that holds state information on the buffer. */
typedef struct StreamBufferDef_t {
volatile uint32_t xTail; /* Index to the next item to read within the buffer. */
volatile uint32_t xHead; /* Index to the next item to write within the buffer. */
uint32_t xLength; /* The length of the buffer pointed to by pucBuffer. */
uint32_t xTriggerLevelBytes; /* The number of bytes that must be in the stream buffer before a task that is waiting for data is unblocked. */
volatile uint64_t xTaskWaitingToReceive; /* Holds the handle of a task waiting for data, or NULL if no tasks are waiting. */
volatile uint64_t xTaskWaitingToSend; /* Holds the handle of a task waiting to send data to a message buffer that is full. */
uint64_t pucBuffer; /* Points to the buffer itself - that is - the RAM that stores the data passed through the buffer. */
uint8_t ucFlags;
#if ( configUSE_TRACE_FACILITY == 1 )
uint32_t uxStreamBufferNumber; /* Used for tracing purposes. */
#endif
#if ( configUSE_SB_COMPLETED_CALLBACK == 1 )
uint64_t pxSendCompletedCallback; /* Optional callback called on send complete. sbSEND_COMPLETED is called if this is NULL. */
uint64_t pxReceiveCompletedCallback; /* Optional callback called on receive complete. sbRECEIVE_COMPLETED is called if this is NULL. */
#endif
} StreamBuffer_t;
And adding getters/setters for member accesses like e.g.:
footprint unnecessary large for smaller architectures
even more complex if endianness is considered
Considering that the inter-architecture-compatibility of message buffers is probably only relevant for a handful of FreeRTOS users, I think it is not worth bringing this into the kernel.
I think we will just use our adapted implementation besides the kernel, unless you think it should be in the kernel.
I agree with your assessment - though, it may change in future as more such devices become popular. You may want to consider putting an example here to help others looking into the same problem.