About ipBUFFER_PADDING and ipconfigPACKET_FILLER_SIZE

daveskok wrote on Wednesday, May 18, 2016:

Greetings,
I have been looking closely at FreeRTOS+TCP(160112) source and examples related to implementing zero copy support. I have also read the following threads related to providing newtork buffers and alignment.

https://sourceforge.net/p/freertos/discussion/382005/thread/5591a1a9/?limit=25#c87d/59a7/e3fd
https://sourceforge.net/p/freertos/discussion/382005/thread/0480e081/?limit=25#0ffa/067a
https://sourceforge.net/p/freertos/discussion/382005/thread/cc8075ef/?limit=25#c143

It is clear to me why the padding is used in the buffer, I do that kind of thing in my own code often. What I am having a hard time understanding is implementation for a case where ethernet hardware requires buffers on 32bit alignment. First assume that the buffers are sized max packet + ipBUFFER_PADDING and are indeed aligned on even 32bit boundaries. In this case I reason that the value of ipconfigPACKET_FILLER_SIZE would need to be 0 or possibly 4. Assuming that this assertion is so the value of ipBUFFER_PADDING would be an even multiple of 32 bits. The result is that the buffer (base addr + ipBUFFER_PADDING) that harware uses is on even 32bit boundary and there is still space in front of the hardware buffer for FreeRTOS+TCP to use. If I understand correctly this will satisfy hardware requirements but the consequence is that FreeRTOS+TCP will be somewhat less efficient when it “cracks” the packets because fields are more favorably aligned when ipconfigPACKET_FILLER_SIZE is 2. Is this right?

Thanks!

heinbali01 wrote on Thursday, May 19, 2016:

If I understand correctly this will satisfy hardware requirements
but the consequence is that FreeRTOS+TCP will be somewhat less
efficient when it “cracks” the packets because fields are more
favourably aligned when ipconfigPACKET_FILLER_SIZE is 2.
Is this right?

You see this perfectly right !

There is a conflict between the requirements of the IP-stack and the DMA buffers.

But fortunately, many makers of EMAC peripherals know about this problem and created a way to get around this: a flag that says:

"ignore the first 2 bytes of my TX data"

and there may be another flag saying:

"insert 2 dummy bytes before the RX packet"

What hardware are you using? The above configuration flags may also be available.

A summary of the ethernet buffer:

Invisible 10 bytes, 32-bit aligned,
at pucEthernetBuffer - 10:

Offs  Contents

      /* Pointer to the owner of this array. */
 0    NetworkBufferDescriptor_t *pxBackPointer;
 4    uint32_t ulSpare;
      /* Filler to get a 32-bit alignment PLUS 2. */
 8    uint16_t usFiller;

Here start the visible data,
at pucEthernetBuffer + 0:

14-byte Ethernet header:

10    uint8_t ucDestination[ 6 ];
16    uint8_t ucSource[ 6 ];
22    uint16_t usFrameType;


IP-header, 32-bit aligned:

24    uint8_t ucVersionHeaderLength;
            ...
            /* 32-bit fields: */
36    uint32_t ulSourceIPAddress;
40    uint32_t ulDestinationIPAddress;

Some hardware is able to access 32-bit variables at 16-bit aligned locations. I saw a CPU that can do this with internal SRAM only.

In cases where the compiler knows in advance that a variable is badly aligned, such as here:

struct xUnaligned {
    uint8_t ucChar;
    uint32_t ulLong;
} __attribute__( ( packed ) );

the compiler may get around the problem and access ‘ulLong’ as an array of 4 bytes.

For FreeRTOS+TCP it was decided to give all network packets a perfect alignment so that all 32-bit fields will be accessed with 32-bit instructions. It is the 14-byte Ethernet header that spoils the party.

PS At higher levels (such as DNS, LLMNR, DHCP and NBNS), no assumptions can be made about the alignment of 32-bit fields and memcpy() is used.

Regards.

daveskok wrote on Thursday, May 19, 2016:

Hein,
Thank you very much for clarifying this! I am investigating for the purpose of porting Microsemi A2F200 from LWIP to FreeRTOS+TCP. LWIP implementation currently uses copy. If we spend the time to port to FreeRTOS+TCP it appears that switching to zero copy is a simple matter but I am not confident that Microsemi hardware provides the trick you mention. I did see a comment in zero copy driver example Zynq/emacpsif_dma.c(629) that indicates hardware setting to accomodate shift and until now was not certain why unaligned transfer was strived for.

If hardware is not capable of unaligned reception of packets do you find that using zero copy still is a benefit? That is, time saved not copying buffers minus time added with unaligned packet cracking still comes out ahead?

Microsemi A2F200 is Cortex-M3 married to FPGA. Feature set is exotic and alluring. Once chosen and used in project a world of pain ensues. No user forum, frustrating tools and factory support only (read no support). If you’ve never heard of Microsemi forget the name now. Reader be warned. Apologies for the off topic rant.

Thanks again!