+TCP zero copy implementation

system · May 19, 2016, 3:14pm

daveskok wrote on Thursday, May 19, 2016:

Greetings,
Referencing :
http://www.freertos.org/FreeRTOS-Plus/FreeRTOS_Plus_TCP/Embedded_Ethernet_Porting.html#vNetworkInterfaceAllocateRAMToBuffers

The example for how to implement zero copy buffers on receive in prvEMACDeferredInterruptHandlerTask
show that it is easily accomplished by buffer pointer swap. The example doesn’t show the extra data in front of the DMA buffer being updated to point to the new NetworkBufferDescriptor_t it is now associated with. It would need to wouldn’t it?

Regards

htibosch · May 19, 2016, 3:40pm

heinbali01 wrote on Thursday, May 19, 2016:

Hi Gualterio,

You are right, thanks for reporting this.

The first 4 bytes at ( pucEthernetBuffer - 10 ) are in fact a pointer to the owner, and when two NetworkBufferDescriptor_t's swap their pucEthernetBuffer's, these pointers must be set correctly after the swap.

This pointer is used when accessing UDP messages with the zero-copy flag: a pointer to an UDP payload buffer can be translated to a NetworkBufferDescriptor_t and v.v.

After swapping the pucEthernetBuffer, please add:

        *( ( NetworkBufferDescriptor_t * ) pxDescriptor->pucEthernetBuffer ) = pxDescriptor;
        *( ( NetworkBufferDescriptor_t * ) pxDMARxDescriptor->pucEthernetBuffer ) = pxDMARxDescriptor;

Agreed, Richard?

Regards.

htibosch · May 19, 2016, 3:49pm

heinbali01 wrote on Thursday, May 19, 2016:

Sorry, I did not subtract the 10 bytes:

        *( ( NetworkBufferDescriptor_t * ) ( pxDescriptor->pucEthernetBuffer - ipBUFFER_PADDING ) ) = pxDescriptor;
        *( ( NetworkBufferDescriptor_t * ) ( pxDMARxDescriptor->pucEthernetBuffer - ipBUFFER_PADDING ) ) = pxDMARxDescriptor;

In the library, before casting a byte address to a pointer location, it is tested if the address is well-aligned.

    /* Here a pointer was placed to the network descriptor.  As a
    pointer is dereferenced, make sure it is well aligned. */
    if( ( ( ( uint32_t ) pucBuffer ) & ( sizeof( pucBuffer ) - 1 ) ) == 0 )

system · May 19, 2016, 4:19pm

daveskok wrote on Thursday, May 19, 2016:

Hein,
Thanks for clarifying. I read somewhere either in code comments or elsewhere in documentation (precise reference eludes me now) that in future +TCP code that data in front of buffer will not be required. If this is likely then should user supplied implementation code prepare now by putting various “in front of buffer prep” in a macro so it can easily be removed (or enhanced)?

Regards

rtel · May 19, 2016, 4:19pm

rtel wrote on Thursday, May 19, 2016:

Yes, looks reasonable to me. I will update the referenced page.

system · May 19, 2016, 6:18pm

daveskok wrote on Thursday, May 19, 2016:

The lvalue cast should be ( NetworkBufferDescriptor_t ** )

rtel · May 19, 2016, 7:01pm

rtel wrote on Thursday, May 19, 2016:

Please check I have this right - you may need to refresh the page (F5)
to see the changes:

http://www.freertos.org/FreeRTOS-Plus/FreeRTOS_Plus_TCP/Embedded_Ethernet_Porting.html

system · May 19, 2016, 7:42pm

daveskok wrote on Thursday, May 19, 2016:

It is correct.

htibosch · May 20, 2016, 4:56am

heinbali01 wrote on Friday, May 20, 2016:

that in future +TCP code that data in front of buffer will not be required

You read that on the same page in a comment: “/\* The following line is also required, but will not be required in future versions \*/”.
The 2 alignment bytes have proven to be useful for efficiency. The back-pointer is necessary for zero-copy calls to sendto() and recvfrom().
I agree that the 10 hidden bytes are a bit confusing, but I can not think of a good alternative.

You may also wonder why there are 10 bytes and not 6 : some DMA controllers require an 8-byte alignment, and so 8 bytes were skipped from the start before the packet begins.

Regards.

system · May 20, 2016, 12:55pm

daveskok wrote on Friday, May 20, 2016:

Hein,
May I suggest a possible solution that has memory requirement trade off but simpler implementation. I am in the process of acessing port from lwip to +TCP and have not read through enough code to determine if my idea is feasable but anyway here is a description.

The emac driver code provided by Microsemi for A2F processor creates all structures including DMA descriptors that point to buffers sized for max packet size right off. These buffers are all “owned” by the driver. I would assume that most driver code for other devices would work similarly or provide API to assign user supplied buffer.

I modify emac driver code to initialize DMA descriptors to use buffers supplied by +TCP. That is, all DMA RX and TX descriptors as well as +TCP xNetworkBufferDescriptor_t are initialized at start to point to a +TCP supplied buffer. Static DMA buffer requirement is DMA RX descriptors count + DMA TX descriptors count + xNetworkBufferDescriptor_t count.

When packets are transmitted or received the buffer held by xNetworkBufferDescriptor_t is swapped with DMA descriptor buffer. There is no need to track association of buffer with xNetworkBufferDescriptor_t. The life cycle of xNetworkBufferDescriptor_t would follow same path as copy because when transmit or receive function called xNetworkBufferDescriptor_t is not tied up with emac driver.

Is this feasable despite possibly requiring more static buffers?

Regards

htibosch · May 20, 2016, 3:33pm

heinbali01 wrote on Friday, May 20, 2016:

Hi Gualterio,

I’m reading a PDF ug0250_v5.

It looks like you can have any alignment for the transmission buffers:

 31:0 TBA1 Transmit buffer 1 address 
   Contains the address of the first data buffer. For the setup frame,
   this address must be 32-bit word aligned.
   In all other cases, there are no restrictions on buffer alignment. 
 31:0 TBA2 Transmit buffer 2 address 
   Contains the address of the second data buffer. There are
   no restrictions on buffer alignment.

For reception, a 32-bit alignment seems to be expected:

 31:0 RBA1 Receive buffer 1 address 
   Indicates the length, in bytes, of memory allocated for the
   first receive buffer. This number must be 32-bit word aligned. 
 31:0 RBA2 Receive buffer 2 address 
   Indicates the length, in bytes, of memory allocated for the second
   receive buffer. This number must be 32-bit word aligned.

I’m not sure if I understand your proposal correctly. But I’ll try to sketch some possibilities:

For the RX path, I’m afraid that your EMAC doesn’t accept the +2 byte alignment.

For the TX path, you can pass pucEthernetBuffer pointers to DMA. After every transmission, there will be a TX-ready interrupt which will wake-up the EMAC task. The task will then release the TX buffers that are transmitted.

There is no need to track association of buffer with NetworkBufferDescriptor_t.

Not sure what you mean here? The pointer to the owner is sometimes needed by the IP task and the application.

Regards.

system · May 20, 2016, 8:01pm

daveskok wrote on Friday, May 20, 2016:

Hein,
I sucessfully ported to +TCP zero copy from lwip (copy) doing what I tried to explain in previous post. Initial testing indicates it is working quite well. I can send you all pertinent files for you to look at if you like. In short, my implementation requires more static DMA/buffers but implementation is simpler. In fact is nearly identical to copy except instead of copy does buffer swap.

If you don’t think I missed anything I am ok with Including it with +TCP examples.

lemme know how to send to you private

Regards

rtel · May 20, 2016, 9:00pm

rtel wrote on Friday, May 20, 2016:

Excellent.

lemme know how to send to you private

You can go to RTOS contact and support details and click on the Business
Contact link - I can then forward to Hein.

Thanks.

Topic		Replies	Views
Porting TCP/IP web page misleading Kernel	3	215	June 22, 2016
Receiving Data Using Zero-Copy driver Kernel	5	464	February 14, 2017
pucGetNetworkBuffer() and BufferAllocation_1.c Kernel	1	255	December 21, 2017
Bug in FreeRTOS_DHCP.c Kernel	10	385	April 13, 2015
UDP zero copy sendto outside of allocated stack? Kernel	9	518	September 23, 2019

+TCP zero copy implementation

Related topics