TCP: BufferAllocation1 does not work with SMP kernel and Zynq port

I am transitioning a project to use SMP kernel, but I noticed that the TCP library using BufferAllocation1 does not function for some reason. BufferAllocation2 seems to work properly from what I’ve seen so far, but there is a noticeable drop in performance compared to 1. The port being used is Xilinx Zynq.

smp_buffer1.zip (14.8 KB)

I attached debug messages and wireshark capture of me simply trying to ping the device. When using BufferAllocation1, it does reply to ARP but replies to ping are very delayed if it replies at all. Attempting tcp connection always fails.

I did try to set the affinity of the IP task and the EMAC task to only run on core 0, but it did not seem to improve anything.

So some questions this raises:
Has BufferAllocation1 been tested using SMP on any other ports? Is the problem in the Zynq port or the allocation scheme?

Maybe a dumb question but has the TCP library been tested using SMP? I only ask because I have searched and not found any mention of this. So far BufferAllocation2 (with no set affinities) seems to be working.

Any other thoughts on what the problem could be? If it was some threading problem I would have thought pinning the 2 tasks to core 0 would have worked around the issue.

Is it crashing, or are you not able to send or receive packets? If it’s crashing, are you able to get the call stack/logs?

No, both buffer allocation schemes are not tested with SMP enabled yet.

Is the problem in the Zynq port or the allocation scheme?

Since the major difference between both allocation schemes is that the buffer allocation scheme 1 uses static memory, I would start checking where the ucNetworkPackets/pucNetworkPackets gets allocated and if it’s aligning with the SMP/HW configuration.

I did not observe any crash or asserts. I did include the debug messages in the zip, but I will put them in this post.

FreeRTOS_AddEndPoint: MAC: 00-11 IPv4: c0a8150aip
prvIPTask started
XEmacPs detect_phy: PHY detected at address 0.
Start PHY autonegotiation
Waiting for PHY to complete autonegotiation.
autonegotiation complete
link speed: 1000
prvEMACHandlerTask[ 0 ] started running
Network buffers: 64 lowest 64
Socket 6000 -> [0.0.0.0]:0 State eCLOSED->eTCP_LISTEN
Network buffers: 61 lowest 61
Network buffers: 60 lowest 60
pxEasyFit: ARP c0a81505ip -> c0a8150aip
pxEasyFit: ARP c0a81505ip -> c0a8150aip
Network buffers: 30 lowest 30
ipARP_REQUEST from c0a81505ip to c0a8150aip end-point c0a8150aip
ipARP_REQUEST from c0a81505ip to c0a8150aip end-point c0a8150aip
emacps_handle_error: Receive buffer not available
emacps_handle_error: Receive buffer not available
pxEasyFit: ARP c0a81505ip -> c0a8150aip
emacps_handle_error: Receive buffer not available
Network buffers: 26 lowest 26
ipARP_REQUEST from c0a81505ip to c0a8150aip end-point c0a8150aip
emacps_handle_error: Receive buffer not available
Network buffers: 24 lowest 24
pxEasyFit: ARP c0a81505ip -> c0a8150aip
pxEasyFit: ARP c0a81505ip -> c0a8150aip
pxEasyFit: ARP c0a81505ip -> c0a8150aip
pxEasyFit: ARP c0a81505ip -> c0a8150aip
pxEasyFit: ARP c0a81505ip -> c0a8150aip
pxEasyFit: ARP c0a81505ip -> c0a8150aip
pxEasyFit: ARP c0a81505ip -> c0a8150aip
pxEasyFit: ARP c0a81505ip -> c0a8150aip
Network buffers: 22 lowest 22
ipARP_REQUEST from c0a81505ip to c0a8150aip end-point c0a8150aip
ipARP_REQUEST from c0a81505ip to c0a8150aip end-point c0a8150aip
ipARP_REQUEST from c0a81505ip to c0a8150aip end-point c0a8150aip
ipARP_REQUEST from c0a81505ip to c0a8150aip end-point c0a8150aip
ipARP_REQUEST from c0a81505ip to c0a8150aip end-point c0a8150aip
ipARP_REQUEST from c0a81505ip to c0a8150aip end-point c0a8150aip
ipARP_REQUEST from c0a81505ip to c0a8150aip end-point c0a8150aip
ipARP_REQUEST from c0a81505ip to c0a8150aip end-point c0a8150aip
pxEasyFit: ARP c0a81505ip -> c0a81501ip
pxEasyFit: ARP c0a81505ip -> c0a81501ip
pxEasyFit: ARP c0a81505ip -> c0a81501ip
Network buffers: 9 lowest 9
ipARP_REQUEST from c0a81505ip to c0a81501ip end-point c0a8150aip
ipARP_REQUEST from c0a81505ip to c0a81501ip end-point c0a8150aip
ipARP_REQUEST from c0a81505ip to c0a81501ip end-point c0a8150aip
emacps_send_message: Time-out waiting for TX buffer

OK this comment is making me think that the problem might be the BufferAllocation1 is expecting to use uncached memory and maybe that is conflicting with the SMP cache coherent memory. I’m hardly an expert on this topic but I’ll see if I can investigate the ARM memory settings.

Thats possible.

These logs indicate that the network driver is facing issues with the RX buffer when receiving a network packet into the buffer, probably losing them and finally running out of network buffers.

Can you check by stepping through init_dma and emacps_check_rx to see if the buffers allocated by pxGetNetworkBufferWithDescriptor are properly initialized and set up with the DMA?

I’m still thinking that this is the most likely reason. BufferAllocation1 modifies the attributes of its memory buffer here. I tinkered around with the memory attribute values and task affinities and sometimes the network will init to a stable and working state but the init seems unreliable and sometimes will fail (which makes me think that the memory is misconfigured).

Meanwhile so far I have not seen any issues with BufferAllocation2. So I am going to move forward with testing my application using this scheme. I think its likely that the Zynq port BufferAllocation1 is not compatible with SMP operation without some modifications.

The Zynq project declares 1 MB of space:

uint8_t pucUncachedMemory[ uncMEMORY_SIZE ]

which will become non-cached memory. It is shared between DMA and the CPU.

I just compiled and ran a Zynq/Zybo project, using BufferAllocation_1.c, while caching enabled for pucUncachedMemory[]. It has no Ethernet connectivity.
I added a new testing macro:

#define uncZYNQ_FORCE_USE_CACHED_MEMORY    1

So before we conclude that caching is the problem, some reparation is needed.

Strange though that @mike919192 reports that his demo does work when using BufferAllocation_2.c. That module uses the default pvPortMalloc(), I assume?

EDIT For me only BufferAllocation_1.cworks, not _2.c

I just double checked that when using the regular (not SMP) kernel, both BufferAllocation 1 and 2 are working for me.

when using the regular (not SMP) kernel, both BufferAllocation 1 and 2 are working for me

Sorry, for me both methods BufferAllocation_1 and _2 work as well. Version _1 is faster, but it shows the reception of strange unknown IP-packets. It looks like there is some corruption.

Is this the debug message you are talking about?

prvProcessIPPacket: Undefined Frame Type

I am occasionally getting that message with my current testing using SMP and BufferAllocation_2. I don’t think I ever saw that message before with my previous use of no SMP and BufferAllocation_1. I could go back to an old version to see if that message shows up.

I haven’t yet investigated what is triggering that message. But it does not appear to cause any problems with my application operating.

Edit: Duh, I just looked at where that message is printed here. My project is compiled with IPv4 only, and the message is in response to receiving an IPv6 packet.

@tony-josi-aws @htibosch
So after some application testing I have also run into some issues with the socket interface while using SMP kernel + TCP library.

  • The sockets do not reliably return error code on disconnection. This problem is pretty repeatable so this would be possible to investigate.
  • After many hours I had one instance of stream corruption (and therefore losing track of our binary protocol). I’m not certain that the corruption occurred on the embedded (server) side, because it is possible it may have occured on the PC (client) side. This corruption only occured after many hours so it would be difficult to capture and debug.

So because of the issues and the performance problems, seems pretty clear that its premature to adopt the SMP kernel + TCP library right now.

That said I can probably spend some time continuing to test and debug. I wanted to get your feedback on what future items would make sense.

  • Does it make sense to do some testing on the TCP library with SMP POSIX port? Perhaps with thread sanitizer enabled?
  • Obviously I will continue to test my kernel SMP port.
  • I spent some time getting setup and familiar with Percepio in order to capture traces with the SMP kernel. This could be a useful tool for debugging TCP library.

Let me know what you think would be a good path forward.

We do not have a SMP POSIX port.

Tracelyzer is a very useful tool for debugging such issues.

Your approach of debugging a bit more to pin point the problem seems reasonable to me.

Are there any major challenges to accomplish this? I would think this would be a good tool for validating libraries and other application code.

I was off most of last week. I think the best place to start would be for me to investigate why the sockets sometimes do not return on disconnection.

Since we have not tried it yet, it is hard to speak about specific challenges. If you decide to implement this, we always welcome contributions :slight_smile:

Sounds good!

Here is more information about the problem with the disconnect. The error sometimes occurs when a RST is received, but the connection state does not change. I attached debug messages, wireshark capture, and percepio capture for both successful disconnect and the error.

disconnect.zip (365.8 KB)

Here is the important messages from the debug log. Hopefully the other captures also have useful information:

Successful disconnect:

TCP: RST received from 57212 for 6000
vTCPStateChange: Closing (Queued 0, Accept 0 Reuse 0)
vTCPStateChange: me 0x327ec8 parent 0x327ec8 peer 0x0 clear 0
vTCPStateChange: xHasCleared = 0
Socket 6000 -> [192.168.21.5]:57212 State eESTABLISHED->eCLOSED
error reading from socket
Notify tx_task to shutdown
Tx task is deleting itself
Lost: Socket 6000 now has 0 / 1 children
FreeRTOS_closesocket[0ip port 6000 to c0a81505ip port 57212]: buffers 64 socks 1
Rx task is deleting itself

Disconnect error:

TCP: RST received from 56133 for 6000

I have some more information what is happening. I found the function that prints the RST message and added a few more debug messages in FreeRTOS_TCP_IP.c

/* This is not a socket in listening mode. Check for the RST
 * flag. */
if( ( ucTCPFlags & tcpTCP_FLAG_RST ) != 0U )
{
    FreeRTOS_debug_printf( ( "TCP: RST received from %u for %u\n", usRemotePort, usLocalPort ) );

    /* Implement https://tools.ietf.org/html/rfc5961#section-3.2. */
    if( pxSocket->u.xTCP.eTCPState == eCONNECT_SYN )
    {
        const uint32_t ulAckNumber = FreeRTOS_ntohl( pxTCPHeader->ulAckNr );

        /* Per the above RFC, "In the SYN-SENT state ... the RST is
         * acceptable if the ACK field acknowledges the SYN." */
        if( ulAckNumber == ( pxSocket->u.xTCP.xTCPWindow.ulOurSequenceNumber + 1U ) )
        {
            vTCPStateChange( pxSocket, eCLOSED );
        }
    }
    else
    {
        const uint32_t ulSequenceNumber = FreeRTOS_ntohl( pxTCPHeader->ulSequenceNumber );

        /* Check whether the packet matches the next expected sequence number. */
        if( ulSequenceNumber == pxSocket->u.xTCP.xTCPWindow.rx.ulCurrentSequenceNumber )
        {
            vTCPStateChange( pxSocket, eCLOSED );
        }
        /* Otherwise, check whether the packet is within the receive window. */
        else if( ( xSequenceGreaterThan( ulSequenceNumber, pxSocket->u.xTCP.xTCPWindow.rx.ulCurrentSequenceNumber ) != pdFALSE ) &&
                 ( xSequenceLessThan( ulSequenceNumber, pxSocket->u.xTCP.xTCPWindow.rx.ulCurrentSequenceNumber +
                                      pxSocket->u.xTCP.xTCPWindow.xSize.ulRxWindowLength ) != pdFALSE ) )
        {
            FreeRTOS_debug_printf( ( "In challenge! Seq Num:%u Cur Seq Num:%u, RxWin Len:%u\n", ulSequenceNumber, pxSocket->u.xTCP.xTCPWindow.rx.ulCurrentSequenceNumber, pxSocket->u.xTCP.xTCPWindow.xSize.ulRxWindowLength ) );
            /* Send a challenge ACK. */
            ( void ) prvTCPSendChallengeAck( pxNetworkBuffer );
        }
        else
        {
            FreeRTOS_debug_printf( ( "In nothing! Seq Num:%u Cur Seq Num:%u, RxWin Len:%u\n", ulSequenceNumber, pxSocket->u.xTCP.xTCPWindow.rx.ulCurrentSequenceNumber, pxSocket->u.xTCP.xTCPWindow.xSize.ulRxWindowLength ) );
            /* Nothing. */
        }
    }

    /* Otherwise, do nothing. In any case, the packet cannot be handled. */
    xResult = pdFAIL;
}

The instances where the error occurs is when the else station with the “challenge” or “nothing” is executed.

TCP: RST received from 62406 for 6000
In challenge! Seq Num:3242066080 Cur Seq Num:3242063949, RxWin Len:2920
TCP: RST received from 58643 for 6000
In nothing! Seq Num:1183004540 Cur Seq Num:1183001620, RxWin Len:2920

In both these cases, the PC client has disconnected but the embedded server does not close the connection. I get that RST is not a clean disconnect, but the embedded gets into a state where it holds onto the broken connection forever.

I believe that I have found the cause of the original issue with the Buffer Allocation 1. It was indeed caused by incorrect memory attributes, the cause was that the boot code of the second core was modifying memory attributes after the TCP library had already initialized and set its buffer memory to non cachable.

There was a section of the boot code that sets the memory to shareable between the cores, I later found out that it was unnecessary because shareable memory is already the default.

So the problem sequence was:

  • Core 0 boots and sets all memory to shareable (not necessary because it is already default but doesn’t hurt anything)
  • TCP initializes its buffer 1 and sets that region to non cachable
  • Core 1 boots and sets all memory to shareable (misconfigures the TCP buffer 1)

So after disabling the option in the boot that modifies memory attributes, TCP buffer 1 now seems to be working without issues. Although, I should mention that it seems that the TCP library requires that configRUN_MULTIPLE_PRIORITIES is set to 0.

I think that the other issues I ran into with TCP buffer 2, were due to it being slower than buffer 1. The problems with the RST flag do not seem to trigger with buffer 1.

3 Likes

@mike919192 Thank you for sharing your solution!