Handling of disconnects by the network stack

Hi,

our device is encountering problems when it has to handle network disconnects. We are using a RTOS+TCP V2.3.3 port for a STM32H7.

The problem arises, when the plug of a previously working ETH connection has been removed. The RTOS IP task sends out ARPs in a regular interval. Apparently TX semaphores for descriptors are being taken in the process by the function xNetworkInterfaceOutput().
HAL_ETH_TxCpltCallback() normally triggers a clear of the respective semaphore in the EMACHandlerTask. But since there is no connection, the ISR will not be called (and thus not xSemaphoreGive( xTXDescriptorSemaphore );)

I get the following print from RTOS, via logging by UART:


LVL_INFO xPhyCheckLinkStatus() L745: xPhyCheckLinkStatus: PHY LS now 00<\n>
LVL_INFO prvEthernetUpdateConfig() L461: prvEthernetUpdateConfig: LS mask 00 Force 0<\n>
LVL_INFO prvEMACHandlerTask() L873: TX descriptors 1/16<\n>
LVL_INFO prvEMACHandlerTask() L873: TX descriptors 2/16<\n>
LVL_INFO prvEMACHandlerTask() L873: TX descriptors 3/16<\n>
LVL_INFO prvEMACHandlerTask() L873: TX descriptors 4/16<\n>
LVL_INFO prvEMACHandlerTask() L873: TX descriptors 5/16<\n>
LVL_INFO prvEMACHandlerTask() L873: TX descriptors 6/16<\n>
LVL_INFO prvEMACHandlerTask() L873: TX descriptors 7/16<\n>
LVL_INFO prvEMACHandlerTask() L873: TX descriptors 8/16<\n>
LVL_INFO prvEMACHandlerTask() L873: TX descriptors 9/16<\n>
LVL_INFO prvEMACHandlerTask() L873: TX descriptors 10/16<\n>
LVL_INFO prvEMACHandlerTask() L873: TX descriptors 11/16<\n>
LVL_INFO prvEMACHandlerTask() L873: TX descriptors 12/16<\n>
LVL_INFO prvEMACHandlerTask() L873: TX descriptors 13/16<\n>
LVL_INFO prvEMACHandlerTask() L873: TX descriptors 14/16<\n>
LVL_INFO prvEMACHandlerTask() L873: TX descriptors 15/16<\n>
LVL_INFO prvEMACHandlerTask() L873: TX descriptors 16/16<\n>
LVL_INFO xNetworkInterfaceOutput() L383: emacps_send_message: Time-out waiting for TX buffer<\n>

The TX descriptors never get freed again and the network stack is permanently bricked.

Is there anything that can be done in order to avoid this situation? Am I missing a setting,… here?

Thanks a lot for your help

Hello @Dweb_2,

When the network is disconnected, the packets should queue up (and not get sent). This is expected behavior.

I understand that when you re-connect the ethernet cable, the queued packets never get sent - is that correct?
Also, are you using zero copy method (i.e. what is the value of ipconfigZERO_COPY_TX_DRIVER )?

I will look into this and get back to you shortly.

Hello @Dweb_2,

I checked my wares but could not find an STM32Hxx based board (have ordered one for myself now :slight_smile:). But, I did look at the code and it seems to me that the below change when applied here (after the printf) should help you work around the problem.

        /* Check link status since we failed to get the network semaphore. */
        if( xGetPhyLinkStatus() == pdFAIL )
        {
            /* Link is down. */
            xLinkDown = pdTRUE;
            /* Since the link is down, clear the descriptors. */
            ETH_Clear_Tx_Descriptors( &( xEthHandle ) );
        }
        else if( xLinkDown == pdTRUE )
        {
            /* Link was down and now it is up again. Start autonegotiation and restart MAC. */
            prvEthernetUpdateConfig( pdTRUE );
            /* Reset the flag. */
            xLinkDown = pdFALSE;
        }

Do note that I have not run this piece of code myself (since I do not have the STM board required) but I did do a dry run. Let me know if this works for you.

Hi @kanherea,
thanks for your quick reply :slightly_smiling_face:
Yes, that is correct, plugging in the cable again changes nothing and the device sadly can’t respond anymore (the FreeRTOS+TCP stack seems to at least run tho).

Concerning the zero copy macro we configured this:
#define ipconfigZERO_COPY_TX_DRIVER ( 0 )

Thanks a lot for the supplied fix. I will try it out and will report the results.

I included the snippet in our code. At first, it did not compile, because xLinkDown was missing. I’m not 100% sure if I understood the function correctly, but I added a declaration myself in order to continue :

FreeRTOS_printf( ( "emacps_send_message: Time-out waiting for TX buffer\n" ) );
static BaseType_t xLinkDown = pdFALSE; // <-was missing
/* Check link status since we failed to get the network semaphore. */
if( xGetPhyLinkStatus() == pdFAIL )

After some testing the same problem sadly still seems to persist. I did some more debugging and found out, that ETH_Clear_Tx_Descriptors() is now entered, but not to the point where the semaphore is released. The function is left because of this condition. So there seems to be some kind of conflict between app and DMA.
Looks to me like the ETH peripheral still has ownership of the descriptor (since the TX complete ISR did not fire).

Ah! Yes, it must have gotten lost while I was copy pasting the code from my IDE. The declaration you have added seems perfect!

Yes, the DMA has not released the buffer yet. The below snippet should fix it. Add it at the same location.

        static BaseType_t xLinkDown = pdFALSE;
        /* Check link status since we failed to get the network semaphore. */
        if( xGetPhyLinkStatus() == pdFAIL )
        {
            /* Link is down. */
            xLinkDown = pdTRUE;
            /* Stop the DMA transfer. */
            HAL_ETH_Stop_IT( &( xEthHandle ) );
            /* Clear the Transmit buffers. */
            memset( &( DMATxDscrTab ), '\0', sizeof( DMATxDscrTab ) );
            /* Since the link is down, clear the descriptors. */
            ETH_Clear_Tx_Descriptors( &( xEthHandle ) );
        }
        else if( xLinkDown == pdTRUE )
        {
            /* Link was down and now it is up again. Start autonegotiation and restart MAC. */
            prvEthernetUpdateConfig( pdTRUE );
            /* Reset the flag. */
            xLinkDown = pdFALSE;
        }

Can you try this and test whether this works or not and let me know what is happening? Apologies for not giving you a “one-shot answer” - disadvantages of not having a hardware. I haven’t yet received my board (should receive it by tomorrow).

Until now, the device works fine with the newly added code. Network operation is directly resumed after re-connecting and the TX descriptor count increases and decreases as expected.
@kanherea , thanks again for the great support.
Is there a chance that the fix will be applied in future FreeRTOS+TCP releases for the STM32Hxx port?

Hello @Dweb_2,

Thank you for reporting back! I am glad that the solution worked for you.

We will add this to our backlog :slight_smile:.

If you have any more issues, do ask. A request: can you mark the suggestion as “answer” so that other users can find the solution till the time we add the patch to the FreeRTOS+TCP repository.

When I receive the FreeRTOS forum- and github-emails, my email client filters on keywords like TCP, ICMP, UDP, DNS, but not on “network”. I will add that keyword. That is why I didn’t notice this post until today.

@Dweb_2, thanks a lot for reporting, and thanks to @kanherea for giving a solution.

I just turned the solution into PR #321. I tested the changes on a board with STM32H747 ( on the CM7 core ). It worked as expected.

Normally network drivers have this type of code:

BaseType_t xNetworkInterfaceOutput( NetworkBufferDescriptor_t * const pxDescriptor,
                                    BaseType_t xReleaseAfterSend )
{

    if( xGetPhyLinkStatus() === pdPASS )
    {
    }
    else
    {
        /* The Link Status is low, no use to try to send packets. */
    }
    if( xReleaseAfterSend != pdFALSE )
    {
        vReleaseNetworkBufferAndDescriptor( pxDescriptor );
    }
}

At the bottom I changed the code into:

if( xPhyCheckLinkStatus( &xPhyObject, xResult ) != 0 )
{
    /*
     * The function xPhyCheckLinkStatus() returns pdTRUE if the
     * Link Status has changes since it was called the last time.
     */
    if( xGetPhyLinkStatus() == pdFALSE )
    {
        /* Stop the DMA transfer. */
        HAL_ETH_Stop_IT( &( xEthHandle ) );
        /* Clear the Transmit buffers. */
        memset( &( DMATxDscrTab ), '\0', sizeof( DMATxDscrTab ) );
        /* Since the link is down, clear the descriptors. */
        ETH_Clear_Tx_Descriptors( &( xEthHandle ) );
    }
    else
    {
        /* Something has changed to a Link Status, need re-check. */
        prvEthernetUpdateConfig( pdFALSE );
    }
}

You can find the new NetworkInterface.c here.
After merging it will appear here, but that will take a while.