FreeRTOS+TCP will not reply ping after unplug/plug eth cable when sending data

gsing wrote on Monday, August 21, 2017:

I test the FreeRTOS+TCP, sending data from FreeRTOS to PC, FreeRTOS is the server side, minor change from example “SimpleTCPEchoServer.c”, see attachment. PC side can receive data well. but have problem when I do below experiment:

  1. run pc side program to receive data, data can receive smoothly; and run ping FreeRTOS in cycle
  2. unplug eth cable
  3. wait until timeout(5 sec)
  4. plug in the eth cable, FreeRTOS will never reply ping any more.

In step3, if not wait timeout, quickly plug in the cable, ping will be replied.
Where should I check for this problem? Thanks for any help!

test environment:

  1. xilinx zynq zybo board
  2. 160919_FreeRTOS_Labs
  3. some minor change from 160919_FreeRTOS_Labs:
    a) disable DHCP, using static ip
    b) call “vStartSimple2TCPServerTasks( mainTCP_SERVER_STACK_SIZE, tskIDLE_PRIORITY+2 );” before vTaskStartScheduler in main()
    c) EchoServer port set to 1000 in SimpleTCPEchoServer.c
    d) not receive data, just sending data in SimpleTCPEchoServer.c

heinbali01 wrote on Wednesday, August 23, 2017:

Hi Guoxing Zhang,

In step3, if not wait timeout, quickly plug in the cable, ping will be replied.
Where should I check for this problem?

I can reproduce this, thanks. I’m working on it and I will post a solution today.

heinbali01 wrote on Wednesday, August 23, 2017:

Hi Guoxing Zhang,

There is one thing to realise while testing: when you run ping constantly while disconnecting your device, your OS may start forwarding the ping ICMP packets through a different ( gateway ) interface. You may see this happing by running WireShark.

And also like this:

    Reply from 192.168.2.106: bytes=32 time<1ms TTL=128
    Reply from 192.168.2.106: bytes=32 time<1ms TTL=128
    Request timed out.
    Reply from 192.168.2.5: Destination host unreachable.
    Request timed out.
    Request timed out.
    Request timed out.
    Request timed out.
    Reply from 192.168.2.1: Destination host unreachable.

Here above, there is a shift from x.x.x.5 to x.x.x.1.

If that happens, you may want to try the following:

● Try a few pings
● Stop pinging and disconnect the device
● Wait 20 seconds and connect the device
● Wait a while and try ping again

Sometimes, when packets are forwarded to the wrong interface, it may help to clear the ARP cache.

The driver for Zynq/Zybo has changed a bit through time. I will attach the latest ( not-yet well-tested ) version ( see Zynq_Zybo_driver.zip, here below ). Please have a try with it.

I tested the demo application as follows:

● Disconnect device
● Start it up
● Wait 15 secs
● Connect it to the LAN
● See it working

And also I did your test:

● Start it up while connected
● See it working
● Disconnect it from the LAN
● Wait 15 secs
● Re-connect it
● See it working

These tests worked fine here.

What is new in this version: when the device notices a rise of the Link Status, a new Auto Negotiation will be started.
Regards.

gsing wrote on Friday, August 25, 2017:

Hi Hein Tibosch,
Thanks for your help!
For your recommendation(try few pings, wait long time to ping), I have tried, but none can recover. I tried your updated driver, no success too.
In your steps, do you have run socket receiving at PC(win7) side? There is 2 important conditions for this phenomenon:

  1. unplug the cable when PC(win7) receiving data by socket. If not run receiving program in PC side, just unplug/plug cable, ping is OK.
  2. after unplug cable, need to wait some time, several seconds, I guess this time related with FREERTOS_SO_SNDTIMEO, after timeout, socket task in freeRTOS will be deleted. if quickly unplug/plug cable, ping will come back.
    I captured the data by wireshark, see attachment. I know little about TCP stack, maybe you can help to check it.
    Here I also attach the socket receiving program in PC side, I compiled with VC.
    I do this test, because I want data receiving recover after cable plug in, to emulate the condition cable not stable.

Best Regards
Guoxing

heinbali01 wrote on Friday, August 25, 2017:

HI Guoxing, thanks for the PCAP.

Are you connecting your embedded device to a switch or router, or directly to your laptop?
If possible, use a LAN switch. When that works you can try other connection types.

I did not use the TCP connection during my tests, but today I will!

Both modules euadRecv.c and SimpleTCPEServer.c compiled well and run without a problem ( although the free() statement is erroneous, pBuf points to a static char buffer ).

Your VC application does not notice the disconnection. I changed it to call select() before calling recv().

If select() returns negative, the connection has gone.

But you may also notice that select() returns a positive value, while recv() returns zero. This is a sign that OOB ( out-of-band ) data has been received, most possibly an FD_CLOSE event, which means: connection closed.

I just tested with the adapted VC application ( attached as tcp_client.cpp ), and saw that the disconnection was noticed quite quickly after disconnecting:

    Received 41 MB
    Received 41 MB
    Received 42 MB // still connected
    select returns -1
    BufLen receive error
    recv failed (10022)  // WSAEINVAL
    uCntRevTemp = (-1)

Under all circumstances, the ping worked fine as long as the device was connected.

heinbali01 wrote on Friday, August 25, 2017:

PS. when testing, please use the driver in SimpleTCPEServer.c that I posted here above.
It has some minor changes in the PHY handling.

heinbali01 wrote on Friday, August 25, 2017:

Sorry, I mean the driver in Zynq_Zybo_driver.zip and not SimpleTCPEServer.c

gsing wrote on Monday, August 28, 2017:

Hi Hein,
Really appreciate for your help!
After using a switch, no problem seen. The previous tests I have done does not use any switch or router, PC connected to ZYBO directly, not the crossed cable. So may I think this problem related with this directly connection?
I think this is not a major problem, I am OK for freeRTOS+TCP with a switch.
And thanks for your modified tcp_client.cpp. I have never use the select before, I will learn it. Thanks!

BR
Guoxing

gsing wrote on Monday, August 28, 2017:

Hi Hein,
In today’s test, I disconnect the connection between pc and switch. I am afraid, the sesult will fail if disconnect freeRTOS and switch. I will test this tommorrow. Thanks!

gsing wrote on Monday, August 28, 2017:

Hi Hein,
Sorry for my pervious statement.
I just disconnect the connection between freeRTOS and switch. It has no problem too. Thanks for your help!

BR
Guoxing

heinbali01 wrote on Monday, August 28, 2017:

PC connected to ZYBO directly, not the crossed cable

Crossed versus straight: that should not be a problem. Most PHY’s can negotiate about that and find a way to communicate.
But what is a problem is that when you unplug the device, the Network Interface on your laptop goes down.
When the laptop is connected to a switch or router, the Network Interface is up all the time because it is connected constantly.

So for testing, please disconnect and reconnect your device from the switch, and not your laptop.

When that all works well ( like it does here ), I don’t mind checking the laptop-to-device connection.

gsing wrote on Monday, August 28, 2017:

Hi Hein,

So for testing, please disconnect and reconnect your device from the switch, and not your laptop.

Yes, this test( disconnect and reconnect your device from the switch) have done, it’s OK. Thanks!

Hi.
I have a similar problem.
I work with STM32F207 MCU and the last version of FreeRTOS+TCP.
My PHY is Marvell’s switch 88E6071.
When I power up the board with eth cable connected (or if I connect it after powering up) - the ping works ok.
After disconnecting the cable, waiting for about 10-15 seconds and connecting it back, the ping does not return. Also, vApplicationIPNetworkEventHook_Multi() callback is called with eNetworkDown event and it doesn’t get eNetworkUp event any more.
What can I check to solve this problem?
Thanks.

Hi Michael,

Firstly: there will be a new Unified STM32 Network Interface which will work for all STM32xx parts.

xPhyCheckLinkStatus() returns true when the Link Status has changed, gone high or gone low.

xGetPhyLinkStatus() polls the current Link Status. This function may not be called to offten, because it might influence ongoing transmissions.

When the LS is going down, FreeRTOS_NetworkDown() will be called by the network driver.

These is no equivalent function like FreeRTOS_NetworkUp(). The IP-task will call the initialising function regularly, which returns pdPASS when everything is OK:

    if( pxInterface->pfInitialise( pxInterface ) == pdPASS )
    {
        pxInterface->bits.bInterfaceUp = pdTRUE_UNSIGNED;
    }

Would it be possible for you to add some logging? Most importantly when the Link Status changes, when FreeRTOS_NetworkDown() is called and when xxx_NetworkInterfaceInitialise() returns a value different from the last time, like this:

    {
        static BaseType_t xLast = 1;
        if( xLast != xResult )
        {
            xLast = xResult;
            FreeRTOS_printf( ( "xxx_NetworkInterfaceInitialise returns %d\n", ( int ) xResult ) );
        }
    }

I would like to this test myself, but I am lacking time.

Hi @htibosch
Please see the debug output of +TCP:

.FreeRTOS_AddEndPoint: MAC: 48-5c IPv4: ac1601e9ip
FreeRTOS_NetworkDown is called
xPhyReset: phyBMCR_RESET 0 ready
+TCP: advertise: 0101 config 3100
prvEthernetUpdateConfig: LS mask 00 Force 1
xSTM32F_NetworkInterfaceInitialise returns 0
xPhyCheckLinkStatus: PHY LS now 01
prvEthernetUpdateConfig: LS mask 01 Force 0
Network buffers: 59 lowest 59
Heap: current 6464 lowest 6464
Network buffers: 57 lowest 57
Heap: current 3360 lowest 3360
FreeRTOS_NetworkDown is called
Link Status is high
xSTM32F_NetworkInterfaceInitialise returns 1
Heap: current 1728 lowest 1728
Network buffers: 56 lowest 55
Heap: current 176 lowest 176
xPhyCheckLinkStatus: PHY LS now 00
prvEthernetUpdateConfig: LS mask 00 Force 0
FreeRTOS_NetworkDown is called
xSTM32F_NetworkInterfaceInitialise returns 0
FreeRTOS_NetworkDown is called
FreeRTOS_NetworkDown is called
FreeRTOS_NetworkDown is called
FreeRTOS_NetworkDown is called
FreeRTOS_NetworkDown is called

At first I power up the board with the cable connected. When I disconnect the cable “xPhyCheckLinkStatus: PHY LS now 00” appears, then “FreeRTOS_NetworkDown is called” appears repeatedly even after I plug the cable back.

I see that the EMAChandler task in STM32H network interface handles the link up status here: FreeRTOS-Plus-TCP/source/portable/NetworkInterface/STM32Hxx/NetworkInterface.c at d70107967a14094994aecef68889b26e196ee6e0 · FreeRTOS/FreeRTOS-Plus-TCP · GitHub, but STM32F series doesn’t FreeRTOS-Plus-TCP/source/portable/NetworkInterface/STM32Fxx/NetworkInterface.c at d70107967a14094994aecef68889b26e196ee6e0 · FreeRTOS/FreeRTOS-Plus-TCP · GitHub

Can you give that a try?

(hope I understood your instruction correctly)

I added the “… LS has changed” printout to

if( xPhyCheckLinkStatus( &xPhyObject, xResult ) != 0 )
        {
            /* Something has changed to a Link Status, need re-check. */
            FreeRTOS_printf( ( "%s LS has changed\n", __func__ ) );
            prvEthernetUpdateConfig( pdFALSE );

            #if ( ipconfigSUPPORT_NETWORK_DOWN_EVENT != 0 )
            {
                if( xGetPhyLinkStatus( pxMyInterface ) == pdFALSE )
                {
                    FreeRTOS_NetworkDown( pxMyInterface );
                }
            }
            #endif /* ( ipconfigSUPPORT_NETWORK_DOWN_EVENT != 0 ) */
        }

and I get the following output:

FreeRTOS_AddEndPoint: MAC: 48-5c IPv4: ac1601e9ip
FreeRTOS_NetworkDown is called
xPhyReset: phyBMCR_RESET 0 ready
+TCP: advertise: 0101 config 3100
prvEthernetUpdateConfig: LS mask 00 Force 1
xSTM32F_NetworkInterfaceInitialise returns 0
xPhyCheckLinkStatus: PHY LS now 01
prvEMACHandlerTask LS has changed
prvEthernetUpdateConfig: LS mask 01 Force 0
Network buffers: 59 lowest 59
Heap: current 6464 lowest 6464
Network buffers: 58 lowest 58
Heap: current 4912 lowest 4912
FreeRTOS_NetworkDown is called
Link Status is high
xSTM32F_NetworkInterfaceInitialise returns 1
Network buffers: 57 lowest 57
Heap: current 3280 lowest 3280
Network buffers: 55 lowest 54
Heap: current 176 lowest 176
xPhyCheckLinkStatus: PHY LS now 00
prvEMACHandlerTask LS has changed
prvEthernetUpdateConfig: LS mask 00 Force 0
FreeRTOS_NetworkDown is called
xSTM32F_NetworkInterfaceInitialise returns 0
FreeRTOS_NetworkDown is called
FreeRTOS_NetworkDown is called
FreeRTOS_NetworkDown is called
FreeRTOS_NetworkDown is called
FreeRTOS_NetworkDown is called
FreeRTOS_NetworkDown is called
FreeRTOS_NetworkDown is called
FreeRTOS_NetworkDown is called

@michaelv Are you using a custom STM32 board?

The issue is not getting reproduced on STM32F4 NUCLEO board. [NUCLEO-F439ZI]
Verified using the current main branch of +TCP. This board uses lan8742a as its PHY.

The debug logs looks like this:


.
.
.

dns query6: 'google.nl' = 2404:6800:4007:829::2003 rc = 0
Server task now ready.
pxEasyFit: ARP 192.168.0.1 -> 192.168.0.102
ipARP_REQUEST from 192.168.0.1 to 192.168.0.102 end-point 192.168.0.102


/********** UNPLUGGED ETHERNET CABLE **********/

xPhyCheckLinkStatus: PHY LS now 00
prvEthernetUpdateConfig: LS mask 00 Force 0
vIPSetDHCP_RATimerEnableState: Off
prvCloseDHCPSocket[44-46]: closed, user count 0
vIPSetDHCP_RATimerEnableState: Off
prvCloseDHCPSocket[44-46]: closed, user count 0
vIPSetDHCP_RATimerEnableState: Off

.
.
.

/********** PLUGGED ETHERNET CABLE BACK **********/

Link Status is high
vDHCPProcessEndPoint: enter 0
DHCP-socket[44-46]: DHCP Socket Create
prvCreateDHCPSocket[44-46]: open, user count 1
prvInitialiseDHCP: start after 250 ticks
vDHCP_RATimerReload: 250
vDHCPProcessEndPoint: exit 1
RA: source fe80::7009
vRAProcess: Router Solicitation, attempt 1/3
vRAProcess( 1, 2001:470:ed44::a90f:aeca:eb09:e37f) bRouterReplied=0 bIPAddressInUse=0 state 6 -> 1
RA: Reload 10 seconds
vDHCP_RATimerReload: 10000
IP-address : fe80::7009
End-point  : up = yes method static
Prefix     : fe80::/10
GW         : ::
DNS-0      : ::
DNS-1      : ::
MAC address: 00-11-22-33-44-46

vDHCPProcessEndPoint: enter 1
vDHCPProcess: discover
vDHCPProcessEndPoint: exit 2
vDHCPProcessEndPoint: enter 2
Autonego ready: 00000004: full duplex 100 mbit high status
vDHCPProcess: offer 192.168.0.102 for MAC address 44-46
vDHCPProcess: reply 192.168.0.102
vDHCPProcessEndPoint: exit 3
pxEasyFit: ARP 192.168.0.1 -> 192.168.0.102
ipARP_REQUEST from 192.168.0.1 to 192.168.0.102 end-point 0.0.0.0
vDHCPProcessEndPoint: enter 3
vDHCPProcess: offer 192.168.0.102 for MAC address 44-46
vDHCPProcess: acked 192.168.0.102
IP-address : 192.168.0.102
Default IP : 192.168.2.114
End-point  : up = yes method DHCP
Net mask   : 255.255.255.0
GW         : 192.168.0.1
DNS-0      : 192.168.0.1
DNS-1      : 0.0.0.0
Broadcast  : 192.168.0.255
MAC address: 00-11-22-33-44-46


Yes, my board is custom. It has STM32F207 MCU and Marvell’s PHY switch 88E6071.
And I use static IP.

Any suggestions anyone?