TCP/IP: FreeRTOS_send drops packets

peku33 wrote on Wednesday, August 10, 2016:

I observed a strange behavior with TCP connection.

In my solution I use TCP sockets for simple client-server command-response protocol. The messages are pretty short (up to 256 bytes) and are sent once every second. At the beginning of application sockets are created and are kept open for transmission. Client (the PC) sends message every one second and FreeRTOS device sends response.

However, once every 10-70 transactions the message is not sent. What is really interesting, the FreeRTOS_send is called successfuly (and returns true), but this packet is not outputed to xNetworkInterfaceOutput. What is even more interesting is that this message is being sent later, if any other communication occurs on this socket. But without poke from other side of the socket packet is delayed forever.

I also did some wireshark analysis. The transmission looks ok. Each packet from PC has ACK from FreeRTOS. If FreeRTOS decides to push my packet out - transmission to PC is also successful - ACK is sent back.

So the packet is sometimes delayed somewhere in the middle between FreeRTOS_send and xNetworkInterfaceOutput.

I also observed that after increasing priority of ip task, this is much less likely to happen (once every thousands).

This is not related to memory exhaustion (memory hwm never passes 30%) but could be caused by timing and scheduling. I also tested with no application logic (just pure os + ip stack), but results are the same.

Some facts:
STM32F107 + ENC28J60
100Hz Ticks
TCP Windowing off
1 * mss tx & rx buffers
4 tx, 8 rx descriptors

Best regards,
Pawel Kubrak

rtel wrote on Thursday, August 11, 2016:

This is indeed strange, and it is not something I am aware of anybody
reporting before.

Two things come to mind that could potentially be an issue here,
although to be honest it would be unlikely:

  1. The relative priorities of the tasks using the TCP stack. There is a
    known issue with the FreeRTOS_accept() function if the priority of the
    task running the TCP/IP stack is lower than the priority of the task
    using the TCP/IP stack. It is unlikely this is the case here though -
    however as you mention raising the priority of the IP task improves the
    situation maybe it could be related. Do you have the priority of the IP
    task above the priority of any other task that is using the stack?

  2. Perhaps you are running out of buffers, or buffer descriptors.
    Although internally in the stack you should be protected from that
    causing an issue, perhaps this could be an issue in your application
    code? When you say it is not a memory exhaustion problem, have you also
    checked you are not running out of buffer descriptors - the number of
    which is fixed at compile time? See
    http://www.freertos.org/FreeRTOS-Plus/FreeRTOS_Plus_TCP/TCP_IP_Configuration.html#ipconfigNUM_NETWORK_BUFFER_DESCRIPTORS

Also, are you using an infinite timeout anywhere when using the TCP
stack? It is best to use a finite time out in case an operation cannot
complete.

Perhaps Hein will be able to provide more insight.

heinbali01 wrote on Thursday, August 11, 2016:

The problem is the clock rate: only a 1,000 Hz and higher will work properly with the current release.
This will be repaired in the next release.

For now you can replace 4 or 5 occurrences of pdMS_TO_TICKS() in FreeRTOS_TCP_IP.c with pdMS_TO_MIN_TICKS(), which is defined as:

#define pdMS_TO_MIN_TICKS( xTimeInMs ) ( pdMS_TO_TICKS( ( xTimeInMs ) ) < ( ( TickType_t ) 1 ) ? ( ( TickType_t ) 1 ) : pdMS_TO_TICKS( ( xTimeInMs ) ) )

With 100 Hz, small time-out values were rounded down to zero ticks. The above macro returns 1 or more. A time-out value of 1 is OK.

Sorry for the inconvenience :slight_smile:

kerub wrote on Monday, September 05, 2016:

It can be also an issue of ENC28J60, if you use automatic pading in the PHY.
You have to remember to set correct the conrtol bits (the first byte) for each frame in transmit buffer of ENC28J60.
Once I forgotten and the ENC28J60 didn’t sent some time to time frames smaller than 60 Bytes because of random values of the bits in the buffer.

Regards

Andrzej