TCP + IP, Lost data

daerid · April 19, 2021, 8:19am

Hi there,

I am using TCP + IP interface in my multi thread application. It’s working well except one time, when I am waiting for an acknowledgement from the server. The message only contains 1 byte.

I have timeout, so my network task is in blocked state and waiting for any message, but I don’t get any. I have some short atomic functions which is used by other threads. Is it possible, those atomic functions cause this issue?

Looking forward for your reply,
Kind regards,
David

hs2 · April 19, 2021, 8:42am

I don’t think that some other atomic functions (I guess some short pieces of code in critical sections) cause this.
Are you sure (Wireshark ?) that the message was sent in time by the peer ? Did you set TCP_NODELAY socket option on the peer to disable Nagle’s algorithm ?

daerid · April 19, 2021, 9:03am

Hi,

I’ve not seen the message in wireshark, but I do not see any of them after get connected. I’m connected to the same hub, but I only see the ARP messages. Actually, I’ve tried to configure the socket to this communication, so to avoid message losing, I’m using a much higher value to get timeout, than the interval of acknowledge.

The server part wasnot written by me, but we have different devices, which communicate with server correctly.

Regards,

RAc · April 19, 2021, 9:10am

That means it is not a hub but a switch. You need to get a very dumb hub such as an old blue netgear box to plug in between your target and the machine running wireshark. Don’t throw the old stuff away!

hs2 · April 19, 2021, 9:33am

Ok - if other devices using the same protocol (with 1 byte ACK message) are fine, it could be an issue with your application and/or the FreeRTOS TCP stack configuration.
You’re using FreeRTOS+TCP stack ? What’s your MCU ?
Is the TCP connection still there after sending the ACK msg ? If not which peer closes the connection ?

daerid · April 19, 2021, 10:55am

Unfortunately, I do not have hub, It’s true it is switch my baad, thanks your response. I’ve learned new today!

daerid · April 19, 2021, 11:01am

Yes, I’m using FreeRTOS+TCP stack.

My MCU is EK-RA6M3 MCU from renesas, it has Arm Cortex M4 cpu.

I can see the message was sent from the server, and the connection is still alive after the ACK msg.
The connection will be closed by server, after I couldn’t recognise 3 of ACK msg. (It’s around 3 mins.)

RAc · April 19, 2021, 11:06am

ok, what you can try to do then IF you happen to have a 2nd network card on your PC is let that network card act as a switch on which only your target is plugged in (P2P) and add the necessary routing info on your machine. That way you can wireshark that interface.

Some switches also keep different logical segments for different negotiated speeds, so something that might help in seeing all traffic is nailing the ethernet interface’s speed to the one that matches your logging PC instead of allowing auto negotation (or vice versa nail the PCs speed to match your target’s).

Being able to analyze network traffic is absolutely crucial to network debugging, so I’d rather go through some trouble setting up a reliable way to sniff the traffic than guessing…

hs2 · April 19, 2021, 12:00pm

That’s strange … is FreeRTOS_recv waiting forever for the 1 byte ACK packet or do you encounter a socket error ?
Are you able to recv other/larger data packets ?
Do you’ve access to the server log ? Is there a socket error on the server after sending the ACK packet (like ECONNABORTED) ? If not, the packet was sent and TCP-acknowledged by the board successfully.

daerid · April 19, 2021, 12:12pm

The ACK msg should be received in every minutes, if you don’t get any of other requests from the server. I’ve no lost any of longer messages, and sometimes a can catch the ACK msg too. I though that, if my socket has 70 seconds of timeout counter, it would be fit well, but it’s not. I’ve got 2 times 0 length message, and after that I got an error -128 which means, The server get disconnected.

RAc · April 19, 2021, 12:14pm

What you should do is run ICMP requests forever in parallel, ie

ping -t [your device].

It’s a good diagnostic, because when you lose your ICMP responses when the ACKs are dropped, you know it’s rather a general problem with the stack than with the particular connection.

daerid · April 19, 2021, 12:47pm

I’ve started to ping the device.

It should be error of stack, because I don’t even get any of ICMP responses neither after some mins.

hs2 · April 19, 2021, 12:52pm

Could also be a config error. Did you enable ipconfigREPLY_TO_INCOMING_PINGS ?

Edit: Do you get PING REPLYs at the beginning but not later on ?

RAc · April 19, 2021, 12:56pm

ok, then it may be the case that the TCP/IP task is hung or deadlocked.

I’m not familiar with FreeRTOS’s TCP stack, but the basic strategy is the same for how I go about this with other stacks like lwip: Add counters to all functions that process incoming data. For example, every time an ethernet Rx frame is processed, increment a counter. Every time a packet is processed by the IP layer, increment another counter. Same for TCP etc. Same the other way arounf (that is, transmit). If your IDE allows you to watch the values of those counters in real time, you can visually follow where packets cease to be forwarded to the next layers.

rtel · April 19, 2021, 2:51pm

This thread is becoming quite long and is mainly folks replying by asking questions. That is because your original post does not contain enough information for people to start to know what the issue might be. Grateful if in future you could start by providing more details (for example, in the above, what is the relevance of the atomic functions? What do they look like (code)? Also, what does “working well” mean - it could mean you are able to get a DHCP address or it could mean it runs for hours without any problem). Also it is best to start this kind of thread with a wireshark trace. If you do this the replies will be a lot more targeted and your issue will get solved faster with less effort.

daerid · April 19, 2021, 2:53pm

Yes, It is really strange. There is no fix command, which cause this stop. In this case, the task is stay in FreeRTOS_recv function, but I don’t get any of message from the server. I’m sure it’s something like timing or stack issue. After I’ve shutted down the previous socket and create new one it’s working again for a short time.

I’m trying to catch stack overflow, but I’ve not seen something like that however the configCHECK_FOR_STACK_OVERFLOW is defined to 1.

hs2 · April 19, 2021, 3:19pm

Hmm … I’m not so sure b/c the stack works very well in many, many applications (if configured right). The ethernet driver might also have issues, which is the custom/user provided part of the stack.

Did you verify that you do not run out of memory / network buffers and which buffer allocation scheme do you use for TCP stack ?
I guess you did define configASSERT.
When in doubt I’d propose to set configCHECK_FOR_STACK_OVERFLOW to 2.

And … which FreeRTOS and TCP stack versions do you use ?

RAc · April 19, 2021, 3:25pm

If it’s a recv with timeout, as you wrote earlier, then if the code never returns from your receive, either the freertos timer task or the tcp task is deadlocked. Do other software timers still run in the error case?

kanherea · April 20, 2021, 1:53am

Does your board happen to have debugging capabilities? If so, and if you can control the server, you can actually track the packet through the TCP stack. And route cause why and when the packet was discarded.
Since you are not doing something very out of the ordinary with packets (such as modifying flags etc), the +TCP stack should be able to handle this quite easily and gracefully if configured correctly.

daerid · April 20, 2021, 7:47am

Thank you so much for your replies, I’ve learn a lot of new things, what I could check, and what’s the image of these all.

It’s working as it’s expected. The Critical section caused my problem.
I have a big data structure which is stored, and to write and read I’ve used Critical_section, which was disabled my interrupts.

I guess the interrupts caused deadlock of my stack.

I’m appreciate your kindness and help.
Regards,
David