For a long time, I have been experiencing a rare problem after the connection on the external 4g channel (MCU - Ethernet - Router-4g usb modem) was lost, communication with the broker was not restored. I couldn’t reproduce the problem at all. Today I managed to connect remotely to the device and use the add vTCPNetStat() function;
I saw this result:
TCP 46129 c0a865edip :62801 1/1 eESTABLISHED 5993 14359
TCP 25303 6dfe1f2eip : 1885 0/1 eCLOSE_WAIT 999999 0
the socket can hang indefinitely in this state and the MQTT cycle is stopped. Can you tell me how to deal with this problem? My code is based on the MqttAgent example code.
Along with enabling ipconfigTCP_HANG_PROTECTION, please also set the FREERTOS_SO_RCVTIMEO option on your MQTT socket.
This ensures that FreeRTOS_recv() does not block indefinitely. Without a receive timeout, the MQTT task can remain stuck waiting for data even after the stack has already closed a hung socket, which prevents the reconnection logic from executing.
Thank you for reply. I understand that the problem occurs on the 4g side of the carrier. I couldn’t reproduce it on the test bench in any way. My router has the ability to point filter packets by flags. Can you tell me how i can repeat a similar situation by dropping packets with certain flags?
I have these options installed, and the problem occurs when they are installed. Here is a piece of code, and as I understand it, there is a stop somewhere inside the MQTTAgent_CommandLoop(). Since the task is in a blocked state
since the function does not exit with an error, the function xNetworkResult = prvSocketDisconnect( &xNetworkContext ) is not called, which would close the connection. and the socket hang in eCloseWait
MQTT Agent task can be in the blocked state if there is no MQTT communication. It should periodically wake up to send keep alive messages. One possibility is that your network interface implementation is not returning error even after the connection is broken. If that is the case, the agent will detect a broken connection only after the keep alive timeout - coreMQTT/source/include/core_mqtt_serializer.h at main · FreeRTOS/coreMQTT · GitHub. Would you please try to reduce the keep alive value?
and accordingly, in the logs, I see the ping command executed every 30 seconds
[13:30:41:659] < [DEBUG] [MQTT] [MQTT_Ping:3087] MQTT PINGREQ packet size is 2.
[13:30:41:659] < [DEBUG] [MQTT] [Plaintext_FreeRTOS_send:137] FTOS Sen state:2
[13:30:41:668] < [DEBUG] [MQTT] [sendBuffer:905] sendBuffer: Bytes Sent=2, Bytes Remaining=0
[13:30:41:668] < [DEBUG] [MQTT] [MQTT_Ping:3136] Sent 2 bytes of PINGREQ packet.
[13:30:41:721] < [DEBUG] [MQTT] [Agent_GetCommand:115] Agent pool get space 1
[13:30:41:721] < [DEBUG] [MQTT] [Agent_GetCommand:120] mqtt agent command 2
[13:30:41:721] < [DEBUG] [MQAG] [processCommand:603] com agent command:1
[13:30:41:721] < [DEBUG] [MQAG] [processCommand:630] call conclude
[13:30:41:721] < [DEBUG] [MQTT] [Agent_ReleaseCommand:138] Agent pool release space 0
[13:30:41:721] < [DEBUG] [MQTT] [Agent_ReleaseCommand:143] Returned Command Context 6 to pool
[13:30:41:721] < [DEBUG] [MQTT] [remainingLengthEncodedSize:479] Encoded size for length 0 is 1 bytes.
[13:30:41:721] < [DEBUG] [MQTT] [handleIncomingAck:1644] Received packet of type d0.
when the code is running abnormally, I don’t see any topics in the MQTT and MQAG logs. Unfortunately, this error occurs very rarely, and there is no way to write a log constantly. However, I have the ability to connect through CLI and view certain statuses. For example, the function MQTT_CheckConnectStatus(&xGlobalMqttAgentContext.mqttContext); returns the value MQTTStatusConnected. Can you suggest a more accurate way to implement the idea function through CLI to determine the stop location of the MQTT LOOP? However, when the connection is restored on the device, the mqttloop function does not restore the connection and command processing, and only resetting the device solves the problem.
Unfortunately, no. These devices are located on a remote object, and the error may occur a couple of times a month. But I can sometimes connect via cli and run certain functions. as an option, I can add a variable to the mqtt loop and change it to see at what value the variable will stop
Not sure how useful would that be. How about adding logs at points which can guide us where the Agent task is stuck? I suspect it might be the network interface implementation.