MQTT Agent fails with "no space"

We are in the process of updating MQTT library from AWS IoT version to CoreMQTT & MQTT Agent.
It appears to work for the most of the part. However, we are always seeing this error if I run it long enough(roughly after 4 hours while sending about 6 MQTT messages every second to the broker).
Some logs are specific to our application but much of it comes from FreeRTOS libraries.
MQTT Agent and IP-task logs are 100% generic from FreeRTOS libraries.

  1. Once this error occurs, TCP socket connection keeps going back and forth between ESTABLISHED and CLOSED; as if IP-task and MQTT Agent were out of sync somehow?
  2. The whole error condition appears to be caused initially by “no space” error while FreeRTOS_send() is being executed.
  3. Then there is a spewing of [IP-task] TCP: No active socket on port 55983 (127.255.255.255:1883)

It would great if I could get some help on this.

910 15571379 1000000 [MQTT Agent] FreeRTOS_send: 55983 -> 127.255.255.255:1883: no space
911 15571379 999911 [MQTT Agent] Unable to send packet: Timed out in transport send.
912 15571379 999826 [MQTT Agent] Transport send failed for PUBLISH header.
913 15571379 999751 [MQTT Agent] MQTT PUBLISH failed with status MQTTSendFailed.
914 15571380 999670 [MQTT Agent] MQTT operation failed with status MQTTSendFailed
915 15571380 999588 [MQTT Agent] MQTT agent command loop exited(status = 3)
916 15571380 999512 [MQTT Agent] Disconnect socket
917 15571380 999461 [MQTT Agent] Disconnecting TCP connection.
918 15573631 1000000 [IP-task] FreeRTOS_closesocket[55983 to 127.255.255.255:1883]: buffers 60 socks 0
919 15573632 999897 [MQTT Agent] MQTT connection disconnected due to an error
920 15573632 999819 [MQTT Agent] Wait for MQTT_CONNECTED event
921 15573889 1000000 [MQTT] Notify->DISCONNECTED_BAD_PACKET
922 15573892 1000000 [MQTT] Try MQTT socket connect
923 15573892 999948 [MQTT] FreeRTOS_connect: 18098 to 127.255.255.255:1883
924 15573893 999873 [MQTT] Socket 18098 -> 127.255.25928 15574721 1000000 [NETIF_TASK] NetworkInterface: frame error
929 15574721 999935 [NETIF_TASK] Error Stats:
930 15574721 999888 [NETIF_TASK]   statsRxLenGreaterErr 0
931 15574722 999829 [NETIF_TASK]   statsRxAlignErr      0
932 15574722 999770 [NETIF_TASK]   statsRxFcsErr        1
933 15574722 999711 [NETIF_TASK]   statsRxOverRunErr    1
934 15574722 999652 [NETIF_TASK]   statsRxTruncateErr   0
935 15574723 999593 [NETIF_TASK] NetworkInterface: frame error
936 15574723 999529 [NETIF_TASK] Error Stats:
937 15574723 999482 [NETIF_TASK]   statsRxLenGreaterErr 0
938 15574723 999423 [NETIF_TASK]   statsRxAlignErr      0
939 15574723 999364 [NETIF_TASK]   statsRxFcsErr        2
940 15574724 999305 [NETIF_TASK]   statsRxOverRunErr    2
941 15574724 999246 [NETIF_TASK]   statsRxTruncateErr   0
942 15574724 999187 [IP-task] TCP: No active socket on port 55983 (127.255.255.255:1883)
943 15574725 999098 [IP-task] TCP: No active socket on port 55983 (127.255.255.255:1883)
944 15574725 999009 [IP-task] TCP: No active socket on port 55983 (127.255.255.255:1883)
945 15574725 998920 [IP-task] TCP: No active socket on port 55983 (127.255.255.255:1883)
946 15574726 998831 [IP-task] TCP: No active socket on port 55983 (127.255.255.255:1883)
947 15574726 998742 [IP-task] TCP: No active socket on port 55983 (127.255.255.255:1883)
948 15574727 998653 [IP-task] TCP: No active socket on port 55983 (127.255.255.255:1883)
949 15574727 998564 [IP-task] TCP: No active socket on port 55983 (127.255.255.255:1883)
950 15574727 998475 [IP-task] TCP: No active socket on port 55983 (127.255.255.255:1883)
1066 15574900 989585 [IP-task] TCP: No active socket on port 55983 (127.255.255.255:1883)
1067 15574901 989495 [IP-task] TCP: No active socket on port 55983 (127.255.255.255:1883)
1068 15574901 989405 [IP-task] TCP: No active socket on port 55983 (127.255.255.255:1883)
1069 15574901 989315 [IP-task] TCP: No active socket on port 55983 (127.255.255.255:1883)
1070 15574902 989225 [IP-task] TCP: No active socket on port 55983 (127.255.255.255:1883)
1071 15574902 989135 [IP-task] TCP: No active socket on port 55983 (127.255.255.255:1883)
1072 15574903 989045 [IP-task] TCP: No active socket on port 55983 (127.255.255.255:1883)
1073 15574903 988955 [IP-task] TCP: No active socket on port 55983 (127.255.255.255:1883)
1074 15574903 988865 [IP-task] TCP: No active socket on port 55983 (127.255.255.255:1883)
1075 15574904 988775 [IP-task] TCP: No active socket on port 55983 (127.255.255.255:1883)
1076 15574904 988685 [IP-task] TCP: No active socket on port 55983 (127.255.255.255:1883)
1077 15574904 988595 [IP-task] TCP: No active socket on port 55983 (127.255.255.255:1883)
1078 15574905 988505 [IP-task] TCP: No active socket on port 55983 (127.255.255.255:1883)
1079 15574905 988415 [IP-task] MSS change 1160 -> 1460
1080 15574906 988360 [IP-task] TCP: active 18098 => 127.255.255.255:1883 set ESTAB (scaling 1)
1081 15574906 988265 [IP-task] Socket 18098 -> 127.255.255.255:1883 State eCONNECT_SYN->eESTABLISHED
1082 15574918 988264 [MQTT] Try MQTT broker connect
1083 15575435 993712 [IP-task] TCP: No active socket on port 55983 (127.255.255.255:1883)
1084 15575475 994122 [MQTT] Send MQTT_CONNECTED event
1085 15575475 994068 [MQTT Agent] Received MQTT_CONNECTED event
1086 15575475 994004 [MQTT Agent] Service command loop
1087 15575480 993949 [MQTT] Connected to broker
1088 15575483 994001 [MQTT] Subscribing to Cmd/+/CabController/SetTime
1089 15575946 998130 [IP-task] TCP: RST received from 127.255.255.255:1883 for 18098
1090 15575946 998045 [IP-task] Socket 18098 -> 127.255.255.255:1883 State eESTABLISHED->eCLOSED
1091 15575947 997949 [MQTT Agent] Transport send failed. Error code=-128.
1092 15575947 997875 [MQTT Agent] Transport send failed for PUBLISH header.
1093 15575948 997799 [MQTT Agent] MQTT PUBLISH failed with status MQTTSendFailed.
1094 15575948 997717 [MQTT Agent] MQTT operation failed with status MQTTSendFailed
1095 15575948 997634 [MQTT Agent] MQTT agent command loop exited(status = 3)
1096 15575948 997557 [MQTT Agent] Disconnect socket
1097 15575948 997505 [MQTT Agent] Disconnecting TCP connection.
1098 15575949 997441 [IP-task] FreeRTOS_closesocket[18098 to 127.255.255.255:1883]: buffers 60 socks 0

Hello @jamesk,

The address which is being connected to seems like a broadcast address to me. It is 127.255.255.255? Can you confirm that this is the address to which you want to connect to?

Additionally, the spewing of TCP that you ask about TCP: No active socket on port 55983 (127.255.255.255:1883), it seems that the server is sending data for a port 55983 which doesn’t exist anymore since it was disconnected (see log 916-918).

The no space left error occurs when the transmit buffers of the socket are full. Can you share the value of ipconfigTCP_TX_BUFFER_LENGTH which sets the buffer length at the creation of the socket?
Meanwhile, you can also use FreeRTOS_setsockopt with the option FREERTOS_SO_SNDBUF and try to increase the size of the send stream just after socket creation. Alternatively, you can also try modifying (increasing) the value of ipconfigTCP_TX_BUFFER_LENGTH and see if that helps.

Regards,
Aniruddha

Dear Aniruddha,
Thanks for your response.

  1. The address being connected to is 169.x.x.x address which is within our local network.
    The 127.x.x.x being printed out in the log is purely what FreeRTOS MQTT library does. I am guessing it was determined to be useful to log that?

  2. So TCP: No active socket on port 55983 (127.255.255.255:1883) is expected and normal once disconnect occurs?

  3. Here is #define for TCP Socket Tx buffer length:
    #define ipconfigTCP_TX_BUFFER_LENGTH ( 1000 )

I will try your recommendations to see if that helps.

However, do you have a theory as to why “no space” error from TCP socket occurs when we are simply sending about 6 MQTT message every second repeatedly?
Increasing the buffer size without understanding why the error occurred may mask the issue for a longer time but it may occur again eventually if we run it long enough.

Other factors are the size of the messages and the rate at which the client is receiving ACKs for those messages. Note that a packet must stay in the TX buffer until it is ACK’d. For example, if each publish is 1000 bytes i.e. ipconfigTCP_TX_BUFFER_LENGTH with MSS > 1000 and the time taken to ack the first message is greater than the send timeout, then it’s highly likely that while attempting to send the next message, you will get a timeout error. The agent will handle this by reconnecting. Therefore, another thing you can try doing is configuring the FREERTOS_SO_SNDTIMEO (i.e. send timeout) option with FreeRTOS_setsockopt based on the expected latency of your network.