MQTT reconnection problem

sriramchowdary1333 wrote on February 05, 2020:

Hi,
i'm using AWS FreeRTOS with stm controller and cc3135 wifi chipset. and i'm able to communicate with MQTT(publishing and subscribing). because of 30 seconds minimum keep alive time, i'm not able to detect the network loss immediately.

  1. How to know the network disconnection immediately?is there any call back function?
  2. Is there any Mqtt re connection method given in AWS FreeRTOS?
  3. i red in the documentation, ping request will be postpone if two way transaction happened(i.e., publishing and receiving), but here ping request is sending exactly at keep alive time intervals even though it is publishing and receiving.
  4. i set keep alive time to 30 seconds, now controller has connected to mqtt and subscribed to one topic. suppose it lost the internet connectivity at 10th second and gain at 15th second. if any message published to the same topic from console, the next ping request response is failing but if no message was given to the same topic from 10 to 30th second, the ping request response getting success. How to handle this to not loose the message.

thank you in advance

cobus-aws wrote on February 05, 2020:

Hi sriramchowdary1333,

From your question it seems like the real problem you are trying to solve is to rapidly detect if the connection is dead?

I do not think that MQTT Ping Requests or MQTT Keepalive is going to be effective to achive this goal of yours and we probably need to look more specifically at your problem to come up with a different solution.

To explain that statement, MQTT runs on top of TCP, and TCP connections have buffers and it does it's own keepalives and connection management on a lower level than MQTT. On the TCP side a connection can report that it is perfectly fine for more than 30seconds, so if MQTT is running on top of TCP it is not really possible to detect if the connection is dead faster than 30 seconds.

There are of course other things you could do, but they all have pros and cons.

Additionally if you set the MQTT keepalive time to a very short period a large amount of traffic will be generated which will likely be associated with large costs.

Another problem is that network latency can vary a lot, TCP has retries built-in and packets have to be delivered in order, this can cause delays in the arrival of your ping responses, which in turn will cause the connection to be considered dead even when it is perfectly fine. Re-establishing a TLS connection typically takes in the order of 10s, so very likely restarting the connection can be a worse option than just waiting a little bit longer.

I would suggest you take a look at this for more information on how TCP connections can cause you trouble here https://blog.stephencleary.com/2009/05/detection-of-half-open-dropped.html

If you can provide more information about your specific use case we can try to help you brainstorm ways to achieve your goals, but I am not convinced that MQTT keepalive or MQTT pin requests are going to be your best solution.

sriramchowdary1333 wrote on February 06, 2020:

Hi,
thanks for the information.
my application is simple as the controller connects and subscribes to a topic on MQTT broker on the device power up, later based on the commands given to the topic my application responds accordingly.

here one thing i observed that, when the device is publishing the messages continuously(for every 1 sec), if network loss happened in this time, on the aws console it showing immediately device disconnected with reason connection lost (without considering the keep alive time out).
“clientInitiatedDisconnect”: false
“disconnectReason”: “CONNECTION_LOST”

as i mentioned in the previous message point number 4, How can i handle that situation in order to not lost the message?

kindly let me know if any inputs needed.

cobus-aws wrote on February 06, 2020:

In MQTT this problem is solved by quality of service. If you subscribe to the topic as QOS1 (“At least once delivery”) the publish will be delivered to your device when you reconnect later.

It sounds like this option in the MQTT standard best meets your requirement to not lose any messages in between connection failures.

Another aspect of MQTT that may be of use to you (depending on what you are doing) is to use the Last Will and Testament option. If you enable this you can tell the broker to publish a message to all other subscribers when a connection to your client is lost.

sriramchowdary1333 wrote on February 11, 2020:

we are facing an issue when we connect to the AWS end point below is the issue description
“Failed to resolve au9b75i3zpxa8-ats.iot.us-east-1.amazonaws.com

qiutongs wrote on February 11, 2020:

That indicates DNS resolution failure for the end point. Does it happen just once, intermittently or always?

sriramchowdary1333 wrote on February 18, 2020:

Thanks for your responses,

As we asked in our previous questions, we are currently stuck with two problems,

  1. Wifi reconnection when a hotspot(AP) is disconnected in the run time.
  2. AWS mqtt reconnection when there is a slow network or network is completely disconnected?

To solve the above two problems we need some inputs
1.How to disconnect the Mqtt connection in the run time when required and reconnect back to AWS as required.
2.What are the api’s available from the AWS sdk to solve the problem? Is there any specific callback the application should register to , which will be invoked when there is a connection loss?

3.How the application knows when the hot spot is turned off?

4.Since we integrated the TI CC3135 SDK to the AWS SDK, is there any function we need to implement so that the AWS SDK knows when the WIFI is disconnected?

5.When the wifi is turned OFF, if i try to call “IotMqtt_Disconnect()” it gets stuck in _IotMqtt_DecrementConnectionReferences(). How to solve this issue?

aws-archit wrote on February 18, 2020:

Here are responses to your questions:

  1. & 2. IotMqtt_Disconnect can be called to disconnect an MQTT connection at run-time, similarly, IotMqtt_Connect can be called to re-establish the connection at runtime.
    The MQTT stack supports a disconnect callback which is invoked on disconnection, there is a disconnect. Refer here: https://github.com/aws/amazon-freertos/blob/201912.00/libraries/c_sdk/standard/mqtt/include/types/iot_mqtt_types.h#L986). You You can implement a re-connection logic in the disconnect callback and provide it as a parameter in the call to IotMqtt_Connect. However, as @cobus-aws had pointed out, there is no way for MQTT to be aware of intermediate lost and re-gained network connections in the span of the keep-alive intervals. As suggested, QoS 1 would be the answer to ensure that messages have been received and delivered between broker and client.

  2. & 4. For detecting changes in WiFi network states, the WiFi API supports registration of a callback ( Refer here: https://github.com/aws/amazon-freertos/blob/201912.00/libraries/abstractions/wifi/include/iot_wifi.h#L535). This callback could be used for detecting WiFi network disconnects. However, currently, this API is only supported on the ESP32 board. For your use with the TI chipset, you would have to port the WiFi APIs that can be implemented with calls to the TI CC3135 SDK.

  3. Can you provide more information on where you are getting stuck within _IotMqtt_DecrementConnectionReferences? In this scenario of the WiFi connection being lost/turned off, calling IotMqtt_Disconnect will attempt to make futile effort in sending an MQTT DISCONNECT packet out on the network, and depending on the network stack being used, the “send()” (Refer here: https://github.com/aws/amazon-freertos/blob/201912.00/libraries/c_sdk/standard/mqtt/src/iot_mqtt_operation.c#L879) could be a blocking call. There is a way to make IotMqtt_Disconnect disconnect without sending out a packet on the network. It is done by passing the IOT_MQTT_FLAG_CLEANUP_ONLY flag to the function ( Refer here: https://github.com/aws/amazon-freertos/blob/master/libraries/c_sdk/standard/mqtt/include/types/iot_mqtt_types.h#L1082-L1089 )