CONNECTION_LOST after device receive "delta" MQTT message

Hello,
We have a product based on STM32H563ZI and is connected to AWS IOT Core based on the project example in ST X-CUBE-AWS-H5 V1.0.
The connection to the ethernet is based on a physical cable connection (Not a WiFi).
We have the devices installed on several different locations and in one location we start having the following phenomena:

  • The device is connected successfully after reset.
  • Outgoing MQTT publications (Shadow reported updates) are working as expected and received on AWS IOT Core successfully.
  • MQTT Keep alive messages seems to be received by IOT Core as when I’m subscribing to $aws/events/presence/+/ I do not get any disconnection reports as well as on the device terminal side there are no error messages.
  • Sometimes, not always, when I make a “Desired” change in the shadow on the cloud side that followed by a publish of a “delta” MQTT message from IOT Core downloaded to the device I immediately receive on the MQTT test client on IOT Core console a disconnect message with disconnectReason “CONNECTION_LOST”:

{
“clientId”: “ThingName”,
“timestamp”: 1730357192653,
“eventType”: “disconnected”,
“clientInitiatedDisconnect”: false,
“sessionIdentifier”: “XYZ”,
“principalIdentifier”: “ABC”,
“disconnectReason”: “CONNECTION_LOST”,
“versionNumber”: 10
}

  • The device immediately tries to reconnect the MQTT session and succeed but the delta message from the shadow that was triggered and causes this disconnect event is not handled any more and actually the desired operation is not executed.

The above behavior causes the system to malfunction.

Since this is something that happens not on every location as well as not all the time I’m puzzled and need help.

Please advise.

Thanks,
Eyal

Hello @EyalG,

Based on the AWS IoT documentation, the CONNECTION_LOST event suggests a possible network issue.

To help diagnose the problem, could you please provide some additional information? I currently suspect that the connection had lost before this issue happened.

  1. How many devices are in the affected location? Could high device density be impacting network performance?
  2. Is it possible to check the local network environment, including firewall settings and overall performance?
  3. Are keep-alive messages consistently sent and received under normal conditions?

Thank you.

Hi and thank you for your reply.

  1. I have two devices in this location (The development center)
  2. The local network seems to work as usual, no firewall issues, and any how, the device is basically working, outgoing messages are transmitted to IOT Core and update the shadow as needed.
  3. I set the KA to 30 seconds and it s look like it is being sent, how can I see that this is indeed the case on the console side?
    Thanks!

Thank for your information!

To better understand the issue, could you please capture network packets using Wireshark or a similar tool? This would help us verify if KA packets are being sent correctly. Would you be able to check both device logs and network captures?

Thank you,

Does it mean that the device does not correctly syncs its state with the cloud as described here -Using shadows in devices - AWS IoT Core? Are you using persistent session?

Hello,
After I disconnected one of my two connected devices the one that was left connected is now working as expected for a long time without those disconnect messages I saw before.
When both devices was connected together and the disconnection problem occurred I tried to monitor the network with Wireshark but as the connection is TLS I can’t see the content of the messages.
Any idea how can I monitor it?
Also, what can be the problem when there are two devices on the local network that cause the disconnection?
Thanks,
Eyal

Are you using different client IDs for both the devices?

You mean ThingName? Yes, of course

What I see is that after the connection is broke with the message:

Call to receiveSingleIteration failed. Status=MQTTRecvFailed

I see that the device tries and succeed to reconnect and I get this messages:

Resuming persistent MQTT Session

But if the shadow have a delta part in it already from the previous session it is not handle.
As I see it, this is a problem from a second order as the main problem is why I get those disconnection in the first place.

Please note also that I have other devices in other locations (That is, different network connection) that work fine while all those devices are based on the same SW and was provisioned in the same way (Based on ST STM32CubeExpansion_Cloud_AWS_H5_V1.0.0 package)

Can you check the AWS IoT CW logs for disconnect reason? Also, can you share the network traffic that you captured?

Did you check the links that I shared before for syncing state?