mbedTLS connection dropped on MQTT AWS OTA startup

I’m running an STM32H7 with BG96 for the cellular connection. The device connects/reconnects properly using mbedtls to the AWS MQTT broker. It publishes payloads to topics and it receives notifications of topics it subscribed to.
Only when I start the ota-for-aws-iot-embedded-sdk I run into a problem. It subscribes to the $aws/things/<thingId>/jobs/notify-next alright, but after publishing the clientToken to $aws/things/<thingId>/jobs/$next/get the TLS connection is dropped with the following message:

[PkcsTlsTransport] [TLS_FreeRTOS_recv:957] Failed to read data: mbedTLSError= SSL - The peer notified us that the connection is going to be closed : <No-Low-Level-Code>.

Hello @xanderhendriks,

Is that the exact topic to which the client is subscribed to or publishes to?
Because it seems that the thing name is missing from your topics. The topics should look something similar to the following:

If so, then did you pass in the correct thing name to the call to OTA_Init?

@kanherea thanks for your response. I’m sorry, you are right that is not the exact topic it’s subscribing to. I had included the ThingId between <> like you did, but I failed to see this text got removed. It was one of those long days yesterday. I have fixed up my question.

Any other thoughts on what to try? I did run the visual studio demo for the OTA on my laptop and this worked without any issues. So the process of starting an OTA update and the cloudside configuration all seems alright.

Can you enable mbedTLS debug logs? Another thing you can try is to enable AWS IoT logs.

Thanks for your reply @aggarg. I had a look at the AWS IoT logs and found that the server receives the publish on $aws/things/<devideId>/jobs/$next/get, but it gives an error after which the server disconnects with CLIENT_ERROR as shown in the screenshot. Is there a way to find out why that error is happening?

Both in the case of the working MSVC demo and the STM32 code the publish is like this:
qos: 1
topic: $aws/things/<thingId>/jobs/$next/get
topicLength: 63
payload: {“clientToken”:“:<thingId>”}
payloadLength: 55

The calls to MQTT_ProcessLoop result in MQTTSuccess, so the client thinks everything is fine.

Are you using the same creds in both the demos? Can you try dumping topic and payload in both the demos to figure out if there is anything different?

The topic and payload are both as per my previous message and both devices indeed use the same credentials:

qos: 1
topic: $aws/things/<thingId>/jobs/$next/get
topicLength: 63
payload: {“clientToken”:“:<thingId>”}
payloadLength: 55

I did find it a bit odd that the payload starts with :. When I looked into this I saw there is already fix for that, but that didn’t make it into the FreeRTOS-LTS yet. It doesn’t seem to affect the functionality though as the cloud is happy to accept this from the MSVC code.
I also have a branch with older code that was based on some ST/AWS training code which works as well. I’m trying to find the difference to see if I can also make this work with the FreeRTOS-LTS libraries.
Is there a way for you to see what it all checks when a device publishes to: $aws/things/<thingId>/jobs/$next/get? It would be a great help to understand which checks it does, so I can make sure my code does the right thing.
Another interesting finding is that the code can publish to our application topics and also to the AWS shadow topics.

I am not aware of one such service if there even is one.

I know that you are using the same credentials but can you please verify that the thingname is not bigger than 53 characters? It is limited by this line of code: ota-for-aws-iot-embedded-sdk/source/ota_mqtt.c.

Also, can you add logging here to verify that the topic string and the payload being sent over MQTT are exactly as they should be?

I found the problem. It wasn’t the credentials, thingname, topic or payload. Which only leaves 2 other bits which are configured for a publish, the retain and dup bits. The OTA had its own implementation of the Publish command as the API didn’t match with the function we use in the rest of the code. In the OTA version the 2 bits were not cleared, while they were in the other version. Resulting in all Publishes working as expected except for the OTA ones.
Refactoring this now to make sure this never happens again.
A big thanks to everyone who helped!

1 Like

Thank you for reporting back!