IotMqtt_Assert( contextIndex != -1 ) inside AwsIotShadow_TimedUpdate()

Hi Support, we are observing an assert inside a call to AwsIotShadow_TimedUpdate() during some of our long term tests.

I believe this is actually occurring during the following sequence.

Task 1)
Call to AwsIotShadow_TimedUpdate(), pends waiting on timeout for response from server

Task 2)
During this time, another task calls to IotMqtt_Disconnect(g_aws_iot_mqtt_connection, flags)

Back to Task 1 again)
Assert occurs inside
AwsIotShadow_TimedUpdate()
AwsIotShadow_Wait
_AwsIotShadow_DecrementReferences
AwsIotShadow_Assert( IotMqtt_IsSubscribed( pOperation->mqttConnection,
pTopicBuffer,
topicFilterLength,
NULL ) == true )
IotMqtt_Assert( contextIndex != -1 )

Does this make sense?
Thanks!

We are running shadow - FreeRTOS Shadow V2.2.3
And mqtt - FreeRTOS MQTT V2.3.1

Hi @schoeler, thank you for sharing the call stack of the assert failure.

From the prima facie, it seems like the race condition between task 1 and task 2 of using the same MQTT connection object can be causing the issue where task 2 is clearing the connection and thereby, causing the task 1’s reference to the same MQTT connection object to become invalid and thus, hit the assert failure.

Will you be able to confirm with debugging that task 2 is causing the _IotMqtt_removeContext function (that clears the MQTT connection object) to be called?

As mentioned in the API documentation of IotMqtt_Disconnect, the function does not support use of the mqttConnection parameter after it has been called. Therefore, it does not support thread-safe use of this function while another task is already accessing the mqttConnection.

Will you be able to confirm with debugging that task 2 is causing the _IotMqtt_removeContext function (that clears the MQTT connection object) to be called?

Yes, I can confirm!

As mentioned in the API documentation of IotMqtt_Disconnect, the function does not support use of the mqttConnection parameter after it has been called. Therefore, it does not support thread-safe use of this function while another task is already accessing the mqttConnection .

Interesting. If it’s not task-safe, then I think this comment should be clearer -
* Once this function is called, its parameter mqttConnection should no longer be used. Mostly because we are actually using the mqttConnection from task 1 before the IotDisconnect is called in Task 2.

Follow up question from above - is it OK to allow two IotMqtt_TimedPublish() calls from 2 tasks at the same time? i.e. One task is calling it via the the shadow library, and another is calling IotMqtt_TimedPublish() directly?

Interesting. If it’s not task-safe, then I think this comment should be clearer -
* Once this function is called, its parameter mqttConnection should no longer be used. Mostly because we are actually using the mqttConnection from task 1 before the IotDisconnect is called in Task 2.

Thank you for the feedback, we will improve the documentation to make it clearer. :slightly_smiling_face:

Follow up question from above - is it OK to allow two IotMqtt_TimedPublish() calls from 2 tasks at the same time? i.e. One task is calling it via the the shadow library, and another is calling IotMqtt_TimedPublish() directly?

Yes, it is safe to call the IotMqtt_TimedPublish API from multiple tasks for the same mqttConnection parameter as the library creates internal operation objects, along with incrementing the reference count of mqttConnection, for each Publish/Subscribe/Unsubscribe API called.

It is only not task-safe to call the IotMqtt_DisconnectAPI for a connection object that is in-use from another task as the API attempts to destroy the connection object, thereby, causing the other task to see the assert failure (or undefined behavior).