We are in the process of integrating CoreMQTT library along with MQTT Agent.
Unfortunately we are getting intermittent crash caused by invalid pointer access while subscription callback is being serviced.
In our application, we always subscribe 1 topic at a time and call
MQTTAgent_Subscribe() with following arguments
MQTTSubscribeInfo_t xSubscribeInfo = { 0 };
MQTTAgentSubscribeArgs_t xSubscribeArgs = { 0 };
CommandInfo_t xCommandParams = { 0 };
CommandContext_t xApplicationDefinedContext = { 0 };
MQTTStatus_t xCommandAdded;
uint32_t ulNotificationValue;
xSubscribeInfo.qos = qos;
xSubscribeInfo.pTopicFilter = topic;
xSubscribeInfo.topicFilterLength = ( uint16_t ) strlen( topic );
xSubscribeArgs.pSubscribeInfo = &xSubscribeInfo;
xSubscribeArgs.numSubscriptions = 1;
/* Complete an application defined context associated with this subscribe message.
* This gets updated in the callback function so the variable must persist until
* the callback executes. */
xApplicationDefinedContext.xTaskToNotify = xTaskGetCurrentTaskHandle();
xApplicationDefinedContext.pArgs = ( void * ) &xSubscribeArgs;
xCommandParams.blockTimeMs = MAX_COMMAND_SEND_BLOCK_TIME_MS;
xCommandParams.cmdCompleteCallback = prvSubscribeCommandCallback;
xCommandParams.pCmdCompleteCallbackContext = ( void * ) &xApplicationDefinedContext;
xCommandAdded = MQTTAgent_Subscribe( &xGlobalMqttAgentContext,
&xSubscribeArgs,
&xCommandParams );
if ( xCommandAdded != MQTTSuccess )
{
ERR_LOG("error(%d) while subscribing thru MQTT agent\n", xCommandAdded);
return xCommandAdded;
}
/* Wait for acks from subscribe messages - this is optional. If the
* returned value is zero then the wait timed out. */
xTaskNotifyWait( 0, mqtt_SUB_COMMAND_ACK_BIT, &ulNotificationValue, MS_TO_WAIT_FOR_NOTIFICATION );
When the crash occurs from invalid pointer access, it is
xSubcribeArgs.pSubscribeInfo that appears to have been corrupt. Actually xSubscribeArgs.numSubscriptions also is some huge number.
I understand that the memory for xSubscribeInfo in xSubscribeArgs.pSubscribeInfo = &xSubscribeInfo;
is from the task stack and, therefore, it will no longer be valid if the task (which calls MQTTAgent_Subscribe() and blocks afterwards until subscription callback is invoked) times out from xTaskNotifyWait.
I double-confirmed that the timeout is NOT happening but sometime when a subscription callback is invoked, it is invoked with invalid MQTTAgentSubscribeArgs_t*
for our callback implemented as below:
static void prvSubscribeCommandCallback( void * pxCommandContext,
MQTTAgentReturnInfo_t * pxReturnInfo )
{
bool xSubscriptionAdded = false;
CommandContext_t * pxApplicationDefinedContext = ( CommandContext_t * ) pxCommandContext;
MQTTAgentSubscribeArgs_t * pxSubscribeArgs = ( MQTTAgentSubscribeArgs_t * ) (pxApplicationDefinedContext->pArgs);
I was wondering if pPendingAcks list in MqttAgentContext was somehow not managed properly. The size of the pending ack list is defined to be 200 in our application and the total number of subscription topics in our app is way less than 200. It’s around 50 only.
Does anyone know what we might be missing or perhaps there is a known bug in MqttAgent?
Another question I have is whether blocking after calling MqttAgent_Subscribe() with “timeout” is a reasonable way to deal with the sync between the caller and the MQTT agent task.
What I am not sure is what the timeout should be. The issue I see is if timeout occurs but a subscription callback is invoked, since pxSubscribeArgs is from the caller’s stack, it won’t be valid any more. What timeout value can guarantee that this won’t happen. I am not sure if that’s possible since it’s Broker implementation specific and there can be bad broker’s out there too.
Any insight on the first and the second question would be greatly appreciated.