MQTT Agent Error : No command structure available

Hi,

I’m running OTA+CoreMQTT on top of the FreeRTOS cellular interface. During testing, roughly every other day, the cellular interface fails to send a single packet. When this happens the MQTT Agent disconnects and the modem is RFoff/RFon cycled. Then my code waits for a network re-registration and then reconnects the MQTT agent again.
Sometimes however, directly after things have resumed again, CoreMQTT throws an error (see log snippet below).

First CoreMQTT reports that no command structure is available, then a call to SUBSCRIBE returns statusReturn = MQTTNoMemory;
This in turn then triggers a chain of events that eventually stops the OTA agent and then my application hangs in prvTaskExitError();

I having a hard time finding the cause of this. Apparently the MQTT Agent function createAndAddCommand() is called while there is no command for some reason.
Hope someone here knows what this can be.
What makes it difficult is that the bug is hard to reproduce, only happening roughly once every two days.

-----------------------Logging-------------------------------------------------------------------------

[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1281] Waiting for cellular connection before creating the TLS connection...
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1284] Creating a TLS connection to a33g7qaxhmjo21-ats.iot.eu-central-1.amazonaws.com:8883.
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1506] TCP socket connected! 
[DEBUG] [OTADemo] [OtaOverMqttDemoExample.c:1386] Creating command queue.
[INFO] [MQTT] [core_mqtt.c:1838] MQTT connection established with the broker.
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1474] Session present: 0
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1521] MQTT connection successful! 
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1848] MQTT Agent connected...
[INFO] [OTA] [ota.c:2846] Current State=[RequestingJob], Event=[Resume], New state=[RequestingJob]
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2012]  Received: 0   Queued: 0   Processed: 0   Dropped: 0
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1853] Resumed OTA agent.
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2012]  Received: 0   Queued: 0   Processed: 0   Dropped: 0
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2012]  Received: 0   Queued: 0   Processed: 0   Dropped: 0
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2012]  Received: 0   Queued: 0   Processed: 0   Dropped: 0
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2012]  Received: 0   Queued: 0   Processed: 0   Dropped: 0
[ERROR] [MQTT] [freertos_command_pool.c:117] No command structure available.
[ERROR] [OTADemo] [OtaOverMqttDemoExample.c:1609] Failed to SUBSCRIBE to topic with error = 2.

After the reconnection are you trying to subscribe to the topics directly from the MQTT connection successful Callback? That would give you that problem.

Try covering the antenna with an antistatic bag to hammer down the cellular signal, you may be able to replicate the problem that way.

Not exactly sure what you mean, but here is how I implemented it:

My implementation closely follows the OTA example.

If the MQTTAgent_CommandLoop returns NOK then:
prvSuspendOTA() is executed
prvSocketDisconnect() executed

Then immediately following that an attempt to establish a new connection is performed by calling prvConnectToMQTTBroker(). This in turn calls prvSocketConnect().
I added my own code to prvSocketConnect() that waits for the cellular connection to come back up before it attempts to re-establish the socket connection.

Once the cellular connection comes back up, the socket is re-established and prvConnectToMQTTBroker() returns. After that OTA is resumed which wants to (re)subscribe, that’s when the error occurs.

Hi @BaxEDM

The MQTT Agent’s command structure pool can get exhausted when an application task is retrying a packet which requires acknowledgment (such as SUBSCRIBE or Qos1 Publish), while the Agent is not connected.

Are there any tasks other than OTA in your application which subscribes or performs Qos1 publishes with the MQTT agent?

From your comment, I think you are correctly suspending OTA agent when MQTT Agent loop is disconnected, and resuming the OTA Agent after reconnection. This should atleast stop OTA agent from sending subscribe requests when MQTT agent is not connected. However from the logs it seems OTA Agent is still in state RequestingJob when you resume OTA after reconnection.

I expect OTA agent to be in suspended state (Current State=[Suspended]) while calling Resume. So could you share the logs before reconnection to see whats happening?

Hi @ravibhagavandas,

Here’s a bigger section of the log:

{"id":"ID-00009-00037-015-P0N261Z","spd":0.0,"odo":0.0,"vb":0.000,"vg":3.777,"pwr":0.0,"cap":0.0,"cnt":168.0,"lat":51.410678863525391,"lon":5.401926517486572}
[INFO] [MQTT] [core_mqtt_agent_command_functions.c:76] Publishing message to iot/ID-00009-00037-015-P0N261Z.

[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2012]  Received: 0   Queued: 0   Processed: 0   Dropped: 0
[INFO] [MQTT] [core_mqtt.c:1170] Ack packet deserialized with result: MQTTSuccess.
[INFO] [MQTT] [core_mqtt.c:1183] State record updated. New state=MQTTPublishDone.
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1679] Sent PUBLISH packet to topic iot/ID-00009-00037-015-P0N261Z.

[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2012]  Received: 0   Queued: 0   Processed: 0   Dropped: 0
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2012]  Received: 0   Queued: 0   Processed: 0   Dropped: 0
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2012]  Received: 0   Queued: 0   Processed: 0   Dropped: 0
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2012]  Received: 0   Queued: 0   Processed: 0   Dropped: 0
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2012]  Received: 0   Queued: 0   Processed: 0   Dropped: 0
{"id":"ID-00009-00037-015-P0N261Z","spd":0.0,"odo":0.0,"vb":0.000,"vg":3.782,"pwr":0.0,"cap":0.0,"cnt":169.0,"lat":51.410655975341797,"lon":5.401903152465820}
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2012]  Received: 0   Queued: 0   Processed: 0   Dropped: 0
[ERROR] [CellularLib] [cellular_pkthandler.c:335] pkt_recv status=1, data sending timed out
[ERROR] [CellularLib] [cellular_r4_api.c:748] Cellular_SocketSend: Data send fail, PktRet: 1
[ERROR] [TlsTransport] [using_mbedtls.c:840] Failed to send data:  mbedTLSError= <No-High-Level-Code> : ERROR - Generic error.
[ERROR] [MQTT] [core_mqtt.c:625] Transport send failed. Error code=-1.
[ERROR] [MQTT] [core_mqtt.c:2034] Transport send failed for PINGREQ packet.
[ERROR] [MQTT] [core_mqtt.c:2200] Exiting process loop due to failure: ErrorStatus=MQTTSendFailed
[ERROR] [MQTT] [core_mqtt_agent.c:1022] MQTT operation failed with status MQTTSendFailed

[WARN] [OTA] [ota_os_freertos.c:310] OTA Timer handle NULL for Timerid=0, can't stop.
[INFO] [OTA] [ota.c:1350] OTA Agent is suspended.
[INFO] [OTA] [ota.c:2846] Current State=[Suspended], Event=[Suspend], New state=[Suspended]
[ERROR] [CellularLib] [cellular_pktio.c:722] recvdMsgType is AT_UNDEFINED for Message: OK, cmd NULL
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1835] Suspended OTA agent.
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1836] MQTT Agent disconnecting...
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1363] Disconnecting TLS connection.

[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1842] TCP Connection closed...
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1273] Attempting to open TLS socket to broker...


Waiting on reconnect before attempting forced rescan [1/10]
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1281] Waiting for cellular connection before creating the TLS connection...


Waiting on reconnect before attempting forced rescan [2/10]
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1281] Waiting for cellular connection before creating the TLS connection...


Waiting on reconnect before attempting forced rescan [3/10]
[ERROR] [OTADemo] [OtaOverMqttDemoExample.c:1674] Failed to send PUBLISH packet to broker with error = 3.
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1281] Waiting for cellular connection before creating the TLS connection...


Waiting on reconnect before attempting forced rescan [4/10]
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1281] Waiting for cellular connection before creating the TLS connection...


Waiting on reconnect before attempting forced rescan [5/10]
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1281] Waiting for cellular connection before creating the TLS connection...


Waiting on reconnect before attempting forced rescan [6/10]
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1281] Waiting for cellular connection before creating the TLS connection...


Waiting on reconnect before attempting forced rescan [7/10]
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1281] Waiting for cellular connection before creating the TLS connection...


Waiting on reconnect before attempting forced rescan [8/10]
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1281] Waiting for cellular connection before creating the TLS connection...


Waiting on reconnect before attempting forced rescan [9/10]
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1281] Waiting for cellular connection before creating the TLS connection...


Waiting on reconnect before attempting forced rescan [10/10]
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1281] Waiting for cellular connection before creating the TLS connection...

Turning RF off...

Waiting 1s...

[INFO] [CellularLib] [cellular_r4_urc_handler.c:341] _cellular_UrcProcessUupsmr: PSM_MODE_EXIT
#----------------------------------------------------------#
Unsolicited Network Registration Event Detected... 
networkRegistrationMode==REGISTRATION_MODE_UNKNOWN
csRegistrationStatus==REGISTRATION_STATUS_NO_REGISTERED_SEARCHING
psRegistrationStatus==REGISTRATION_STATUS_ROAMING_REGISTERED
#----------------------------------------------------------#
#----------------------------------------------------------#
Unsolicited Network Registration Event Detected... 
networkRegistrationMode==REGISTRATION_MODE_UNKNOWN
csRegistrationStatus==REGISTRATION_STATUS_NO_REGISTERED_SEARCHING
psRegistrationStatus==REGISTRATION_STATUS_NO_REGISTERED_SEARCHING
#----------------------------------------------------------#
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1281] Waiting for cellular connection before creating the TLS connection...


Turning RF on...

[INFO] [CellularLib] [cellular_r4_urc_handler.c:341] _cellular_UrcProcessUupsmr: PSM_MODE_EXIT
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1281] Waiting for cellular connection before creating the TLS connection...


#----------------------------------------------------------#
Unsolicited Network Registration Event Detected... 
networkRegistrationMode==REGISTRATION_MODE_UNKNOWN
csRegistrationStatus==REGISTRATION_STATUS_NOT_REGISTERED_SEARCHING
psRegistrationStatus==REGISTRATION_STATUS_NO_REGISTERED_SEARCHING
#----------------------------------------------------------#
#----------------------------------------------------------#
Unsolicited Network Registration Event Detected... 
networkRegistrationMode==REGISTRATION_MODE_UNKNOWN
csRegistrationStatus==REGISTRATION_STATUS_NOT_REGISTERED_SEARCHING
psRegistrationStatus==REGISTRATION_STATUS_NOT_REGISTERED_SEARCHING
#----------------------------------------------------------#
Waiting on reconnect before attempting forced rescan [1/30]
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1281] Waiting for cellular connection before creating the TLS connection...


#----------------------------------------------------------#
Unsolicited Network Registration Event Detected... 
networkRegistrationMode==REGISTRATION_MODE_UNKNOWN
csRegistrationStatus==REGISTRATION_STATUS_ROAMING_REGISTERED
psRegistrationStatus==REGISTRATION_STATUS_NOT_REGISTERED_SEARCHING
#----------------------------------------------------------#
#----------------------------------------------------------#
Unsolicited Network Registration Event Detected... 
networkRegistrationMode==REGISTRATION_MODE_UNKNOWN
csRegistrationStatus==REGISTRATION_STATUS_ROAMING_REGISTERED
psRegistrationStatus==REGISTRATION_STATUS_ROAMING_REGISTERED
#----------------------------------------------------------#
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1281] Waiting for cellular connection before creating the TLS connection...


[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1284] Creating a TLS connection to a4gg9xlmdgdfgo34-ats.iot.eu-central-1.amazonaws.com:8883.
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1506] TCP socket connected! 


[DEBUG] [OTADemo] [OtaOverMqttDemoExample.c:1386] Creating command queue.
[INFO] [MQTT] [core_mqtt.c:1838] MQTT connection established with the broker.
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1474] Session present: 0

[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1521] MQTT connection successful! 


[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1848] MQTT Agent connected...
[INFO] [OTA] [ota.c:2846] Current State=[RequestingJob], Event=[Resume], New state=[RequestingJob]
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2012]  Received: 0   Queued: 0   Processed: 0   Dropped: 0
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1853] Resumed OTA agent.
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2012]  Received: 0   Queued: 0   Processed: 0   Dropped: 0
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2012]  Received: 0   Queued: 0   Processed: 0   Dropped: 0
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2012]  Received: 0   Queued: 0   Processed: 0   Dropped: 0
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2012]  Received: 0   Queued: 0   Processed: 0   Dropped: 0
[ERROR] [MQTT] [freertos_command_pool.c:117] No command structure available.
[ERROR] [OTADemo] [OtaOverMqttDemoExample.c:1609] Failed to SUBSCRIBE to topic with error = 2.
[ERROR] [OTA] [ota_mqtt.c:390] Failed to subscribe to MQTT topic: subscribe returned error: OtaMqttStatus_t=OtaMqttSubscribeFailed, topic=$aws/things/ID-00009-00037-015-P0N261Z/jobs/notify-next

The line between { } is a message that I publish.

The strange thing is that after the disconnect the OTA state is:

[INFO] [OTA] [ota.c:2846] Current State=[Suspended], Event=[Suspend], New state=[Suspended]

Then after the reconnect the log gives:

[INFO] [OTA] [ota.c:2846] Current State=[RequestingJob], Event=[Resume], New state=[RequestingJob]

While in between the disconnect and reconnect there are no other state changes reported, so how did it go from [suspended] to [RequestingJob] without reporting a resume? I also question that log entry, why is the new state always the same as the current state? The code that makes that log entry in ota.c seems ok:

LogInfo( ( "Current State=[%s]"
           ", Event=[%s]"
           ", New state=[%s]",
           pOtaAgentStateStrings[ otaAgent.state ],
           pOtaEventStrings[ pEventMsg->eventId ],
           pOtaAgentStateStrings[ otaTransitionTable[ index ].nextState ] ) );

So next to OTA, I do publish in another thread with QoS 1, but I make sure there I only publish when the MQTT Agent is connected.

Hi @BaxEDM,

Can you also check if this commit has been merged into your OTA demo?

If this commit is not merged in the OTA demo, MQTTAgentCommand_t won’t be returned to the command pool after network disconnect/connect.

“No command structure available.” message will be observed from the log.

Thanks! That commit was not merged yet. I’ve merged it now and am running a endurance test now. As it is hard to reproduce, time will tell if this did the trick. I’ll come back here in a few days to report.

Thanks! Ran for a couple of days now without problems!

1 Like