Occasional cellular data sending timeout in prvDisconnectFromMQTTBroker results in configassert

Hi,

I’m running OTA over cellular MQTT. In order to save power I cycle between a low power sleep state and an active state. When going to sleep, the OTA task is suspended, the MQTT connection with the broker is disconnected and then the modem is turned off. Here’s a log snippet of this happening when there are no problems:

[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2014]  Received: 0   Queued: 0   Processed: 0   Dropped: 0
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2014]  Received: 0   Queued: 0   Processed: 0   Dropped: 0
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2014]  Received: 0   Queued: 0   Processed: 0   Dropped: 0
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2014]  Received: 0   Queued: 0   Processed: 0   Dropped: 0
{"id":"ID-00009-00037-015-P0N261Z","spd":0.0,"odo":0.0,"vb":0.000,"vg":3.543,"pwr":0.0,"cap":0.0,"cnt":327.0,"lat":51.410743713378906,"lon":5.401909351348877}
[INFO] [MQTT] [core_mqtt_agent_command_functions.c:76] Publishing message to iot/ID-00009-00037-015-P0N261Z.

[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1679] Sent PUBLISH packet to topic iot/ID-00009-00037-015-P0N261Z.


[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2014]  Received: 0   Queued: 0   Processed: 0   Dropped: 0
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2014]  Received: 0   Queued: 0   Processed: 0   Dropped: 0
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2014]  Received: 0   Queued: 0   Processed: 0   Dropped: 0
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2014]  Received: 0   Queued: 0   Processed: 0   Dropped: 0
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2014]  Received: 0   Queued: 0   Processed: 0   Dropped: 0
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2014]  Received: 0   Queued: 0   Processed: 0   Dropped: 0
Bike state transition: [LOWPOWERAWAKE] --> [LOWPOWERSLEEP]
Cellular state transition: [CONNECTED] --> [TURNINGOFF]
[WARN] [OTA] [ota_os_freertos.c:310] OTA Timer handle NULL for Timerid=0, can't stop.
[INFO] [OTA] [ota.c:1350] OTA Agent is suspended.
[INFO] [OTA] [ota.c:2846] Current State=[Suspended], Event=[Suspend], New state=[Suspended]
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2149] Suspended OTA agent successfully.
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1533] Disconnecting the MQTT connection with a32g9qwxlmbo20-ats.iot.eu-central-1.amazonaws.com.
[INFO] [MQTT] [core_mqtt.c:2158] Disconnected from the broker.
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1363] Disconnecting TLS connection.

[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1858] MQTTAgent task deleted...

CellularDevice_PowerOff + 
CellularDevice_PowerOff - 
Cellular state transition: [TURNINGOFF] --> [OFF]
Going to sleep...

Occasionally however it might happen that during an attempt to disconnect from the MQTT broker the cellular library reports that there is a data timeout. When this happens, this triggers a series of events that eventually cause a configassert to trigger in vPortFree(). Here is a log event of this happening:

[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2014]  Received: 0   Queued: 0   Processed: 0   Dropped: 0
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2014]  Received: 0   Queued: 0   Processed: 0   Dropped: 0
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2014]  Received: 0   Queued: 0   Processed: 0   Dropped: 0
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2014]  Received: 0   Queued: 0   Processed: 0   Dropped: 0
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2014]  Received: 0   Queued: 0   Processed: 0   Dropped: 0
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2014]  Received: 0   Queued: 0   Processed: 0   Dropped: 0
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2014]  Received: 0   Queued: 0   Processed: 0   Dropped: 0
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2014]  Received: 0   Queued: 0   Processed: 0   Dropped: 0
bike timer 2 callback
Bike state transition: [LOWPOWERAWAKE] --> [LOWPOWERSLEEP]
Cellular state transition: [CONNECTED] --> [TURNINGOFF]
[WARN] [OTA] [ota_os_freertos.c:310] OTA Timer handle NULL for Timerid=0, can't stop.
[INFO] [OTA] [ota.c:1350] OTA Agent is suspended.
[INFO] [OTA] [ota.c:2846] Current State=[Suspended], Event=[Suspend], New state=[Suspended]
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2149] Suspended OTA agent successfully.
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1533] Disconnecting the MQTT connection with a32g9qwxlmbo20-ats.iot.eu-central-1.amazonaws.com.
[ERROR] [CellularLib] [cellular_pkthandler.c:335] pkt_recv status=1, data sending timed out
[ERROR] [CellularLib] [cellular_r4_api.c:748] Cellular_SocketSend: Data send fail, PktRet: 1
[ERROR] [TlsTransport] [using_mbedtls.c:840] Failed to send data:  mbedTLSError= <No-High-Level-Code> : ERROR - Generic error.
[ERROR] [MQTT] [core_mqtt.c:625] Transport send failed. Error code=-1.
[ERROR] [MQTT] [core_mqtt.c:2146] Transport send failed for DISCONNECT packet.
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1363] Disconnecting TLS connection.

[ERROR] [MQTT] [core_mqtt_agent.c:1022] MQTT operation failed with status MQTTSendFailed

[WARN] [OTA] [ota_os_freertos.c:310] OTA Timer handle NULL for Timerid=0, can't stop.
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1836] Suspended OTA agent.
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1837] MQTT Agent disconnecting...
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1363] Disconnecting TLS connection.

[ERROR] [CellularLib] [cellular_pkthandler.c:335] pkt_recv status=1, data sending timed out
[ERROR] [CellularLib] [cellular_r4_api.c:748] Cellular_SocketSend: Data send fail, PktRet: 1
[ERROR] [TlsTransport] [using_mbedtls.c:742] (Network connection 0x20033bf8) Failed to send TLS close-notify: mbedTLSError= <No-High-Level-Code> : ERROR - Generic error.
[WARN] [CellularLib] [cellular_pktio.c:367] Modem return ERROR: line AT+USOWR=0,31, cmd : +CME ERROR: Operation not allowed, respPrefix NULL, status: 0
[WARN] [CellularLib] [cellular_pkthandler.c:247] Modem returns error in sending AT command AT+USOWR=0,31, pktStatus 2.
[ERROR] [CellularLib] [cellular_r4_api.c:748] Cellular_SocketSend: Data send fail, PktRet: 2
[ERROR] [CellularLib] [cellular_common.c:487] _Cellular_TranslatePktStatus: Status 2
[ERROR] [TlsTransport] [using_mbedtls.c:742] (Network connection 0x20033bf8) Failed to send TLS close-notify: mbedTLSError= <No-High-Level-Code> : ERROR - Generic error.

This is the section in vPortFree that the programs ends up in when this happens:

/* Check the block is actually allocated. */|
configASSERT( ( pxLink->xBlockSize & xBlockAllocatedBit ) != 0 );|

It seems that when prvDisconnectFromMQTTBroker() is called and the MQTTAgent_Disconnect() call inside fails, it continues anyway and calls prvSocketDisconnect() after that. In parallel, inside the MQTTAgentTask, the MQTTAgent_CommandLoop exits due to the error and then in turn another call prvSocketDisconnect() is being made. That is how it is implemented, seems wrong to me.

I do not know what the best way would be to handle such a condition. Hoping that someone here can point me to the solution.

Hi Mike,

From your log, the cellular socket send fails when the OTA demo task tries to disconnect from MQTT broker and later disconnect the socket. The MQTT Agent task also trying to disconnect/connect to the MQTT broker when MQTTAgent_CommandLoop returns error due to cellular socket send error.

The first send error is timeout.

[ERROR] [CellularLib] [cellular_pkthandler.c:335] pkt_recv status=1, data sending timed out
[ERROR] [CellularLib] [cellular_r4_api.c:748] Cellular_SocketSend: Data send fail, PktRet: 1

The second send error is cellular modem returns error.

AT+USOWR=0,31
+CME ERROR: Operation not allowed

There are chances that the socket send returns error, for example disconnected by remote server. In that case, we need to close the socket and reconnect to the server.

However, this is not the root cause of your problem. The problem is that the socket may be closed by two threads if the network is unavailable. One is OTA demo task and the other is MQTT agent task.

We will look into this problem and reply more information.

Hi Mike,

OTA demo task handles the MQTT connect/disconnect and handovers the MQTT connection to MQTT Agent task later.
MQTT Agent task handles the MQTT command, including fail reconnect to MQTT broker.

If we want to disconnect MQTT broker, it is better to do it in OTA demo task.
MQTT Agent has terminate function, MQTTAgent_Terminate. It can be used to terminate the MQTTAgent_CommandLoop. Then, we can disconnect the MQTT broker with MQTT_Disconnect in OTA demo task.

The following code can be referenced.

static void prvDisconnectFromMQTTBroker( void )
{
    MQTTAgentCommandContext_t xCommandContext = { 0 };
    MQTTAgentCommandInfo_t xCommandParams = { 0 };
    MQTTStatus_t xCommandStatus;

    /* Disconnect from broker. */
    LogInfo( ( "Disconnecting the MQTT connection with %s.", democonfigMQTT_BROKER_ENDPOINT ) );

    xCommandParams.blockTimeMs = MQTT_AGENT_SEND_BLOCK_TIME_MS;
    xCommandParams.cmdCompleteCallback = prvCommandCallback;
    xCommandParams.pCmdCompleteCallbackContext = &xCommandContext;
    xCommandContext.xTaskToNotify = xTaskGetCurrentTaskHandle();
    xCommandContext.pArgs = NULL;
    xCommandContext.xReturnStatus = MQTTSendFailed;

    /* Terminate the MQTT Agent. */
    xCommandStatus = MQTTAgent_Terminate( &xGlobalMqttAgentContext, &xCommandParams );
    configASSERT( xCommandStatus == MQTTSuccess );

    xTaskNotifyWait( 0,
                     0,
                     NULL,
                     pdMS_TO_TICKS( MQTT_AGENT_MS_TO_WAIT_FOR_NOTIFICATION ) );

    /* Disconnect MQTT session. */
    ( void ) MQTT_Disconnect( &( xGlobalMqttAgentContext.mqttContext ) );

    /* End TLS session, then close TCP connection. */
    prvSocketDisconnect( &xNetworkContextMqtt );
}

Thank you for the explanation. If I understand correctly MQTT_Terminate will cause the commandloop to return MQTTSuccess, which then terminates the loop. This then avoids the second call to prvSocketDisconnect(&xNetworkContextMqtt);
I will give this a try and will report my findings here.

@Fresh
I have added the MQTTAgent_Terminate() line and ran into a very similar problem. Attached is the complete log of the application debug uart as well as the modem uart.
Log_files_CME_Error.zip (418.8 KB)

So when my application wants to go to sleep it calls:

void disable_OTA_MQTT(void){
   BaseType_t xResult = pdFAIL;
   xResult = prvSuspendOTA();
    configASSERT(xResult == pdPASS);
    LogInfo(("Suspended OTA agent successfully."));
    prvDisconnectFromMQTTBroker();

}

The error then occurs during the disconnect. This triggers an exit of the MQTTAgent_CommandLoop which causes a second call to prvSocketDisconnect. the first call was made by prvDisconnectFromMQTTBroker(). So apparently MQTTAgent_Terminate does not return success if there is a connection issue.

After exiting the MQTTAgent_CommandLoop a second call to prvSocketDisconnect() I believe causes a modem “+CME ERROR”, and this makes my application hang in the configassert.

Hello,

I’m running OTA on top of the cellular library. For the cellular library I’m using the latest commit as of today (24-Jun-2022).
I’m using the uBlox SARA-R410M with software version 02B-01
The application runs on a STM32L4S5 and uses STM32L4S5 comm interface.

I’m trying to run my application for an extended period without hangs. Each hang I find which is not related my my own code, but to the cellular lib or OTA I will report it here on the forum as a new topic.

My application cycles between a normal power and a low power mode. When going to the low power mode the MQTT connection and TLS socket are disconnected and the modem is turned off.
This topic is about my application hanging in a configasset in vPortFree, after the modem returns “+CME ERROR: Operation not allowed” when the application sends the command “AT+USOWR=0,31” while trying to disconnect TLS.

The complete log of my application as well as the serial log of the Rx and Tx modem UART lines is attached here:
Log_files_CME_Error.zip (418.8 KB)

I do not know how to recover correctly from such a situation. Hoping I can get help here.

I just realized this error is closely related to:

Since I cannot delete this topic, I will continue the discussion in that other topic.

@BaxEDM Posts from your other related topic have been merged into this one.

Hi Mike,

Sorry for missing this thread and thank you for your feedback with the detail logs.

We can update the prvDisconnectFromMQTTBroker to prevent the TERMINATE command get cancelled.

static void prvDisconnectFromMQTTBroker( void )
{
    MQTTAgentCommandContext_t xCommandContext = { 0 };
    MQTTAgentCommandInfo_t xCommandParams = { 0 };
    MQTTStatus_t xCommandStatus;
    BaseType_t notifyWaitStatus;

    /* Disconnect from broker. */
    LogInfo( ( "Disconnecting the MQTT connection with %s.", democonfigMQTT_BROKER_ENDPOINT ) );

    xCommandParams.blockTimeMs = MQTT_AGENT_SEND_BLOCK_TIME_MS;
    xCommandParams.cmdCompleteCallback = prvCommandCallback;
    xCommandParams.pCmdCompleteCallbackContext = &xCommandContext;
    xCommandContext.xTaskToNotify = xTaskGetCurrentTaskHandle();
    xCommandContext.pArgs = NULL;
    xCommandContext.xReturnStatus = MQTTSendFailed;

    /* Terminate the MQTT Agent. */
    do
    {
        xCommandStatus = MQTTAgent_Terminate( &xGlobalMqttAgentContext, &xCommandParams );
        configASSERT( xCommandStatus == MQTTSuccess );

        notifyWaitStatus = xTaskNotifyWait( 0,
                                            0,
                                            NULL,
                                            pdMS_TO_TICKS( MQTT_AGENT_MS_TO_WAIT_FOR_NOTIFICATION ) );
        configASSERT( notifyWaitStatus == pdPASS );
    } while( xCommandContext.xReturnStatus != MQTTSuccess );

    /* Disconnect MQTT session. */
    ( void ) MQTT_Disconnect( &( xGlobalMqttAgentContext.mqttContext ) );

    /* End TLS session, then close TCP connection. */
    prvSocketDisconnect( &xNetworkContextMqtt );
}

The reason can be referenced below.

  1. The application tries to suspend OTA in OTA demo task. OTA demo task enqueues a MQTTAgent_Terminate command. The command is successfully enqueued but doesn’t get processed. The reason is that the log “Terminating command loop.” is not observed.
Bike state transition: [LOWPOWERAWAKE] --> [LOWPOWERSLEEP]
Cellular state transition: [CONNECTED] --> [TURNINGOFF]
[WARN] [OTA] [ota_os_freertos.c:310] OTA Timer handle NULL for Timerid=0, can't stop.
[INFO] [OTA] [ota.c:1350] OTA Agent is suspended.
[INFO] [OTA] [ota.c:2846] Current State=[Suspended], Event=[Suspend], New state=[Suspended]
<2022.06.24 22:12:53.058>
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:2186] Suspended OTA agent successfully.
[INFO] [OTADemo] [OtaOverMqttDemoExample.c:1567] Disconnecting the MQTT connection with a32g9qwxlmbo20-ats.iot.eu-central-1.amazonaws.com.
  1. The MQTT Agent task is sending PINGREQ and encountered timeout error. The task cancels all the command including the TERMINATE command. Then do a reconnect to the MQTT BROKER.
[ERROR] [CellularLib] [cellular_pkthandler.c:335] pkt_recv status=1, data sending timed out
[ERROR] [CellularLib] [cellular_r4_api.c:738] Cellular_SocketSend: Data send fail, PktRet: 1
[ERROR] [TlsTransport] [using_mbedtls.c:843] Failed to send data:  mbedTLSError= <No-High-Level-Code> : ERROR - Generic error.
[ERROR] [MQTT] [core_mqtt.c:625] Transport send failed. Error code=-1.
[ERROR] [MQTT] [core_mqtt.c:2034] Transport send failed for PINGREQ packet.
[ERROR] [MQTT] [core_mqtt.c:2200] Exiting process loop due to failure: ErrorStatus=MQTTSendFailed
[ERROR] [MQTT] [core_mqtt_agent.c:1022] MQTT operation failed with status MQTTSendFailed
  1. OTA Demo task receives a xTaskNotify from the prvCommandCallback callback function. However, if the command is canceled the returnedCode should not be MQTTSuccess. The OTA demo task continues to close socket connection while the MQTT Agent is not terminated.

From the modem log, we can see that the socket is closed without reconnecting to the MQTT broker. There should not be any socket send command after the socket is closed.

AT+USOCL=0,0
<2022.06.24 22:12:56.540>

OK
AT+USOWR=0,31

+CME ERROR: Operation not allowed

@Fresh Thank you,

I have updated my prvDisconnectFromMQTTBroker() accordingly and will test. I will report back the results.