Failed to send PINGRE with Shadow Demo

Hi.

I’m working with the Shadow demo over ethernet and after starting the demo, after a few minutes, the console shows the following error messages:

I have been reading this thread where it seems to me that it is a similar or equal situation:

The solution he proposes is the following:

void mqttTask(...) {
    IotMqtt_Init();
    AwsIotShadow_Init(0);
    _establishMqttConnection(...);
    _modifySubscriptions(...);
    _setShadowCallbacks(...);
    
    while(1) {
        if(reconnect_to_mqtt) {
            AwsIotShadow_Cleanup();
            AwsIotShadow_Init(0);
            _establishMqttConnection(...);
            _modifySubscriptions(...);
            _setShadowCallbacks(...);
            reconnect_to_mqtt = false;
        }
    }
}

void someKindOfWiFiReconnectCallback(...) {
  //This cleans up a bunch of MQTT resources and triggers the disconnect callback.
  IotMqtt_Disconnect(mqttConnection, 0);
}

void mqttDisconnectCallback(...){
    //Set flag or post message to trigger re-connect in task that originally setup mqtt connection
    reconnect_to_mqtt = true;
}

I have tried to make similar lake in RunShadowDemo function as follows:

int RunShadowDemo( bool awsIotMqttMode,
                   const char * pIdentifier,
                   void * pNetworkServerInfo,
                   void * pNetworkCredentialInfo,
                   const IotNetworkInterface_t * pNetworkInterface )
{
   
   ....

    if( status == EXIT_SUCCESS )
    {
        /* Mark the libraries as initialized. */
        librariesInitialized = true;
        status = EXIT_FAILURE; //So that it enters the while loop at least once
        while (EXIT_FAILURE == status)
        {
            /* Establish a new MQTT connection. */
            status = _establishMqttConnection( pIdentifier,
                                           pNetworkServerInfo,
                                           pNetworkCredentialInfo,
                                           pNetworkInterface,
                                           &mqttConnection );
        }
    }

    if( status == EXIT_SUCCESS )
    {
      ...
    }

}

But this solution would only be if the problem occurs when starting the demo, I can’t identify where I could do something similar when the demo is already running.

Any suggestion?

Based on the above log, it looks like the connection to the broker is going down. Do you know why it is going down, is your keep alive timer set correctly ?

Regarding the solution, when the connection is terminated / closed, the next send would return an error. When you receive the error, you will have establish the connection again. The shadow API should signal error at MQTT level by returning AWS_IOT_SHADOW_MQTT_ERROR. If you should receive an error, I would first close MQTT connection and then reconnect for clean operation.

Please let us know if that works.

Hi.

Thanks for answering.

I’m taking the example that the microcontroller manufacturer provides for the Shadow demo, so I’m not sure what timer you mean.

The only file that you have modified is the aws_iot_demo_shadow.c. And I have modified it to receive 2 characters instead of one:

#define SHADOW_DESIRED_JSON     \
    "{"                         \
    "\"state\":{"               \
    "\"desired\":{"             \
    "\"powerOn\":%02d"          \ //<-- I changed from 01d to 02d to receive two characters
    "}"                         \
    "},"                        \
    "\"clientToken\":\"%06lu\"" \
    "}"
#define EXPECTED_DESIRED_JSON_SIZE    ( sizeof( SHADOW_DESIRED_JSON ) - 2)//was -3, instead of -2
#define SHADOW_REPORTED_JSON    \
    "{"                         \
    "\"state\":{"               \
    "\"reported\":{"            \
    "\"powerOn\":%02d"          \//<-- I changed from 01d to 02d to receive two characters
    "}"                         \
    "},"                        \
    "\"clientToken\":\"%06lu\"" \
    "}"
#define EXPECTED_REPORTED_JSON_SIZE    ( sizeof( SHADOW_REPORTED_JSON ) - 2)/was -3, instead of -2

I reverted this change, and it seems to work fine a little longer before the error occurs.

With a much older version of the same project, but I have not added more tasks than the current one (the new tasks don’t interact with the Shadow example code, they only manage other peripherals of the microcontroller), it seems to work correctly, that is, it doesn’t it presents the error shown in the OP, or if it happens, maybe it will be much longer.

I have looked for where AWS_IOT_SHADOW_MQTT_ERROR occurs, and I have placed breakpoints to determine which of the two occurs, but neither is reached

AwsIotShadowError_t _AwsIotShadow_ProcessOperation( IotMqttConnection_t mqttConnection,
                                                    const char * pThingName,
                                                    size_t thingNameLength,
                                                    _shadowOperation_t * pOperation,
                                                    const AwsIotShadowDocumentInfo_t * pDocumentInfo )
{
 ...
/* Publish to the Shadow topic name. */
        publishStatus = IotMqtt_TimedPublish( pOperation->mqttConnection,
                                              &publishInfo,
                                              0,
                                              _AwsIotShadowMqttTimeoutMs );

        /* Check for errors from the MQTT publish. */
        if( publishStatus != IOT_MQTT_SUCCESS )
        {
            IotLogError( "Failed to publish MQTT message to %s %.*s Shadow, error %s.",
                         _pAwsIotShadowOperationNames[ pOperation->type ],
                         thingNameLength,
                         pThingName,
                         IotMqtt_strerror( publishStatus ) );

            /* Convert the MQTT "NO MEMORY" error to a Shadow "NO MEMORY" error. */
            if( publishStatus == IOT_MQTT_NO_MEMORY )
            {
                status = AWS_IOT_SHADOW_NO_MEMORY;
            }
            else
            {
                status = AWS_IOT_SHADOW_MQTT_ERROR; //<---------------------------
            }
....
}

&

static AwsIotShadowError_t _modifyOperationSubscriptions( IotMqttConnection_t mqttConnection,
                                                          const char * pTopicFilter,
                                                          uint16_t topicFilterLength,
                                                          _mqttCallbackFunction_t callback,
                                                          _mqttOperationFunction_t mqttOperation )
{
    ....

if( mqttStatus != IOT_MQTT_SUCCESS )
    {
        IotLogError( "Failed to %s %.*s, error %s.",
                     mqttOperation == IotMqtt_TimedSubscribe ? "subscribe to" : "unsubscribe from",
                     topicFilterLength,
                     pTopicFilter,
                     IotMqtt_strerror( mqttStatus ) );

        /* Convert the MQTT "NO MEMORY" error to a Shadow "NO MEMORY" error. */
        if( mqttStatus == IOT_MQTT_NO_MEMORY )
        {
            return AWS_IOT_SHADOW_NO_MEMORY;
        }
        return AWS_IOT_SHADOW_MQTT_ERROR; //<------
    }
    else
    {
        IotLogDebug( "Successfully %s %.*s",
                     mqttOperation == IotMqtt_TimedSubscribe ? "subscribed to" : "unsubscribed from",
                     topicFilterLength,
                     pTopicFilter );
    }
   ....
}

Regarding the old project that has no problems, also modify SHADOW_DESIRED_JSON and SHADOW_REPORTED_JSON to receive and send 2 characters and it works correctly, so it seems that the modification of the JSONs is not the cause of the problem

I’ve been tracking where this occurs:

And I have reached the prvHandleEstablished function in the file FreeRTOS_TCP_IP.c

In this function, ucTCPFlags has the ipTCP_FLAG_FIN flag in a logical one, that is ((ucTCPFlags & (uint8_t) ipTCP_FLAG_FIN)! = 0u) is true.

Below is an if where it is verified that pxSocket-> u.xTCP.bits.bFinSent == pdFALSE_UNSIGNED, but the else condition occurs.

Inside the else, the value of lDistance is obtained, which is not greater than one and therefore xMayClose remains true

When xMayClose remains true, the prvTCPHandleFin function is executed, which in turn causes the TCP state change: vTCPStateChange (pxSocket, eLAST_ACK);

static BaseType_t prvHandleEstablished( FreeRTOS_Socket_t *pxSocket, NetworkBufferDescriptor_t **ppxNetworkBuffer,
	uint32_t ulReceiveLength, UBaseType_t uxOptionsLength )
{
  
   ...
   if( ( pxSocket->u.xTCP.bits.bFinAccepted != pdFALSE_UNSIGNED ) || ( ( ucTCPFlags & ( uint8_t ) ipTCP_FLAG_FIN ) != 0u ) )
	{
        /* Peer is requesting to stop, see if we're really finished. */
		xMayClose = pdTRUE;
        if( pxSocket->u.xTCP.bits.bFinSent == pdFALSE_UNSIGNED )
		{

        }
       else
       {
           lDistance = (int32_t) (ulSequenceNumber + ulReceiveLength - pxTCPWindow-> rx.ulCurrentSequenceNumber);
           if( lDistance > 1 )
		   {
			 FreeRTOS_debug_printf( ( "Refusing FIN: Rx not complete %ld (cur %lu high %lu)\n",
						lDistance, pxTCPWindow->rx.ulCurrentSequenceNumber - pxTCPWindow->rx.ulFirstSequenceNumber,
						pxTCPWindow->rx.ulHighestSequenceNumber - pxTCPWindow->rx.ulFirstSequenceNumber ) );
			 xMayClose = pdFALSE;
		   }
       }
    }
     ....
     if( xMayClose != pdFALSE )
     {
		pxSocket->u.xTCP.bits.bFinAccepted = pdTRUE_UNSIGNED;
		xSendLength = prvTCPHandleFin( pxSocket, *ppxNetworkBuffer ); <--- Here
	 }
     ......
}

What I cannot identify in which part the ipTCP_FLAG_FIN flag is activated in the ucTCPFlags variable.

There are several parts of the code where this flag is activated, but what happens first is the if of the prvHandleEstablished function and the ucTCPFlags variable already has the flag activated.

Hello @DominusDRR ,

I’m taking the example that the microcontroller manufacturer provides for the Shadow demo, so I’m not sure what timer you mean.

I meant keep alive timer. If you are noticing disconnect from broker, it may be due to inactivity on the connection. It should be part of MQTT configuration and is made available in demo. For example, it is configured here for shadow demo (this version is similar to what you are using): https://github.com/aws/amazon-freertos/blob/202007.00/demos/shadow/aws_iot_demo_shadow.c#L86
The default is 60, that means if you don’t send anything for 60 seconds , the sever will disconnect the connection after 90 seconds approx.

Regarding how to detect the connection closure, the version you are using, does not a clean way to report server side disconnect. So if you detect any failure while using any shadow functions which indicate network failure, I would recommend disconnecting the MQTT connection and establish it again.

Ok. Thanks

I’m going to modify these times to determine what happens.

What I don’t understand is why the disconnection doesn’t happen in a previous version, the only difference, as I explained before, is that the current version has more tasks to manage other peripherals that are not related to the Shadow demo.

It seems to me that the manufacturer example that I’m taking for the development of our product is designed to always work.

This is that in the RunShadowDemo function there is an infinite loop as follows:

int RunShadowDemo( bool awsIotMqttMode,
                   const char * pIdentifier,
                   void * pNetworkServerInfo,
                   void * pNetworkCredentialInfo,
                   const IotNetworkInterface_t * pNetworkInterface )
{
    /* Return value of this function and the exit status of this program. */
    int status = 0;
    ...
 if( status == EXIT_SUCCESS )
    {
        /* Clear the Shadow document so that this demo starts with no existing
         * Shadow. */
        _clearShadowDocument( mqttConnection, pIdentifier, thingNameLength );

        /* Send Shadow updates. */
        
        status = _sendShadowUpdates( &deltaSemaphore,
                                     mqttConnection,
                                     pIdentifier,
                                     thingNameLength );
            for(;;) // <------------- infinite loop
            {
                                
                GetShadow(&deltaSemaphore,
                                     mqttConnection,
                                     pIdentifier,
                                     thingNameLength );// this is just a delay --> vTaskDelay (100);
            }
        /****************  The code below the loop is never reached **************************************/
        /* Delete the Shadow document created by this demo to clean up. */
        _clearShadowDocument( mqttConnection, pIdentifier, thingNameLength );
    }
     /****************  The code below  is never reached **************************************/
    /* Disconnect the MQTT connection if it was established. */
    if( connectionEstablished == true )
    {
        IotMqtt_Disconnect( mqttConnection, 0 );
    }

    /* Clean up libraries if they were initialized. */
    if( librariesInitialized == true )
    {
        _cleanupDemo();
    }
    ...
}

Hi.

It seems I have found the cause and have fixed the problem.

I will try to explain it.

As I mentioned before, I have an older version of the project that doesn’t get the error.

I made a copy of the current project and began to remove everything that has been added.

First I removed all the new tasks until they remain the same as the functional project, then the tasks that remained, equal to the functional project, change their priority 1, change some CPU properties such as disabling the prefeth and other features of flash memory, etc.

Every time I made a change, I would test the project, but the error kept happening.

Until I came up with a clock for a number of peripherals, including some timers.

I use two timers, one to generate a PWM signal for a buzzer for an audible signal and another to create an I2C bus, of the Bit Bang type.

This bus was configured for 50MHz and in the project that has no problems, it was configured for 100MHz

I changed the clock frequency of these peripherals to 100 MHz as in the project it works (obviously I also reconfigured everything related to the peripherals that use those timers) and voila!, now the current project works great, the problem has disappeared.

Conclusions.

I thought that freeRTOS used the Core Timer, which is independent of peripheral clocks:

That implies that it uses one of the timers that use the clock that I modified, possibly timer 1 which is a true 32-bit timer. Tomorrow I may try to find in the code which timer it uses.

Thank you for taking time to report back.

1 Like

Timer 1 is used. When I changed the clock frequency of this peripheral, the mentioned problem occurred.

/*
By default port.c generates its tick interrupt from TIMER1.  The user can
override this behaviour by:
	1: Providing their own implementation of vApplicationSetupTickTimerInterrupt(),
	   which is the function that configures the timer.  The function is defined
	   as a weak symbol in this file so if the same function name is used in the
	   application code then the version in the application code will be linked
	   into the application in preference to the version defined in this file.
	2: Define configTICK_INTERRUPT_VECTOR to the vector number of the timer used
	   to generate the tick interrupt.  For example, when timer 1 is used then
	   configTICK_INTERRUPT_VECTOR is set to _TIMER_1_VECTOR.
	   configTICK_INTERRUPT_VECTOR should be defined in FreeRTOSConfig.h.
	3: Define configCLEAR_TICK_TIMER_INTERRUPT() to clear the interrupt in the
	   timer used to generate the tick interrupt.  For example, when timer 1 is
	   used configCLEAR_TICK_TIMER_INTERRUPT() is defined to
	   IFS0CLR = _IFS0_T1IF_MASK.
*/
#ifndef configTICK_INTERRUPT_VECTOR
	#define configTICK_INTERRUPT_VECTOR _TIMER_1_VECTOR
	#define configCLEAR_TICK_TIMER_INTERRUPT() IFS0CLR = _IFS0_T1IF_MASK
#else
	#ifndef configCLEAR_TICK_TIMER_INTERRUPT
		#error If configTICK_INTERRUPT_VECTOR is defined in application code then configCLEAR_TICK_TIMER_INTERRUPT must also be defined in application code.
	#endif
#endif