KeepAliveTimeout code in CoreMQTT

I spotted some code in CoreMQTT library that may contain potential bug and I wanted to double-check with the forum to see whether it’s valid or not.
The code below can be found in core_mqtt.c and it’s called whenever there is no message for MQTT Agent to service during 1 second periodic check(roughly stated).
The issue I see is that after a ping request is sent due to no MQTT activity for keepAliveMs time, MQTT Agent will wait another keepAliveMs time because MQTT_Ping() will time-stamp lastPacktTime and the elapsed time will be calculated from the time at which a ping request is sent.
So the overall time it takes to declare KeepAliveTimeout will be 2xKeepAliveMs.
Perhaps I am missing something?

static MQTTStatus_t handleKeepAlive( MQTTContext_t * pContext )
{
    MQTTStatus_t status = MQTTSuccess;
    uint32_t now = 0U, keepAliveMs = 0U;

    assert( pContext != NULL );
    assert( pContext->getTime != NULL );

    now = pContext->getTime();
    keepAliveMs = 1000U * ( uint32_t ) pContext->keepAliveIntervalSec;

    /* If keep alive interval is 0, it is disabled. */
    if( ( keepAliveMs != 0U ) &&
        ( calculateElapsedTime( now, pContext->lastPacketTime ) > keepAliveMs ) )
    {
        if( pContext->waitingForPingResp == true )
        {
            /* Has time expired? */
            if( calculateElapsedTime( now, pContext->pingReqSendTimeMs ) >
                MQTT_PINGRESP_TIMEOUT_MS )
            {
                status = MQTTKeepAliveTimeout;
            }
        }
        else
        {
            status = MQTT_Ping( pContext );
        }
    }

    return status;
}

Adding line numbers:

1  static MQTTStatus_t handleKeepAlive( MQTTContext_t * pContext )
2  {
3      MQTTStatus_t status = MQTTSuccess;
4      uint32_t now = 0U, keepAliveMs = 0U;

5      assert( pContext != NULL );
6      assert( pContext->getTime != NULL );

7      now = pContext->getTime();
8      keepAliveMs = 1000U * ( uint32_t ) pContext->keepAliveIntervalSec;

      /* If keep alive interval is 0, it is disabled. */
9      if( ( keepAliveMs != 0U ) &&
10         ( calculateElapsedTime( now, pContext->lastPacketTime ) > keepAliveMs ) )
11     {
12         if( pContext->waitingForPingResp == true )
13         {
14            /* Has time expired? */
15            if( calculateElapsedTime( now, pContext->pingReqSendTimeMs ) >
16                MQTT_PINGRESP_TIMEOUT_MS )
17            {
18                status = MQTTKeepAliveTimeout;
19            }
20        }
21        else
22        {
23            status = MQTT_Ping( pContext );
24        }
25    }

26    return status;
27  }

This is my reading of the code:

  1. handleKeepAlive() is called.
  2. Assume keepAliveMs have passed since a packet was sent. Line 10 evaluates to true.
  3. Assume not waiting to a reply from the last ping. Line 12 evaluatates to false.
  4. MQTT_Ping() is called which (presumably) sets waitingForPingResp to be set to true. From your report it sounds like calling MQTT_Ping() also sets lastPacketTime to the current time.
  5. handleKeepAlive() exits.
  6. handleKeepAlive() gets called again within keepAliveMs of MQTT_Ping() having been called.
  7. Line 10 evaluates to false and the function returns. This and the prior step may repeat several times.
  8. handleKeepAlive() gets called again, this time after keepAliveMs have elapsed since MQTT_Ping() was called.
  9. What happens now depends on whether the ping reply was received in the interim. If not waitingForPingResp will still be true, and assuming keepAlilveMs is greater than pingReqSenTimeMs, lines 12 and 15 both evaluate to true and MQTTKeepAliveTimeout is returned. If the ping reply was received in the interim, line 12 evaluates to flase, and MQTT_Ping() is called again - which is ok because keepAliveMs have passed since it was last called.

Do you agree?

Hi Richard,
Thanks for sharing your reading of the code.
I agree with your outline.

The issue I am raising is on how long it takes for MQTTKeepAliveTimeout to be reported by the function.
For example, let’s say keepaliveMs is equal to 10 seconds.
Talking in the context of CoreMQTT and MQTTAgent, based on this code,
PING request will be sent if elapsed time since lastPacketTime is greater than 10 seconds.
And in the case PING response is NOT received for another 10 seconds, the function will finally return MQTTKeepAliveTimeout because, as you also noted, lastPacketTime is reset when PING request is sent.
So there will be total of 20 seconds until MQTTKeepAliveTimeout is detected even though keepAliveMs is set to 10 seconds.
Do you agree?

Yes - I think that is why my reading of the code I posted above shows, but there may be other things going on which are outside of that code snippet. I will ask for somebody to investigate further and post back.

Hi @jamesk,

Thank you for bringing this bug to our attention. I’ve created a pull request for this here