OTA Demo hangs when initializing MQTT Connection

I am running on a custom board using an STM32L496RGT6 microcontroller which has integrated freertos (release 202002.00) to successfully run the MQTT demo and the HTTPS demo.

When I update aws_demo_config.h and define CONFIG_OTA_UPDATE_DEMO_ENABLED, the demo code executes but hangs in prxCreateNetworkConnection inside the ota demo code. The full stack can be found below:

Does anyone have any thoughts about why the the MQTT connection can be established for the MQTT demo, but would hang for the OTA demo? Any suggestions on changes that could be made so the demo code progresses further?

1 Like

[moved to AWS category]

1 Like

Hello,

Thank you for reporting this issue and sharing call stack.

As you mentioned you are using a port on custom platform based on STM32L496RGT6 (instead of STM32l475 port) and observing this issue when running OTA demo during MQTT connection, can you please check the memory configuration on this platform and enough stack and heap is configured? My first suspect here is that we might be running out of heap memory when MQTT connection is being established. The size of the FreeRTOS heap is set by the configTOTAL_HEAP_SIZE configuration constant in FreeRTOSConfig.h.

Please check if vApplicationMallocFailedHook is defined and is getting called. This function is called if pvPortMalloc fails due to insufficient memory available in the FreeRTOS heap and configUSE_MALLOC_FAILED_HOOK is set to 1 in FreeRTOSConfig.h.

You can also share the console logs after enabling MQTT debug logs using the configuration mqttconfigENABLE_DEBUG_LOGS in config file aws_mqtt_config.h.

Please let us know if that helps to resolve it or if you have more questions.

Hello,

Thank you for the suggestions. I’ve tried them out and have some more information.

  • I’m using Heap5. I bumped configTOTAL_HEAP_SIZE from 100kB to 200 kB with no change in behaviour. With both heap sizes, vApplicationMallocFailedHook is not getting called.
  • I’ve set mqttconfigENABLE_DEBUG_LOGS to 1 but no additional messages have been logged. This is the current output:
0 509 [Tmr Svc] WiFi module initialized.
1 616 [Tmr Svc] Write certificate...
2 6476 [Tmr Svc] WiFi connected to AP LAN.
3 6480 [Tmr Svc] IP Address acquired 192.168.0.15
4 6486 [iot_thread] [INFO ][DEMO][6486] ---------STARTING DEMO---------
5 6494 [iot_thread] [INFO ][INIT][6494] SDK successfully initialized.
6 10151 [iot_thread] Write certificate...
7 10708 [iot_thread] [INFO ][DEMO][10707] Successfully initialized the demo. Network type for the demo: 1
8 10717 [iot_thread] [INFO ][MQTT][10717] MQTT library successfully initialized.
9 10724 [iot_thread] OTA demo version 0.9.2
10 10728 [iot_thread] Creating MQTT Client...
  • I’ve also investigated the hanging behaviour further. I have found that the code gets stuck inside bignum.c in the function mbedtls_mpi_mod_mpi. The following is code that loops forever:

while( mbedtls_mpi_cmp_mpi( R, B ) >= 0 )
MBEDTLS_MPI_CHK( mbedtls_mpi_sub_mpi( R, R, B ) );

Could there be a configuration value that needs updating so mbedtls doesn’t get stuck here? Any other thoughts?

Although it sounds like this might not be the issue in this case, vApplicationMallocFailedHook() will only get called if you are using one of the FreeRTOS memory allocation schemes, and if configUSE_MALLOC_FAILED_HOOK is defined in FreeRTOSConfig.h.

Just double checked the code:

  • I am using heap5
  • configUSE_MALLOC_FAILED_HOOK is set to 1
  • If I set heap to 25kB the code hits vApplicationMallocFailedHook
  • If I use a heap of size 200kB, vApplicationMallocFailedHook is not hit. Instead, the previously described halting behaviour is observed

Thank you for sending in these suggestions, I appreciate the second opinion. Do you have any more thoughts?

Hello, at this point, it seems just as likely that you might be blowing out your stack (or both). I suggest modifying the library code temporarily to make calls to xPortGetFreeHeapSize and uxTaskGetStackHighWaterMark. Strategic locations for those calls could be in SOCKETS_Connect and C_Sign, for example.

Specifically for isolating potential stack overflow, in your debugger, capture the start address of the stack for the above thread. As you’re stepping through the code and your call stack gets deeper, periodically check your stack pointer against the starting address and your max size (keeping in mind that the latter gets defined in words, not bytes).

1 Like

I followed your suggestions and confirmed the stack was overflowing for the demo task. By increasing the stack size I was able to get the OTA demo running to the point where it is constantly checking for updates.