OTA session fail after several successful file block transfers only on specific access point

My target is STM32L475 with Inventek WiFi module.
My code base is 202002 release.
I have a working OTA module that works fine on two of my development environment site Access Points.
I installed the system on a beta site location and the basic MQTT\Shadow is working fine there.
When I try to start an OTA job on this beta site I see that every time, after about 10-20 blocks of 1K I get an error (“Error -1 while sending data.”) and the OTA process fail.
What could be the cause of that? The OTA process is working great on my development site with the other two Access point I tested it on. Can it be something with the block size? The time between blocks request?

Hi Eyal,

Can you please share the complete logs? From the information that you shared it seems the connection is disconnected before the OTA process completes. The error Error -1 while sending data is coming from IotNetworkAfr_Send function.

In aws_ota_agent_config.h you can try to lower the number of blocks requested from the service per request using otaconfigMAX_NUM_BLOCKS_REQUEST. You can also increase the timeout send per request using the configuration parameter otaconfigFILE_REQUEST_WAIT_MS.

Also in this release the MQTT connection is established in OTA demo and the connection handle is passed to OTA library. The MQTT or OTA library does not auto reconnect on disconnection and this must be handled in the application using MQTT disconnect callback.

Hi Pvyawaha,
Thank you for your replay.
I attached here a stripped log of the OTA session (I had to clean some of my application specific log entries)
Regarding the MQTT connection you mention, my app use the same MQTT connection that is being used for the shadow updating also for the OTA.
In case this MQTT connection fails the system is close the connection after cleanup of the shadow and OTA tasks and then try to reconnect to the network (access point) and establish MQTT connection again and then reconnect to the shadow and OTA.

StripOTALog.zip (4.5 KB)

Thank you for sharing the logs and it confirms the network is disconnected while OTA is in progress.

Can you please check the reason of network disconnection and if that can be resolved? Apart from that mqttconfigKEEP_ALIVE_TIMEOUT_TICKS can also be increased to allow more time for MQTT ping response while OTA is in progress. You can also tweak the OTA configuration parameters I mentioned above.

When it disconnects and after reconnecting the OTA starts downloading the file again. We are working on providing support for resuming OTA process after reconnect in the OTA demo and I will update here when it is released.

Thank you again Pvyawaha,
First, after I changed the otaconfigMAX_NUM_BLOCKS_REQUEST to 1 (Instead of the 4 it was before) I was able to get a full OTA update on this problematic Access Point. So this is great, thanks!
Now, after I have it working I want to understand better what happened so I’ll try to get dipper into this. my otaconfigFILE_REQUEST_WAIT_MS was changed before this problem arise to 5sec, I’ll try to change the mqttconfigKEEP_ALIVE_TIMEOUT_TICKS too, I’ll try to increase the number of block a bit higher then 1 and see how this is working.
As for what you mention in the end of your response, this feature will be great ( Resuming the OTA file downloading from the last point and not restart from scratch) , it can be very frustrating to have 80% of the file downloaded and flash and then it fails from some communication issue and the all process start again.


Thanks for sharing that it worked for you. One way to troubleshoot networking issues with access point will be uisng tools like Wireshark to log and analyze the network activity.

Regarding OTA library configuration here is a diagram that I created that shows difference when using different block size ( 1KB vs 4 KB) and the otaconfigFILE_REQUEST_WAIT_MS timeout. The otaconfigMAX_NUM_BLOCKS_REQUEST is maximum, for 1 KB it is 128 blocks and for 4KB it is 32 blocks. This is due to service sending maximum 128KB of data per request.

1 Like