I noticed that in the case when a network becomes available again, but when the modem registration status is sticking to anything but home, a manual cycle of Cellular_RFoff(), pause and Cellular_RFon() quickly reregisters the modem to the network. Without doing this, it can take a really long time (>30mins) before the modem re-registers itself.
So I implemented a mechanism that starts a timer when the RegisterUrcNetworkRegistrationEventCallback is triggered and the reported connection is not home. When the timer expires after a short while, a check is performed on the registration status and if it is still not home, a cycle of Cellular_RFoff()-pause(1s)-CellularRFon() is performed.
If there is reception again, the Cellular_RFon() will then quickly result in a state change to REGISTRATION_STATUS_REGISTERED_HOME.
I also implemented a flag representing the registration status. Whenever the MQTTAgent returns an error the flag is raised, it also kicks off the timer which forces the re-registration. The flag is only lowered when the network state returns to REGISTRATION_STATUS_REGISTERED_HOME.
This flag is then in turn used to halt any socketconnect() calls, so when socketconnect() is called the function first waits for the right network state before attempting to connect the socket.
I have tested this mechanism in the scenario where halfway during an OTA job, I remove the antenna, simulating bad reception, and this works. The OTADemo then suspends the OTA Agent and disconnects the socket, waits for the network re-registration, makes a new TLS socket and resumes the agent successfully, leading to a successful OTA update.
There are still some other scenarios I need to test, for instance when the connection drops during different moments in the TLS setup after the socket is connected, so I’m not 100% sure yet this is the final solution.
I do feel like what I came up with is a bodged solution for a fundamental core problem that should have already been addressed in the libraries I’m using.
Any opinions here on this?