OTA Update Failed Update Multiple Things

leandropg wrote on June 27, 2019:

Currently, I am using Amazon FreeRTOS on a Curiosity PIC32MZEF Development Board. I tested the OTA functionality and this worked with a one “thing” perfectly. My problem is when I need update two o more things. I made this process to update two things and it fails in one. The process applied was:

  1. I have create two things called Thing1 and Thing2 in the AWS IoT Console

  2. I have programmed the two PIC32 with clientcredentialIOT_THING_NAME set to Thing1 and Thing2 respectively with the Version #4

  3. I created a new OTA file with the Version #5. This version was generated with clientcredentialIOT_THING_NAME set to Thing2. I don’t know which “thing” configure in clientcredentialIOT_THING_NAME (Thing1 or Thing2) because I need update both. So I set Thing2

  4. I created a Job with the new OTA file to update the Thing1 and Thing2 to Version #5

  5. Effectly the job start and the two “things” start to download the new Version #5

  6. When the update finish in the Thing2 this OTA finished successful, but the Thing1 crashes and never restart in normal mode. I reviewed the log file of the Thing1 and I saw that OTA process failed:


6 5717 [OTA Task] [OTA_CheckForSelfTest] Starting OTA_SelfTest timer.
7 5717 [OTA Task] [OTA_CheckForUpdate] Request #0
8 5814 [OTA Task] [prvParseJSONbyModel] Extracted parameter [ clientToken: 0:Thing2 ]
9 5814 [OTA Task] [prvParseJSONbyModel] parameter not present: execution
10 5814 [OTA Task] [prvParseJSONbyModel] parameter not present: jobId
11 5814 [OTA Task] [prvParseJSONbyModel] parameter not present: jobDocument
12 5814 [OTA Task] [prvParseJSONbyModel] parameter not present: afr_ota
13 5814 [OTA Task] [prvParseJSONbyModel] parameter not present: streamname
14 5814 [OTA Task] [prvParseJSONbyModel] parameter not present: files
15 5814 [OTA Task] [prvParseJSONbyModel] parameter not present: filepath
16 5814 [OTA Task] [prvParseJSONbyModel] parameter not present: filesize
17 5814 [OTA Task] [prvParseJSONbyModel] parameter not present: fileid
18 5814 [OTA Task] [prvParseJSONbyModel] parameter not present: certfile
19 5814 [OTA Task] [prvParseJSONbyModel] parameter not present: sig-sha256-ecdsa
20 5814 [OTA Task] [prvParseJobDoc] Ignoring job without ID.
21 5814 [OTA Task] [prvOTA_Close] Context->0x8004da2c
22 5814 [OTA Task] [prvPAL_Abort] Abort - OK
23 5815 [AWS-OTA] [OTA_AgentInit] Ready.
24 21716 [Tmr Svc] [prvSelfTestTimer_Callback] Self test failed to complete within 16000ms
25 21716 [Tmr Svc] [prvPAL_ResetDevice] Resetting the device.

It never restarted either Version # 4 or Version # 5. The node now does not working!!!

In summary, the OTA job update the Thing2 successfully and broke the Thing1. I have the questions for this behavior:

  1. What is the correct value for clientcredentialIOT_THING_NAME in the code for multiple devices? If I set this value for example with Thing1… How AWS OTA Job knows which things that must updated? All the things should have a same value in clientcredentialIOT_THING_NAME? How difference one thing from another? How an OTA Job knows what “things” must update and which not?

  2. How avoid that a “thing” crash after OTA failed? I can not permirt that a node stop in the customer field

  3. Where in the AWS IoT Console can see what version has a “Thing”? Can I see if a thing is online?

Thank you.

Edited by: leandropg on Jun 27, 2019 2:05 PM

Alexa-AWS wrote on June 29, 2019:

Hi,

  1. Currently it takes some modification to perform OTAs on multiple things. You have a couple options:

a. Define clientcredentialIOT_THING_NAME to be a function - for example getThingName(). Make sure that this function reads the thing name from a part of non-volatile memory that is not overwritten by the OTA, and that the thing name is provisioned initially before running the first job. If you are using the JITP flow, you might choose to read the thing name out of your device certificate’s common name.

b. Do a separate OTA build per thing name, subbing in different thing names for each build.

The OTA job is sent to the target(s) specified when creating the job.

  1. When the device performs an OTA, it uses it’s thing name to determine the OTA topic that it should subscribe to in order to indicate that the OTA was successful and to listen for further updates. If the OTA causes the device’s thing name to change, this would cause an issue with the reporting of the job status & topic subscription. I take your feedback that it should be easier/more intuitive to configure devices to obtain thing names using the method above in 1a so that this issue will not occur.

  2. In the AWS IoT console, you can go to Manage>Things and then click on your Thing. Next select ‘Jobs’ from the sidebar to see the last job run on that thing and whether the jobs succeeded. If you expand the job and then click on ‘View job details’, then ‘Details’ in the sidebar, you can also expand to “View stored job file” and see which file was sent to the device in that OTA to determine which version number was sent.

If your device is communicating w/ AWS IoT (for example sending MQTT message to a topic or updating it’s shadow) you should be able to see this communcation/updates in the console.


To answer the question in your second post, error code 0x23000000 indicates that the firmware update was rejected due to the firmware version number being the same. OTA error masks can be found in aws_ota_agent.h and start with kOTA_Err_*. I will file a ticket to investigate the behavior of the OTA agent after receiving an image with the wrong thing name and with the same version number as was previously running.

Let me know if this clears up the behavior you are seeing, and I will be passing this feedback on to our team for prioritization.

Thanks,
Alexa

leandropg wrote on June 28, 2019:

The other problem that I have is when I lanch an OTA update over Wifi. The firmware is dowload successful:


1411 73530 [OTA Task] [prvIngestDataBlock] File receive complete and signature is valid.
1412 73530 [OTA Task] [prvUpdateJobStatus] Msg: {"status":"IN_PROGRESS","statusDetails":{"self_test":"ready","updatedBy":"0x1001413 73634 [OTA Task] [prvUpdateJobStatus] 'IN_PROGRESS' to $aws/things/123/jobs/AFR_OTA-20190628-02/update
1414 73634 [OTA Task] [prvOTA_Close] Context->0x8004da7c

2019-01-01 00:01:15 [ INFO] [AWS-OTA] State: Active Received: 679 Queued: 679 Processed: 678 Dropped: 0
2019-01-01 00:01:15 [ INFO] [AWS-OTA] Received eOTA_JobEvent_Activate callback from OTA Agent1415 73745 [OTA Task] [prvUnSubscribeFromDataStream] OK: $aws/things/123/streams/AFR_OTA-4001a697-5235-42d2-bce1416 73745 [OTA Task] [prvPAL_Abort] Abort - OK
1417 73753 [OTA Task] [prvPAL_ActivateNewImage] Activating the new MCU image.
1418 73753 [OTA Task] [prvPAL_ResetDevice] Resetting the device.

At restart the microcontroller is in the new version, but when connect with OTA Task, the process say FAILED with reject code 0x23000000 and restart it again:


2019-01-01 00:00:07 [ INFO] [AWS-OTA] Connected to broker4 7171 [OTA Task] [prvSubscribeToJobNotificationTopics] OK: $aws/things/123/jobs/$next/get/accepted
5 7276 [OTA Task] [prvSubscribeToJobNotificationTopics] OK: $aws/things/123/jobs/notify-next
6 7277 [OTA Task] [OTA_CheckForSelfTest] Starting OTA_SelfTest timer.
7 7277 [OTA Task] [OTA_CheckForUpdate] Request #0
8 7415 [AWS-OTA] [OTA_AgentInit] Ready.
9 7416 [OTA Task] [prvParseJSONbyModel] Extracted parameter [ clientToken: 0:123]
10 7416 [OTA Task] [prvParseJSONbyModel] Extracted parameter [ jobId: AFR_OTA-20190628-02 ]
11 7416 [OTA Task] [prvParseJSONbyModel] Identified parameter [ self_test ]
12 7416 [OTA Task] [prvParseJSONbyModel] Extracted parameter [ updatedBy: 16777221 ]
13 7416 [OTA Task] [prvParseJSONbyModel] Extracted parameter [ streamname: AFR_OTA-4001a697-5235-42d2-bce4-0b43d99c0968 ]
14 7416 [OTA Task] [prvParseJSONbyModel] Extracted parameter [ filepath: envoy.bin ]
15 7416 [OTA Task] [prvParseJSONbyModel] Extracted parameter [ filesize: 693740 ]
16 7416 [OTA Task] [prvParseJSONbyModel] Extracted parameter [ fileid: 0 ]
17 7416 [OTA Task] [prvParseJSONbyModel] Extracted parameter [ certfile: aws_ota_codesigner_certificate.pem ]
18 7416 [OTA Task] [prvParseJSONbyModel] Extracted parameter [ sig-sha256-ecdsa: MEYCIQD553Dcm9XzictQLWklWZjfXoh2... ]
19 7416 [OTA Task] [prvParseJobDoc] In self test mode.
20 7416 [OTA Task] [prvParseJobDoc] Failing job. We rebooted and the version is still the same.
21 7416 [OTA Task] [prvPAL_SetPlatformImageState] Rejected image.
22 7416 [OTA Task] [prvUpdateJobStatus] Msg: {"status":"FAILED","statusDetails":{"reason":"rejected: 0x23000000"}}
23 7519 [OTA Task] [prvUpdateJobStatus] 'FAILED' to $aws/things/123/jobs/AFR_OTA-20190628-02/update
24 7519 [OTA Task] [prvResetDevice] Attempting forced reset of device...
25 7519 [OTA Task] [prvPAL_ResetDevice] Resetting the device.

I after this, the node never come back… The application never restarted and the node remain blocked!!! Only when I turn off and turn on the power, the node start in the previous version.

Where can I looking for this error: {“status”:“FAILED”,“statusDetails”:{“reason”:“rejected: 0x23000000”}? What is significate of 0x23000000?

Thank you

leandropg wrote on July 22, 2019:

Thank for your response. I applied many changes and now works for multiple things. But now I have another problems:

  1. I have created an OTA Update over 21 things. Some things update successfully, but another fail. But I don’t see what is the reason of the update fail on specific Thing. Where I can obtain more information of the fail of the specific thing in the OTA Job?

  2. How can retry an OTA fail for a specific thing in a OTA Job? Is neccesary create another job for this? Is very difficult trace those problems and create and create jobs when it fails

Thanks

DanG-AWS wrote on October 15, 2019:

1/ AWS IoT provides integration with CloudWatch to collect logs from devices. You can follow this documentation to setup, https://docs.aws.amazon.com/iot/latest/developerguide/monitoring_overview.html. Particularly for the use case of FreeRTOS OTA, you can check this documentation part, https://docs.aws.amazon.com/iot/latest/developerguide/cloud-watch-logs.html#job-logs and the Describe Job Execution Logs part. Whenever FreeRTOS call UpdateJobExecution of AWS IoT, the status and details will be updated, but only the final status and details can be stored into CloudWatch.

2/ There are two cases in this situation, (a) device already updated terminal status to the cloud, (b) device not yet update terminal status to the cloud. In (a), you’ll have to create a new job containing those failed devices and execute the OTA again. In (b), you can still get the same job document and retry and no new job is required.