[Microchip PIC32] AWS IoT: OTA Bootloader -> System Crash

Hello everyone!

We are having a problem implementing the bootloader proposed by the amazon team in the microchip demos and we are looking for any hints on how to debug the issue.

The problem description is:

  1. We are using a Microchip PIC32MZ2048EFM100 based solution in a custom Hardware that connects and sends information to AWS IoT Core via MQTT. And we are trying to implement the OTA Updates functionalities. Those functionalities rely upon a bootloader, and we are using the bootloader proposed by Amazon in their demos. But when the bootloader is included the main application won’t start. We’re using the Amazon FreeRTOS V202002.00.

  2. The code base is integrated in a Harmony 3 project. It uses internally the same code that amazon provided in their demos but as a module of a Harmony 3. Harmony 3 is a Microchip framework to facilitate embedded development.

  3. I’m going to share with you two log files.

  • “Boot.log” shows the behavior of the system when working with the Bootloader. Nothing happens after the end of the log.
  • “OTA.log” shows the behavior of the system working WITHOUT the bootloader. As expected, the OTA update fails but the log was captured to show the application running.
  1. The point where the system crash is: Unknown function ( ) at c:/projects/aws_gateway_azimut_ota/firmware/src/config/eth1/exceptions.c : 116 Runtime exception @ PC address 0x9d04d0e8. Function and line number unavailable. at : 0
  • I know that the information that the system crash shows is not very useful, so I’ve created a simplified version of the project that shows the issue.
  1. I’m working with MPLAB X v5.40, and XC32 Compiler v3.01 with the Harmony 3 libraries.

You can find the example project and the log files here: drive. google. com/drive/folders/1N3wMqZKEmlpADwqPqDkORBwyDXHghimi?usp=sharing

Thank you all for your time.

Does this page: https://devices.amazonaws.com/detail/a3G0L00000AANscUAH/Curiosity-PIC32MZ-EF-FreeRTOS-Bundle, which links to this getting started guide https://docs.aws.amazon.com/freertos/latest/userguide/getting_started_mch.html help you?

Hi Richard, thank you for the quick reply.

The demos and the guides were useful to get the application ready and the bootloader ready on its own. The problem is when you get the two toguether in a custom app. The guides do not cover that because they were created based on a fully functional project.

My current guess is that the problem might be related with the linker script. But I haven’t been able to identify it.

Hello @jgarzon,

The application structure on the flash is described here Demo bootloader for the Microchip Curiosity PIC32MZEF - FreeRTOS. The first 32 bytes are the header plus image descriptor.

To start troubleshooting this can you please check if the application reset address is correctly set ( + 0x20 ) i.e 32 bytes in the application linker script?

From the bootloader logs it shows that the image header and descriptor are not present for both the image slots and bootloader defaults to execute the first image at 0xbd000020. It means the application is not signed and we have some other issues booting it or the application was not flashed ( or erased when flashing bootloader due to incorrect erase configuration while flashing )

Can you please check if the application project setting is referring to the bootloader as loadable component? Once this is set up can you step through by setting breakpoint when loading the application ? You can also verify that the bootloader and application are flashed at correct location using the MPLAB IPE tool.

Please let me know once you try this and we can proceed with debugging.

Hi @pvyawaha,
A quick summary of our last review of this:

  • We are using the same linker script files that the AWS demos use, so the addresses should be fine.
  • The bootloader is a loadable component in the project.
  • The breakpoint booting was of little to no use because the call stack doesn’t show any specific function that causes the system crash.
  • MPLAB IPE tool shows that the application is flashed at the correct location.
  • There’s a problem with the code signing and I’ll work on that. However, the application should still execute so our problem is different.

Hope to hear from you soon.

Thank you!

Hello @jgarzon,

Can you please make sure to do following before loading the application in BOOT_PAL_LaunchApplication.

  1. Disable interrupts -
    PLIB_INT_Disable( INT_ID_0 );

  2. Clear instruction cache -
    SYS_DEVCON_InstructionCacheFlush();

  3. Clear data cache -
    SYS_DEVCON_DataCacheFlush();

and then launch the application. Let me know if this resolves the issue.

Hi @pvyawaha

Making those changes does not resolve the issue but at least the call stack related to the system crash changes:

Runtime exception @ PC address 0x9d05e880 in function: Unknown function ( ) at c:/projects/aws_gateway/firmware/src/config/eth1/amazon-freertos/libraries/freertos_plus/standard/freertos_plus_tcp/source/freertos_ip.c : 1170

The line 1170 of freertos_ip.c shows:

if( ( xIPIsNetworkTaskReady() == pdFALSE ) && ( pxEvent->eEventType != eNetworkDownEvent ) )
	{
		/* Only allow eNetworkDownEvent events if the IP task is not ready
		yet.  Not going to attempt to send the message so the send failed. */
		xReturn = pdFAIL;
	}

So, the app is running but it fails along the way.
Does this behavior tell you something?
I’ll keep looking into this and let you know if I find anything.

Hello @jgarzon ,

This means the bootloader issue is resolved and application is crashing after boot.

I tried to reproduce this and could not with demo from the release V202002.00. Can you please verify the hardware initialization in your application by referring to the application in V202002.00 release.

Hi @pvyawaha,

I checked the initialization process for the V202002.00 and compared it with my project and I do not see any major differences. Also, the demo without the bootloader runs fine as shown in “OTA.log”.

I have been working on a different angle:

I built the simplest functional project including the bootloader. I had to remove the initialization of all the components but the clock and created a simple task that blinks a led. With this setup the system does not crash but I think there is a problem handling the interrupts when the bootloader is included. Functions like vTaskDelay puts the blinking task to a permanent blocked state, so I think that the ticking interrupt is not properly being handled. The only way that I could generate the blink is with a rudimentary for(int i= 0; i<100000000; i++);.

With this, I am going back to the linker script file. I think that there may be a problem with the way that interrupt sections are being placed.

Does this added information give you any ideas?

Thank you for your help.

Hi, does the bootloader remap the interrupts correctly? The bootloader might be handling the interrupts, causing it to catch the tick interrupt and returning it.

For the pic32, the interrupt module has 2 modes, Single Vector and Multi-Vector. In both cases there is the EBase register which has the base address of the exceptions (interrupts). The application will likely need to update the EBase register for the final location of the interrupt vector table. Locating this is likely part of the linker script.

Hi @archigup. Thank toy for your help.

Yes, the application redefines the _ebase_address, as well as the _RESET_ADDR.

Right now, I am trying with new linker script files proposed by microchip for both the bootloader and the application. The main differences are in some kseg0 definitions, alignments in the interrupt vector table, and something called the “TLB-Based MMU Initialization section for EBI/SQI memory regions”. I do not deeply understand these changes and its implications, but I will let you know my results.

Please tell me if you think on anything else.

Hello everyone!
I could finally find the problem and solve it. It was necessary to make one more change before loading the application from the bootloader.

Added to the @pvyawaha suggestions, I also had to clear the shadow register set (PRISS) in the BOOT_PAL_LaunchApplication, to properly manage the interruptions from the main app. This register is previously set in the EVIC initializations that Harmony 3 generates for the bootloader code.
For anyone that is facing the same problem, I will let you the current version of my BOOT_PAL_LaunchApplication:

void BOOT_PAL_LaunchApplication( const void * const pvLaunchAddress )
{
    void ( * pfApplicationEntry )( void ) = ( void ( * )( void ) )pvLaunchAddress;
    
    /* Disable interrupts and clean interrupt flags */
    SYS_INT_Disable();
    EVIC_SourceDisable(INT_SOURCE_CORE_TIMER);
    EVIC_SourceStatusClear(INT_SOURCE_CORE_TIMER);
    
    EVIC_SourceDisable(INT_SOURCE_ETHERNET);
    EVIC_SourceStatusClear(INT_SOURCE_ETHERNET);
    
    EVIC_SourceDisable(INT_SOURCE_FLASH_CONTROL);
    EVIC_SourceStatusClear(INT_SOURCE_FLASH_CONTROL);
    
    EVIC_SourceDisable(INT_SOURCE_UART4_FAULT);
    EVIC_SourceStatusClear(INT_SOURCE_UART4_FAULT);
    
    EVIC_SourceDisable(INT_SOURCE_UART4_RX);
    EVIC_SourceStatusClear(INT_SOURCE_UART4_RX);
    
    EVIC_SourceDisable(INT_SOURCE_UART4_TX);
    EVIC_SourceStatusClear(INT_SOURCE_UART4_TX);
 
    /* Clear instruction and data cache */
    SYS_CACHE_CleanDCache();
    
    /* Clear the shadow register set */
    PRISS = 0;
    
    /* Launch...*/
    ( *pfApplicationEntry )();
}

I would also mention that at last, the linker script file or the compiler had nothing to do with the issue, and I was able to use the same files that the AWS demos use. The application is working with both XC32v2.40 and XC32v3.01, under MPLAB X IDE v5.40.

The next step for me is to complete the code signing process, the unified binary generation, and test the OTA job, but the system crash that I opened this thread for is solved.

Thank you @archigup, @pvyawaha and @rtel for your time and ideas.