Issues with ARM9 port and Link Time Optimization

I have a really specific problem and was wondering if some experts here who are more familiar
with the older compilers have an idea what might cause this issues.

I am using a port for the SAM9G20 core (which is already considered legacy)

There is a macro function to restore the port context which is written in inline assembly:

#define portRESTORE_CONTEXT()											\
{																		\
extern volatile void * volatile pxCurrentTCB;							\
extern volatile unsigned portLONG ulCriticalNesting;					\
																		\
	/* Set the LR to the task stack. */									\
	asm volatile (														\
	"LDR		R1, =pxCurrentTCB								\n\t"	\
	"LDR		R0, [R1]										\n\t"	\
	"LDR		LR, [R0]										\n\t"	\
																		\
	/* The critical nesting depth is the first item on the stack. */	\
	/* Load it into the ulCriticalNesting variable. */					\
	"LDR		R0, =ulCriticalNesting							\n\t"	\
	"LDMFD	LR!, {R1}											\n\t"	\
	"STR		R1, [R0]										\n\t"	\
																		\
	/* Get the SPSR from the stack. */									\
	"LDMFD	LR!, {R0}											\n\t"	\
	"MSR		SPSR, R0										\n\t"	\
																		\
	/* Restore all system mode registers for the task. */				\
	"LDMFD	LR, {R0-R14}^										\n\t"	\
	"NOP														\n\t"	\
																		\
	/* Restore the return address. */									\
	"LDR		LR, [LR, #+60]									\n\t"	\
																		\
	/* And return - correcting the offset in the LR to obtain the */	\
	/* correct address. */												\
	"SUBS	PC, LR, #4											\n\t"	\
	);																	\
	( void ) ulCriticalNesting;											\
	( void ) pxCurrentTCB;												\
}

When I flash the hardware using the J-Link, it works without issues. If I transfer the binary on the boot flash memory and restart the core (letting the bootloader do the job of loading the software), the program gets stuck in the macro when starting the task scheduler. This only happens for code optimized with Link Time Optimization (-flto flag).
The following change fixed the issue:

I removed

extern volatile void * volatile pxCurrentTCB;
extern volatile unsigned portLONG ulCriticalNesting;	     

and

( void ) ulCriticalNesting;
( void ) pxCurrentTCB;

and now the program does not hang up in the macro anymore but starts regularly (although I still need to run some advanced tests, everything seems to work up until now).

I am using the newest ARM GCC Toolchain available (9.3.1-1.1.1) so maybe it’s just the combination of the new toolchain and very old code but maybe someone also has an idea why
those 4 lines cause issues… Especially because functions like portSAVE_CONTEXT use similar code.

UPDATE: Alright, it worked for some time and now it is stuck again in the same function so the problem is propably unrelated…

Kind Regards
Robin

The externs were originally added to enable the code to link, and the casting to void added because the compiler cannot see the variables getting used and generated warnings - I suspect the problem with LTO is similar in that if the compiler thinks they are not being used it may just remove them or at least do something weird with their linkage. If the code compiles and links with those edits you should be fine.

I actually had to perform three additional steps… it works for now but at this point I am not sure whether the problem is completely unrelated…

  1. Turn off optimization for function to start scheduler

    portBASE_TYPE xPortStartScheduler( void ) attribute((optimize(“O0”)));

  2. Flag function to start first task as noinline

    static void vPortISRStartFirstTask( void ) attribute((noinline));

  3. Add -fno-omit-frame-pointer to compilation flags for FreeRTOS source files

And none of this is necessary when flashing with the J-Link… Really weird. Also, sometimes the code did crash (or whatever is happening, sometimes the hard fault handler is triggered, sometimes not…) even when Link Time Optimization was disabled.

Kind Regards
Robin

Flashing with J-Link makes no difference to the executable, but may to how the hardware boots post flashing.

I think I found out the issue a few weeks later and it was completely unrelated to FreeRTOS
(super evil bug). The processor architecture uses seven ARM vectors and I used the sixth one which is reserved and normally not used to store the binary size manually (this is sometimes required, but when it was not, I thought I could use it for own purposes. That was wrong apparently…). So a really stupid mistake by me.

When not doing that, the changes above are not necessary…

Kind Regards
Robin

Thanks for taking the time to report back.

Unfortunately, the problem is occuring again on the AT91SAM9G20 development board (and I did not manipulate the binary in any way).

When flashing the application with a J-Link, the application works sometimes (?). It also tends to work when I set a breakpoint near the jump instruction. Currently, when the program is loaded by the bootloader instead of J-Link, it stops working at the portRESTORE_CONTEXT() routine most of the time. I traced down the problem to the same assembler routine, but this time the steps from above also do not help.

The software does not appear to be able to exit the following function:

 #define portRESTORE_CONTEXT()											\
{																		\
extern volatile void * volatile pxCurrentTCB;							\
extern volatile unsigned portLONG ulCriticalNesting;					\
																		\
	/* Set the LR to the task stack. */									\
	asm volatile (														\
	"LDR		R0, =pxCurrentTCB								\n\t"	\
	"LDR		R0, [R0]										\n\t"	\
	"LDR		LR, [R0]										\n\t"	\
																		\
	/* The critical nesting depth is the first item on the stack. */	\
	/* Load it into the ulCriticalNesting variable. */					\
	"LDR		R0, =ulCriticalNesting							\n\t"	\
	"LDMFD	    LR!, {R1}										\n\t"	\
	"STR		R1, [R0]										\n\t"	\
																		\
	/* Get the SPSR from the stack. */									\
	"LDMFD	    LR!, {R0}										\n\t"	\
	"MSR		SPSR, R0										\n\t"	\
																		\
	/* Restore all system mode registers for the task. */				\
	"LDMFD	    LR, {R0-R14}^									\n\t"	\
	"NOP														\n\t"	\
																		\
	/* Restore the return address. */									\
	"LDR		LR, [LR, #+60]									\n\t"	\
																		\
	/* And return - correcting the offset in the LR to obtain the */	\
	/* correct address. */												\
	"SUBS	    PC, LR, #4										\n\t"	\
	);																	\
	( void ) ulCriticalNesting;											\
	( void ) pxCurrentTCB;												\
}

I managed to verify that the pxCurrentTCB pointer is the same for the working and non-working variant with the following code

portBASE_TYPE xPortStartScheduler( void )
{
    extern volatile void * volatile pxCurrentTCB;
    /* Start the timer that generates the tick ISR.  Interrupts are disabled
    here already. */
    prvSetupTimerInterrupt();

    TRACE_INFO("Starting first FreeRTOS task..\n\r");
    TRACE_INFO("Current TCB Pointer: 0x%08x\n\r", (unsigned int) pxCurrentTCB);

    /* Start the first task. */
    vPortISRStartFirstTask();

    /* Should not get here! */
    return 0;
}

Does anyone have an idea how to best access the variables used in the assembler routine when not having access to a debugger? (as this problem only occurs when the program is loaded by the bootloader…). I also made sure that the processor is in supervisor mode when the function is called, and the CPSR register containing the processor mode is the same for the working and non-working version.

Kind Regards
Robin

Okay, I managed to access and print the stack of the task before the first task is started with the following piece of code:

static void debugFirstTaskStart(void) {
    extern volatile void * volatile pxCurrentTCB;
    TRACE_INFO("Starting first FreeRTOS task..\n\r");
    TRACE_INFO("Current TCB Pointer: 0x%08x\n\r", (unsigned int) pxCurrentTCB);
    uint32_t currentStackPtr = *((uint32_t*) pxCurrentTCB);
    TRACE_INFO("Current Stack Pointer: 0x%08x\n\r", (unsigned int) *((uint32_t*) pxCurrentTCB));
    TRACE_INFO("Current Critical Nesting: %d\n\r", (unsigned int) *((uint32_t*) currentStackPtr));
    currentStackPtr += 4;
    TRACE_INFO("Current SPSR: 0x%08x\n\r", (unsigned int) *((uint32_t*) currentStackPtr));
    currentStackPtr += 4;
    TRACE_INFO("Current function parameter: 0x%08x\n\r", (unsigned int) *((uint32_t*) currentStackPtr));
    currentStackPtr += 4;
    TRACE_INFO("R1: 0x%08x\n\r", (unsigned int) *((uint32_t*) currentStackPtr));
    currentStackPtr += 4;
    TRACE_INFO("R2: 0x%08x\n\r", (unsigned int) *((uint32_t*) currentStackPtr));
    currentStackPtr += 4;
    TRACE_INFO("R3: 0x%08x\n\r", (unsigned int) *((uint32_t*) currentStackPtr));
    currentStackPtr += 4;
    TRACE_INFO("R4: 0x%08x\n\r", (unsigned int) *((uint32_t*) currentStackPtr));
    currentStackPtr += 4;
    TRACE_INFO("R5: 0x%08x\n\r", (unsigned int) *((uint32_t*) currentStackPtr));
    currentStackPtr += 4;
    TRACE_INFO("R6: 0x%08x\n\r", (unsigned int) *((uint32_t*) currentStackPtr));
    currentStackPtr += 4;
    TRACE_INFO("R7: 0x%08x\n\r", (unsigned int) *((uint32_t*) currentStackPtr));
    currentStackPtr += 4;
    TRACE_INFO("R8: 0x%08x\n\r", (unsigned int) *((uint32_t*) currentStackPtr));
    currentStackPtr += 4;
    TRACE_INFO("R9: 0x%08x\n\r", (unsigned int) *((uint32_t*) currentStackPtr));
    currentStackPtr += 4;
    TRACE_INFO("R10: 0x%08x\n\r", (unsigned int) *((uint32_t*) currentStackPtr));
    currentStackPtr += 4;
    TRACE_INFO("R11: 0x%08x\n\r", (unsigned int) *((uint32_t*) currentStackPtr));
    currentStackPtr += 4;
    TRACE_INFO("R12: 0x%08x\n\r", (unsigned int) *((uint32_t*) currentStackPtr));
    currentStackPtr += 4;
    TRACE_INFO("Task stack: 0x%08x\n\r", (unsigned int) *((uint32_t*) currentStackPtr));
    currentStackPtr += 4;
    TRACE_INFO("R14: 0x%08x\n\r", (unsigned int) *((uint32_t*) currentStackPtr));
    currentStackPtr += 4;
    TRACE_INFO("Return address: 0x%08x\n\r", (unsigned int) *((uint32_t*) currentStackPtr));
}

With this code I was able to verify that all variables are the same for the working and non-working code, yet the code does not appear to be able to jump into the first task…

Kind Regards
Robin

This is really weird. I am able to load the bootloader into the SDRAM now so I c an also debug with a debugger probe.

This is the piece of code which performs the core part of the bootloader. It is executed either bare-metal (always works) or by FreeRTOS inside a task:

int perform_bootloader_core_operation() {
    LED_Clear(0);
    LED_Clear(1);
    int result = 0;
    //result = copy_sdc_image_to_sdram();
    result = copy_nandflash_image_to_sdram(PRIMARY_IMAGE_NAND_OFFSET, PRIMARY_IMAGE_RESERVED_SIZE,
            PRIMARY_IMAGE_SDRAM_OFFSET, false);
    if(result != 0) {};

    LED_Set(0);

#if BOOTLOADER_VERBOSE_LEVEL >= 1
    TRACE_INFO("Jumping to SDRAM application address 0x%08x!\n\r",
            (unsigned int) SDRAM_DESTINATION);
#endif

#if USE_FREERTOS == 1
    vTaskEndScheduler();
#endif
    CP15_Disable_I_Cache();

    jump_to_sdram_application(0x22000000 - 1024, SDRAM_DESTINATION);

    /* Should never be reached */
    return 0;
}

Now if I do this with FreeRTOS enabled and set a breakpoint at jump_to_sdram_application,
the primary image is loaded and works without issues. If I don’t set the breakpoint, it appears to crash somewhere in the main application (the first task can’t be started).

Does anyone have an idea why setting a breakpoint allows the code to run? Is this a timing related issue? I disabled all interrupts before jumping to the bootloader and I also took care in the low_level init function to clear all interrupts, but that does not appear to help…

Good one. Thanks for sharing a nice piece of info.

I was not able to solve this problem and now it is coming back on our target hardware as well.
The only way to solve this is to not use FreeRTOS in the bootloader… I also don’t know how to debug this properly because the crash appears to happen inside an assembler routine. Does anyone have an idea or has used FreeRTOS on bootloaders as well and have issues with random crashes?

Just wanted to report back that this issue was most likely related to the instruction cache being used in the bootloader but not being invalidated before jumping to the main application.

Thanks for taking the time to report back.