Imx rt1024 position independent code - fail to start first task

I am trying to get position independent code (compiling with -fpic) working on my evk board. I can compile and run my program, but as soon as I pull in freertos my app crashes on when starting the kernel.

I debugged through the application, both with and without compile with -fpic. Here’s what I can see.

  • My code calls vTaskStartScheduler.
  • freertos calls xPortStartScheduler.
  • eventually it calls prvPortStartFirstTask
  • and eventually prvGetNextExpireTime gets called

In prvGetNextExpireTime, it checks if pxCurrentTimerList is empty.

It’s hard to see, but the pointer pxCurrentTimerList points to 0x200072b8.
That address is valid, since my DTC ram starts from 0x20000000 and is 0x10000 large.

If you look close enough, in the bottom right corner the memory is shown. The size of the list is 0x0000000 and the pointers to next item and end of the list are both 0x200072c0. So that all looks perfectly OK.

When I step over
*pxListWasEmpty = listLIST_IS_EMPTY( pxCurrentTimerList );
the app crashes on a mem fault.

When I run my app without compiling it with -fpic. It gives the exact same memory addresses. Then it works fine. My one task gets started, and my led starts to blink.

So it’s clear that compiling with -fpic breaks my freertos, but I have no idea what the problem is…

Is there anybody who has experience with position independent code who can give me a hint here?

PS:
just found that the disassembled code contains an undefined function in the .got table.

200000a8: 200072b8 ; instruction: 0x200072b8

0x200072b8 happens to be the exact same address as where the pxCurrentTimerList is located.

Can anybody explain what the UNDEFINED instruction means?

I just tried building with the latest arm-none-eabi-gcc with the following command line:

-nostartfiles -ffreestanding -mthumb -mcpu=cortex-m3 -Wall -Wextra -g3 -fpic -ffunction-sections -fdata-sections -MMD -MP -MF"$(@:%.o=%.d)" -MT $@

…and it appears to run without any problem.

Which chip, compiler and compiler version are you using?

I am using the nxp imx rt 1024 board.

  • I created a new standard c++ project in mcuxpresso.
  • disabled the managed linker script and modified the “mcuxpresso generated linker script” :
    .vector : ALIGN(4)
    {
        __vector_table_flash_start__ = ADDR(.vector) ;
        KEEP(*(.isr_vector))
        __vector_table_flash_end__ = ABSOLUTE(.) ;
        . = ALIGN(4) ;
    } >PROGRAM_FLASH

     .got : ALIGN(4)
    {
        __global_offset_table_flash_start__ = LOADADDR(.got) ;
        __global_offset_table_itc_start__ = ADDR(.got) ;
        *(.got* .got.*)
        __global_offset_table_flash_end__ = . ;
    } >SRAM_DTC AT>PROGRAM_FLASH

I played around with vector tables to SRAM, but I don’t think it matters, so I am just pushing them to flash now. I’ll paste the entire linker script for reference.

GROUP (
  "libgcc.a"
  "libc_nano.a"
  "libstdc++_nano.a"
  "libm.a"
  "libcr_newlib_nohost.a"
  "crti.o"
  "crtn.o"
  "crtbegin.o"
  "crtend.o"
)

MEMORY
{
  /* Define each memory region */
                              /*0xE000ED08*/
  PROGRAM_FLASH (rx) : ORIGIN = 0x60000000, LENGTH = 0x400000 /* 4M bytes (alias Flash) */  
  SRAM_DTC (rwx) : ORIGIN = 0x20000000, LENGTH = 0x10000 /* 64K bytes (alias RAM) */  
  SRAM_ITC (rwx) : ORIGIN = 0x0, LENGTH = 0x10000 /* 64K bytes (alias RAM2) */  
  SRAM_OC (rwx) : ORIGIN = 0x20200000, LENGTH = 0x20000 /* 128K bytes (alias RAM3) */  
  BOARD_SDRAM (rwx) : ORIGIN = 0x80000000, LENGTH = 0x1e00000 /* 30M bytes (alias RAM4) */  
  NCACHE_REGION (rwx) : ORIGIN = 0x81e00000, LENGTH = 0x200000 /* 2M bytes (alias RAM5) */  
}
ENTRY(ResetISR)

SECTIONS
{
     /* Image Vector Table and Boot Data for booting from external flash */
    .boot_hdr : ALIGN(4)
    {
        __boot_hdr_start__ = ABSOLUTE(.) ;
        KEEP(*(.boot_hdr.conf))
        . = 0x1000 ;
        KEEP(*(.boot_hdr.ivt))
        . = 0x1020 ;
        KEEP(*(.boot_hdr.boot_data))
        . = 0x1030 ;
        KEEP(*(.boot_hdr.dcd_data))
        __boot_hdr_end__ = ABSOLUTE(.) ;
        . = 0x2000 ;
    } >PROGRAM_FLASH

    .vector : ALIGN(4)
    {
        __vector_table_flash_start__ = ADDR(.vector) ;
        KEEP(*(.isr_vector))
        __vector_table_flash_end__ = ABSOLUTE(.) ;
        . = ALIGN(4) ;
    } >PROGRAM_FLASH

     .got : ALIGN(4)
    {
        __global_offset_table_flash_start__ = LOADADDR(.got) ;
        __global_offset_table_itc_start__ = ADDR(.got) ;
        *(.got* .got.*)
        __global_offset_table_flash_end__ = . ;
    } >SRAM_DTC AT>PROGRAM_FLASH

    /* MAIN TEXT SECTION */
    .text : ALIGN(4)
    {
        /* Global Section Table */
        __section_table_start = .;
        __data_section_table = .;
        LONG(LOADADDR(.data));
        LONG(    ADDR(.data));
        LONG(  SIZEOF(.data));
        LONG(LOADADDR(.data_RAM2));
        LONG(    ADDR(.data_RAM2));
        LONG(  SIZEOF(.data_RAM2));
        LONG(LOADADDR(.data_RAM3));
        LONG(    ADDR(.data_RAM3));
        LONG(  SIZEOF(.data_RAM3));
        LONG(LOADADDR(.data_RAM4));
        LONG(    ADDR(.data_RAM4));
        LONG(  SIZEOF(.data_RAM4));
        LONG(LOADADDR(.data_RAM5));
        LONG(    ADDR(.data_RAM5));
        LONG(  SIZEOF(.data_RAM5));
        __data_section_table_end = .;
        __bss_section_table = .;
        LONG(    ADDR(.bss));
        LONG(  SIZEOF(.bss));
        LONG(    ADDR(.bss_RAM2));
        LONG(  SIZEOF(.bss_RAM2));
        LONG(    ADDR(.bss_RAM3));
        LONG(  SIZEOF(.bss_RAM3));
        LONG(    ADDR(.bss_RAM4));
        LONG(  SIZEOF(.bss_RAM4));
        LONG(    ADDR(.bss_RAM5));
        LONG(  SIZEOF(.bss_RAM5));
        __bss_section_table_end = .;
        __section_table_end = . ;
        /* End of Global Section Table */

        *(.after_vectors*)

       *(.text*)
       *(.rodata .rodata.* .constdata .constdata.*)
       . = ALIGN(4);
            /* C++ constructors etc */
            . = ALIGN(4);
            KEEP(*(.init))
            
            . = ALIGN(4);
            __preinit_array_start = .;
            KEEP (*(.preinit_array))
            __preinit_array_end = .;
            
            . = ALIGN(4);
            __init_array_start = .;
            KEEP (*(SORT(.init_array.*)))
            KEEP (*(.init_array))
            __init_array_end = .;
            
            KEEP(*(.fini));
            
            . = ALIGN(4);
            KEEP (*crtbegin.o(.ctors))
            KEEP (*(EXCLUDE_FILE (*crtend.o) .ctors))
            KEEP (*(SORT(.ctors.*)))
            KEEP (*crtend.o(.ctors))
            
            . = ALIGN(4);
            KEEP (*crtbegin.o(.dtors))
            KEEP (*(EXCLUDE_FILE (*crtend.o) .dtors))
            KEEP (*(SORT(.dtors.*)))
            KEEP (*crtend.o(.dtors))
            . = ALIGN(4);
            /* End C++ */
    } > PROGRAM_FLASH
    /*
     * for exception handling/unwind - some Newlib functions (in common
     * with C++ and STDC++) use this.
     */
    .ARM.extab : ALIGN(4)
    {
        *(.ARM.extab* .gnu.linkonce.armextab.*)
    } > PROGRAM_FLASH

    .ARM.exidx : ALIGN(4)
    {
        __exidx_start = .;
        *(.ARM.exidx* .gnu.linkonce.armexidx.*)
        __exidx_end = .;
    } > PROGRAM_FLASH
 
    _etext = .;
        
    /* DATA section for SRAM_ITC */

    .data_RAM2 : ALIGN(4)
    {
        FILL(0xff)
        PROVIDE(__start_data_RAM2 = .) ;
        PROVIDE(__start_data_SRAM_ITC = .) ;
        *(.ramfunc.$RAM2)
        *(.ramfunc.$SRAM_ITC)
        *(.data.$RAM2)
        *(.data.$SRAM_ITC)
        *(.data.$RAM2.*)
        *(.data.$SRAM_ITC.*)
        . = ALIGN(4) ;
        PROVIDE(__end_data_RAM2 = .) ;
        PROVIDE(__end_data_SRAM_ITC = .) ;
     } > SRAM_ITC AT>PROGRAM_FLASH

    /* DATA section for SRAM_OC */

    .data_RAM3 : ALIGN(4)
    {
        FILL(0xff)
        PROVIDE(__start_data_RAM3 = .) ;
        PROVIDE(__start_data_SRAM_OC = .) ;
        *(.ramfunc.$RAM3)
        *(.ramfunc.$SRAM_OC)
        *(.data.$RAM3)
        *(.data.$SRAM_OC)
        *(.data.$RAM3.*)
        *(.data.$SRAM_OC.*)
        . = ALIGN(4) ;
        PROVIDE(__end_data_RAM3 = .) ;
        PROVIDE(__end_data_SRAM_OC = .) ;
     } > SRAM_OC AT>PROGRAM_FLASH

    /* DATA section for BOARD_SDRAM */

    .data_RAM4 : ALIGN(4)
    {
        FILL(0xff)
        PROVIDE(__start_data_RAM4 = .) ;
        PROVIDE(__start_data_BOARD_SDRAM = .) ;
        *(.ramfunc.$RAM4)
        *(.ramfunc.$BOARD_SDRAM)
        *(.data.$RAM4)
        *(.data.$BOARD_SDRAM)
        *(.data.$RAM4.*)
        *(.data.$BOARD_SDRAM.*)
        . = ALIGN(4) ;
        PROVIDE(__end_data_RAM4 = .) ;
        PROVIDE(__end_data_BOARD_SDRAM = .) ;
     } > BOARD_SDRAM AT>PROGRAM_FLASH

    /* DATA section for NCACHE_REGION */

    .data_RAM5 : ALIGN(4)
    {
        FILL(0xff)
        PROVIDE(__start_data_RAM5 = .) ;
        PROVIDE(__start_data_NCACHE_REGION = .) ;
        *(.ramfunc.$RAM5)
        *(.ramfunc.$NCACHE_REGION)
        *(.data.$RAM5)
        *(.data.$NCACHE_REGION)
        *(.data.$RAM5.*)
        *(.data.$NCACHE_REGION.*)
        . = ALIGN(4) ;
        PROVIDE(__end_data_RAM5 = .) ;
        PROVIDE(__end_data_NCACHE_REGION = .) ;
     } > NCACHE_REGION AT>PROGRAM_FLASH

    /* MAIN DATA SECTION */
    .uninit_RESERVED (NOLOAD) : ALIGN(4)
    {
        _start_uninit_RESERVED = .;
        KEEP(*(.bss.$RESERVED*))
       . = ALIGN(4) ;
        _end_uninit_RESERVED = .;
    } > SRAM_DTC AT> SRAM_DTC

    /* Main DATA section (SRAM_DTC) */
    .data : ALIGN(4)
    {
       FILL(0xff)
       _data = . ;
       PROVIDE(__start_data_RAM = .) ;
       PROVIDE(__start_data_SRAM_DTC = .) ;
       *(vtable)
       *(.ramfunc*)
       KEEP(*(CodeQuickAccess))
       KEEP(*(DataQuickAccess))
       *(RamFunction)
       *(.data*)
       . = ALIGN(4) ;
       _edata = . ;
       PROVIDE(__end_data_RAM = .) ;
       PROVIDE(__end_data_SRAM_DTC = .) ;
    } > SRAM_DTC AT>PROGRAM_FLASH

    /* BSS section for SRAM_ITC */
    .bss_RAM2 : ALIGN(4)
    {
       PROVIDE(__start_bss_RAM2 = .) ;
       PROVIDE(__start_bss_SRAM_ITC = .) ;
       *(.bss.$RAM2)
       *(.bss.$SRAM_ITC)
       *(.bss.$RAM2.*)
       *(.bss.$SRAM_ITC.*)
       . = ALIGN (. != 0 ? 4 : 1) ; /* avoid empty segment */
       PROVIDE(__end_bss_RAM2 = .) ;
       PROVIDE(__end_bss_SRAM_ITC = .) ;
    } > SRAM_ITC AT> SRAM_ITC

    /* BSS section for SRAM_OC */
    .bss_RAM3 : ALIGN(4)
    {
       PROVIDE(__start_bss_RAM3 = .) ;
       PROVIDE(__start_bss_SRAM_OC = .) ;
       *(.bss.$RAM3)
       *(.bss.$SRAM_OC)
       *(.bss.$RAM3.*)
       *(.bss.$SRAM_OC.*)
       . = ALIGN (. != 0 ? 4 : 1) ; /* avoid empty segment */
       PROVIDE(__end_bss_RAM3 = .) ;
       PROVIDE(__end_bss_SRAM_OC = .) ;
    } > SRAM_OC AT> SRAM_OC

    /* BSS section for BOARD_SDRAM */
    .bss_RAM4 : ALIGN(4)
    {
       PROVIDE(__start_bss_RAM4 = .) ;
       PROVIDE(__start_bss_BOARD_SDRAM = .) ;
       *(.bss.$RAM4)
       *(.bss.$BOARD_SDRAM)
       *(.bss.$RAM4.*)
       *(.bss.$BOARD_SDRAM.*)
       . = ALIGN (. != 0 ? 4 : 1) ; /* avoid empty segment */
       PROVIDE(__end_bss_RAM4 = .) ;
       PROVIDE(__end_bss_BOARD_SDRAM = .) ;
    } > BOARD_SDRAM AT> BOARD_SDRAM

    /* BSS section for NCACHE_REGION */
    .bss_RAM5 : ALIGN(4)
    {
       PROVIDE(__start_bss_RAM5 = .) ;
       PROVIDE(__start_bss_NCACHE_REGION = .) ;
       *(.bss.$RAM5)
       *(.bss.$NCACHE_REGION)
       *(.bss.$RAM5.*)
       *(.bss.$NCACHE_REGION.*)
       . = ALIGN (. != 0 ? 4 : 1) ; /* avoid empty segment */
       PROVIDE(__end_bss_RAM5 = .) ;
       PROVIDE(__end_bss_NCACHE_REGION = .) ;
    } > NCACHE_REGION AT> NCACHE_REGION

    /* MAIN BSS SECTION */
    .bss : ALIGN(4)
    {
        _bss = .;
        PROVIDE(__start_bss_RAM = .) ;
        PROVIDE(__start_bss_SRAM_DTC = .) ;
        *(.bss*)
        *(COMMON)
        . = ALIGN(4) ;
        _ebss = .;
        PROVIDE(__end_bss_RAM = .) ;
        PROVIDE(__end_bss_SRAM_DTC = .) ;
        PROVIDE(end = .);
    } > SRAM_DTC AT> SRAM_DTC

    /* NOINIT section for SRAM_ITC */
    .noinit_RAM2 (NOLOAD) : ALIGN(4)
    {
       PROVIDE(__start_noinit_RAM2 = .) ;
       PROVIDE(__start_noinit_SRAM_ITC = .) ;
       *(.noinit.$RAM2)
       *(.noinit.$SRAM_ITC)
       *(.noinit.$RAM2.*)
       *(.noinit.$SRAM_ITC.*)
       . = ALIGN(4) ;
       PROVIDE(__end_noinit_RAM2 = .) ;
       PROVIDE(__end_noinit_SRAM_ITC = .) ;
    } > SRAM_ITC AT> SRAM_ITC

    /* NOINIT section for SRAM_OC */
    .noinit_RAM3 (NOLOAD) : ALIGN(4)
    {
       PROVIDE(__start_noinit_RAM3 = .) ;
       PROVIDE(__start_noinit_SRAM_OC = .) ;
       *(.noinit.$RAM3)
       *(.noinit.$SRAM_OC)
       *(.noinit.$RAM3.*)
       *(.noinit.$SRAM_OC.*)
       . = ALIGN(4) ;
       PROVIDE(__end_noinit_RAM3 = .) ;
       PROVIDE(__end_noinit_SRAM_OC = .) ;
    } > SRAM_OC AT> SRAM_OC

    /* NOINIT section for BOARD_SDRAM */
    .noinit_RAM4 (NOLOAD) : ALIGN(4)
    {
       PROVIDE(__start_noinit_RAM4 = .) ;
       PROVIDE(__start_noinit_BOARD_SDRAM = .) ;
       *(.noinit.$RAM4)
       *(.noinit.$BOARD_SDRAM)
       *(.noinit.$RAM4.*)
       *(.noinit.$BOARD_SDRAM.*)
       . = ALIGN(4) ;
       PROVIDE(__end_noinit_RAM4 = .) ;
       PROVIDE(__end_noinit_BOARD_SDRAM = .) ;
    } > BOARD_SDRAM AT> BOARD_SDRAM

    /* NOINIT section for NCACHE_REGION */
    .noinit_RAM5 (NOLOAD) : ALIGN(4)
    {
       PROVIDE(__start_noinit_RAM5 = .) ;
       PROVIDE(__start_noinit_NCACHE_REGION = .) ;
       *(.noinit.$RAM5)
       *(.noinit.$NCACHE_REGION)
       *(.noinit.$RAM5.*)
       *(.noinit.$NCACHE_REGION.*)
       . = ALIGN(4) ;
       PROVIDE(__end_noinit_RAM5 = .) ;
       PROVIDE(__end_noinit_NCACHE_REGION = .) ;
    } > NCACHE_REGION AT> NCACHE_REGION

    /* DEFAULT NOINIT SECTION */
    .noinit (NOLOAD): ALIGN(4)
    {
        _noinit = .;
        PROVIDE(__start_noinit_RAM = .) ;
        PROVIDE(__start_noinit_SRAM_DTC = .) ;
        *(.noinit*)
         . = ALIGN(4) ;
        _end_noinit = .;
       PROVIDE(__end_noinit_RAM = .) ;
       PROVIDE(__end_noinit_SRAM_DTC = .) ;        
    } > SRAM_DTC AT> SRAM_DTC

    /* Reserve and place Heap within memory map */
    _HeapSize = 0x1000;
    .heap :  ALIGN(4)
    {
        _pvHeapStart = .;
        . += _HeapSize;
        . = ALIGN(4);
        _pvHeapLimit = .;
    } > SRAM_DTC

     _StackSize = 0x1000;
     /* Reserve space in memory for Stack */
    .heap2stackfill  :
    {
        . += _StackSize;
    } > SRAM_DTC
    /* Locate actual Stack in memory map */
    .stack ORIGIN(SRAM_DTC) + LENGTH(SRAM_DTC) - _StackSize - 0:  ALIGN(4)
    {
        _vStackBase = .;
        . = ALIGN(4);
        _vStackTop = . + _StackSize;
    } > SRAM_DTC

    /* Provide basic symbols giving location and size of main text
     * block, including initial values of RW data sections. Note that
     * these will need extending to give a complete picture with
     * complex images (e.g multiple Flash banks).
     */
    _image_start = LOADADDR(.text);
    _image_end = LOADADDR(.data) + SIZEOF(.data);
    _image_size = _image_end - _image_start;
}

Next I modified the startup code to copy the .got section to DTC sram. I debugged through, the entire .got table is moved uint32 by uint32 to DTC sram. I can see in my memory view that this indeed is copied perfectly. And lastly I set R9 to the start of my .got section in DTC sram.

    //
    // Copy global offset table to ram
    //
    global_offset_table_flash = const_cast<unsigned int*>(&__global_offset_table_flash_start__);
    global_offset_table_itc = const_cast<unsigned int*>(&__global_offset_table_itc_start__);
    global_offset_table_end_itc = const_cast<unsigned int*>(&__global_offset_table_flash_end__);

    size =
        reinterpret_cast<unsigned int>(&__global_offset_table_flash_end__) -
        reinterpret_cast<unsigned int>(&__global_offset_table_itc_start__);
    global_offset_table_size = static_cast<unsigned int>(&__global_offset_table_flash_end__ - &__global_offset_table_itc_start__);

    for (index = 0u; index < size/sizeof(unsigned int); ++index)
    {
        global_offset_table_itc[index] = global_offset_table_flash[index];
    }

    __asm volatile ("LDR r9, = __global_offset_table_itc_start__");

I added the compiler arguments

 -fPIC -mpic-register=r9 -msingle-pic-base -mno-pic-data-is-text-relative

resulting in this complete list of compiler arguments

-c -ffunction-sections -fdata-sections -ffreestanding -fno-builtin -fno-rtti -fno-exceptions -fPIC -mpic-register=r9 -msingle-pic-base -mno-pic-data-is-text-relative 

I use these compiler arguments for both C and C++ compiler

The compiler version I use is

λ arm-none-eabi-gcc.exe --version
arm-none-eabi-gcc.exe (GNU Arm Embedded Toolchain 10.3-2021.07) 10.3.1 20210621 (release)

I can also push my small project to github if that helps.

Again, the problem is that in one scenario I compile and run my project compiled with -fPIC. Then my program crashes on Data access violation. The memory it’s accessing in the debugger seems perfectly valid. Maybe the indirection caused by the .got is causing the problem. But in the debugger I can see that mentioned variable is perfectly fine. The size of the list is zero in the debugger tool tip. Still when I execute that line of code, the program crashes.

When I compile without -fpic free rtos works just fine. So the only things which is clear to me is that I do something wrong when compiling with -fpic, but I have no clue what, nor do I have any idea how to proceed in figuring out what I do wrong.

Many many thanks for any contribution.

PS: again, if I need to share my project, say the word and I’ll push it to github

This won’t be sufficient because each task starts with its own value of R9.

Have you tried with just -fPIC and not the other compiler switches and no GOT? Are your trying to load your image at a different address from where the linker located it? Maybe to use a bootloader or something? Or are you doing legitimate dynamic linking?

[EDIT: Just now realizing that you’re using C++ and it likely requires the global offset table when compiled for position independence.]

Hi @jefftenney
Thanks for your reply. That at least explains why this is not working.

The end goal for me is indeed to have 1 firmware image and being able to deploy it twice on flash as app1 and app2 (and being able to update either app over the air).

Can you elaborate on each task using its own r9 value? Does that mean freertos tasks need r9 for other purposes? Or should I modify freertos code so that r9 gets the global offset table address restored as soon as a task starts / continues? Is there any documentation on this topic you could refer me to?

I think Jeff has answered the question - but as a general point - you have changed a lot which is going to make it hard to know which change caused the problem. In this situation my approach would be to start with a working project created by NXP’s tools that didn’t include the kernel, then change one thing at a time - such as linker script, one command line option at a time, etc. to learn how it behaves. Then generate a working project, again from NXP’s tools, that does include the kernel - and again change one thing at a time until it breaks.

This might be the best strategy – specifically when a task starts. You don’t have to worry about when a task continues because FreeRTOS maintains context already. This line in pxPortInitialiseStack() creates storage for R4-R11 for a new task. You could modify the code to then initialize R9’s spot with the GOT address (which, ironically, would actually be in R9 while this code executes). If it actually works then it could be cleaned-up / generalized afterward as needed.

1 Like

Hi @rtel
I actually did that exactly. Started with just a new simple project where I configure a pin connected to an LED. Next compiled that with fpic, ran again, and after a long struggle finally got that to work. Then pulled in freertos, and let the task blink the LED, first without -fpic. And again, next to compile with -fpic and ran into this issue.

Hi @jefftenney
Just to give a quick update, I hardcoded the address of .got into

void vPortSVCHandler( void )
{
    __asm volatile (
        "	ldr	r3, pxCurrentTCBConst2		\n"/* Restore the context. */
        "	ldr r1, [r3]					\n"/* Use pxCurrentTCBConst to get the pxCurrentTCB address. */
        "	ldr r0, [r1]					\n"/* The first item in pxCurrentTCB is the task top of stack. */
        "	ldmia r0!, {r4-r11, r14}		\n"/* Pop the registers that are not automatically saved on exception entry and the critical nesting count. */
        "   LDR r9, =0x20000000             \n"
        "	msr psp, r0						\n"/* Restore the task stack pointer. */
        "	isb								\n"
        "	mov r0, #0 						\n"
        "	msr	basepri, r0					\n"
        "	bx r14							\n"
        "									\n"
        "	.align 4						\n"
        "pxCurrentTCBConst2: .word pxCurrentTCB				\n"
        );
}

(added “LDR r9, =0x20000000 \n”)

And it actually got passed this one line that failed before!!! Next it explodes when vTaskDelay gets called from my task. I will debug further, but I had to share this! :).

Thanks a LOT for pointing this out. I don’t know if I will be able to get this to work, but at least I am 1 step further after being stuck on the same part for days!

Hi @jefftenney
It works! With a hack though. If I put

    __asm volatile ("LDR r9, = 0x20000000");

  • In my task body (to fix r9 in my own task context)
  • In the vPortSVCHandler function (to fix r9 when starting the kernel)
  • And int portTASK_FUNCTION function (to fix r9 for the idle task)

Then my app remains running.

I hope there is one place I can put this “hack” so that all created tasks benefit from it. That way I wouldn’t have to bother with this anywhere in user code.

Would you happen to know where I am looking for in freertos code? I can live with the fact that I need to do a thing or 2 in freertos code to get this working, but I am trying to keep it out of user code.

You can update the function that sets the initial stack of a task to set the initial value for R9 to whatever R9 is holding at that time (assuming it’s already set to 0x20000000).

1 Like

The link I sent you above is the spot. It’s the same spot Richard mentioned above. It’s in pxPortInitialiseStack().

1 Like

Hi @rtel, tried that (forgot to mention, sorry) but it didn’t work. I’ll look at that again, maybe a silly mistake on my side. Will report back thx!

Existing Code:

    pxTopOfStack -= 8; /* R11, R10, R9, R8, R7, R6, R5 and R4. */

Experimental Code:

    pxTopOfStack -= 8; /* R11, R10, R9, R8, R7, R6, R5 and R4. */
    pxTopOfStack[9-4] = 0x20000000;  // Set the task's initial R9 value
1 Like

Or use a little asm code to read the value of R9 to avoid hard coding the 0x20000000 in case that value changes. We do this in some ports for similar reasons.

1 Like

Yes in fact it might be good to generalize the initial values of R4 through R11 to be the current values in those same registers. That would allow the developer (compiler) to dedicate any of these registers for any purpose outside the scope of FreeRTOS.

1 Like

Hi both,
when I use this trick

    //pxTopOfStack -= 8; /* R11, R10, R9, R8, R7, R6, R5 and R4. */
    pxTopOfStack[9-4] = 0x20000000;  // Set the task's initial R9 value

stepping over

void vPortSVCHandler( void )
{
    __asm volatile (
        "	ldr	r3, pxCurrentTCBConst2		\n"/* Restore the context. */
        "	ldr r1, [r3]					\n"/* Use pxCurrentTCBConst to get the pxCurrentTCB address. */
        "	ldr r0, [r1]					\n"/* The first item in pxCurrentTCB is the task top of stack. */
        "	ldmia r0!, {r4-r11, r14}		\n"/* Pop the registers that are not automatically saved on exception entry and the critical nesting count. */
        "	msr psp, r0						\n"/* Restore the task stack pointer. */
        "   LDR r9, =0x20000000             \n"

        "	isb								\n"
        "	mov r0, #0 						\n"
        "	msr	basepri, r0					\n"
        "	bx r14							\n"
        "									\n"
        "	.align 4						\n"
        "pxCurrentTCBConst2: .word pxCurrentTCB				\n"
        );
}

crashes the app.

it jumps to a weird address (0x1000000) and crashes on bus error

PS: this line " LDR r9, =0x20000000 \n" in vPortSVCHandler is still needed, otherwise the kernel won’t start. It seems that r9 gets reset to 0xa5a5a5a5. I need to “hack” it back to 0x20000000 for the kernel to start properly.

And I am still stuck setting r9 again in every task that I start, like this

static void main_task(void *params)
{
    __asm volatile ("LDR r9, = 0x20000000");

    while (1)
    {

        GPIO1->DR ^= (1<<24);
        vTaskDelay(100);
    }
}

PPS: thus, reading r9 back as you suggested won’t work, something resets it 0xa5a5a5a5.

If there is a nicer more centralized way of setting r9 to 0x20000000 again I’m all ears

Stepping through the task activation in freertos code, but getting nowhere.

According to the stacktrace the activated task initially gets called from the “hardfault handler” which sounds a bit weird to me, but I am sure there is a good explanation.

Anyway, the code of the hard fault handler is as follows:

__attribute__((naked))
void HardFault_Handler(void){
    __asm(  ".syntax unified\n"
        // Check which stack is in use
            "MOVS   R0, #4  \n"
            "MOV    R1, LR  \n"
            "TST    R0, R1  \n"
            "BEQ    _MSP    \n"
            "MRS    R0, PSP \n"
            "B  _process      \n"
            "_MSP:  \n"
            "MRS    R0, MSP \n"
        // Load the instruction that triggered hard fault
        "_process:     \n"
            "LDR    R1,[R0,#24] \n"
            "LDRH    R2,[r1] \n"
        // Semihosting instruction is "BKPT 0xAB" (0xBEAB)
            "LDR    R3,=0xBEAB \n"
            "CMP     R2,R3 \n"
            "BEQ    _semihost_return \n"
        // Wasn't semihosting instruction so enter infinite loop
            "B . \n"
        // Was semihosting instruction, so adjust location to
        // return to by 1 instruction (2 bytes), then exit function
        "_semihost_return: \n"
            "ADDS    R1,#2 \n"
            "STR    R1,[R0,#24] \n"
    	// Set a return value from semihosting operation.
    	// 32 is slightly arbitrary, but appears to allow most
    	// C Library IO functions sitting on top of semihosting to
    	// continue to operate to some degree
    		    "MOVS   R1,#32 \n"
    		    "STR R1,[ R0,#0 ] \n" // R0 is at location 0 on stack
    	// Return from hard fault handler to application
            "BX LR \n"
        ".syntax divided\n") ;
}

When I break just before I set r9 again, I can see that r9 is reset to 0xa5a5a5a5 again. I have no idea who is responsible for this. Studying the inline assembly code in the hardfault handler doesn’t reveal anything (although I have little assembly knowledge, so I might be wrong).

When I was stepping through the freertos kernel code, starting from portTASK_FUNCTION and stepping through until I jumped to my own defined user task, r9 remains 0x20000000 all the time, just until my debugger stops at the first line of my task. At that moment r9 suddenly is reset to 0xa5a5a5a5. At that moment also my callstack changes to hardfault_handler => main_task.

This won’t work because you commented out the storage-allocation statement.

To summarize, you’ll have this:

    __asm volatile ("LDR r9, = __global_offset_table_itc_start__");`

in your C startup. And you’ll have this:

    pxTopOfStack -= 8; /* R11, R10, R9, R8, R7, R6, R5 and R4. */
    pxTopOfStack[9-4] = 0x20000000;  // Set the task's initial R9 value

in pxPortInitialiseStack(). Note that pxTopOfStack -= 8 is unchanged from original FreeRTOS code. You’re just adding the one line after it, to set the task’s initial R9 value.

1 Like