port_asm.S:161:(.exceptions.soft+0x14): relocation truncated to fit: R_NIOS2_CALL26_NOAT against `vTaskSwitchContext'
collect2: error: ld returned 1 exit status
it’s being caused by the “call” instruction in the call_scheduler code. The call instruction has a requirement that the target be in the same 256MB block of memory. Modify the code to use a “register call” to extend the range to the entire 32-bit memory map.
call_scheduler:
stw ea, 72(sp) # EA is PC+4 so will skip over instruction causing exception
// call vTaskSwitchContext # Pick the next context.
movia r15, vTaskSwitchContext # - use long-call version
callr r15
br restore_sp_from_pxCurrentTCB # Switch in the task context and restore.
movia is a macro that brings in the address of vTaskSwitchContext in two 16-bit chunks, and places that in r15. callr uses that 32-bit value to call the subroutine.
I’m operating on a Cyclone 10GX eval board that has 1GB of SDRAM, and the exception vector is mapped into a block of onchip_memory that can’t be located within 256MB of the target subroutine.
On some older eval platforms, like the Stratix 3 or Cyclone 5, available DRAM is only 64MB or 128MB, which makes it possible to map an onchip_memory RAM block in the same 256MB address window. The newer Cyclone 10GX eval board includes 1GB of SDRAM, which is going to span multiple 256MB blocks. Additionally, the Quartus Platform Designer / Qsys tool won’t let you “cheat” and offset the SDRAM address assignment by, say, 32MB, and squeeze the onchip_memory into the same 256MB block. Would be perfectly valid to do the address decoding not-aligned to major block boundaries, but the Quartus tool throws an error when you try to do that.
At least on an FPGA platform, the reset vector and exception vector addresses appear to need to be in an immediately-available RAM space. Mapping them into SDRAM doesn’t work - probably because the SDRAM interface is in the middle of a calibration cycle at start-up when the CPU is trying to boot.
You could probably push both the code and exception segments into the same 256MB block with a SPL and relocatable code, but that’s not the environment I have.
Added a comment on the GitHub page. (thanks for the change request … haven’t done one of those yet.) Yes, you’re replacing the “call” with “movia; callr”. On small systems, this isn’t required. But as more memory becomes available by default, I would expect to see more issues like this where “call” just doesn’t have the reach.
As mentioned, I have this running on the Cyclone 10GX eval board, and I ran a weekend-long test with three tasks set with prime-number vTaskDelay() values to maximize lead, concurrent, and follow task execution. Ran for 60+ hours without issue.