Stack alignment at ARM Cortex A9 portASM.S FreeRTOS_IRQ_Handler

In the GCC ARM EBI is defined, that the stack by the call of a C-Functions need to by aligment to 8Byte. When I check the FreeRTOS_IRQ_Handler in the portASM.S of Cortex A9, this is done for the call of the vApplicationIRQHandler.

Why isn’t this done for the call of the functions vTaskSwitchContext? Is there a reason?

For me should this done like this.


/*8 Byte stack alignment */
MOV r2, sp
AND r2, r2, #4
SUB sp, sp, r2

LDR R0, vTaskSwitchContextConst
/* The stack alignment doesn’t need to restored, the after this will in anycase a new stack loaded */

I think this is because vTaskSwitchContext() is known code, and its known nothing in there requires 8-byte alignment (no floating point, for example), whereas the IRQ handler is unknown code as IRQs are written by the application writer. Its a good question though so I will ask somebody to have a look.

But at the end, what the different compiler does, we don’t know. Or not?

Or if somewhere configure the macro traceTASK_SWITCHED_IN / traceTASK_SWITCHED_OUT with a memcpy, then are floating points inside.

Not done a deep dive on this yet - but looking into it. It might be that, provided the stack is aligned correctly in the first place, actually neither of the pre function call alignments are needed.

In the case of use defined macros maybe it would be possible to do the alignment in the macro, if alignment is needed.

Should also ask - are you actually having an issue with the code. If you can explain the problem you are having we could discuss that directly.

I’m not sure if I understand your issue (I probably don’t). In terms of a task’s view, a context switch is not a call, it’s a status freeze, followed by a subsequent status restore. It’s perfectly transparent to the task, so if the stack was ABI compliant before the freeze, it’ll still be after the restore, and if it wasn’t, it won’t be after the restore. There is no point in the OS trying to “fix” a non ABI compliant stack usage. It’s not what an OS is supposed to do.

Again, I probably don’t quite understand your point. Please elaborate.

This is to do with maintaining ABI compliance when mixing assembly code and C code - so ensuring the stack is correctly aligned after being manipulated manually by asm code, before calling a C function compiled with an assumption of ABI compliance.

We use the ARM Cortex A9 from Xilinx Zynq 7000. We have the problem with wrong Contex Switch. Sometimes, the stack is move. This can we see, when for excample in the Register R10 is stored the value 0909090909. In other case, the pointer of the pxCurrentTCB is 0x5a5a5a5a.
Sometimes the CPU are lost complete in the Forrest. We have callstack that aren’t possible, that this are realy called.

I have done many try to change some code. After I had changed this with add here the StackAligment, the System start to runs over days.

Yes, the Contexswitch is only a status free. But in this case, will be added on the Stack of the Task the following Register.
PC, CPSR, R0 - R12, R14, CriticalNesting Status, 32 FPU (Double words), FPU Status. Then the functions vTaskSwitchContex is called.
This part is in the Interrupthandler can interrupt the code on eatch possition, where the is possible that the Stack isn’t aligned to 8 Byte.

Obviously, the best way to pinpoint your problem would be a deep trace. Does your tool chain support that?

Grateful if you can let me know the compiler you are using (Linaro GCC presumably, which version) and the compiler command line in use. Thanks.

Are you using floating point registers in your interrupt service routines? That may happen unknowingly depending on the libraries in use as some will use the wide floating point registers to optimise things like memcpy(). We work around that by providing our own versions of those libraries.

Are you interrupt service routines nesting?

Yes, the Interrupts are nested and the FPU Register are used in the ISR.

But we use the vApplicationFPUSafeIRQHandler, so the FPU Register are stored in the ISR and the nested ISR are the default in FreeRTOS on the Port of CORTEX A9.

I had in the past already tryed with the overwrite of the memcpy, like the demoimplementation of FreeRTOS for the Zynq-7000. But this hasn’t realy changed the result. Only the time, was longer, until the crash was comming.

We use the GCC Compiler with command arm-none-eabi-gcc and the compiler flag:
-c -fmessage-length=0 -MT"$@" -mcpu=cortex-a9 -mfpu=vfpv3 -mfloat-abi=hard -fsigned-char -Wno-char-subscripts

Which version it is of the GCC, I don’t know. Is the version which is in the Installation of the Xilinx SDK 2019.1

I’m trying to create a Vitis project to test this now as the project in the download is using the older SDK - unfortunately it is not playing nicely on Windows. Vitis IDE started a few times, but now I can’t get it to run at all…still working on it.

Finally have everything compiling in Vitis…

With my experiences, that the problem can reproduced faster, you need a bigger interrupt load and the interrupts shouldn’t only interrupt the idle task.

What I have seen too, that the Xilinx use the GNU Newlib. So the settings configUSE_NEWLIB_REENTRANT is needed and the assigned wrapper. Like this third party link:

Try the attached file.

I changed the order of pushes and pops very slightly so 8 byte registers only get pushed when on an 8 byte boundary - although I don’t think that actually makes a difference from what I can see in the hardware manual.

I also added instructions to ensure 8-byte alignment in two additional places - I don’t think the pre-existing place where that was done is actually needed as it naturally falls on an 8 byte alignment anyway - but I left it in all the same.

Please report back so I know if this is improving, making worse, or having no effect.

[edit - added the file to a zip file as a .S file was not permitted as an attachement] (3.2 KB)

I will make a test and let you inform, if this change somethings or not. But I don’t think, then the Interrupt can interupt the task in a possition, where it isn’t 8 byte bondarys and the in my test, the push has works for fpu register on a not 8 byte bondarys stack.

I inform you, if I have the result.

I have tested it and it works. But since I had remove all testcode and restart to run the full project, I still have some problems. I’m need some time to test and anaylsis my other problems. But this problem is solved with this.

I had done some change:

  • FreeRTOS_SWI_Handler: The aligment doesn’t need to push on the stack and restore after TaskSwitchContext. Then in anycase, a new stack is loaded.

  • vApplicationIRQHandler: I had change the order of the push register, like you have done too at this possition. For me, this has any effect, but doesn’t change somethings. (3.2 KB)

Our Problem was linked to a issue on our hardware. It’s runs now without this change.
Thanks for the support.

Thanks for taking the time to report back.