Stack problem in THUMB mode with GCC 4.1.0

nobody wrote on Tuesday, October 24, 2006:

[moved from tracker item]

I am using CrossWorks 1.6 build 3 (=> GCC codesourcery
version 4.1.0) and FreeRTOS 4.1.1. I am compiling my
project in THUMB mode. The problem occurs in small
functions without local variables. The following code
is generated (interleaved view of the crossworks debugger):

01: StatModeInfo *GetStatModeInfo(void)
02: {
03: B580 push {r7, lr}
04: AF02 add r7, sp, #0x008
05: return( &g_StatModeInfo );
06: 4B03 ldr r3, [pc, #0x00c]
07: }//StatModeInfo *GetStatModeInfo(void)
08: 1C18 adds r0, r3, #0x0
09: 46BD mov sp, r7
10: B082 sub sp, #0x008
11: BC80 pop {r7}
12: BC02 pop {r1}
13: 4708 bx r1

When a timer interrupt (context switch) occurs after
the execution of line 9 but before line 10 then the
context of the current task will be copied to the
stack. But the SP is wrong at that moment! The return
address will be overwritten with some register contents.

The problem does not occur in ARM mode (at least I have
not yet seen it). And it does not occur with GCC 3.4.4
(comes with CrossWorks 1.5 build 2). Maybe this is not
a bug and should be treated as an incompatibility
between FreeRTOS 4.1.1 and codesourcery GCC 4.1.0??

René

rtel wrote on Tuesday, October 24, 2006:

Thanks for the information.  Can you provide the compiler switches you were using when this code was compiled (optimisation in particular).  Thanks.

Regards.

nobody wrote on Tuesday, October 24, 2006:

On line #4, why is 8 added to the stack pointer.  Should this not be subtracted for a negative growing stack?

nobody wrote on Tuesday, October 24, 2006:

Optimization is off.

The compiler switches for the file containing the mentioned function GetStatModeInfo() are:
-mlittle-endian -Wmissing-prototypes -Wstrict-prototypes -Wimplicit-function-declaration -Wunused-variable -gdwarf-2 -march=armv4t -mthumb -mthumb-interwork -mlittle-endian -fno-builtin -msoft-float -mfpu=vfp

I have already got a hint from CrossWorks helpdesk. With compiler switch -fomit-frame-pointers the code works. But I don’t understand completely the meaning of this switch. I am not sure about the consequences this switch has for the generated code and so I am not sure if it does solve the problem in any case.

René

rtel wrote on Tuesday, October 24, 2006:

Because 8 bytes have already been pushed onto the stack, and it is recording the stack position for when the function exits.

Regards.

nobody wrote on Tuesday, October 24, 2006:

There was some talk in this forum before with using -fomit-frame-pointers with WinARM.  I think this is a bug in GCC.  Any interrupt code that switches to use system mode will have this problem.  It is not just FreeRTOS but all the ST standard interrupt stubs do this.

rtel wrote on Tuesday, October 24, 2006:

Having discussed this with people that know a lot more about GCC than me the consensus is that this is a bug in GCC, of which users need to be aware.

Regards.

anonymous wrote on Tuesday, November 07, 2006:

-fomit-frame-pointers does the trick, but unfortunately messes up the ability for the debugger (Rowley CrossWorks) to properly unwind the stack.

GCC 4.1.1 does not fix the problem.

So I am curious to know what people are doing. The choices seem to be to use ARM mode (I haven’t yet verified that this even works) or live with a crippled debuggin environment. Are there other suggestions?

anonymous wrote on Tuesday, November 07, 2006:

Ah. What I was forgetting was that FreeRTOS could be ‘fixed’.
The patch is quite inocuous (little overhead and still works
in the absence of the GCC bug).

The bug is that in Thumb mode GCC can emit code that, on the
return from functions with certain prototypes, will briefly
have the stack pointer located 8 bytes into the actual stack.
If a context switch occurs here, the task registers are saved
on the stack, thus clobbering the valid data there.

So the fix is to always skip 8 stack bytes before pushing the
task context. I know this is not ideal, but it is the best
alternative for me right now.

In the portSAVE_CONTEXT() macro of portmacro.h, just subtract
8 from the stack pointer (actually in R0 at the point of
interest), just before the return address is pushed. Some of
the surrounding dressing has been removed in the following
snippet:

/*gb - step over possible GCC stacked items (GCC bug)*/
"SUB    R0, R0, #8   …

/* Push the return address onto the stack. */
"STMDB    R0!, {LR}    …

I hope it is never more than 8 bytes! My test hasn’t been all
thorough. But I don’t think I have broken anything.

nobody wrote on Tuesday, November 07, 2006:

Thanks for the fix but should be aware this is a problem with any code that uses the user stack from an ISR.  Any bought software might do this.  I think this is a big bug in GCC that needs fixing.

Hello everybody.
A colleague of mine came across this post - quite old, to tell the truth - and I’m wondering if the situation described here is still a concern.
Now the GCC is 8.2.0; we are investigating subtle problems on a project with FreeRTOS on the Xilinx Zynq, which is a Cortex-A9.
The toolchain from Xilinx uses the GCC compiler.

thank you
Alberto

Not sure about it as it is quite old post.

Can you describe the problem that you are facing?

Difficult to say. The system (the firmware is quite complete) runs for days and sometimes suddenly (and randomly) stops or crashes on what could be seen as memory corruption. Of course this can be caused by many factors. Having noticed this old topic we are trying to understand if this “incompatibility” should be considered also today, or in the meantime things have been fixed.

The issue in the above sequence is that the SP is updated before popping off the content from stack. I am not sure whether it is fixed in GCC or not. You can try to check the generated assembly and see if you notice the above sequence.

We have crash inside the Thumbcode or FreeRTOS block in the endlessloop of vListInsert.
The Situation is, that we compile the project in ARM Code, but the NewLib LibC from xilinx is compiled in Thumb Code. So, we have a mix.

Since I have change, that store the TaskContext with an Offset on the Stackpointer, this problems aren’t seen anymore.

The sequenze of the generated assembly, I havn’t found. But in 20MB generated assembly isn’t easy to find this sequenze.

Here you can find me change:
ARM_CA9.ZIP (13.3 KB)

Maybe the issue is the newlib build from xilinx having been built with an “old” compiler that has this bug. Is it possible for you to update your tools and see if this problem goes away without modifying the port code? The problem with modifying the port code is that it fixes the issue only for the FreeRTOS interrupt and not for other interrupts using the user stack.

I’d suggest not make a change until you have determined that this is the cause of the problem. The reason is that you may have just masked the crash and the real problem may still be there.