FreeRTOS 6.1.0 fault on STM32F4 Discovery using IAR 6.7

steverino2 wrote on Monday, January 13, 2014:

Hi all, here is a problem running FreeRTOS on an STM32F4 Discovery target using IAR EWARM toolchain.
I appreciate any ideas on how to fix it.

The project is for STM32F4 Discovery using FreeRTOS 6.1.0 / lwIP 1.3.2
It builds and runs OK on IAR EWARM 6.6.
But when I build the project with IAR EWARM 6.7 it hard faults in vTaskSwitchContext

I tried increasing heap size but did not help.
Could there be a vulnerability to the code generated by IAR EWARM 6.7 ?

I added a hard fault handler to output the register dump

[Hard fault handler]
R0 = a5a5a5a5
R1 = a5a5a5a5
R2 = 20014dcc
R3 = 200150e4
R12 = a5a5a5a5
LR [R14] = 080125e3 function call return address
PC [R15] = 08011974 program counter
PSR = a100000e
BFAR = a5a5a5b1
CFSR = 00008200
HFSR = 40000000
DFSR = 0000000b
AFSR = 00000000
SCB_SHCSR = 00000400

In the link map the fault is isolated somewhere in vTaskSwitchContext
called from portasm.o

CODE ro code 0x080125c0 0x88 portasm.o [1]

vTaskIncrementTick 0x08011853 0xc2 Code Gb tasks.o [1]
vTaskSwitchContext 0x08011915 0xe8 Code Gb tasks.o [1]
vTaskPlaceOnEventList 0x080119ff 0x5e Code Gb tasks.o [1]
noname 0x08011a5d 0x6c Code Gb tasks.o [1]

edwards3 wrote on Monday, January 13, 2014:

But your FreeRTOS/lwIP project has not changed? Only the compiler version. I can’t see anything in the release notes that looks like it could cause any issues. Have you tried with compiler optimization turned completely off?

onramp123 wrote on Thursday, February 06, 2014:

Hi, thanks for the recommendation. But we already tried turning off the optimizations.
Its a little perplexing, but if the SW is not changed, only the IAR EWARM 6.6 to 6.7,
could it be generated code differs like ARM/Thumb or whatever?

rtel wrote on Thursday, February 06, 2014:

I think I have probably used all the IAR versions produced in the last 10 years, and only once has anything needed to be changed to move from one version to another, and that was (I think) moving from 4.x to 5.x versions where the linker script format changed (and a few other things). In that case the code simply didn’t build until you make the linker script changes. Actually, I think there was another time when C99 became the default, which changed the way inline was used - try setting the compiler option to use C89 instead of C99.

Other than that I’m not sure what to suggest. You could look at the map file produced by the two version to check the memory layout is the same. Also, look at the project options to see what is being referenced from the IAR installation directory (which has changed) rather than your project directory (which has not changed). Thinks like the linker script, and possibly macros that are executed by the debugger when it loads and runs your program will not be local to your project and may be the key to the difference.

Regards.

steverino2 wrote on Tuesday, December 09, 2014:

Hi, updating this issue with new information.

Starting with only LED task (in other words, blinky)
I found the hard fault happens as result of an illegal pointer dereference
in vTaskSwitchContext() using listGET_OWNER_OF_NEXT_ENTRY() macro.

I wrote a function version of listGET_OWNER_OF_NEXT_ENTRY() macro
so I could breakpoint it, and found

( pxConstList )->pxIndex = ( pxConstList )->pxIndex->pxNext;

pxNext is 0xA5A5A5A5 and the
the task list entry deferences a pointer value of 0xA5A5A5A5
that triggers hard fault.

I added a patch to validate the pointer, and print error if in-valid.

Now the system avoids hard fault and continues to run with this
workaround but the scheduler is dis-functional.
In a second test, I enable lwIP task (along with LED task), but IP ping
doesn’t work.

Again this is only the result of a change in IAR tools from 6.6 to 7.x

I downloaded FreeRTOS 8.1.2 and it looks like list.h MACRO has the same vulnerability.

rtel wrote on Tuesday, December 09, 2014:

Starting with only LED task (in other words, blinky)
I found the hard fault happens as result of an illegal pointer dereference
in vTaskSwitchContext() using listGET_OWNER_OF_NEXT_ENTRY() macro.

So you have one task, and that task is doing nothing other than blinking
an LED, and you get this problem? Is that correct? If so it is highly
unlikely, but not impossible, that this is a FreeRTOS issue, especially
as there as been nearly two years since the previous conversation, and
there have been several different versions of FreeRTOS in the mean time.

Is your system using any interrupts?
Do you have configASSERT() defined to catch misconfigurations?
Is your start up code and linker script correct?
Do you have stack overflow detection turned on?

I wrote a function version of listGET_OWNER_OF_NEXT_ENTRY() macro
so I could breakpoint it, and found

( pxConstList )->pxIndex = ( pxConstList )->pxIndex->pxNext;

pxNext is 0xA5A5A5A5 and the
the task list entry deferences a pointer value of 0xA5A5A5A5
that triggers hard fault.

This would indicate a stack problem as 0xa5 is the stack fill byte - if
this value is getting into a pointer then somewhere the stack has become
corrupt.

Are you using any code that is not generated by the compiler?

I added a patch to validate the pointer, and print error if in-valid.

Now the system avoids hard fault and continues to run with this
workaround but the scheduler is dis-functional.
In a second test, I enable lwIP task (along with LED task), but IP ping
doesn’t work.

Again this is only the result of a change in IAR tools from 6.6 to 7.x

Have you looked through the change history provided by IAR to see what
the differences are - do they provide an upgrading guide? I use IAR
regularly, as do lots of other people, and don’t see any issue.

I downloaded FreeRTOS 8.1.2 and it looks like list.h MACRO has the same
vulnerability.

Vulnerability? I think the system status when the macro is called is
will be the root cause, not the macro, unless you know of a specific
vulnerability that is not clear from your post.

Regards.

steverino2 wrote on Thursday, December 11, 2014:

I added stack overflow and assert checks, along with assert handler.
configCHECK_FOR_STACK_OVERFLOW
configASSERT

but didn’t isolate it until I went back to the old IAR 6.6 project backup
and reviewed project settings again side-by-side.

It turns out that IAR for some reason selected FPU VFPV4
when project was opened by IAR 7.3
Then the emitted code included something not supported by CM4.
At least this is what I can surmise.
So all bets are off at run time.

Select Project General Options, Target Tab, FPU None
and problem solved.

Thanks for the many constructive ideas and help to find it.