Finding configTOTAL_HEAP_SIZE Maximum Value

spflanze wrote on Wednesday, March 09, 2016:

Processor: STM32F373VC
IDE: Ac6 System Workbench
The job was created by STM32CubeMX with the FREERTOS option enabled.

Am I correct in my understanding that pvPortMalloc() will allocate from the FREERTOS Heap, the size of which is specified by configTOTAL_HEAP_SIZE? And that malloc(), which is not thread safe, will allocate outside the FREERTOS Heap in the C/C++ Heap?

If I set configTOTAL_HEAP_SIZE to the processor’s total amount of RAM, which is 32000, and if I subtract from this amount the the Linker’s overflow error message to use for the next value of configTOTAL_HEAP_SIZE I will have compiled using the maximum value configTOTAL_HEAP_SIZE can be. But will the result be reliable? Will it ever be the case that while the firmware is running there will be a need for a larger C/C++ heap?

What is the C/C++ Heap used for? Is any of it dynamically allocated beyond the Linker’s awareness?

The default value of configTOTAL_HEAP_SIZE was 10000. I find that if I go over this I get a HardFault() call when osKernelStart() is called.

edwards3 wrote on Thursday, March 10, 2016:

Am I correct in my understanding that pvPortMalloc() will allocate from the FREERTOS Heap, the size of which is specified by configTOTAL_HEAP_SIZE? And that malloc(), which is not thread safe, will allocate outside the FREERTOS Heap in the C/C++ Heap?

Yes.FreeRTOS - Memory management options for the FreeRTOS small footprint, professional grade, real time kernel (scheduler)

If I set configTOTAL_HEAP_SIZE to the processor’s total amount of RAM, which is 32000, and if I subtract from this amount the the Linker’s overflow error message to use for the next value of configTOTAL_HEAP_SIZE I will have compiled using the maximum value configTOTAL_HEAP_SIZE can be. But will the result be reliable? Will it ever be the case that while the firmware is running there will be a need for a larger C/C++ heap?

? only the firmware writer knows that. In any bare metal or RTOS system if the amount of RAM you try allocating with malloc is more than the RAM available the allocation will fail or corrupt something.

What is osKernelStart?

spflanze wrote on Thursday, March 10, 2016:

In the frimware I wrote all RAM allocations I do are done with pvPortMalloc(), which is necessary for thread safety when using FREERTOS. These allocations come from the FREETOS heap which is sized by configTOTAL_HEAP_SIZE.

There is a lot of other software in the drivers that STM32CubeMX included in the project. A search for malloc and realloc in them did not get any hits.

osKernelStart() appears in the STM32CubeMX created file main.c where it is inside main(). It is this call that starts up and runs FREERTOS. After this call the firmware runs in threads (tasks in FREERTOS parlance). If osKernelStart() works right it never returns. It runs until processor reset, or power down.

When I set configTOTAL_HEAP_SIZE to 32000 I get this compile time error:

region ‘RAM’ overflowed by 7416 bytes TEC Driver SW4STM32 Configuration

The amount of RAM in hardware minus the overflow minus the configTOTAL_HEAP_SIZE value is 32000 - 7416 - 10000 = 24584. That is 24584 bytes of RAM I cannot account for, and need to use if it isn’t in use elsewhere. If I try to use it by increasing the size of configTOTAL_HEAP_SIZE by any significant amount over 10000 I get a call to HardFault().

rtel wrote on Thursday, March 10, 2016:

There is a lot of other software in the drivers that STM32CubeMX
included in the project. A search for malloc and realloc in them did not
get any hits.

A simple way to see if malloc() is being called anywhere is to define
your own implementation, then set a break point in the implementation,
or otherwise do something like put an infinite loop in the
implementation to see if it ever gets called.

If it does get called then you can re-direct it to pvPortMalloc(), or
simply call pvPortMalloc() from your own implementation:

void* malloc( size_t size )
{
     /* If this function gets called it will get stuck here. */
     for(;;);
}
If malloc() is never called then there is no point allocating any RAM to 
the heap that is set up by your linker.

> When I set configTOTAL_HEAP_SIZE to 32000 I get this compile time error:
>
> region 'RAM' overflowed by 7416 bytes TEC Driver SW4STM32 Configuration

Which would be expected if you set the FreeRTOS heap fill all the 
available RAM - it is just a statically allocated array and you must 
allow some RAM for use by other variables in the system, the stack used 
by main(), etc.


> So when the amount of RAM in hardware minus the overflow minus the
> configTOTAL_HEAP_SIZE value is 32000 - 7416 - 10000 = 24584. That is
> 24584 bytes of RAM I cannot account for, and need to use if it isn't in
> use elsewhere. If I try to use it by increasing the size of
> configTOTAL_HEAP_SIZE by any significant amount over 10000 I get a call
> to HardFault().

Have you tried stepping through the code to see where the hard fault is 
generated, or otherwise debugging 
the hard fault?

spflanze wrote on Thursday, March 10, 2016:

Yes, I did. The result did not make sense to me.

The call to osKernelStart () leads to the Supervisor Call Handler listed below:

                          SVC_Handler:

08005f4c: SVC_Handler+0 ldr r3, [pc, #24] ; (0x8005f68 <SVC_Handler+28>)
08005f4e: SVC_Handler+2 ldr r1, [r3, #0]
08005f50: SVC_Handler+4 ldr r0, [r1, #0]
08005f52: SVC_Handler+6 ldmia.w r0!, {r4, r5, r6, r7, r8, r9, r10, r11, lr}
08005f56: SVC_Handler+10 msr PSP, r0
08005f5a: SVC_Handler+14 isb sy
08005f5e: SVC_Handler+18 mov.w r0, #0
08005f62: SVC_Handler+22 msr BASEPRI, r0
08005f66: SVC_Handler+26 bx lr
08005f68: SVC_Handler+28 adds r2, #64 ; 0x40
08005f6a: SVC_Handler+30 movs r0, #0
399 configASSERT( uxCriticalNesting == 1000UL );

After the instruction at SVC_Handler+4 is executed the value stored in r0 is 0x20002744.

When the instruction at SVC_Handler+6 is executed, the program counter increments to SVC_Handler+10, and then the disassembly tab changes from displaying the SVC_Handler code above, to the displaying the SVC_Handler code below:

                           SVC_Handler:

08005f4c: SVC_Handler+0 movs r4, r4
08005f4e: SVC_Handler+2 movs r0, r0
08005f50: SVC_Handler+4 movs r4, r4
08005f52: SVC_Handler+6 movs r0, r0
08005f54: SVC_Handler+8 movs r4, r4
08005f56: SVC_Handler+10 movs r0, r0
08005f58: SVC_Handler+12 movs r4, r4
08005f5a: SVC_Handler+14 movs r0, r0
08005f5c: SVC_Handler+16 movs r4, r4
08005f5e: SVC_Handler+18 movs r0, r0
08005f60: SVC_Handler+20 movs r4, r4
08005f62: SVC_Handler+22 movs r0, r0
08005f64: SVC_Handler+24 movs r4, r4
08005f66: SVC_Handler+26 movs r0, r0
08005f68: SVC_Handler+28 movs r4, r4
08005f6a: SVC_Handler+30 movs r0, r0

The value of r0 changes to 0x20002768. The call to HardFault happens right after this.

If the disassembly window is to be believed it appears that something is happening to change the instructions in flash. What else could it be?

rtel wrote on Friday, March 11, 2016:

Something goes wrong when the instructions you are viewing change. The addresses are the same, but the instructions decoded are different. That could be because the debugger has let go and is no longer able to query the target correctly, or because the memory map is remapped (because a system register was written to), or no doubt other reasons too.

I suspect that, if you are setting the FreeRTOS heap to the absolutely maximum that the linker will allow, then the heap is overlapping the stacks (which the linker may not know about), resulting in the stack becoming corrupted.

spflanze wrote on Friday, March 11, 2016:

In the above debug session description where the assembly code changed configTOTAL_HEAP_SIZE was set to 10500. The maximum setting the linker will allow is 24584.

This problem does not happen when configTOTAL_HEAP_SIZE is set to 10000.

The value of r0 during the execution of the instruction at SVC_Handler+6 changes from 0x20002744 to 0x20002768. This is a RAM address range. It is the location range in RAM that contains the data loaded into the registers r4, r5, r6, r7, r8, r9, r10, r11, lr. No memory mapped system registers are being written to. I do not understand how that could change a memory map.

After stepping up to and including the instruction at SVC_Handler+6 this message appears in the Console tab:
Info : halted: PC: 0x08005f4e
Info : halted: PC: 0x08005f50
Info : halted: PC: 0x08005f52
Info : halted: PC: 0x08005f56
Warn : WARNING! The target is already running. All changes GDB did to registers will be discarded! Waiting for target to halt.

edwards3 wrote on Saturday, March 12, 2016:

The warning is telling you the debugger has lost contact with the target, so what you see in the debugger and reported in your post is junk probably random data.

spflanze wrote on Sunday, March 13, 2016:

The instruction at SVC_Handler+6 is a psuedo code. Which means there is more going on there than a single opcode.

I have noticed that when the breakpoint at SVC_Handler+0 is hit, and I do not go into Instruction Stepping Mode there as I had in the above code, and do a Step Over (F5), it advances to StartThreadUART_Rx+0, where there is a call to HardFault().

The relevant part of the C source code for StartThreadUART_Rx is pasted below:

void StartThreadUART_Rx(void const * argument)
{ uint8_t rxchar
  BaseType_t xQueueReceiveResult;

  for(;;)
  { xQueueReceiveResult = xQueueReceive( QueueUART_RxHandle, (void *)&rxchar, portMAX_DELAY );
  .....
  } }

The statements that creates the thread and the queue elsewhere in the firmware are:

osThreadDef( ThreadUART_Rx, StartThreadUART_Rx, osPriorityNormal, 0, 512);
ThreadUART_RxHandle = osThreadCreate(osThread(ThreadUART_Rx), NULL);  

/* definition and creation of QueueUART_Rx */
osMessageQDef(QueueUART_Rx, 10, uint8_t );
QueueUART_RxHandle = osMessageCreate(osMessageQ(QueueUART_Rx), NULL);

When stepping through C source code the HardFault() is called in the statement where xQueueReceive() is called. When stepping through the assembly code pasted below the call to HardFault() is called when the instruction at StartThreadUART_Rx+14 is executed.

StartThreadUART_Rx:
08001ea4:  StartThreadUART_Rx+0   push {r4, lr}
08001ea6:  StartThreadUART_Rx+2   sub sp, #8
08001ea8:  StartThreadUART_Rx+4   sub.w r3, sp, #12224    ; 0x2fc0
08001eac:  StartThreadUART_Rx+8   movs r2, #0
08001eae:  ...tThreadUART_Rx+10   str.w r2, [r3, #-60]
159                               { xQueueReceiveResult = xQueueReceive( QueueUART_RxHandle, (void *)&rxchar, portMAX_DELAY );
08001eb2:  ...tThreadUART_Rx+14   ldr r4, [pc, #128]      ; (0x8001f34 <StartThreadUART_Rx+144>)
08001eb4:  ...tThreadUART_Rx+16   movs r3, #0
08001eb6:  ...tThreadUART_Rx+18   mov.w r2, #4294967295
08001eba:  ...tThreadUART_Rx+22   add.w r1, sp, #7
08001ebe:  ...tThreadUART_Rx+26   ldr r0, [r4, #0]
08001ec0:  ...tThreadUART_Rx+28   bl 0x80068d4 <xQueueGenericReceive> 

The value at StartThreadUART_Rx+144 is:

08001f34:  ...ThreadUART_Rx+144    ; <UNDEFINED> instruction: 0x47ec

When the call to HardFault() happens this message appears in the console:

Info : halted: PC: 0x08001eac
Info : halted: PC: 0x08001eae
Info : halted: PC: 0x08001eb2
Info : halted: PC: 0x08000280
Error: jtag status contains invalid mode value - communication failure
Polling target stm32f3x.cpu failed, GDB will be halted. Polling again in 100ms
Info : Previous state query failed, trying to reconnect
Polling target stm32f3x.cpu succeeded again, trying to reexamine
Info : stm32f3x.cpu: hardware has 6 breakpoints, 4 watchpoints

All this is doing is loading the value 0x47ec into r4 and that causes a call to HardFault(). The ST-Link V2 gets disconnected, but it seems it automaically reconnects. How could the instruction at StartThreadUART_Rx+14 create a Hard Fault?

spflanze wrote on Sunday, March 13, 2016:

I have looked up where the pc registers points when HardFault() is called. This is the first instruction in MX_USB_DEVICE_Init() which is called in StartDefaultTask(). I commented out the MX_USB_DEVICE_Init() call and found that HardFault() happened on the next function call after the commented out MX_USB_DEVICE_Init(). It is begining to look like HardFault() is called upon the first function call in any thread.

I have set configCHECK_FOR_STACK_OVERFLOW to a value of 2 and implemented vApplicationStackOverflowHook(). It is not called.

I have also set configUSE_MALLOC_FAILED_HOOK to a value of 1 snd implemented vApplicationMallocFailedHook(). It also is not called.

rtel wrote on Sunday, March 13, 2016:

If everything works fine until the array size is increased I would still
first suspect that the larger array is overlapping RAM used for
something else (like the stack used by an ISR for example), or is
pushing other variables up the memory space so they are overlapping with
RAM used by something else.

spflanze wrote on Sunday, March 13, 2016:

The FREERTOS heap is declared as a static array in line 102 heap_4.c.

static uint8_t ucHeap[ configTOTAL_HEAP_SIZE ];

Wouldn’t the linker avoid assigning any variable storage that overlaps with storage allocated to ucHeap[]?

If there was not enough RAM to assign to all the other arrays and variables wouldn’t the linker fail?

A workspace wide search for the function malloc did not get any hits other than where the function itself is defined in stdlib.h. All dynamic allocations are being made with pvPortMalloc(). If there was not enough RAM to dynamically allocate storage wouldn’t vApplicationMallocFailedHook() be called?

If a stack overflowed wouldn’t vApplicationStackOverflowHook() be called?

spflanze wrote on Monday, March 14, 2016:

I have discovered I have not had any information about the cause of the Hard Fault becuse in SCB->SHCSR the USGFAULTENA, BUSFAULTENA, and MEMFAULTENA bits were not set. So as one of the top lines of code in main() I put this statement:

SCB->SHCSR |= (1<<16) | (1<<17) | (1<<18);

The result is in SCB->SHCSR the BUSFAULTACT bit (bit 1) is set and in SCB->CFSR the IMPRECISERR bit (bit 10) is set.

I do not now know what to do about this fault.

rtel wrote on Monday, March 14, 2016:

Wouldn’t the linker avoid assigning any variable storage that overlaps
with storage allocated to ucHeap?

If there was not enough RAM to assign to all the other arrays and
variables wouldn’t the linker fail?

Assuming you have the linker script configured correctly for your
target; the linker will not itself allocate two variables so that they
would overlap, but it is possible there are other memory regions in use
that are not directly allocated by the linker. For example, space left
in the memory map for use by the stack.

A workspace wide search for the function malloc did not get any hits
other than where the function itself is defined in stdlib.h.

I think my first reply suggested a way of verifying this was actually
the case, did you try the suggestion?

All dynamic
allocations are being made with pvPortMalloc(). If there was not enough
RAM to dynamically allocate storage wouldn’t
vApplicationMallocFailedHook() be called?

Yes, but that won’t tell you if the memory is overlapping with something
else, such as a stack.

If a stack overflowed wouldn’t vApplicationStackOverflowHook() be called?

Only if the stack of a task was corrupted, not if the stack used by
interrupts was corrupted - but it doesn’t sound like you are getting
that far anyway.

spflanze wrote on Tuesday, March 15, 2016:

To change the imprecise error to a precise one, at the cost of some processor performance, I added this line of code to one of the first lines of code in main():

SCnSCB->ACTLR |= SCnSCB_ACTLR_DISDEFWBUF_Msk;

I now have identified what is causing this Bus Fault in the assembly code. I still do not know how to fix this in C source code. When source code is stepped the fault happens on the line where xQueueReceive() is called in the below code snippet:

void StartThreadUART_Rx(void const * argument)
{ uint8_t rxchar;
  static pMsg_t pmsg;
  pMsg_t pEchoMsg;
  portBASE_TYPE TxQueueResult;
  HAL_StatusTypeDef HAL_UART_DMAResumeStat;
  UBaseType_t N;
  BaseType_t xQueueReceiveResult;

  for(;;)
  { xQueueReceiveResult = xQueueReceive( QueueUART_RxHandle, (void *)&rxchar, portMAX_DELAY );
  ...
}

The corresponding compiler produced assembly code is:

 150                             { uint8_t rxchar;
                                StartThreadUART_Rx:
0800285c:  StartThreadUART_Rx+0   push {r4, r7, lr}
0800285e:  StartThreadUART_Rx+2   sub sp, #36     ; 0x24
08002860:  StartThreadUART_Rx+4   add r7, sp, #0
08002862:  StartThreadUART_Rx+6   str r0, [r7, #4]
08002864:  StartThreadUART_Rx+8   sub.w r3, sp, #12224    ; 0x2fc0
08002868:  ...tThreadUART_Rx+12   subs r3, #60    ; 0x3c
0800286a:  ...tThreadUART_Rx+14   movs r2, #0
0800286c:  ...tThreadUART_Rx+16   str r2, [r3, #0] ; Call to BusFault_Handler() on this line
159                               { xQueueReceiveResult = xQueueReceive( QueueUART_RxHandle, (void *)&rxchar, portMAX_DELAY );
0800286e:  ...tThreadUART_Rx+18   ldr r3, [pc, #172]      ; (0x800291c <StartThreadUART_Rx+192>)
08002870:  ...tThreadUART_Rx+20   ldr r0, [r3, #0]
08002872:  ...tThreadUART_Rx+22   add.w r1, r7, #15
08002876:  ...tThreadUART_Rx+26   movs r3, #0
08002878:  ...tThreadUART_Rx+28   mov.w r2, #4294967295
0800287c:  ...tThreadUART_Rx+32   bl 0x8009860 <xQueueGenericReceive>

Pasted below is what I see happening when the assembly code is instruction stepped. The given values of sp and r3 are as the are after the instruction for the described line is executed:

Starting value sp = 0x20002b78 <StartThreadUART_Rx>4:24 PM 3/14/2016
...Rx+0 Push r4, r7, and lr onto the process stack. (sp = 0x20002b6c <ucHeap+9052>)
...Rx+2 Subtract from the stack pointer a value of 36 (sp = 0x20002b48)
...Rx+8 Subtract a value of 12224 from the stack pointer and put the result in r3 (r3 = 0x1ffffb88 )
...Rx+12 Subtract a value of 60 from r3 ( r3 = 0x1ffffb4c )
...Rx+14 Set r2 to a value of zero.
...Rx+16 Store a value of zero (r2) into the location pointed to by r3.

The fault is happening because at …Rx+16 a value is attempted to be written to address 0x1ffffb4c, which is not a valid RAM address. Everything at and below 0x1ffffffff is reserved for flash memory.

My question now is what in the C source code can cause instructions to be wrtten that cause this BusFault_Handler() call?

The subtraction by 12224 is suspiciously large. What is this doing?

rtel wrote on Tuesday, March 15, 2016:

This is excellent information - now we can see the route cause of the
problem, and it probably also shows why the size of the array is
effecting the symptom - if the array is smaller then maybe the memory
access would have stayed in the RAM region?

You say you are using Ac6 System Workbench - am I right in thinking that
means you are using GCC? Can you post the GCC command line that is
being generated by the tool for both the compilation and linking phase
(just post the command generated to compile one C file, not every C
file, and the final linking phase). It will be interesting to see the
optimisation level, and where the linker script is coming from.

Are you 100% sure the linker script is correctly describing the memory
layout of the device?

sub.w r3, sp, #12224

This does look like a bizarre line of assembly code to me (by the way,
this issue does not look to be related to FreeRTOS directly). Stack
pointer relative addressing outside of the current stack frame is not
normally seen. Prior to this line of code the asm seems to be setting
up the stack frame (I’m guessing that in total the variables declared on
the stack consume 36 bytes) and frame pointer.

How big is the pMsg_t type?

pmsg is static, so not on the stack, can you look in the map file to see
where it is allocated (what is its address).

spflanze wrote on Tuesday, March 15, 2016:

If the array were smaller maybe it would, but the assembly instructions that cause the Bus Fault are suspicous to me now. I will need to know more about this before I have confidence a smaller ucHeap array would cause memory corruption instead of a Bus Fault. I could live with smaller memory, but it would mean eliminating a thread or two.

The processor is an STM32F373VCT6.

The IDE is Ac6 System Workbench using GCC.

The MCU GCC Compiler command line arguments:

-mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 -D__weak="__attribute__((weak))" -D__packed="__attribute__((__packed__))" -DUSE_HAL_DRIVER -DSTM32F373xC -I../../../Inc -I"C:\Projects\TEC Driver SW4STM32\TEC\Inc" -I"C:\Projects\TEC Driver SW4STM32\Inc" -I../../../Drivers/STM32F3xx_HAL_Driver/Inc -I../../../Drivers/STM32F3xx_HAL_Driver/Inc/Legacy -I../../../Middlewares/Third_Party/FreeRTOS/Source/portable/GCC/ARM_CM4F -I../../../Middlewares/ST/STM32_USB_Device_Library/Core/Inc -I../../../Middlewares/ST/STM32_USB_Device_Library/Class/CDC/Inc -I../../../Middlewares/Third_Party/FreeRTOS/Source/include -I../../../Middlewares/Third_Party/FreeRTOS/Source/CMSIS_RTOS -I../../../Drivers/CMSIS/Include -I../../../Drivers/CMSIS/Device/ST/STM32F3xx/Include -O0 -g3 -Wall -fmessage-length=0 -fstack-usage -fstack-check -c -fmessage-length=0

The MCU GCC Linker command line arguments:

-mcpu=cortex-m4 -mthumb -mfloat-abi=hard -mfpu=fpv4-sp-d16 -specs=nosys.specs -specs=nano.specs -T"../STM32F373VCTx_FLASH.ld" -Wl,-Map=output.map -Wl,--gc-sections -lm

pMsg_t is a pointer to a small structure:

   typedef struct
   { Msg_Itr_t itr;
     int32_t strln;
     char *pstr;
   } Msg_t;
   typedef Msg_t* pMsg_t;

In the Expressoins tab the expression &pmsg evaluates to 0x200007cc

rtel wrote on Tuesday, March 15, 2016:

First, if you have not done so already, its worth checking the
STM32F373VCTx_FLASH.ld file to be sure it is correct - you would imagine
it is, but something is wrong somewhere.

Even if the linker script were wrong I would still be at a loss to
understand the stack pointer relative addressing going so far outside of
the current stack frame.

The command line does not specify an optimisation level - I don’t know
what the default is but would guess it to be either -O0 or -O1 - try
compiling with both of those optimisation levels set explicitly in turn
to see how the asm code changes.

spflanze wrote on Tuesday, March 15, 2016:

Pasted below is what was copied from the first lines of the STM32F373VCTx_FLASH.ld currently in effect:

/* Entry Point */
ENTRY(Reset_Handler)

/* Highest address of the user mode stack */
_estack = 0x20008000;    /* end of RAM */
/* Generate a link error if heap and stack don't fit into RAM */
_Min_Heap_Size = 0x200;      /* required amount of heap  */
_Min_Stack_Size = 0x400; /* required amount of stack */

/* Specify the memory areas */
MEMORY
{
FLASH (rx)      : ORIGIN = 0x08000000, LENGTH = 256K
RAM (xrw)      : ORIGIN = 0x20000000, LENGTH = 32K
}

The optimizaton level in effect for the instruction stepping result above was -O0. Earlier I also had tried -Og which got the same result regarding the Bus Fault.

rtel wrote on Tuesday, March 15, 2016:

It is likely, given that linker script, that the linker is not taking the space used by the stack into account, but that would not explain the straing asm instructions.

Did you try -O1 to see what asm is generated then?