configTOTAL_HEAP_SIZE increase causig instability

Hello. I’m implementing an http server on STM32F746 using FreeRTOS. The system has 3 tasks in total and it serves two different .html files containing media files(images in this case). Everything worked great until I decided to open both webpages at the same time: one loaded perfectly but the other caused a malloc fail (program was reaching vApplicationMallocFailedHook) . Then I realized (using xPortGetFreeHeapSize function ) that I was running out of heap when I request the second webpage. At this time I tried to increase configTOTAL_HEAP_SIZE from 16x1024 to 30x1024. The lack of heap problem was solved but with that I got another problem: both webpages not loading the images/loading indefinitely. Sometimes I was getting strange symbols on my CubeIDE console(I use SWV ITM Data console for logging) , sometimes one of the tasks was not being created at all. I dropped configTOTAL_HEAP_SIZE down to 25*1025 and the first page loads fully but the second still misses contents/hangs.

I found a thread describing a similiar issue here and tried to use static tasks creation but I got the same problems and sometimes worse.

I increased _Min_Heap_Size = 0x4000; and _Min_Stack_Size = 0x4000; in my .ld file but got no improvement. Not getting any stack overflow.

Is there a limit for configTOTAL_HEAP_SIZE ?

Which FreeRTOS heap are you using?

That is most likely not used unless you are calling malloc directly in your application.

The only limit is the total RAM available on your hardware. So I do not think that is the issue.

I think you should debug the cause of this which may not be related to heap.

1 Like

The main reply text is all jumbled; I typed this between work related tasks.

The TL;DR is that I think you may have DMA crossing memory block boundaries, and that you can fix it by carefully placing where the buffers are being allocated from to not cross those boundaries.


There are some possibilities I can think of; here’s my line of thought:

  • Assuming you’re not sizing the heap through the linker control file (which memory allocator are you using?), the heap ends up in the BSS section of the ELF image file.
  • The STM32F746 has several different, and in some places non-contiguous, blocks of embedded SRAM:
    • 0x0000_0000 - 0x0000_3FFF - ITCM-RAM (16 Kbytes)
    • 0x2000_0000 - 0x2000_FFFF - DTCM-RAM (64 Kbytes)
    • 0x2001_0000 - 0x2004_BFFF - SRAM1 (240 Kbytes)
    • 0x2004_C000 - 0x2004_FFFF - SRAM2 (16 Kbytes)
  • The tightly coupled instruction memory (ITCM) address range is not contiguous with the tightly coupled data, SRAM1 and SRAM2, however it is the best place for code to be stored. The first bit of that address range is where the interrupt vectors go, leaving less memory for the executable code for your application or FreeRTOS kernel, so by default, it puts the .text segments in SRAM1 (probably)
  • The linker script determines where any given ELF segment ends up
  • If the linker script is naive, it might ignore DTCM (or just put the linker defined program stack and heap, which FreeRTOS doesn’t use by default) and treat SRAM1 + SRAM2 as a contiguous 256K memory range. The text segments (instructions) and read-only data go first, so it’s possible that BSS, or some other data segment, is crossing the SRAM1 to SRAM2 boundary
  • I don’t think that a single DMA operation can cross boundaries between different memories
  • If a buffer used by the HTTP server is crossing the SRAM1 - SRAM2 boundary, it’s likely that the DMA operation feeding the Ethernet controller will abort; this will cause some symptoms that may match what you’re seeing

If my reasoning is correct, a possible solution is to create or edit the project linker script to explicitly place the allocator heap in one of the SRAM banks (probably SRAM1) starting at a specific location that ensures it does not cross RAM bank boundaries. If you’re using dynamic task creation, the 4K stack size may be wasting memory (if the code is creating large arrays or structures on the stack, this is not necessarily true).

My preference when working with FreeRTOS is to not use dynamic object allocation in the kernel, and always statically define all of the Task structures and stacks, along with all the other FreeRTOS objects you’re using. This means the heap is not getting used for the kernel objects or your task. If I can’t make the code use statically allocated buffers, I use a heap manager that automatically uses the heap section defined in the linker script rather than declaring a static array large enough for configTOTAL_HEAP_SIZE. I don’t recall which allocator that one is. It is possible to give the compiler a segment name on a per statically defined object definition basis, so the linker script will group like-named segments into a contiguous area within bounds that you can define in the linker script. A well designed linker script, tailored to the application, and a bunch of #pragmas or __attribute(()) to tell the compiler what segments to put code and statically defined data at can let you optimize memory usage and improve performance by putting critical code and data in the ITCM and DTCM, respectively, and leave you more space for things in SRAM1 and SRAM2.

1 Like

Sorry fot the late response, din’t get any update through e-mail.

Which FreeRTOS heap are you using?

heap_4

Thank you @danielglasser for the detailed response it added more clarity. I actually placed my DMATx and DMARx descriptors on the very beggining of SRAM2(Memory_B1,Memory_B2) my linker script:

MEMORY
{
  RAM    (xrw)    : ORIGIN = 0x20000000,   LENGTH = 320K
  FLASH    (rx)    : ORIGIN = 0x8000000,   LENGTH = 1024K
  Memory_B1(xrw)   : ORIGIN = 0x2004C000, LENGTH = 0xA0
  Memory_B2(xrw)   : ORIGIN = 0x2004C0A0, LENGTH = 0xA0
}

Usually the stack pointer starts from the very high memory address (0x2004_FFFF in my case) and increase downwards. The heap on its turn, goes from after the .bss region upwards, according my .map file. So the hypothesis could be that heap is growing up to SRAM2 and crossing SRAM2, corrupting the DMATx and Rx descriptors.

The limitation of DTCM memory is that DMA has no access to it, thus they can’t be used to store ethernet DMA descriptors.

An extract of my .map file:

                0x00000000200173c1                __lock___sfp_recursive_mutex
 *(COMMON)
                0x00000000200173c4                . = ALIGN (0x4)
 *fill*         0x00000000200173c2        0x2 
                0x00000000200173c4                _ebss = .
                0x00000000200173c4                __bss_end__ = _ebss

._user_heap_stack
                0x00000000200173c4     0x8004 load address 0x00000000080be7dc
                0x00000000200173c8                . = ALIGN (0x8)
 *fill*         0x00000000200173c4        0x4 
                [!provide]                        PROVIDE (end = .)
                0x00000000200173c8                PROVIDE (_end = .)
                0x000000002001b3c8                . = (. + _Min_Heap_Size)
 *fill*         0x00000000200173c8     0x4000 
                0x000000002001f3c8                . = (. + _Min_Stack_Size)
 *fill*         0x000000002001b3c8     0x4000 
                0x000000002001f3c8                . = ALIGN (0x8)

/DISCARD/
 libc.a(*)
 libm.a(*)
 libgcc.a(*)

I guess the first approach would be placing Memory_B1 and Memory_B2 into SRMA1 and see what I get? Still have little idea how to debug this issue.