# Failed to allocate heap

bremenpl wrote on Friday, December 21, 2018:

Hello there,
I am using FreeRTOS V9.0.0. the code is added at project generation in the CubeMX software from ST for their STM32 targets. In the project I am working on, I have noticed random, infrequent resets of the device (every couple hours, sometimes even days). The device resets itself on purpose, after entering a critical section. Please consider this function, which is a malloc wrapper (I am using heap4):

/**
* @brief	FreeRTOS \ref pvPortMalloc wrapper with logging.
* @param	size: amount of bytes to allocate on heap.
* @return	non zero if allocation was successful.
*/
void* util_malloc(const size_t size)
{
assert_param(size);

// saved before malloc is called
const size_t freeHeap = xPortGetFreeHeapSize();
void* ptr = pvPortMalloc(size);
const size_t freeHeapAfter = xPortGetFreeHeapSize();

if (freeHeapAfter >= freeHeap)
{
log_PushLine(e_logLevel_Critical,
"Failed to allocate heap. pre %u, after %u, needed %u",
freeHeap, freeHeapAfter, size);
// program counter wont reach here, since Critical log kills the thread
ptr = NULL;
}

return ptr;
}


In those rare situations, the logs reveal such traces:

[2018-11-27T13:34:14.761Z][tUartRx ][001]<Criti>:	Failed to allocate heap. pre 2776, after 4160556831, needed 1
[2018-11-28T06:05:55.698Z][tUartRx ][001]<Criti>:	Failed to allocate heap. pre 2784, after 4160556839, needed 6
[2018-11-28T09:32:50.874Z][tUartRx ][001]<Criti>:	Failed to allocate heap. pre 2776, after 4160556831, needed 1
[2018-11-29T00:37:42.917Z][tNmMain ][005]<Criti>:	Failed to allocate heap. pre 2904, after 4160556959, needed 17
[2018-11-29T00:37:44.494Z][tUartRx ][001]<Criti>:	Failed to allocate heap. pre 4160561167, after 4160561167, needed 144
etc


I cannot understand where does the free heap value after allocation come from. I would appreciate all help regarding debugging this issue.

rtel wrote on Friday, December 21, 2018:

You can see here:
that xPortGetFreeHeapSize() just returns a variable. If the allocation
failed, then the variable will not have changed, so the before and after
values should be the same - so I’m guessing the allocation didn’t fail
but you have some strange integer promotion issue going on. Have a look
at how to use the malloc failed hook here:
https://www.freertos.org/a00016.html

One comment on things that could trigger your assertion (but shouldn’t give the big (negative) number you are getting) would be if another task does a free between your two calls to xPortGetFreeHeapSize(). The pvPortMalloc() call has protection from re-entrancy, but that won’t protect your wrapper.

The big value may be sign that something is doing a ‘wild write’ and corrupting the heap.

rtel wrote on Saturday, December 22, 2018:

Ah yes, good catch.

bremenpl wrote on Saturday, December 22, 2018:

Hello guys, thank you for answers.
After I wrote this topic I thought of the re-entrancy of my wrapper. Thus, I added a mutex before each malloc and free:

/**
* @brief	FreeRTOS \ref pvPortMalloc wrapper with logging and mutex lock
* @param	size: amount of bytes to allocate on heap.
* @return	non zero if allocation was successful.
*/
void* util_malloc(const size_t size)
{
assert_param(size);

if (!size)
return NULL;

// saved before malloc is called
const size_t freeHeap = xPortGetFreeHeapSize();
void* ptr = pvPortMalloc(size);
const size_t freeHeapAfter = xPortGetFreeHeapSize();

if (freeHeapAfter >= freeHeap)
{
log_PushLine(e_logLevel_Critical,
"Heap alloc failure. pre %u, after %u, needed %u, ptr 0x%X",
freeHeap, freeHeapAfter, size, (uint32_t)ptr);

ptr = NULL;
}

return ptr;
}

/**
* @brief	FreeRTOS \ref vPortFree wrapper with logging and mutex lock
* @param	pv: pointer to the memory which has to be free'd.
*/
void util_free(void* pv)
{
assert_param(pv);

if (!pv)
return;

vPortFree(pv);
}

/**
* @brief	A macro wrapper for mutex taking and releasing.
*
* 			When used in a function body, the mutex is taken immedietally and
* 			released only after the function returns (automatically, RAII style)
*
* @param	id: the mutex identifier variable. NOTE: this has to be the actual
* 			mutex variable, since the \ref util_mutexCreate and
* 			\ref util_mutexReleasePtr functions works on a pointer to this
* 			variable. Also the \p id mutex is only initialized once by the
* 			\ref util_mutexCreate functions.
* @return	none.
*/
#define	UTIL_UNIQUE_LOCK(id)												\
util_mutexCreate((osMutexId* const)&id);								\
util_mutexWait(id, osWaitForever);										\
osMutexId thyId __attribute__((cleanup(util_mutexReleasePtr))) = id;	\

/**
* @brief	\ref util_mutexRelease wrapper for usage with RAII macro.
* @param 	id: pointer to the mutex id.
* @return	same as \ref util_mutexRelease.
*/
HAL_StatusTypeDef util_mutexReleasePtr(osMutexId* id)
{
assert_param(id);
return util_mutexRelease(*id);
}


For the heap allocation/ dealocation I am using only my wrappers in the code, so the heap corruption could come only from them. Do you think a rare race condition could be the main case here like you mentioned?

Richard- after what you have said, does that mean my “manual” heap check doesnt even make sense, as I should only rely on what pvMalloc returns and the eventual malloc failed hook?

You don’t show how your mutex utility code works, but unless util)mutexCreate has the smarts to check if the passed handle is already created and the thyID code is setting up an object that releases the mutex at the end of scope (are you using C++ here?) then your code isn’t protecting itself, as each call would be using a new mutex. Also, you aren’t protecting against FreeRTOS itself freeing memory as it won’t go through your wrapper.

I am not sure what your purpose is on checking that the heap free space has grown during a call, as that is really only checking that the heap function itself is working (which is well tested code), and not anything about your own code.

From the results that you showed, my guess is something else is corrupting memory, causing the very big numbers, somehow being normally syncronized with this call (something else, maybe higher priority, doing a heap call that doesn’t like being delayed a bit by the heap syncronization?).

bremenpl wrote on Saturday, December 22, 2018:

Hi, thank you for answer. The function creating the mutex is singleton style- it will create it only once:

/**
* @brief	Initializes the \p id mutex, but only if uninitialized
* @param	id: Mutex to be initialized.
* @return	\ref HAL_OK on succesfull init or if already initialized.
*/
HAL_StatusTypeDef util_mutexCreate(osMutexId* const id)
{
assert_param(id);

if (!id)
return HAL_ERROR;

if (*id)
return HAL_OK; // already initialized

osMutexDef_t tmp;
if (!(*id = osMutexCreate(&tmp)))
{
log_PushLine(e_logLevel_Critical, "Failed to create mutex");
return HAL_ERROR;
}

return HAL_OK;
}


As for the other stuff, I am not using C++. This is a C extension: http://echorand.me/site/notes/articles/c_cleanup/cleanup_attribute_c.html
When the function calling UTIL_UNIQUE_LOCK macro returns, just before that a provided in the macro function is called. In my situation, it is the mutex release function. This is a mere implementation of C++'s unique_lock.

The check of the heap was just for logging purposes and I thought it would never trigger, but it did, thus this all concern.

I am not a FreeRTOS API expert, but what I have noticed so far, the only moment in which the heap is altered, is when one cretes objecs (like mutexes, semaphores, queues etc) and when one allocates memory or frees it. All my objects are created at MCU startup before the scheduler starts (apart from this create mutex implementation I just added now for testing purposes). While the scheduller is running, I only allocate and free memory using my wrappers.

After the further info I provided, would you say that this malloc and free wrappers are thread safe? Its really hard for me to figure out what else could cause the memory leak/ seg faults. Like I said, this happens so, so rarelly and infrequently. I would appreciate all further suggestions.

If you always call through your wrappers, and never delete any FreeRTOS objects then you should be ‘thread safe’. My point was that testing for the heap free space not changing as you expect isn’t really testing any of your code, and is an awkward test for the allocation failing (you should just test that the results are null).

The issue of the strange values likely isn’t likely a heap issue, unless some code is freeing a block with an address that didn’t come from the allocation function. More likely some operation is doing a ‘wild write’ (perhaps overrunning a near by array, maybe look at the link map) and breaking the heap functions. The interesting thing is that most of the cases seem to have the corruption occuring between the start and end of the allocation wrapper (but likely NOT directly due to that thread itself, as that code looks pretty safe. The one thing I can think of is that during this call the scheduler is disabled for a little bit while you are inside pvPortMalloc, and perhaps some ISR is trying to activate a task and something doesn’t like it not happening fast enough. My though here is that 4 of the 5 failures had reasonable values at the start of the function, and it went bad during it, so something fairly unique must be happening that triggers the condition.

bremenpl wrote on Saturday, December 22, 2018:

Thank you for answer. This would be a very interesting case. I must look into the map file, as well as go through all isr’s I utilize and check for memory allocs there. As far as I remember though, in isr’s I only put items to queues and release semaphores, or trigger signals. Also, like you mentioned, I never remove the created objects (like tgreads and queues).

Don’t just look at ‘heap’ stuff. If an ISR is filling a buffer, or writes through a pointer, and is expecting this to be properly set up, and this might not happen if a task that is activated gets delayed, you could have an issue (If an ISR is actually allocating memory by calling the allocation functions, you have broken the FreeRTOS rules, as there is no FromISR memory allocation functions).

bremenpl wrote on Saturday, December 22, 2018:

You are right, uaing the api there is no way to alloc memory from isr. In that case I am surely not doing that. But a memory brock overwrite that writes over the rtoa heap array sounds promising. I will look for that, thank you.

If I remember right (don’t have the code handy) heap4 doesn’t actually parse the heap to get the remaining space, but keeps track of the amount of memory available as a single variable, so the overwrite would be of that variable, not the heap itself.

bremenpl wrote on Saturday, December 22, 2018:

Yes, thats actually what I meant, but written something else… This shod make it easier to find thwn I hope. Thanj you very much.

bremenpl wrote on Wednesday, December 26, 2018:

Hi Richard, hope you are having a good holidays,
Looking at the map file its not easy to point the problem here:

 *fill*         0x2000077d        0x3
.bss.FatFs     0x20000780        0x8 Middlewares\Third_Party\FatFs\src\ff.o
.bss.Fsid      0x20000788        0x2 Middlewares\Third_Party\FatFs\src\ff.o
*fill*         0x2000078a        0x2
.bss.Files     0x2000078c       0x60 Middlewares\Third_Party\FatFs\src\ff.o
.bss.disk      0x200007ec       0x10 Middlewares\Third_Party\FatFs\src\ff_gen_drv.o
0x200007ec                disk
.bss.ucMaxSysCallPriority
0x200007fc        0x1 Middlewares\Third_Party\FreeRTOS\Source\portable\GCC\ARM_CM4F\port.o
*fill*         0x200007fd        0x3
.bss.ulMaxPRIGROUPValue
0x20000800        0x4 Middlewares\Third_Party\FreeRTOS\Source\portable\GCC\ARM_CM4F\port.o
.bss.ucHeap    0x20000804     0xc000 Middlewares\Third_Party\FreeRTOS\Source\portable\MemMang\heap_4.o
.bss.xStart    0x2000c804        0x8 Middlewares\Third_Party\FreeRTOS\Source\portable\MemMang\heap_4.o
.bss.pxEnd     0x2000c80c        0x4 Middlewares\Third_Party\FreeRTOS\Source\portable\MemMang\heap_4.o
.bss.xFreeBytesRemaining
0x2000c810        0x4 Middlewares\Third_Party\FreeRTOS\Source\portable\MemMang\heap_4.o
.bss.xMinimumEverFreeBytesRemaining
0x2000c814        0x4 Middlewares\Third_Party\FreeRTOS\Source\portable\MemMang\heap_4.o
.bss.xBlockAllocatedBit
0x2000c818        0x4 Middlewares\Third_Party\FreeRTOS\Source\portable\MemMang\heap_4.o
.bss.pxCurrentTCB
0x2000c81c                pxCurrentTCB


The variable is mapped as follows:

.bss.xFreeBytesRemaining
0x2000c810        0x4 Middlewares\Third_Party\FreeRTOS\Source\portable\MemMang\heap_4.o


The firther addresses is more FreeRTOS, the earlier addresses is FatFs, which is a pretty solid code as well. What do you think?

It looks like the thing just before is the heap, so if you write past the end of an object allocated on the heap, you can overwrite the variariable.

bremenpl wrote on Thursday, December 27, 2018:

Thanks for answer Richard,
So looking at the map, the first thing before the xFreeBytesRemaining that is not related to FreeRTOS and CMSIS is the disk variable, which is part of the FatFs. Looking inside that struct:

Disk_drvTypeDef disk = {{0},{0},{0},0};

...

/**
* @brief  Global Disk IO Drivers structure definition
*/
typedef struct
{
uint8_t                 is_initialized[_VOLUMES];
const Diskio_drvTypeDef *drv[_VOLUMES];
uint8_t                 lun[_VOLUMES];
volatile uint8_t        nbr;

}Disk_drvTypeDef;


There are couple arrays. Their size depends on the number of volumes available in the system. For me it is 2 volumes. Maybe at some point the index exceeds 1 here… I think I will start with turning the system on with debugger probe attached and breakpoint placed in the place where free heap variable is corrupted. It might be already too late for finding enything, but I could check the FatFs variables values at that point while halted. I was trying to find the error statically, but with no luck so far.

bremenpl wrote on Monday, January 07, 2019:

Hi Richard,
Just wanted to write an update on further tests. I was quite desparate, as couldnt find the reason for this bug for a while. I consulted a friend, and he suggested that since I am running on an 32 bit architecture (Arm cortex m4), maybe my problem is due to the alignement issues. I was sceptic about this, but had to give it a try. I modified the malloc wrapper as follows:

/**
* @brief	The \ref util_malloc function will allocate amounts of bytes
* 			that divided by this value give 0.
*/
#define UTIL_MALLOC_DIVIDER				4

/**
* @brief	FreeRTOS \ref pvPortMalloc wrapper with logging and mutex lock
* @param	size: amount of bytes to allocate on heap.
* @return	non zero if allocation was successful.
*/
void* util_malloc(const size_t size)
{
assert_param(size);

if (!size)
return NULL;

// Create actual allocated size
const size_t mod = size % UTIL_MALLOC_DIVIDER;
const size_t endSize = (!mod) ? size : size + UTIL_MALLOC_DIVIDER - mod;

// saved before malloc is called
const size_t freeHeap = xPortGetFreeHeapSize();
void* ptr = pvPortMalloc(endSize);
const size_t freeHeapAfter = xPortGetFreeHeapSize();

if ((freeHeapAfter >= freeHeap) || !ptr)
{
log_PushLine(e_logLevel_Critical,
"Heap alloc failure. pre %u, after %u, needed %u, ptr 0x%X",
freeHeap, freeHeapAfter, endSize, (uint32_t)ptr);

ptr = NULL;
}

return ptr;
}


So now I always allocate an amount of memory that is a multiplication of 4 (and always at least 4). Since I did this, the problem did not occur yet, the program is running for some days now. I am aware that this is no proof and the problem can as well occur in a week or a month, but previously the problem occured within couple days max.

Also, before introducing this allocation method, I added the ptr pointer print each time I enter the faulty if clause. Each time the amount of heap was faulty, the malloc function returned 0x8 for the ptr. Since my heap is an array, this seems alright. But on the other hand, a lot of things are allocated on the heap during startup, so this 0x8 value seems a bit small?

What do you think?

rtel wrote on Monday, January 07, 2019:

Not read all this thread, but if it helps, pvPortMalloc() will always
ensure the alignment is correct, so if there was an alignment issue it
could only have been in the wrapper code rather than in the memory
allocater itself (assuming you are using one of the allocators that