Question about using printf() / malloc() with heap_4 & newlib

RA1981 · January 28, 2024, 12:36pm

Hello,

I’m just new to FreeRTOS. I was able to get it up and running on my environment (details below) by creating two simple tasks which output something to the UART. Depending on the UART speed settings etc I can force overlapping outputs which is good to learn how to synchronize/serialize the outputs by different approaches like critical sections, semaphore/mutex, streams, etc, etc.

My goal is to port an application to FreeRTOS which uses a parser for imcoming UART data. The parser is called regularly in the main loop and, depending on the outgoing UART data also directly after the output because it’s needed to wait for the parsed data. The number of places where the parser has to be called separately grows as the application grows. So, this application is an ideal candidate for a RTOS, I think.

The application relies on printf() for output and uses malloc() directly etc. for some buffers. So I’ve read several sources about those topics. For the malloc’d buffers, I think I can replace them by queues or something similar. But for printf() and it’s internal usage of malloc() I’m not sure what’s exactly to do to get it up and running.

The environment is GCC 5.4 with newlib 2.4 (very old) on a Cypress/Infineon PSoC 5LP (Cortex-M3). The reason why I didn’t post in the Infineon/Cypress forum is because I don’t think my problem is specific to the manufacturer or device.

So I’ve some questions:

According to “Mastering FreeRTOS” book printf() might fail if anything else than heap_3 is used which simply wraps the library malloc()/free(). But even if those wrappers are thread-safe, the printf() internal calls to malloc() are not thread-safe. Wouldn’t it be needed to call printf() within a critical section?
From what I’ve read on several sources the malloc()/free() lock/unlock functions can be used but it seems they’re not supported on my newlib implementation. Should I consider to build newlib in the newest version with lock/unlock enabled? Never done that before, but if it would help to solve it, I’m willing to do it I also can try to use a newer GCC version, should be possible within the PSoC IDE.
While reading the FreeRTOS documentation etc my first thought was to use heap_4, but it seems there’s an effort needed even when using configUSE_NEWLIB_REENTRANT. I’ve also read Dave Nadlers information about FreeRTOS & newlib, but I’m not sure if it’s generic or specific to STM32 and as far as I understood it doesn’t use any of the FreeRTOS heap implementation. Should I use heap_3 instead?

Regards

hs2 · January 28, 2024, 1:10pm

I think upgrading to an up to date compiler tool chain including newlib is the most simple way if you’re ok with the newlib heap implementation.
Note that when implementing newlib malloc_lock/unlock hooks e.g. using vTaskSuspend/ResumeAll to make them thread safe heap_3.c could be omitted by adding simple macros to your FreeRTOSConfig.h

#define pvPortMalloc malloc 
#define vPortFree    free

if you don’t need configUSE_MALLOC_FAILED_HOOK checking the respective return values yourself.

richard-damon · January 28, 2024, 7:19pm

Personally, I made a copy of heap_3 that I called heap_malloc that removes the calls to the schedule suspend/resume around the calls to malloc and free, and include the function __malloc_lock() and __malloc_unlock() to make malloc and family thread safe.

That way the FreeRTOS heap function calls still have the configurable option for the malloc failed hook and tracing that the other heap functions have. (I also remove the test for dynamic allocation allowed, as I may want to use them myself even if I don’t want FreeRTOS to use dynamic allocations.

RA1981 · January 29, 2024, 6:00pm

Hello Hartmut & Richard,

thank you for your answers.

@hs2 : I’m not sure if I understood it correctly - you mean I can stay with heap_4 and define pvPortMalloc and pvPortFree as the newlib malloc()/free() ?

Regarding the newer version of the toolchain, I’ve setup a small non-RTOS project with the latest ARM-GCC version (13.2). It builds, but throws some errors regarding missing _read(), _close(), etc functions (interestingly those errors still allow to build). The settings passed to GCC are identical as far as I can see, so it’s either a new parameter which is needed or the provided newlib is compiled with (slightly) different settings. I’ve to figure out how the system calls should be implemented, I’ll start with the minimal implementation suggested from the newlib documentation. And of course I’ve to check if malloc lock/unlock is implemented.

@richard-damon :
I’ll do it this way if the lock/unlock mechanism is implemented in the newlib I’m using now and if I decide to stay with heap_3. If the L/U mechanism is available, I want to try out heap_4 with printf().

Regards

hs2 · January 29, 2024, 6:19pm

Sorry - no. I thought you were talking about using heap_3 which is using newlib malloc/free internally.
The missing syscalls should by implemented maybe as empty functions/stubs if you don’t need them (like open/close etc.).

RA1981 · January 29, 2024, 7:11pm

Sorry - no. I thought you were talking about using heap_3 which is using newlib malloc/free internally.

Ah, okay. As I mentioned, the first idea was to take heap_4, but then I realized that it might fail with printf(). The application in its current state does well with the newlib heap implementation, so I think it wouldn’t be a problem to use heap_3. I’ve just to figure out if the lock mechanism is available (and if its not how I can build newlib by myself).

The missing syscalls should by implemented maybe as empty functions/stubs if you don’t need them (like open/close etc.).

I think they can’t (shouldn’t) be totally empty, I’ve to check about the minimum implementation. As far as I can see the return a fixed error condition in most cases.

Regards

RA1981 · January 29, 2024, 9:16pm

UPDATE: it seems that the new newlib version indeed uses the lock/unlock mechanism and I was able to override it, currently using

void __malloc_lock(struct _reent *r) {
	vTaskSuspendAll();
}

void __malloc_unlock(__attribute__((unused)) struct _reent *r) {
	xTaskResumeAll();
}

I verified that those functions are called. So, my next step is to implement retargeting STDIN/STDOUT to UART and check if I can do a printf() from multiple tasks without overlapping output.

Regards

richard-damon · January 30, 2024, 1:13am

the __malloc_lock and __malloc_unlock will NOT keep multiple tasks output from interleaving. That will require something that handles “message” level concepts. If you want that to be at the “printf” level, then you will likely need to wrap printf, so that you can acquire and release a mutex around each call (and then use vprintf, so you can pass the varargs passed to you to it). (You might see if printf just call vprintf and then you just need to wrap it)

RA1981 · January 30, 2024, 8:29pm

Hello @richard-damon

first a question about using vTaskSuspendAll()/XtaskResumeAll() for the locking mechanism: is this the right way to do it?

Regarding wrapping printf(), why do I also need to wrap vprintf()? Or do you mean that instead of having a function which wraps printf() with a mutex I can make this function call vprintf() directly with stdout as parameter? This would save the call to printf(). I checked the newlib source code, as far as I can see it indeed calls vprintf() => it makes a call to vfprintf_r() with stdout parameter.

EDIT: it seems that GCC can create wrapper function calls for library function, so it should be possible to get calls to malloc()/free() redirected to those wrappers. With this it should be possible to use heap_4 and redirect to pvPortMalloc() and pvPortFree(), right? Has anyone uses this approach?

Regards

richard-damon · January 30, 2024, 9:19pm

You commented that you didn’t want multiple tasks output to interleave. Without something to block this, it could happen. Thus, if the goal is that every call to printf be made “atomic” as far as output, then you need to make sure that only one task at a time is inside printf.

as for wrapping malloc with gcc, the problem is that not all functions will internally call malloc, but may call an internal function that malloc also uses (newlib calls this _malloc_r, which handles some aspect of multiple tasks). Also, if something calls realloc, you have nothing availiable to do that.

RA1981 · January 30, 2024, 9:37pm

Oh, okay… well, then it seems there’s no other way than to stay with heap_3. And I’ll use a mutex for printf()/vprintf() if needed.

But first I’ll check the current application implementation about what can be splitted into separate tasks.

Thank you.

hs2 · January 30, 2024, 9:50pm

As Richard mentioned there are a few more functions to wrap to completely replace the newlib heap with e.g. FreeRTOS heap_4.
You need to override/wrap these heap API functions: malloc_r , free_r , calloc_r and realloc_r.
But ask yourself if it’s worth the effort if you’re fine with the newlib heap. Adding just the lock/unlock hooks is much easier.
With regard to printf see this post for an alternative implementation you can tailor/adopt by modifying the source.

RA1981 · January 30, 2024, 10:57pm

Hello @hs2

Indeed it seems to be more complicated as I initially thought While I’m really interested in digging deeper into the newlib and how to correctly wrap it for heap_4 it, I’m not experienced enough with the library details and how it works internally. Since the newlib heap worked (almost) well with the application, it seems to be better to stay with it and instead focus on learning FreeRTOS first. I was just curious about heap_4
Thank you for the link about an alternative printf(), I’ll check it.

Regards

jefftenney · January 31, 2024, 12:33am

If you’re sticking with heap_3 with newlib for now, then see this post for some helpful code (if I do say so myself ).

RA1981 · February 1, 2024, 5:45pm

Hello @jefftenney

Thank you for that code, I’ll use it for the first small test application where I can combine the above with interrupts and sharing data between tasks, printf() etc.

Regards

dc42 · February 6, 2024, 8:00pm

Alternatively, if you don’t need all the features of printf and/or you want to extend it with your own format specifiers, then use your own version of printf and friends which doesn’t call malloc. The version we use is at RRFLibraries/src/General at 3.5-dev · Duet3D/RRFLibraries · GitHub, see the SafeVsnprintf .cpp and .h files. I think it came from one of the FreeRTOS add-on packages before we extended it.

RA1981 · February 6, 2024, 11:49pm

Hello @dc42 David,

well, I don’t know if I need all features In the project to be ported I use no floats, but the specifiers for single chars and strings. I’ve to check in detail which features exactly are used.

Having the possibility to add own specifiers is interesting. But if possible I want a “complete” solution, because in the next project I might use another member of the printf() family or a C library function in general which then should be thread-safe. So, printf() was simply the first obvious thing to ask about because it’s in the current project.

As I’m new to (Free)RTOS programming it’s hard to decide which way to go. And for newlib I need to dig deeper into it to understand how it works in detail and what needs to be done for thread-safety. However, using a non-library printf() function for the project might be a good starting point to get familar with FreeRTOS.

Regards

danielglasser · February 7, 2024, 12:38am

I recommend the following blog post for dealing with concurrent threads/tasks using “printf()” and/or “malloc()”: Reentrancy in Newlib - Code Inside Out
Roughly half-way down the page it shows how to avoid interleaving output.

That being said, I never use “printf()” or “malloc()” from within an ISR or any function called by an ISR - anything that might use a mutex, semaphore, or other potentially blocking system interface.

When developing applications on STM32 (and other similar microcontrollers), I have a library that deals with console output from multiple threads where a thread doesn’t have to wait for its output to complete before the printf() (or write()) call completes, or at least up to a point - if the task gets too far ahead of it’s output then it blocks. I won’t go into details here because:

It’s a lot of code
It requires a lot of details about configuring and integrating it into your application
It requires an extra task

On the other hand, it works well and I use it in a lot of my embedded work during development, then disable it for production builds. Unfortunately I wrote it while at work so legally it belongs to my employer, and I can’t make it available.

dc42 · February 7, 2024, 11:40am

We do something similar when we need to printf from an ISR for debugging purposes. We use our own implementation of the printf family and have it write to a ring buffer. Another task picks it up and outputs it.

RA1981 · February 7, 2024, 6:32pm

Hello @danielglasser & @dc42
That blog post looks interesting. Two questions about it:

it needs either line or full buffering for stdout enabled because otherwise _write() will return after one byte and therefore the mutex is useless, right? I’ve to check it - usually I disable buffering for std* because the UART manages its own buffer and this way I ensure that the bytes are feed into the UART as fast as possible.
Since the mutex locks UART access, printf() family functions which are working on buffers instead of std*/UART don’t need anything special regarding thread-safety, right?

For ISRs I always avoid long functions - that was one of the first things I got teached when I was a trainee - and it was on a 80C32 and with plain assembler, no C language

Regarding the library you mentioned, is it like a queue with multiple writer/single reader mechanism? So writing to the queue protected by a mutex?

As David mentioned, I assume this can be used for output to stderr - my application uses an additional UART as stderr, so using two queues feeding the additional task for printing should be possible I think.

Regards