Question about using printf() / malloc() with heap_4 & newlib

danielglasser · February 13, 2024, 10:01pm

I will use “thread” to refer to either tasks or threads, since where they’re not equivalent, “thread” applies.

The library in question can support 1 or more “streams” per thread; this is configurable system-wide, since it defines the number of stream entries in a per-thread structure that has a pointer to it added to the thread local storage or “reent” structure in the TCB, depending on the underlying OS. Every thread that uses the library allocates this structure (static or dynamic allocation; all the library has is a pointer provided at registration time).

In it’s full-blown configuration, every stream instance includes the following items (thread, event, semaphore, etc. IDs may be pointers; depends on the RTOS; they’re pointers in FreeRTOS, identifiers in INTEGRITY-178B):

Pointer to the name of the stream (should be unique across all threads in the application)
Pointer to the stream buffer provided at stream registration time
Size of the stream buffer, in bytes, provided by the producer at registration time
Stream update semaphore used as a mutex to prevent concurrent updates that might lead to a race condition
Head offset/pointer
Tail offset/pointer
status (Full flag in it’s own byte to prevent race conditions with Read/Modify/Write)
configuration flags - A collection single and multi-bit fields packed into a single word that configure the behavior of the stream (raw vs. line-oriented, line termination, etc.)
newline counting semaphore (used in line-oriented mode)
owner thread ID
producer thread ID
consumer thread ID
producer event ID (or semaphore ID when events are not available)
producer event notification flag number (if applicable)
consumer event ID (or semaphore ID when events are not available)
consumer event notification flag number (if applicable)
producer options (notification type, what events it gets notified under, blocking behavior, etc.)
consumer options (notification type, what events it gets notified under, blocking behavior, etc.)

Each thread that uses the library also has a “file table” that has at least 3 entries (for stdin, stdout, stderr), each entry has an access type (read = consumer, write = producer), open status, and a pointer to the stream object. As it stands, the thread (or should I say, programmer) is responsible for not using file descriptors (indexes) equal-to or greater than the number of entries that have been allocated.) The TCB or reent structure also contains a pointer to the file table.

Before a thread can use the stream i/o library, the thread is registered by a call to the library’s thread registration interface with the thread ID and pointers to the stream and file table structures; the stream tables are added to a forward linked list maintained by the library. Before a stream can be used by any thread it must be registered by calling the library’s stream registration interface with the name, address and size of the stream buffer, the owner task ID, the owner notification event ID and flag number, and the stream options.

I have macros that will define all of the structures and buffers statically and create/populate “initializer” structures for the task and stream registrations in named ELF sections so that the linker coalesces them into contiguous lists, and a single call made by the application start-up code then does all this registration mumbo-jumbo.

Before a stream can be used by a thread, the thread connects to it as either a consumer or producer and associates it with a file descriptor via something resembling an open() call except that there are some additional parameters. Only one thread may be the consumer on a stream, and only one thread may be the producer on a stream. Nether producer nor consumer needs to be the “owner” of the stream; the owner information is used for stream supervision and error handling, but that’s getting too deep into the weeds.

The stream I/O library has interfaces for reading and writing (and other operations) on streams. Associated with that is the libc adapter library that implements the low-level “_read()” and “_write()” functions as calls into the stream I/O library when using newlib.

The last part of this isn’t part of the stream I/O library; it’s a thread that has a large file table and a minimal stream table (it doesn’t own any streams), but depends on the application using the aforementioned stream registration macros so that by the time it starts executing all streams it is going to service have been registered. There is a flag in the stream configuration that says the stream is for the multiplex output, and the service task opens every one of the streams that has this flag set as the consumer, providing the ID of an event or semaphore and a different flag bit number for each stream. It then waits for events, checks the event flags for which streams have output available, and then pulls out pending output and sends it out the serial port. I actually include a little more information for each stream allowing multiplexor thread to label the output in some way so the observer at the receiving end of the multiplexed output can either demultiplex it or figure out where each message comes from.

In general, all multiplexed streams are line-mode, so the multiplexor thread is only notified for a stream when a newline is produced or the stream buffer is full; each time a newline is entered into the stream, the counting semaphore increments, and each time one is read from the stream, the semaphore decrements.

There are other aspects to this, and I’m typing this all from memory, so please forgive the incomplete description of how this all comes together. It sounds very complicated, and bits of it are, but most of it is dead simple. There are conventions that, when followed, avoid overuse of the update semaphore/mutex on a stream, since most of the time only the consumer modifies the tail offset and only the producer changes the head. The full flag may be modified by either producer or consumer (or owner, but we’re not talking about that here), so the mutex is needed when dealing with that, plus “flush()” and “reset()” interfaces that are not otherwise mentioned in this post.

I hope this is helpful.

Update: I forgot to mention a few things:

A thread can use the same stream for both stdout and stderr
The “open” interface in the stream I/O library is a wrapper around the “connect file to stream as consumer|producer” interfaces, providing the the current task ID as one of the arguments
There are macros that create lists of consumer and producer connect structures which use the same ELF section and linker magic as the thread and stream registration lists, so the threads themselves don’t have to be aware of this at all
The header file that defines the interfaces for the I/O library conditionally define the macros and interfaces in such a way as they have no footprint in memory or code unless enabled via a preprocessor symbol - the multiplexor itself is also conditionally compiled using the same preprocessor symbol
A number of the features are optional depending on definition of preprocessor symbols, so the amount of code and data associated with this can be tailored somewhat
Some of the complexity is because this library is ostensibly for asynchronous inter-thread communication; that’s how I was able to justify the time spent putting it together; the multiplexor stuff is a bonus built on the stream I/O library

RA1981 · February 14, 2024, 11:13pm

Hello @danielglasser

wow, that’s a very detailed description of the library, thank you. As a beginner I doubt that I can implement this out of the box I’m making little steps.

Update on my progress: my current small test application for the PSoC device has a few tasks which all use printf(). The malloc/free locks are called and the printf outputs are overlapping as expected. Now I’ll check with mutex/semaphore around printf.

In parallel, I created a similar test project on a STM32 for comparison. Here there are no overlaps. The lock glue file provided by the STM CubeIDE seems to cover most (all?) relevant locks needed, at least for the printf stuff. And if I remember correctly it uses heap_4.

Is there a good way to test if all those needed mechanism work as expected? Something like a test software which definitively fails if something is implemented wrong?

Regards