Non-blocking UART transfer

Please revisist the discussion in the thread mentioned twice. Repetitur non placent.

Your system doesn’t support a command/reply type device (since another task can send a later command before the current task can get its reply), while RAc’s method is primaryly for devices with an intricate command/reply protocal, so I don’t think his system is directly applicable to your system (but if you DO need a command/reply protocol, it could help).

The biggest problem with your system is it doesn’t really achive the “Non-blocking” aspect to the level that is possible, as no message can be added to the queue until the previous one has been fully sent out to the uart.

I mean you would only want one task that has the mutex to send its message before the next task can take over UART, is it not? Which means the other task remains blocked till the UART access is released by the previous task who’s done sending a message.

Again, the example I provided earlier: TaskA wanting to send Hello and TaskB wanting to send World. Assuming TaskA acquiring a mutex first would result in Hello being printed, followed by World once TaskA is done sending, making the output look like HelloWorld. Is it not?

But then you aren’t as “Non-blocking” as possible.

You seem to think the next task can’t add its message to the buffer before the current message is all the way out, but WHY? The Tx line will be the same either way, a continuous sequence of characters containing the two messages, one right after the other.

Functionally, there is no reason why TaskB can’t add its “World” to the buffer before the “H” of “Hello” has been finished going out the wire, leaving a buffer with the contents “elloWorld” (or maybe “lloWorld” if the device can take hold the “e” in the Tx buffer waiting for the shift register to be free.)

All you are doing is blocking your TaskB in a call to a “non-blocking” transmit routine.

The Mutex ensures that TaskA gets all of its message into the buffer before TaskB starts, but there is no reason B needs to wait for that message to be sent out before it can start.

I mean, cause HWelloorld makes no sense to the user getting this response. If the user expects the application to put out temperature readings temp value = 10 degrees, to them seeing an output a mix of 2 messages would be garbage.

Except that my method don’t get that result, because a given task puts out it FULL message before the Mutex is released so another task can add to the buffer, that is the point of the Mutex.

Because of your wait for empty buffer, task B needs to wait for several character times if it request just after TaskA sends its message, for which there is no need if the buffer can hold all 10 characters.

that’s what I do as well though;

  1. acquire a mutex for UART
  2. fill the TX Fifo with the FULL message
  3. let UART ISR take care of reading each byte from TX FIFO, and outputting it to the user
  4. only signal the completion of message TX once a FULL message has been sent
  5. TaskB can now send its full message…
  6. Step #1

Except that you don’t have a mutex, you have a semaphore (as an ISR can’t release a Mutex)

My sequence is:

  1. TaskA Aquires the Mutex for the UART
  2. TaskA Fills the buffer and trigger serial port to start transmission if needed
  3. When Task Transfers all its message to the buffer, it releases the Mutex.
  4. TaskB can now aquire the Mutex for the UART while the previous message is being sent
  5. TaskB Fills the buffer, and if it started while TaskA’s message is still being output, doesn’t need to do anything to start the transmission.
  6. When TaskB finishs copying its message, it release the Mutex for the next Task to take it.

Where Filling the buffer might block if the buffer is FULL, until the ISR removes a character from the buffer. You only block if the total number of unsent characters reaches the limit of what the buffer can hold. Until you get that far behind the serial port, no task will block except to wait for the previous task to copy its message to the buffer, which is quick.

The ISR doesn’t do anything about the Mutex that is protecting the input side of the buffer. It only needs to deal with releasing a task that blocks on the buffer getting full (which is automatic when using a Queue / Stream Buffer).

It is likely that nobody will like the solution I’ve used a number of times when a single serial line is used by multiple threads/tasks, but I’ll do my best to describe it anyway. I will use “tasks” to represent both threads and tasks; I’ve used this technique in environments that have only one or both of the multiprocessing constructs, and sometimes “thread” is used interchangeably with “tasks” anyway.

I don’t share the serial port, or its buffers (directly)

Instead I create a separate task to service the serial output and input streams. The service task maintains a list of circular buffers for pending outbound data and when a buffer has data to send, it sends “atomic” chunks of the data over the serial port; the circular buffers each have a priority, so if a buffer with a higher priority has data when the chunk of data from the current buffer is done, the service task switches to the buffer with both pending data and the highest priority; buffers with the same priority are either serviced in a round-robin fashion or the service drains all data that was pending when service on that buffer started before switching to another buffer of the same priority; new data added to the buffer after the service started is not sent until after the other buffers of the same priority have been serviced. (I did it differently in different implementations).

Inbound characters are a different problem; either only one task gets the data that arrives on the serial port or there’s some sort of mechanism for external selection of the destination of the data. When the service task is notified of available data by the UART driver, that data is copied into the circular buffer that is currently active.

Along with this service task, there is a library that provides a serial I/O interface to the applications. For each task that will be producing and/or consuming serial data, a context structure is created (I do it statically, though it could be done using dynamic allocation) along with circular buffers for each direction. My library supports 2 output streams and 1 input stream per task, however that may be overkill for most.

When using some flavor of newlib, the library hooks in by overriding the default system specific I/O functions that are used by the libraries putc() and getc() (and read() and write()) implementations, on other systems, I have my own printf() and gets() type functions that the tasks use for serial input/output. The low level functions put or get characters into/out of the circular buffers for the active task when they are entered; if a write cannot be satisfied due to the circular buffer being full, the task blocks until the service task has serviced the circular buffer the task is blocked on.

Various heuristics are used to insert special “marks” on data boundaries; newline characters are treated as a data boundary and a mark is inserted automatically, there is a flush() interface that a program uses to apply this mark without sending a newline. If the client sends a buffer to the output interface of the multiplexing library, the library escapes it so that it is treated as byte data. “Atomic chunks” are delimited by the marks in the buffer.

Under FreeRTOS, I have had 5 different tasks using this arrangement, under INTEGRITY-178B, I have had up to 10 separate address spaces with one or more client tasks (the service task runs in its own address space).

It’s nice to see debug messages from multiple tasks without the message text being intermingled. The library I wrote allows each task to specify a “start of my output” and “end of my output” byte string that is written; I used this to send ANSI Set Graphic Rendition (SGR) control sequences using the “ANSI” color codes. Makes it easy to tell what task a message comes from.

The first time I implemented such a system was in the 1990s, and I was using a serial terminal emulator that supported a multi-session protocol; I had multiple sessions, each with its own virtual screen and keyboard, and could switch between them from the keyboard or a mouse. I implemented the library for a 68000 based embedded system that used an obscure task switching executive the name of which I don’t recall. In that case, the service task was using the TDSMP protocol when switching between “sessions”, one session for each task. TDSMP also switched keyboard focus, and the serial multiplexor handled that, giving each task on the embedded box its own virtual keyboard.

The second time I did this was under INTEGRITY-178B on a PowerPC platform in the 2010s; it was a completely from-scratch implementation, and I had written the printf() and gets() related functions myself (the vendor did not provide stdio). I needed it then so that I could control and log output from applications in multiple AddressSpaces over a single serial line while the UUT (Unit Under Test) was in an environmental chamber.

I ported the version I wrote for INTEGRITY-178B to run under FreeRTOS on an STM32F767 based embedded system; again, only a single serial interface was available for debugging and control; all of the other UARTs were either unavailable or used as part of the device’s functional interfaces. I have since used this same version on several ARM Cortex-M and MicroBlaze based projects.

I cannot provide any source code for what I’m describing since my employer, and perhaps the US government, would frown my doing so, but the underlying concept is not that hard to implement.

I will point out that the “Single Driver task” doesn’t solve the interleaved message problem, just moves it. You now need to guard the input queue to the driver task so a task can place its complete message in one “chunk” and not let any other task get a chance to inject in the middle of that message.

And, once you have that problem solved, for a simple message log channel, the Drive task doesn’t actually have anything to do but copy its outgoing message queue to the serial device, and if the device is a simple command and response device, send the response to the originating task.

All you have done is burned cycles move data around.

Your extension with multiple input buffers of different priorities might have some advantages if tasks can take extended time to produce messages (so the higher priority task can get a message into the queue without waiting for the lower priority task to wait), but comes at the cost that you need a LOT more buffer space.

Again, it comes down to my point that you need to look at your actual requirements and use the right solution. “Simple” sharing can use a simple solution, that can be written as a universal low-cost implementation. Devices with more complicated protocols or situations needing something more complicated, can work on top of that.

The solution I have is not for everyone.

The buffers are per-client task, and I provided for high and low priority buffers so that “out of band” messaging is possible; I use this for interactive shells. The original implementation was written to use a serial line multiplexing protocol. Without some protocol layer, sharing a serial line is generally only for output.

The circular buffer size is determined by the programmer when setting things up; the minimum size is 4 bytes, and my current code restricts the sizes to be a multiple of 4, but the size of each buffer is stored in the per-client context structure. There is a bit of memory overhead, but not a huge amount. So long as the client tasks are writing to their own buffers and the buffers don’t fill up, the printing is not a blocking operation. The actual transmission is asynchronous from the filling of the buffer. I do have provision for a task to block until the selected buffer is empty. This also has the disadvantage that it restricts the length of any single line (no newlines) to just shy of the buffer size.

When you have a line multiplexing protocol, this approach is pretty good. When you need to print out status from multiple independent tasks and don’t want to deal with them stepping on each other, it works well enough. The overhead is not large because the serial service task is event driven, using a counting semaphore and driver callbacks. If there’s nothing to print, the task remains blocked. It works well for my purposes, and since I wrote my current version some years ago, it’s pretty much a drop-in; the details of the serial line may be different (DMA on STM32, PIO on most of the other platforms I’ve ported it to).

If you are sending something to a device and expect a response, and multiple tasks might want to do this, what you need is something that handles reservation of the serial line that is aware of the device response boundaries. There I would have a single task that manages the communication with the sensor; a client task puts its request on a queue and has a callback for when the response from the device has been written to its response buffer; the device manager task then sends the command to the device, waits for the response, then (through the callback) notifies the requesting task that the exchange has completed; meanwhile other tasks may have posted their own requests, and if so, the device manager task does the same thing for them. This is a specific, not general, case that requires tailored solutions. This is the sort of thing you need when dealing with I2C where different tasks talk to different devices on the bus and something has to arbitrate access to the bus.

I am being careful in just how much detail I put in these posts since I work for a company that is a contractor to the US Government. Anything that can be seen as technical information must be evaluated as an export subject to ITAR or EAR.

This scenario is a poster case for a single message pump UART server task. The multiple tasks send a message to the pump which emits the outbound data and can then decide on the message content whether a response is expected and set up the read accordingly. Everything else will be very very awkward to arbitrate.

Actually, I find it not hard to arbitrate, as long as the protocol is: Send a message, wait for a reply (or time out) and then you are done. The Mutex serializes the requests, and then you have common core of code that does the operation in the context of what ever task wants the operation. A single driver module can run in the context of multiple tasks, and that says you just have the overhead of a single Mutex, not a full task, to handle the syncronization.

The primary tipping point to making it a separate class is if the device can send a message not in direct response to the immediately preceding message to it. Then you really want a task that can be sitting waiting for the responses, but the message sending might still end up being done in the context of the requesting task.

Again, it very strongly depends on the use case and the protocol requirements. Your architecture is perfectly fine for a use case like a sensor concentrator, where several sources like sensors share the UART as a common “sink.”

The picture changes dramatically when you consider a master for a multi drop bus (I have coded those for 25 years in access control applications, so I have a little experience with the details here). Typically, the bus along with the reader here is a transparent tunnel between the controller (your device) and the access card. An access request is typically a multi message sequence (6 to about twenty packet exchanges), thus there is no “atomicity” of messages. Since a lot of those protocols are historically made for peer-to-peer connections and only later extended to bus capabilities, some readers get seriously confused when there is traffic in between access sequences. For this reason and a fast turnaround cycle, you want an entire transaction to be as compressed as possible. On the other hand, you also need to ensure that all readers are serviced periodically, otherwise you may experience starvation effects, meaning a) persons standing in front of doors would have to wait an unexpected amount of time to have their door opened and b) readers may time out and reset if they do not get polled in a defined “communication watch dog period.” For the icing of the cake, there are scenarios such as firmware downloads into readers that change the rules completly (I have seen abuses of the 7816 protocol that enforces strict serialization of all packets during a download).

In such scenarios, you need a “high level scheduling” that goes beyond individual transactions. Do not even think about using your approach in these cases, you will burn many many hours to discover that the only reasonable approach here is a single service task - which will by the way also work nicely in the scenarios where your approach would be sufficient.

So I believe as a conclusion to this very interesting and insightful debate, we can agree that there is no “one architecture fits all” solution for serial devices; we must always look at the use case and how to model it best and most naturally/organically before deciding on an architecture.

1 Like

Very well said.

Multiplexed connections such as are common with RS-485, i2c, and SPI, especially those where multiple transactions are needed before releasing the interface, require solutions tailored to the application - in those cases one size fits one, and often no existing size fits ‘mine’. (I have a good example of just this sort of situation where the shared bus was i2c, but not FreeRTOS related and off-topic anyway.)

@danielglasser, I will say that I have done all of those, and for an RS-485 MASTER, and I2C MASTER, or a spi MASTER, I haven’t normally found a need for a central task to handle the arbitration, but just a central Mutex, and common library code that any task could call to do its own operation.

The thing that tends to cause a task to need to be created is if something can send a message that isn’t an expected response to a message. Then you likely need a task sitting listening for a message.

The other case would be having devices that need to be periodically polled to see if they need to tell you something. That may want a task to do the polling, or you might just have a task that periodically tries to claim the channel to interrogate a device to see if it has something, and other wise lets other devices in to make their requires.

I see your point, so basically just maintaining a lock while FIFO TX is being populated and the rest is taken care of by ISR (as it pops off each item off the FIFO TX).

So maybe something like this:

void UART::WriteToUART(buffer)  // exposed to users for sending data to UART
        // Take UART access
        if (xSemaphoreTake(mSemaphore, portMAX_DELAY) != pdTRUE) 
           // ERROR HANDLING
        char uartData[100] = {0};
        std::snprintf(uartData, sizeof(buffer), format, args...);
        std::strcat(uartData, "\r\n");
        size_t length = strlen(uartData);
        // Start UART TX 
        StartTX(reinterpret_cast<uint8_t*>(uartData), length); 

        // Probably ISR is called here...

        // Release UART access
        if (xSemaphoreGive(mSemaphore) != pdTRUE)

       // now other tasks can access UART

void UART::StartTX(uint8_t* buffer, size_t length)
    // Populate the TX FIFO
    mFifoTx.WriteElements(buffer, length);

    // Enable the STARTTX task / UART transmitter	

    // Read the first byte and write to TXD to initiate a UART transmission
    uint8_t value = mFifoTx.Read();

However upon trying this out, xSemaphoreTake results in configASSERT( !( ( xTaskGetSchedulerState() == taskSCHEDULER_SUSPENDED ) && ( xTicksToWait != 0 ) ) );

If you hit that assert, either you are doing this before the scheduler has been started, or inside a vTaskSuspendAll() critical section.

Code that is run in those conditions isn’t allowed to allow itself to block, but must use a timeout parameter of 0.

If your WriteToUART function might be used under those conditions, it needs to special case them and handle it differently.

Also, StartTx needs to check if the Uart is already running, and if so not send that first byte.

but then how come the original code doesn’t end up in the assert?

void UART::WriteToUART(buffer)  // exposed to users for sending data to UART
        char uartData[100] = {0};
        std::snprintf(uartData, sizeof(buffer), format, args...);
        std::strcat(uartData, "\r\n");
        size_t length = strlen(uartData);
        // Start UART TX 
        StartTX(reinterpret_cast<uint8_t*>(uartData), length); 

        // Take UART access
        if (xSemaphoreTake(mSemaphore, portMAX_DELAY) != pdTRUE) 
           // ERROR HANDLING

        // ISR will unblock

static void UART::ISR()
   // 1. IRQ is triggered
   // 2. if FIFO != empty
   //       2.1 Read a byte from TX FIFO and write to TXD register  
   //       2.2 Back to 2 
   //    ELSE:
   //       2.3 	  
            SetUARTReg(NRF_UART_TASK_STOPTX, 0);  // stop the transmission
	        BaseType_t xHigherPriorityTaskWoken = pdFALSE;

	       // unblock the task now the UART transmission is over
	       xSemaphoreGiveFromISR(mSemaphore, &xHigherPriorityTaskWoken);

Richard, this is getting a little bit surreal.

I do not need to explain to an experienced and knowledgeable engineer like you that is a master-slave setup, all communication by definition is initiated by the master, so there is no way to receive upstream data from a slave unless the master polls the slave or piggy backs upstream receptions on downstream packets.

So unless either there is no upstream data from the slaves (what kind of devices are we talking about here?) OR there are no timing requirements for upstream data, there MUST be bus arbitration and thus, by your own explanation, one dedicated thread to arbitrate the bus.

I challenge you to sketch out pseudo code for a serial reader bus (which is a reasonable poster use case for many applications) using your schema that gets away without central bus scheduling and arbitration, in particular if (as outlined before) there are several packets exchanges to be sequenced - all under the requirement that the end user (presenting a badge to the door) receives a response in a predictable amount of time. Or alternatively, present a use case for a one to many master slave bus that can be serviced by your approach.