Replacement for FreeRTOS-IO?

madyn · September 14, 2024, 4:28pm

What I’m looking for are similar support routines for I/O as were in the FreeRTOS-IO package. After some research, I find that the code is no longer available although some examples have been updated in 2024. All links lead to dead ends or things that want to have FreeRTOS_IO.h included.

I seem to have found references to an IOT routine that is supposedly the same, or has similar functionality.

However, I have no need or desire to use the cloud, nor do I particularly want an Amazon account, all I was looking for was a a more graceful method of doing interrupt driven I/O with queues without having to program my own version (which is going to be in progress).

Any suggestions?

What I am likely to do is to build a C++ class that uses optional queues to write to I2C, SPI, and USART ports on an STM32 processor.

aggarg · September 15, 2024, 10:30am

As mentioned in this post, the CommonIO library is now hosted here - GitHub - aws/common-io-basic: A basic set of I/O libraries for pins and serial communications on embedded devices..

madyn · September 15, 2024, 2:06pm

I missed the part where it integrates queues into (at least) I2C support and SPI support. That was a desired feature.

I also replied in more depth on my other post in the libraries section. Perhaps that clarifies a bit.
Thanks

cobusve · September 17, 2024, 4:32pm

The other discussion is here:

I am very interested in this topic as it comes up quite a bit, yet we have not figured out a good way to handle this. When I speak to people they all want an I/O subsystem but every time they explain to me what they need it to be like there are major differences based on their use case and if we end up making a subsystem which is generic it is bound to be large and slow, which nobody wants.

It seems to me that if you have the drivers which generally come from the vendors (each with it’s own unique interface and hw supported optimizations and special features) and you just use a FreeRTOS queue to serialize access to the HW driver, like @aggarg described in that post, then it is pretty straightforward to make it work?

madyn · September 17, 2024, 6:12pm

There are several issues. Bear in mind that for I2C you need master/slave determination and receive/transmit capability. For SPI you need much the same, but implemented differently because of the hardware. For USART you need only receive/transmit capability unless you are going to use multiprocessor mode.

Remember that these drivers need, at a minimum, three basic routines:

create (these are C++ drivers) with new. Configure queues, modes, I/O pins if needed, etc.
Send, generalized send from a buffer (ignores queues for now)
Receive, generalized receive, and send to a buffer.

STMicro (and likely others) have pre-done routines for their particular processors. The routines and structures I’m working on serve as wrappers to make the entire process thread safe, and work with DMA, programmed I/O, or interrupts.

You’d need to possibly write the routines with weak links to a user supplied interface.
That interface would be specific to a processor (and would need to be supplied by the user). Not sure that AWS_IO would do the job, but it would require the user to supply some code, if nothing else, links to the manufacturer supplied routines.

As far as queues are concerned, I write directly to the queue. There is a created task that runs and cleans out the queue. It writes one byte at a time. For some drivers don’t do well with this, which would require buffering the queue and then transmitting as a block. I may not be able to make this work for I2C. For serial, I think it will work. For SPI? Not at all sure.

What I might need from FreeRTOS is a type of queue that can be loaded up, then paused, allow an external routine to read the queue memory as a buffer, and (send it) and then clear out that part of the queue, then restart the queue. Not sure how workable that is.

aggarg · September 18, 2024, 6:07pm

So you are you are trying to come up with a generic interface which works with all type of peripherals (SPI, I2C, USART etc.) and with all hardware?

Have you looked at FreeRTOS stream buffer - FreeRTOS stream & message buffers - FreeRTOS™.

madyn · September 18, 2024, 6:42pm

I already have interfaces for SPI, I2C, and USART. For the moment, I’m not dealing with others (CAN, I2S, etc).
I wanted a method to use queues with these peripherals. Reading from a serial device that drives a queue is desirable. That’ll come with the USART implementation (when debugged) and be added to I2C and SPI.
The problem I’ve got is that the I2C driver (or the OLED display) behaves badly when transferring a block of data from a queue, a byte at a time, to the display. Queues are on hold for a bit.
I am not trying to make a universal routine so much as I am cleaning up existing drivers, and adding for each one (transmit mostly, and receive with only programmed reads) the option of blocking, I2C and DMA.
Because many processor support routines perform similar functions (write/read/callbacks), the existing driver structure (mine) can be adapted to other processors by substituting (at the right point) the low level manufacturer supplied drivers.
The overall driver structure is protected by semaphores at the OS level, and is written in C++. The rest of my code is C++, which with a graphics system makes a lot of sense.
As it turns out, the calls to each driver routine are remarkably similar, but the setups are not as similar. So it gets close to generic, but not exact. Within a driver, the blocking, IRQ and DMA calls are identical.

I’ll have to look at a stream buffer, which may not help when transmitting, but can help with receiving. It may turn out more useful to have a standard queue.

Thanks

madyn · September 26, 2024, 1:18am

queues work, the basic mechanism is to make a task which (say for receive) receives a byte of data in blocking mode with infinite timeout, then transmits to the queue with portMAX_DELAY. This seems to chew up a lot of time, is there a better method?

blocking mode works (but you have to be careful with semaphores). Queues work, but there’s a question with the actual drivers. DMA and IRQ work.

However, there are situations (depending on the driver needed and use of the driver) where some of these modes do not work, or need to be modified.

I’d welcome comments and questions.

aggarg · September 26, 2024, 6:20am

A task received a byte and then it is sending this received byte to other task? You can consider buffering and sending only after x number of bytes are accumulated. It depends on your use case.

Unless you elaborate these situations, it is hard to comment. Are you looking for some help here?

madyn · September 26, 2024, 2:40pm

[quote=“aggarg, post:9, topic:21405”]
A task received a byte and then it is sending this received byte to other task? You can consider buffering and sending only after x number of bytes are accumulated. It depends on your use case.
[/quote]

There are three use cases, one for each interface:

I2C: messages may be of any size. Communication is between both smart processors and stupid chips. Blocking mode is most useful for chips, and while DMA and IRQ can be useful (treating outgoing and incoming differently), the ultimate limit is the bus speed. Queues may be used if the overall message size is known (using 32 byte packets over I2C to mimic the NRF24L01 packet mesh network). This is a different scenario and buffering can be used profitably. This driver is complicated by the need to address both I2C master and I2C scenarios. I2C slave is a processor, so packet communication methods are used.
SPI: the SPI driver is somewhat more complex, allowing shifting the clock rate on a per-instance basis since SPI interfaces run only so fast. In addition, the hardware wants to control a CS line on each block of data. Certain chips (ILI9341 display driver for 320 * 240 TFT displays) cannot tolerate this, thus the driver must treat the CS differently. The write sequence for an ILI9341 chip must do the following sequence. CS Low/A0 low/send command/A0 high/send data/CS high. A0 is not important at this point. The driver must also handle a single sequence of CS Low/send command/send data/CS high. It does do that. In this use case, queues are not useful
USART: Usart serial data is considered only on a byte basis for the use scenario, although all other methods (blocking, DMA, IRQ) can be used. Serial data here is mostly sent to a console, although packet

While all three drivers (I2C, SPI, USART) provide access to all four methods (blocking, queue, IRQ and DMA), not all use scenarios are practical. It should be noted that receive and transmit methods are allowed to be different, and a default SPI mode can be set as blocking (for display commands) and the transmit mode can be overridden to allow DMA transfer for block data to the display.

Unless you elaborate these situations, it is hard to comment. Are you looking for some help here?

I have one particularly questionable scenario, working with both receive and transmission for USART data. The code is below:

// uses RECEIVE and constantly calls it
// when character is received, the result is placed on a queue
// may need delay in here to allow task switching

void USART_RECEIVE_TASK(void const * argument)
{
	HAL_USART*				me;
	uint8_t					buf=0;


	me = (HAL_USART*)argument;

//	uint32_t 					result;

	// ************************ initialization **********************************************************************

	while (1)
	{
		HAL_UART_Receive(me->huart, &buf, 1, HAL_MAX_DELAY);
		xQueueSend(me->receive_queue, &buf, portMAX_DELAY);
	}
}

While a delay does not seem to be needed between the Receive and write to the queue, the runtime statistics show a lot of time spent in in (say) this receive task. I don’t think that’s good for overall system performance although it might be misleading information.

What I think I need help with is that there used to be a zero footprint method of going from an interrupt directly to a queue. It might not even work with this and the code may or may not have worked well. I can find no way of implementing this, since the source code was apparently removed sometime between 2016 and 2019.

So the question becomes: is there a better way to handle this USART receive task?

aggarg · September 26, 2024, 3:37pm

If your queue has space, xQueueSend will return immediately and call HAL_UART_Receive. It seems like HAL_UART_Receive is a blocking call - how does implement blocking? If it busy waits, it might be the reason for high CPU usage by this task.

richard-damon · September 26, 2024, 3:37pm

One thing to remember, is that “HAL” drivers often do not use OS primitives’, so teh HAL_UART_Receive likely uses a polling loop using up 100% CPU time (unless the OS switches to another higher or equal priority task) and the transmit likely does the same too. To be efficient, you need to change to (if available) a receive (and transmit) routine that is based on interrupts. Sometimes a HAL will provide a receive with interrupt that has a callback function that could do a xQueueSendFromISR operation.

Personally, I rarely use the vendor supplied I/O library directly in my project, but make a real abstraction layer that is processor independent that I then implement for a given processor, maybe using their library if it works, or maybe just bypass and write my own.

Rarely does that library involve creating a task.

For instance, for a UART, there will be a queues or streambuffers used by the transmit and receive wrapper functions, that interfaces with an ISR that is doing most of the actual I/O. The wrappers will also tend to have a Mutex to make it so that a given task can “claim” the device to do a set of I/O, after which it releases it.

For SPI and I2C, The driver accepts a “packet” to transmit and get a response, and again does most of the work in an ISR, and normally there is no queue at all. (If we are a SLAVE device, the driver may need a queue to put the data in).

madyn · September 26, 2024, 4:50pm

It’s a driver provided by ST Micro. I looked at the code, and it is indeed just a busy loop.
I can’t put a vTaskDelay in there (although I could copy the code and rewrite it), but that would be a delay of up to 1 msec, if not 1 to 2. For a USART (for instance) of 115.2 Kbaud, that’s about 10000 chars/second, and putting in a 1 ms delay slows it down to 1000 chars/second. Since the FIFO in the chip is only so deep, that won’t work.

DMA drivers or IRQ drivers are likely needed, if I can get them to work properly.

richard-damon · September 26, 2024, 4:56pm

As I said, the base drivers provided are likely not really suitable for RTOS use, as they will use busy loops. If they provide ISR based versions, normally with call backs, you can often adapt them to work.

What you really need is either a queue / stream buffer you add the data to, which is emptied by the ISR, which normally means writting your own driver, or a semaphore that you wait on, that is signaled in the ISR callback with the ISR based transmission is done.

madyn · September 26, 2024, 5:04pm

I have most of that thought out and coded. It works for I2C. For SPI, due to the complexity of the actual driver, some methods (queues) don’t work without modification. I’d need to expand the queue width to 16 bits, then use the upper bits for state (say A0) control.

For I2C I used the upper 8 bits of a 16 bit queue for the address. This scheme doesn’t seem to work for small OLED displays, and I haven’t gotten to sorting that one out.

For USART, there’s something odd going on in how the interrupts are working, at least on receive, so I might work on that a bit. Interrupts seem to be ignored, or more likely, never happen. It’s an oddity.
I tried setting up the chip (their routines), using either of callback setting a semaphore which then allows use of the data, or putting the data on a queue. Interrupts just aren’t happening.

richard-damon · September 26, 2024, 5:26pm

The issue with I2C is that different devices need different length messages, it isn’t just an I2C Address/Data pair, but often you send I2C_Address, Register_Address, Register_Data (and the data, or even the register address can be multi-byte). A I2C Read is often even more complicated:
I2C_Address(W), Regster_Address, Bus_Restart, I2C_Address(R), read data.

madyn · September 26, 2024, 8:13pm

Blocking mode, IRQ mode, and DMA mode use buffers provided by the calling software. The read and write, restart read, and so on are handled by blocking mode (and some legacy drivers, it needs extra work for now).

Queues tend to be a problem with variable length messages as I said. For fixed packet lengths (processor to processor), that can work, and fits within the parameters of the main software.

An I2C queue uses a 16 bit queue with the upper 8 bits being address, and the lower 8 bits being data. Queues become less useful with I2C since this requires a mode packet of some sort to control bus restarts and the like. This suggests that the best use of queues is only processor to processor, with DMA, IRQ and BLOCKING modes normally used for chips.

madyn · September 26, 2024, 8:18pm

Oh, and the display driver problem is related to the block mode. Sending a screen data chunk works well with block mode transfers (address, register, feed all the data) and does not work well with the same sequence, with data bytes individual transactions.

richard-damon · September 26, 2024, 9:08pm

Likely because it take a lot longer to break the one large transaction into many small transactions. That could easily be a factor of 10x or more in speed.

Queues are good for data that is just an amorphus byte string, or short individual messages of fixed size.

madyn · September 26, 2024, 9:37pm

Yep, agreed on the queue uses. For what I do, I have mesh networking enabled on an NRF24L01. The packet sizes are all 32 bytes although the chip could do a variable length message. Bigger data transfers use a slightly differently encoded packet. I duplicate the packet structure for anything between subsystems except for display driving.

Those packets are much larger.
Now, for the OLED display, all control characters are sent in blocking mode to the I2C display. The block data is sent as block data from a buffer (note: no queues). Now the fun part happens when I change to individual byte mode from block mode, the display does not clear.
Go figure.

The fun part is that on a 100KHz clock speed, the entire process is bus limited regardless of blocking, IRQ, or DMA.