@htibosch I was referring to FatFs, not FreeRTOS+FAT.
Also, you can go further and put a buffered Standard Input/Output (stdio) wrapper around the FreeRTOS-Plus-FAT Standard API. Then, the application would use fprintf instead of ff_fprintf, or fwrite instead of ff_fwrite, for example. This can give a 4X speedup for some applications. See:
I’ve done some measurements focused on the FreeRTOS+FAT
All tests are made with 12.5MHz SPI clock and a line-length of 100bytes…
Reading 100 lines in 512 bytes chunk size
- ff_fopen=1.2ms
- do-while ff_fread with own ‘\n’ searching an line separation = 49.3ms
- ff_fclose = 0.2ms
Reading 200 lines in 512 bytes chunk size
- ff_fopen=0.94ms
- do-while ff_fread with own ‘\n’ searching an line separation = 98.2ms
- ff_fclose = 0.2ms
Reading 100 lines in 2048 bytes chunk size
- ff_fopen=1.2ms
- do-while ff_fread with own ‘\n’ searching an line separation = 44.3ms
- ff_fclose = 0.2ms
Reading 200 lines in 2048 bytes chunk size
- ff_fopen=0.94ms
- do-while ff_fread with own ‘\n’ searching an line separation = 86.2ms
- ff_fclose = 0.2ms
compared to reading the entire file, without line separation:
- ff_fread = 12ms for 100 lines
- ff_fread = 20ms for 200 lines
or using ff_fgets
- 100 lines = 152.4ms
- 200 lines = 304.7ms
So reading as mutch as possible and process the data in RAM would be the fastest approach.
Summary
- ff_fgets is >3x slower than own (quick&dirty) implementation. I’m surprized about this, and I think there is some space for improvements in the +FAT library! I migrated from the FatFS fgets to the ff_fgets and struggled with the new performance → so I think they do something different.
- reading chunks > sector size for getting some % improvements
- ff_freads shows that I get ~1MB/s, what is 50% of what you @carlk3 measured, but you had an 31.25MHz SPI-Clock. So I think the theori fits

Thank you for these details.
@P51D wrote:
I think there is some space for improvements in the +FAT library!
Always! And we are grateful for the feedback.
Maybe we should put a #warning in the declaration of these byte-oriented read/write functions.
We could add a new read/write object that has its own i/o buffers.
ff_fgets is >3x slower than own (quick&dirty) implementation
As you may have seen, FF_GetLine() is much slower, because it is 100% flexible. You can seek to any position and call FF_GetLine() or FF_GetC() from there.
This is how it is implemented:
FF_GetLine()callsFF_GetC()for every single byte.FF_GetC()callsFF_getMinorBlockEntry()FF_GetC()callsFF_SetCluster()
Depending on FilePointer, calculate CurrentCluster
and traverse the FAT to find the right ulAddrCurrentClusterFF_SetCluster()may callFF_TraverseFAT()
If it succeeds:FF_ReadPartial()is called to finally read a single byte.
The functions FF_GetC() and ff_fputc() were added for “academic completeness”, but they’re too slow for a real-life application.
It is preferred to use ff_fread() and ff_fwrite() only, preferably with a multiple of 512 bytes.
In my projects, I often have a c++ object that puts a buffer between the application and the +FAT functions ff_fread() and ff_fwrite().
As I mentioned above, if you’re using a Standard C Library implementation like glibc or newlib you can easily put a buffered Standard Input/Output (stdio) wrapper around the FreeRTOS-Plus-FAT Standard API and get a vast speedup for operations like these.
Hi Carl @carlk3 , David @dc42 and Hein
Thank you very much - particularly for Carl’s performance tuning tips (sorry I can’t include a link to the performance tuning tips - I got an error message here when I included links so had to strip all the links out).
We’re working on writing the stream of data from the h.264 video encoder on the STM32N6 to a buffer then to SD storage for a new open source wildlife camera so this has been a big help to us.
At the moment we’re using FileX in ST’s venc_sdcard_ThreadX example but FreeRTOS+FAT would also be an option. We’re encountering long and very volatile write times from FileX (c. 100x slower on average than the STM32N6 and a V30 SD card hardware would be expected to achieve in theory) - these look characteristic of what we’d expect from incomplete clusters (or even incomplete 512 byte sectors) being written to the SD but we’re finding it very difficult to debug how the file system is buffering the data and interacting with the HAL to write data to the SD - we suspect that at some layer something may be buffering inappropriately and/or trying to write data one sector at a time instead of one cluster at a time and/or not correctly handling the DAT0 low signal from the SD card (indicating when it can’t accept data) but we’re having difficulty diagnosing the exact causes of the problem.
We suspect that in the example code some the performance could be improved by pre-allocating space for the file, enabling HardwareFlowControl and trying to use Transceiver communication with the SD card but we’re not sure.
This is our conservation work that the open source wildlife camera is to support (Quick Summary: New Homes for Old Friends Switzerland – New Homes For Old Friends - a “An error occurred: Sorry, new users can’t put links in posts.“ error message forced me to remove the URL but a Google search should find it.)
These are the potential ideas we’re looking at at the moment to improve performance - I tried to include a link to them but was forced to strip that: SD_Card.md (Googling doesn’t find that but hopefully it should be possible to go through the directory structure on my github account)
We’re working on a fork of the ST venc_sdcard_ThreadX example.
I put together is a spreadsheet with some of the very volatile write times we’re seeing but the link to that had to be stripped as well to post this - we got write speeds ranging from 1 Megabyte per second to 20 Kilobytes per second.
Any help or suggestions that anyone could give would be wonderful!
My background is as a climbing arborist and in server side software so I don’t have any experience of debugging the interaction between filesystems and the underlying HAL.
Will
@Will_Robertson You should be able to post links now.
Interesting, the article about trees and animals. In an earlier project, FreeRTOS+FAT was used on the ocean, for filming fish under the ship.
When using FreeRTOS+FAT for mass storage, the following will be nothing new for you:
- Make sure that every write contains a multiple of 512 bytes, and preferably contains like 10 KB or more. You can do a simple experiment: fill a file in blocks of 1KB, 10KB or e.g. 100KB.
- Realise that if you always write data in blocks of “N x 512” bytes, the user pointer will be passed to the DMA. In other words, the driver is “zero-copy” whenever possible.
Yes please, share your spread sheet with measurements. I think that you can attach ZIP files now.
file system is buffering the data and interacting with the HAL to write data to the SD
When following the above rules, there won’t be much buffering in the SD-card driver.
There is an old article about formatting SD-cards. We found that in some cases, the SD-card is considerably faster if it is formatted by FreeRTOS+FAT, while using large clusters.
32 Bit (4 Byte) Buffer Alignment is probably crucial if you’re using SD mode and DMA. Depending on your hardware and driver, if a read or write buffer is not 4-byte aligned, it will have to be copied to one that is. That means writing or reading one block at a time, which is much slower than streaming multiple blocks. It’s easy to ensure your buffers are 4-byte aligned if you use buffered Standard Input/Output (stdio). (See stdio_buffering example).