FreeRTOS+FAT slow write on 64GB eMMC and concurrency

Hello,

My project is using the following setup:
Xilinx Zynq XC7Z030-xFFG676x SoC - 1Gb DDR3
FreeRTOS port from Xilinx
FreeRTOS+TCP
FreeRTOS+FAT
64GB eMMC

The project, in its essence, is a data logger. It is running an FTP server that is used to send the log and debug files to a control device. Based on a couple of other threads that I read, I decided to implement the log storage in the following way:

  • On boot, the eMMC is initialised and a log file of a fixed size (1MB) is created and filled with 0s and left open.
  • Every time an event is to be logged, it is written to a buffer of size 1024b. Once this buffer is full, it is written to the open file and is flushed. Once the file is full, it is closed, and a new file is created, filled with 0s, and the process is repeated.

My problem is that it takes about 10s for a new 1MB file to get created and filled with 0s. Is there any way this can be sped up? The eMMC has 8 data channels but we are only using 4. I believe that even with the 4 channels, the speed should be much higher than what I am seeing. Not quite sure why it is so slow.

Regarding concurrency, will there be any concurrency problems if I have 2 threads that are writing to two separate files and have an FTP server running at the same time? Like I said before, a 1MB write takes about 10s. I tried doing an FTP RETR command while this write was going on and I noticed that the FTP server didn’t respond until the write was complete, which gives me faith that a write cannot get interrupted.

At the end of the day, all access to the eMMC is going through the FreeRTOS+FAT library. Until now, I have assumed that +FAT takes care of all concurrency but would like to get some confirmation about this.

Thank you,
Sid

Would it be possible to attach your FreeRTOSFATConfig.h? I am curious about the settings.

My problem is that it takes about 10s for a new
1MB file to get created and filled with 0s.

That is indeed extremely slow.

Did you use this driver? I recently adapted it so it can handle drives larger than 64 GB.

It uses a copy of the standard Xilinx disk driver.

The eMMC has 8 data channels but we are only using 4.

When a card uses a single data line (SPI), it is really slow. But both 4 or 8 bits should be a lot faster than what you are seeing.
Are you sure it is only using 4 data lines? Can that be configured?

Regarding concurrency, will there be any concurrency problems
if I have 2 threads that are writing to two separate files and
have an FTP server running at the same time? Like I said before

That should be all safe, +FAT is protected with FreeRTOS mutexes.

I tested the driver with an SD-card on a comparable Xilinx Zynq, that showed a very good performance.

In the past, a FreeRTOS FTP server was published on freertos.org. At this moment it is absent. I have placed a copy here.
Please only use the files in protocols directory, the rest is outdated.

The latest FreeRTOS+FAT sources are here.
The latest FreeRTOS+TCP sources are here.

One more question: how do you create an 1MB file? The fastest way (as you most probably know) is to write large blocks which are a multiple of 512 bytes. Have you tried to write it in a single ff_write() statement?

Hello Hein,

Happy New Year!

Would it be possible to attach your FreeRTOSFATConfig.h ? I am curious about the settings.

Here you go FreeRTOSFATConfig.h (13.8 KB) .

Did you use this driver? I recently adapted it so it can handle drives larger than 64 GB.

I am indeed using the 2019.3 version of the driver, although it is not synced with the repo - I had made a fix that I had suggested to you, and I noticed that the official repo now has that fix, so I will updated my source to that.

Are you sure it is only using 4 data lines? Can that be configured?

Yes, I am sure about this - I checked the schematic of our custom board today and saw that 4 of the 8 lines are connected. Same in the FPGA config for the Zynq - running at 50MHz.

That should be all safe, +FAT is protected with FreeRTOS mutexes.

Okay that is great. I remember updating to your latest copy of the FreeRTOS FTP protocol a few months ago and ran into issues where the directory listing was not getting produced correctly - I had sent an email to you about this, not sure exactly what the details were, so I kept using an old copy of those files. I will try updating to your latest source again and see if it works.

One more question: how do you create an 1MB file? The fastest way (as you most probably know) is to write large blocks which are a multiple of 512 bytes. Have you tried to write it in a single ff_write() statement?

This is how I am doing it:

zero_array = pvPortMalloc(_file_size_in_bytes);
memset(zero_array, 0, _file_size_in_bytes);

p_file_object = ff_fopen(_file_name, "w");
if (p_file_object == NULL)
{
    DEBUG_PRINT(DEBUG_ERROR, "emmc_ffs_create_file: Failed to open file %s in read/write mode.\r\n", _file_name);
    return pdFALSE;
}

/* Write the zeroes to the new file. */
num_bytes_written  = ff_fwrite( zero_array, sizeof(uint8_t), _file_size_in_bytes, p_file_object );
vPortFree(zero_array); // Free the zero array

where _file_size_in_bytes is 1024 * 1024.

Hope that answers your questions and helps with helping me identify this issue.

Oh one more thing, it takes about 40mins to format the 64GB eMMC card. Is that normal?

Thank you,
Sid

I forgot to mention - I was seeing this same slow speed even with an 8GB eMMC card.

I haven’t used the Xilinx, but the Cypress I’m using now makes it easy to add a free running 32-bit hardware counter clocked at 10 kHz. I first did that in order to implement Run Time Stats, but it is handy for finding performance bottlenecks, too. I go to a function that I suspect is taking a long time, and insert

uint32_t xStart = portGET_RUN_TIME_COUNTER_VALUE();

at the beginning, and

FF_PRINTF("%s:%d: Elapsed time: %lu ms\n", __FUNCTION__, __LINE__, (unsigned long) (portGET_RUN_TIME_COUNTER_VALUE() - xStart) / 10);

at the end. If it turns out I’m right, and this function is taking a long time, then I look at the functions it calls and instrument any suspects in the same way.

Hi Carl,

I have a free running timer that has been implemented in the FPGA to be used for this exact purpose, so I will add the 2 lines that you have recommended and see what exactly is taking so long. My guess is that it is indeed the ff_write function because when I reduce my file size from 1MB to 1KB, it becomes much faster.

I will try this and report back.

It takes about 12 seconds to format an 8 GB SD-card, and a 64GB disk should take 8 times longer, 96 seconds. Clearing the FAT area is most of the work.

Writing 1 MB should take less than a second.

When formatting the disk, I recommend using these parameters:

FF_Format(
    pxDisk,
    xPartition,
    pdFALSE, /* xPreferFAT16 */
    pdFALSS  /* xSmallClusters */

As for the speed: does reading from the disk also take so long?
And have you checked the clocks? During initialisation, it will be determined. Does it want 25 MHz?
And does the function XSdPs_Change_ClkFreq() in xsdps_options.c succeed?

In this case, something strange is happening on my end. 40 mins is way longer than 96 seconds. Here is my concern: The code has very separate pathways for what happens for SD cards vs MMC/eMMC…

No, read is very fast.

I added a whole bunch of breakpoints in the XSdPs init process today and saw that XSdPs_Change_ClkFreq() gets called 3 times, once with 400kHz, then with 26MHz, and then one last time with 52MHz. And it succeeds each time. And the bus width also gets defined as 4bits as expected. I really thought there was something going on with frequencies but I’m back to square one now…

One thing I found was that my card type was getting detected as MMC and not eMMC. Not sure if this would affect the speed in any way but I will do some more digging here.

I think what I will do next is add xilffs (Xilinx’s FAT driver) to my project and try to format and write a 1MB file from there and see if it is faster. I will report back with results from that.

Any more ideas?

Update: 1MB write is fast now with +FAT! And formatting is fast too!

Here is what changed: When I started the debugging process today, my first set of breakpoints helped me find out that the bus speed for my device is being set to 52MHz by the XsdPs driver based on the values it was reading from the device. However, I opened up the FPGA platform that was created for our board and noticed that the frequency for the SDIO peripheral was set to 50MHz. I changed this to 52MHz, synthesized the FPGA code, and created a new bitstream to load into the Zynq. Because I am using Vitis to do development, I had to manually tell Vitis that the FPGA image had been updated and I had to regenerate the BSP. Once I did this, I ran the xilffs example and it worked! So then I tried to re-run my main project code too with this new FPGA image and voila, everything works as Hein said, less than 1s to write a 1MB file.

Thank you Hein for pointing me in the direction of the clocks. It would have taken me much longer to debug this without your help.

Very good Sid, thank you for explaining what you have done, it will help other people. Thanks, Hein

***** Final update *****

At the risk of revealing to the world how little I know about clocking, here goes:

I had a chat with the FPGA engineer on our team and he found it incredibly strange that changing the frequency from 50MHz to 52MHz made the device fast all of a sudden. And after doing some reading, I was in agreement with him. So I reset the FPGA HW to what it was before; 50MHz clock going to SDIO instead of 52. And as expected, the eMMC was still fast. So the clock was not the issue.

In an attempt to understand what changed to make the device fast all of a sudden, I looked at my git diff. There was only a single line of code that had changed -

-        xError = FF_Format( pxDisk, PARTITION_NUMBER, pdTRUE, pdTRUE );
+        xError = FF_Format( pxDisk, PARTITION_NUMBER, pdFALSE, pdFALSE );

When I was setting up the formatting functionality in my program, I had copied the example usage from the FF_Format() documentation page. In the example, the variables xPreferFAT16 and xSmallClusters are both set to pdTRUE. I was being ignorant and didn’t pay attention to this. I made this change after Hein’s recommendation from a previous comment in this thread:

So yes, everything works as expected now and it was indeed a firmware issue and not an FPGA issue like I said before.

For anyone who cares, the f_mkfs function from the xilffs driver formats the drive to FAT32 and uses big clusters, which is why that example was working for me. And after running this example, when I switched back to running my project application, I initially didn’t do a re-format, which is why the writes I was doing there also sped up. Today I did a format using the FF_Format function with the parameters mentioned above and everything is still working fast as expected.

I hope that my experience with this will be of help to whoever goes through an issue like this.

1 Like

Interesting to see how sometimes we can be misled by apparent results. Me also, often I wonder wether I am really measuring what I want to measure.

FF_Format() : we wanted to make it as simple as possible, and so only two configuration parameters are asked:

  • do you prefer FAT16?
  • do you prefer to use small clusters?

When the documentation was produced, there were still lots of small drives, so it used “pdTRUE, pdTRUE”.

Thanks again for reporting, and good luck with your project.

1 Like