This is a follow-up to this old topic:
I’ve done a lot more experimenting and analysis, and I thought I would summarize my findings in case someone else finds themselves implementing this, what seems to be a common pattern, of data logging.
My first, naive, implementation used two files, opened and closed for each update. One file got updated once per minute with a big record, and the other once per second with a small record. It turned out that this pattern of updates leads to massive fragmentation as one file gets extended, then the other. The long cluster chains make opening a file for append horribly slow.
I tried a bunch of different things.
Increasing the cache is not a viable option because I am tight on RAM space.
Keeping the files open indefinitely is not good, because the directory structures never get updated until you close the file. I experimented with forcing a directory update without a close, but in cases this might lead to data integrity problems. Maybe one could borrow some tricks from GrumpyOldPizza/ RFAT (Robust FAT)? More research needed if you want to go down this road.
Pre-allocating the file did not work out well. You have to write the entire file twice, once when allocating it, and again while writing the data. This takes time, and doubles the wear of the flash memory in the SD card.
Putting the files in two separate partitions was good from a performance standpoint, but it ate up significantly more precious RAM, increasing my number of file systems from 3 to 5. Also, I could not get Windows to recognize the SD cards with two partitions, and I really want to be able to analyze these cards on Windows. And it really added to the code complexity.
What I finally settled on was putting all of the records in a single file, keeping the file open for 5 minutes at a time, and creating a new file each day, a directory per year, all in a single partition. By using a single file, you get one, nice, linear file that doesn’t get fragmented. Forward seek times are not bad, so skipping over records is fast. The best way to retrieve records is to read a whole day’s file from the beginning and filter in the records of interest. Backwards seeks are hopelessly slow. It takes almost as long to find the mid-point of the file as to just read the first half of the file, so you can forget about binary searching. And, I can read the cards on Windows.
Hope this helps someone.
Update: If a whole day’s file is too much data to go through when retrieving records by time, you can exploit the power of the hierarchical file system to efficiently decrease the granularity. I ended up making directories for year, month, and day, with a file per hour.
#define ffconfigMKDIR_RECURSIVE 1 and
mktime() are my friends for this.