Recharge Your USB Flash Drive Today or Say Goodbye to the Files Tomorrow

Posted in Hardware on 17 October 2021

I know, I know. The title will inevitably raise some eyebrows. Rest assured, though, by the end of this brief post it will make perfect sense.

flash-memory-post-low-battery

A couple months ago I sat down to sort out my USB thumb drive collection. I have 10 of those used either to move large files between distant machines, as OS installation and recovery media, and to actually store some non-critical data. While going through the inventory I eventually plugged in the USB stick I brought back from my Summer 2018 trip to Japan. The last time I used it was around September 2018, when I transfered some video clips from the PlayStation 4 to study while building my Dynamic Sloshing Liquid Rig.

Which means the drive was left unused for 3 years.

Imagine my surprise when I found out that two out of three short in-game video clips were corrupted. The first clip wouldn't even copy over to the PC, while the other one took its sweet time and eventually, after about 30 minutes (for a 100 MB file), succeeded in getting copied to the HDD.

Overwhelmed by avid curiosity I immediately fired up the video player...

What you see here is the result of physical data corruption (a.k.a. "bit rot" or "data degradation" or "data decay"). While such graphic and "movie-like" damage is, ironically, quite fitting to the overall presentation of Horizon: Zero Dawn — a game about hunting mechanical dinosaurs in a post-apocalyptic world, — this incident begs the question...

What Happened?

To answer it, let's briefly recall how a Flash Drive actually works.

For any Operating System a USB Flash Drive is just a block storage device with address space divided into 512 byte sectors. So when the OS needs to address a specific data block it sends an appropriate command to the controller and either reads or writes data.

NAND-flash-memory-chip Looks and sounds simple, right?

Well, in actuality, the NAND memory such drives are based on comes with lots of quirks and complications. Long story short: LBA (OS-level) access needs to be translated into NAND-specific order of commands: sometimes it needs to erase the blocks, or only write data in a specific order, or log the number of reads and writes to each memory block e.t.c.

All of those steps are absolutely necessary for the drive to be both practical and robust, due to the fact that the number of times each NAND memory block can be overwritten is strictly limited. Such command translation is performed by the drive's controller which functions as a Flash Translation Layer (FTL). It accepts LBA commands, does the required NAND operations and returns LBA-compliant response.

flash-ftl

What a lot of users don't know, is that the controller also performs another very important set of operations: garbage collection, wear leveling and bad block management.

NAND Flash Memory Wear Leveling

Now recall once more that NAND blocks can only be overwritten so many times before they become unusable. Therefore, before writing anything to NAND the controller must decide which blocks to distribute the data over, to ensure that all blocks get worn out evenly. While it's easy to make use of "fresh" blocks, the chip also needs to closely monitor ones which undertook lots of writes, since those are highly likely to fail either on their own, or upon another read/write operation. Therefore sometimes the controller needs to take existing data and physically move it to another block to for safe storage.

wear-leveling-visualized

This is complicated even further by the fact that before allowing to be written into, any block must be erased. This set of operations is called Garbage Collection (GC) and it's essential for the drive's endurance and secure data storage, and can only be performed while the drive has power.

wear-leveling-wear-count

No Time to Explain, Get In the Car!

When you use the drive to write data, to ensure high write performance, the controller looks for previously "erased" blocks to put data into. While this doesn't sound like a problem, consider how USB Thumb Drives are usually used: you put the drive in, quickly write the data onto it, then almost immediately take it out of the device and put the drive into your pocket.

Garbage Collection, sort of like automatic defragmentation, is performed while the drive is idle. And since your common USB Stick gets little to no time "for itself", it cannot effectively perform such "house-keeping" operations as data relocation (from "older" NAND blocks to "fresh" ones) and erasure of unused blocks for future write operations.

As a result of such use scenarios, over time the USB Stick becomes much slower, for at some point it runs out of ready-to-use blocks and needs to erase old ones during immediate writes. Because, of course, erasing files via the OS only marks them as deleted on the file system level, the parts of the drive where those files are physically stored aren't really changed. At least, by the OS itself.

The lack of "NAND-level" erased blocks causes write speeds to drop down significantly, which can be perceived as the drive "dying".

garbage-collection

But it gets even worse. To relocate data from one block to another, the controller needs some already erased blocks to be freely available. And if it had no time to physically erase unused blocks prior, it wouldn't be able to perform this data-critical operation due to alleged lack of available fresh blocks.

And as a result of that — blocks die or lose charge over time and the data loss occurs.

Oh No! What Do?

With power provided, and given enough idle time, the Flash Drive controller can work as intended and perform Garbage Collection in the background, ensuring it's nice and ready for future uses and safe mid-term data storage when unplugged from the device.

All you need to do is from time to time plug your USB Flash Drives into your PC or other host device and let them sit idle for some time.

garbage-collection

I can't assert exactly for how long, but believe an hour or more should suffice, depending on the drive's capacity and space used. This will give the NAND controller precious time to perform "house-keeping" and will prolong the life of both your drive and the data on it, as well as help maintain its maximum read/write speeds for much longer.

SSDs Too by the Way

This advice is valid for almost any type of a flash storage device, including Solid State Drives.

hdd-vs-ssd

If for some reason you prefer SSDs over HDDs for long-term data storage (which is a terrible idea, by the way) make sure to plug those drives in at least once a year and let them sit idle for a day or two. Especially if your drive uses cheaper TLC or QLC memory cells which are highly prone to charge loss over time.

slc-mls-tlc-qlc-nand-density-schematic

Multi- (MLC) or Single level (SLC) cell drives are also susceptible, so don't think you're safe simply because you shelled out for a more expensive drive. It will eventually lose charge or flip bits if left unpowered long enough.

Story Time

As long as we're talking NAND data loss, I'd like to share a peculiar incident that happened about a year ago.

I use a high-quality 2-bit MLC-based SATA SSD as a system drive. It's fast, has good endurance and never showed any signs of excessive wear. Then one day I powered up the PC to discover that my Windows user profile was damaged and the OS loaded up into a default one. My user folder and the files were still there, but the OS refused to load it up, saying the registry was corrupted.

Luckily, thanks to the 3-2-1 Backup Strategy, a recent backup is always available, so I easily recovered the system partition and decided to inspect the files. Just as I thought: it wasn't a virus or anything — upon closer inspection and after comparing the corrupted file to the original, I discovered mismatched bits in the damaged registry file in two places. How exactly this happened — I'll never know, but all signs point to either some random cell charge leak or maybe even a bit switch caused by cosmic radiation.

No. I'm not kidding:

After all, we do live in a real, physical world. It's highly volatile to anything that wants or needs to hold charge for extended periods of time. Therefore, even if you have a good Solid State Drive, you're never 100% safe from data loss. Make sure to back your precious data up from time to time.

And in the mean-time...

Did You Remember to Recharge Your Flash Drives This Year?

flash-memory-post-battery-full