Understanding SSD Controller Terminology: Overprovisioning & WAF
Posted on August 27, 2013
We've previously written in great detail about the SSD development lifecycle and SSD manufacturing processes, but we've yet to delve into what really drives solid-state drives: Controllers. An SSD is effectively its own, self-contained computer-within-a-computer. It's got a CPU-equivalent in the form of a Flash Storage Processor (or Controller), complete with on-board cache, memory management, low-level firmware, and channels to the NAND Flash modules.
Because an SSD's Flash Storage Processor is effectively a specialized, purpose-built CPU, we'll limit this article to a few key, top-level aspects of controllers; we'll break the controller's numerous complex elements into several individual articles, with this one focusing on overprovisioning, write amplification factor, and video content.
Overprovisioning and Write Amplification Factor (WAF) were selected as our key topics for a few reasons: First, WAF plays a massive role in the longevity of the NAND Flash as it ages, directly impacting the usable life of the drive; second, the overprovisioned space on a drive dictates the available user space and performance. When helping our system builders select an SSD, our two most common questions have been "What's the endurance like?" and "Why is the capacity different between competing drives? Why is one 256GB and the other 240GB?" This article answers both of those questions.
Let's hit the video content first:
To LSI: "What should our gaming audience be focused on in an SSD?"
Video Interview w/ LSI - "What is Overprovisioning & the Write Amplification Factor?"
To summarize what Sr. Director of Outbound Marketing Kent Smith noted in our above video, here's a quick definition of our two primary topics:
Overprovisioning in SSDs - What It's For
Overprovisioned space is the storage capacity of a device retained in spare for garbage collection and wear-leveling commands. Unlike an HDD, an SSD must erase blocks completely for each delete or rewrite command; each full erase and re-program of a block is known as a P/E cycle, or program/erase cycle, which wears the endurance of the block down. A block has limited P/E cycles before it is depleted -- as the drive ages (and wear-leveling should age it equally across all blocks), the NAND Flash modules lose their ability to retain a charge (data). Consumer-class SSDs tend to sit in the 1K-5K P/E cycle spec, meaning the drive can be erased and reprogrammed in whole 1000-5000 times. A HyperX 3K SSD is conservatively rated for 3,000 P/E cycles, for instance. This isn't a hard limit and there's a lot more that goes into it, but supplies a foundation for understanding drive endurance.
Other controller features (like wear-leveling & garbage collection) work in tandem to ensure level degradation of the Flash (ensuring one block does not fail before the others). In order to properly wear-level the drive, the controller is responsible for constantly relocating, rewriting, and erasing blocks in the background. To improve stability and performance, an overprovisioned space (storage capacity inaccessible to the user) is "reserved" for these functions, effectively acting as a swap space.
Putting this knowledge to a practical test, you've likely noticed that drives ship in varying capacities in the same general range. In a drive where 240GB of storage is available to the user (240GB of "user space"), we see an additional 6.67% of overprovisioned space (down from 256GB).
Note: This stated, because of the differences in the way the OS and application layers interpret gigabytes (binary vs. decimal), displayed storage capacity could lead you to believe a 256GB SSD has no overprovisioned space. This isn't the case (and is actually impossible), but is just a matter of how data is presented on the OS layer.
Different controllers juggle overprovisioned space in different ways, but SandForce SSDs do tend to overprovision a larger portion of the user space in an effort to improve reliability and performance-over-time, trading off overall capacity.
Enterprise-class SSDs have overprovisioning reaching upwards of 28%, seriously improving overall performance (speed), stability, and longevity. Because consumers are more value-driven and don't require the level of reliability a server SSD might, manufacturers can get away with allocating a smaller amount of the space to overprovisioning. This decreases the cost-per-usable-GB by increasing the size of the user space, but at the trade-off of decreased performance and longevity (against an Enterprise SSD). Because a consumer SSD isn't being hammered as heavily as a server drive—and won't be in use for as long—this trade-off is more than worth it in the face of value.
With identical architecture and variables, it is a safe assumption that more overprovisioned space will equate superior performance and endurance. The lines get blurred when comparing -- for instance -- a Samsung controller against a SandForce controller, which deploy vastly differing solutions for the same problems. That's where benchmarking becomes your primary metric for comparison, given the non-linear nature of cross-architecture analysis.
Write Amplification Factor in SSDs
Then there's Write Amplification. When writing data to Flash memory, the actual data written from the host does not equate the data physically written to the device. In actuality, the data physically written to the NAND Flash is multiplied by what is known as the device's Write Amplification Factor. The higher the Write Amplification Factor, the faster the device's NAND Flash is depleted and endurance / performance are degraded in turn. Knowing this, if the host sends a gigabyte of data to a device with a 1.05 WAF, you're multiplying the data physically written by 1.05 times. This wears down the device faster due to what is effectively (to the user) an artificially-increased load.
Several underlying and complex solutions are present on the firmware-level (and OS-level) to reduce the WAF to more sustainable values. Wear-leveling and garbage collection are two of these (and will be left to future articles), working to burn through P/E cycles more equally across all blocks by relocating data and wiping the pages/blocks when necessary. A typical WAF for a consumer-grade SandForce controller hovers in the 0.5-0.7 range, meaning each physical write is multiplied 0.5-0.7 times from the host, significantly reducing the aging process of the Flash modules.
What can I do with this information?
Until we've gone through every other major aspect of the controller -- which is one of the more complex and daunting tasks we've set out to undertake -- you can still use this information for your benefit. By understanding that a low write amplification factor significantly impacts the usable life of an SSD, and that overprovisioned space improves stability and performance, you'll be more equipped when researching purchasing options. There's a lot more to the SSD and its controller than that, of course, but this gives us a solid foundation to work from.
As gamers and hardware enthusiasts, your biggest concerns should generally be reliability and stability. Because performance on modern SATA-interface SSDs is all nearly identical (and starting to hit the SATA barrier), it's easy to let that fall to the wayside in favor of greater longevity and persistence of data. When we're looking at a difference of a few thousand IOPS or a couple MB/s in performance, those differences are so insignificant in gaming and everyday tasks that they can be overlooked.
When checking benchmarks on the web -- and this is something we'll go into more in the future -- you'll want to focus on 4KB random R/W operations. Gaming hammers the drive for small, random files throughout the loading process (the pipeline is explained in our Star Citizen tech article); sequential operations are favorable for those transferring large files frequently - media production professionals would do well to favor high sequential transfer metrics over random metrics.
If you've got questions about SSDs or controllers, drop us a line below and we'll do our best to get it answered.
- Steve "Lelldorianx" Burke.