Increasing memory sizes will make caches more and more effective at satisfying read requests. As a result, disk traffic will become dominated by writes. This is the prediction that is poised to hit the storage industry, largely due to the advent of caches based on flash memory, which are generally 10 times larger than the traditional DRAM-based caches.
Cache rich
It is also due to the advent of data reduction technologies such as compression and deduplication, which further increase the effective size of the cache. I will refer to such caches as “hyper caches” to differentiate them from the much smaller, traditional caches.
A common theme among host-attached hyper caches is that they are focused on accelerating reads, not writes. In theory, a hyper cache could buffer writes, which is also known as write-back caching. However, most implementations do not buffer writes at all, and even those that do make it optional and have enough warnings attached that the average user is likely to leave it turned off. In particular, CBRC, as the name suggests, does not accelerate writes.
The major reason for not buffering writes within the host is that it can jeopardise storage reliability and consistency. What if the host fails? Would a snapshot or backup taken on the storage be point-in-time consistent? The impact on consistency is even more grave when write buffering is optimised to re-sort the writes. This leaves the burden of optimising writes on storage. So, how does one build a write-optimised storage system?
Disk traf?c will become dominated by writes
Most disk-based storage systems employ a write buffer internally to reduce latency and absorb bursts of writes. But it does not help so much with sustainable throughput, because buffered writes need to be drained to the underlying disk subsystem at some point. This is different from reads serviced from a cache, which never go to disk! Therefore, sustainable write throughput is limited by the ingest rate of the disk subsystem.
Another solution is to use flash, either as a large write buffer or as an endpoint of storage by itself. But the benefit of using flash for write optimisation is small relative to its high cost. The fundamental reason for this is that flash memory is nowhere nearly as good with accelerating writes as it is with accelerating reads. This is apparent in its lower write performance, limited write endurance, and need for overprovisioning to limit write amplification. Therefore, the burden of optimising writes goes all the way to disk.
Write Coalescing
Indeed, most flash SSDs use this same technology, renamed “write coalescing”, to make random writes more palatable for flash. However, write coalescing hasn’t been very successful for disk-based file systems. File systems such as NetApp’s WAFL and SUN’s ZFS do write coalescing opportunistically, which works well initially when the disk space is largely free, but degrades to random writes over time. Coalescing all writes all the time, as suggested by Mendel, requires an efficient process to defragment the free space.
The predominant themes to take away from the emerging need for write-optimised storage is that disk traffic will soon become dominated by writes, with storage systems employing a write buffer internally to reduce latency and absorb bursts of writes.
The prediction that this is poised to hit the storage industry is extremely pertinent at a time where the industry is seeing exponential growth in various areas. The most important factor to consider is that many virtual workloads, especially virtual desktops, have a significant write demand. However, caching or storing these writes in flash can significantly improve performance at a time when businesses need to access information quickly, often remotely and from a plethora of devices.