Lately we are witnessing hype around flash based storage. SSD design companies raise money (Anobit) or acquired (Pilant), flash appliance companies execute huge IPO (Fusion-IO) and SSD based appliance start-ups are hot merchandise (Violin, XtremIO).
Is this a real revolution in the storage industry or just a temporary trend? Will we see flash based storage in the data center in the coming years and in what construction? For this we need to analyze the benefits and weaknesses of flash based storage:
Performance
Most people outside the IT world would say that “SSD is much faster than regular disk (HDD)”. However, in the storage world we know that there is no such metric as “fast”. There are 3 orthogonal metrics to measure performance: bandwidth, IO per second (IOps) and latency. Each one is relevant to different types of applications. For example, video stream application would require large bandwidth while the IOps and latency are irrelevant. OLTP database applications require high IOps and low latency while bandwidth is irrelevant.
So how does flash based storage perform in each of those metrics comparing to HDD? The main advantage of flash over hard drive is the lack of moving parts or mechanical parts. Each command in the hard drive applies seek (the rotor moves to the required offset) and transfer of data. In flash based storage on the other hand, only data transfer phase apply. So what is the improvement? If the seek element is dominant (in case of small data transfer and random pattern), then the improvement is big. However, if the data transfer is dominant (large transfer and sequential pattern), then the improvement is minor or even not exists (in case of fast disks or RAID).
Returning to the metrics above, we can see that the bandwidth of flash based storage is about the same as HDD (as it being measured with large commands). The IO per second and latency however have improvement potential in a workload profiled with small command and random pattern.
Size
Unlike the mechanical construction of a hard drive, the flash based storage is pure silicon device. Although SSD disks are sometimes packaged for compatibility as hard drives, the storage element itself is a small fraction of a hard drive. Obviously, this feature is critical to portable devices, hence we will see flash based storage in smart phones and tablets. Thin laptops can also benefit from this capability. But this can provide cost benefit in space in highly expensive rent areas.
Power
As an electrical component without moving parts, the flash based storage is a pure electronic device. While the hard drive requires power to operate an electric motor, the flash based storage only requires standard operation power of a memory device. Furthermore, the size factor difference applies further power for cooling to the hard driver. Traditionally, this feature is related to portable devices with limited battery power supply. However, the power consumption of data centers becomes major part of the expenses. Furthermore, while the electrical consumption of data centers in the US reaches to the level of entire country consumptions, power regulations may force power saving and power cut.
Endurance
Due to the internal mechanism of the flash cell, the number of erase and programming cycles is limited. While in SLC (single bit per cell) the endurance level is 100K, the endurance level in MLC (multiple bits per cell) reduces to several thousands for x2 MLC (two bits per cell) and even hundreds for x3 MLC (3 bits per cell). Although signal processing algorithms raise the endurance level of MLC (also called eMLC), this is a limit (even psychological) to IT managers.
Cost
Last, but with most importance is the cost difference. Here is still the main barrier for the flash based storage. While 1TB of HDD cost less than 100$, 1TB of flash based storage would vary from 10K$ of SLC flash to 2K$ of MLC x3, almost 100 times more expensive! Furthermore, while the HDD drives’ cost per volume keeps dropping, the flash memory may have reached its technological limit. Current layout of 24 nm may be the barrier in the next years.
So where and how will flash based storage reside in the data center?
Eventually, when it comes to the IT manager, it all ends in ROI considerations. As long as the cost difference is about 100 times, full deployment of flash based storage in the data center is not realistic. Although flash based appliances exists (Violin, Virident, XtremIO), their high cost locates them into dedicated niches such as financial trading. Furthermore, the performance benefit is limited for only parts of the applications in the data center.
If full deployment is not seen in the near future, where is the suited place for flash based storage? Their performance capabilities can be used for two functions – caching and tiering.
Flash Tier
Flash tier is a storage appliance with some portion of the overall storage is flash based. An external storage places the data with the largest acceleration potential (e.g., hot zones, database hot area) in the flash tier dynamically. Such approach exists in EMC’s FAST, where the flash tier is part of a multi tier construction of the storage.
Another tier approach places dedicated critical data in the flash. For example, file system or NAS indexes (e.g., Alacritech) or any other metadata. Metadata is by nature profiled as small granularity and random access, best fitted for flash.
By any architecture, flash based tier should use high level flash (i.e., SLC or at least eMLC). The production data resides on the flash and should be protected from endurance for several years’ retention. This constraint limits the flash usage to small portion of the storage (about 1%).
Cache
Flash Caching used the flash storage for data caching between the server and the storage. There are two types of such cache – a write through (read) cache that always coherent with the back end storage and write back cache, where data in the back end is not updated. A read cache is safer to use and ensures full coherency at the back end storage for any functionality (e.g., replication, backup, snapshots, etc.).
A write back cache (write cache) must ensure the validity of the data in it, thus must use high level cache (SLC). Furthermore, due to the high requirements of data center reliability, it should provide cluster architecture with data synchronization. Read cache on the other hand, can relax the reliability requirements since data always can be retrieved from the back end storage. Hence, a low level flash (MLC) can be used.
As a results, a write cache approaches to the high end of the data centers and high performance computing, while read cache is more common. Furthermore, due to the flash requirements, a read cache can apply cheaper flash (MLC) and obtain larger volumes up to 10% from the back end storage.
So where is the cache located between the servers and the storage?
There are several places in the data path between the servers in the front end and storage in the back end. Server cache is an intuitive and simple location for the cache. In server cache (as FusionIO) is close to the server’s backbone and provide low latency caching. However, this approach is good for a single server while a cluster of servers (e.g., VMware environment) will require per server cache and will not be coherent for tasks’ transfer (e.g., vMotion).
Caching in the storage appliance is available in compliance with tier approach. This can be a simple approach for a storage construction, but may apply complexity in scaling up.
Third approach places the cache half way between the servers and the storage – in the switch. Such connectivity (e.g., Grid Iron, Dataram) is usually accomplished with virtualization and network management capabilities. Its central location enables manageability with caching and hence easy to scale up both directions – servers and storage.
So where will we see flash based storage in the data center? We will probably see it as tier in storage appliance or some for of caching, most likely a read cache in some form. The applications it will accelerate are forms of databases, VDI, Exchange and other intensive read applications. The new challenges now facing storage designers include the decisions what to put in the flash tier/cache to best utilize this new storage media.