Ultra-Low Latency Down to Microseconds SSDs Make It. Possible

Ultra-Low Latency Down to Microseconds SSDs Make It Possible DAL is a large ocean shipping company that covers ocean and land transportation, storage, cargo handling, and ship management. Every day, its application system processes more than 100,000 transaction orders and performs data backup and integration activities. Every night, storage and ship scheduling plans are made and must be completed before 6:00 a.m. the next day. In early 2013, DAL planned to add flights to North America, increasing the number of transaction orders per day to 150,000. Its legacy system was unable to handle such a large number of orders every day, but a one-day delay would result in a 100,000 euro loss. According to analysis, a major performance bottleneck lay in the response latency of databases and disk drives during peak hours. When the system IOPS peaked at 200,000, the I/O latency reached 8 ms. This high I/O latency caused 80% of the database operating time to be wasted in I/O waiting, with a prolonged batch processing period. To resolve this performance issue, a storage system with an ultra-low I/O latency was installed for DAL, ensuring that all 150,000 transaction orders were processed within the required time. Performance bottlenecks and countermeasures As enterprise data centers process an increased number of applications, they pose more and more demanding requirements on latency and service levels. A mature and high-performance IT system can facilitate enterprise operations, covering enterprise resource planning, customer relationship management, end-to-end manufacturing, and enterprise management. The field of public affairs is witnessing the same change. As cloud computing and server virtualization technologies develop, each IT appliance must process diversified applications, while users call for shorter waiting time and higher system availability. These exert challenges for IT system response times, concurrent processing capability, and query latency. In the past, many methods have been tried to

improve IT system efficiency, such as adding more servers, upgrading server configurations, and reducing server workloads. Performance bottlenecks persisted, and server computing requests were starved of storage resources. Huawei has diagnosed hundreds of user systems that suffer from the same problem, and found that 87% of system performance issues occur in the interaction between storage systems and application databases. In other words, the response latency and concurrent capability of a storage system affect those of an entire application system. Response latency of a storage system is a performance indicator that most concerns users. Especially for mission-critical businesses, response latency directly determines user experience. Stable and low latency optimizes user experience and also reduces the number of servers required, saving equipment room footprint and power consumption. Shortened latency also adds customer IOPS requirements, which helps system providers increase profits. For example, if the data access latency of a data-intensive application is reduced by 90%, the IOPS requirements will increase tenfold. This benefit is especially significant for applications like OLTP, OLAP, high-performance computing, and virtual desktop. Here we need to clarify that the latency we mention in this article is not the average system latency, but the maximum latency of at least 99% of I/O applications (99% latency). The 99% latency plays a more important role in applications than the average latency, because most applications are online data-intensive applications. One application request triggers multiple data access operations, and the latency of the application request is determined by the operation with the highest latency. This is why we focus on the 99% latency rather than the average latency. As shown in the following figure, when the service pressure increases, the number of I/Os to be processed also rises. Maintaining stable and low latency under heavy I/O pressure (such as 1 million IOPS) ensures the fast response of applications.

Comparative analysis of storage system performance Relationship between system latency and disk drives Before we try to develop a mechanism providing low latency for mass concurrent access requests, we must answer a question: "What is the desirable latency we want to achieve?" We find out that every latency decrease in one order of magnitude brings us brand-new user experience, and if we reduce the latency to lower than 1 ms, we can achieve optimal experience. Such low system latency requires a microsecond-level processing speed of every unit involved in the system, including hardware, software, architecture, and protocols. In traditional storage systems, the latency of hard disk drives (HDDs) slows down the system processing speed, and makes the system latency reach 10 ms at least. But with the application of solid state drives (SSDs) in storage systems, most critical data can be saved on SSDs, which reduces data access latency to shorter than 1 ms. If we use the most advanced DRAM SSDs, the latency can be shorter than 100 us, or even 10 us. Unlike the previous method that uses a large number of HDDs to improve system IOPS, SSDs boost system IOPS with its low latency. This new method improves system performance as well as reduces the cost in system infrastructure. Furthermore, the internal controllers of SSDs enable concurrent access to the back-end NAND Flash chips,

in this way, SSDs can accelerate the system's processing of concurrent requests. Disk drives are an important factor in determining system latency, but it is not the sole factor. Latency is the result of a very complicated process. Every storage request, from entering a storage system to being sent back to users, is processed by multiple system resources (CPUs, locks, caches, disk drives, internal networks, and I/O interfaces) and queues many times. Each processing and queuing operation causes certain latency. Competition for resources and operating system scheduling prolong the latency further. Therefore, the software processing mechanism and protocol stack overhead also require attention. Using SSDs in storage systems for microsecond-level latency If we want to use SSDs to accelerate system performance, we cannot merely replace HDDs with SSDs, but need to redesign the system architecture. Traditional storage systems are based on caches, which provide read-hit and write-back mechanisms to reduce the read and write latency for HDDs, and use an index table to reduce memory usage. However, this index table is comparatively slow and is inadequate for SSDs whose latency is only tens or hundreds of microseconds. The latency caused by the traditional index table greatly hampers SSD performance. Therefore, in an SSD storage system, we need to adopt a new cache index table which delivers a shorter operation latency than before. This table design may occupy some extra CPU and memory resources, but it sacrifices certain CPU and memory usage for high performance, and relieves more system resources to process I/O requests. When designing SSD storage systems, we must regard latency reduction as our foremost concern. Traditional storage systems focus on storage space usage, while SSD storage systems focus on system latency. Another difference distinguishing HDDs and SSDs is that HDDs process random access requests 100 times slower than sequential access requests, while SSDs process random access requests only two to four times slower than sequential access requests. The significant difference in HDDs causes traditional storage systems to employ various cache algorithms to increase sequential access to HDDs. However, these cache algorithms are

not suitable for SSDs. In SSD storage systems, we focus more on issues like Flash page usage, cache contamination, and data selection overhead. Therefore, a variety of technologies are dedicatedly developed for SSDs. The SSD data selection algorithm can effectively separate sequential data, temporary data, and hot data, and obsolete unvalued data. Huawei-proprietary SSD granularity feature matches the data granularity with SSD Flash page and ECC granularities, effectively reducing SSD write penalties and write amplification. Conclusion With the development of cloud computing and server virtualization, a storage system must process a diversity of applications. The storage industry is experiencing a revolution, which promotes storage systems to become converged and unified. SSDs are replacing HDDs as the mainstream storage medium, and they bring stable and ultra-low latency to storage systems. Huawei introduces SSDs into its storage systems and develops a wide range of technologies to maximize the high-iops and low-latency advantages of SSDs. With these cutting-edge technologies, Huawei's storage systems can process millions of IOPS within a few microseconds, meeting the long-term requirements of enterprise data centers.