Kinetic Action: Micro and Macro Benchmark-based Performance Analysis of Kinetic Drives Against LevelDB-based Servers

Size: px

Start display at page:

Download "Kinetic Action: Micro and Macro Benchmark-based Performance Analysis of Kinetic Drives Against LevelDB-based Servers"

Dorothy Bates
6 years ago
Views:

1 Kinetic Action: Micro and Macro Benchmark-based Performance Analysis of Kinetic Drives Against LevelDB-based Servers Abstract There is an unprecedented growth in the amount of unstructured data. Therefore, key-value object storage is becoming extremely important for its ease of storing and retrieving information. Seagate recently launched direct-access-over Kinetic Ethernet hard drives which incorporate a LevelDB key-value store inside a hard drive. In this work, we employ Micro as well as Macro (Yahoo Cloud Serving Benchmark) to evaluate these drives to try to understand the performance limits, trade-offs, and implications of replacing traditional hard drives with Kinetic Drives in the data centers and High Performance Systems. We perform in-depth throughput and latency benchmarking of these Kinetic Drives (each acting as a tiny independent server) from a client machine connected to them via Ethernet. We compare these results to a SATA-based and a faster SAS-based traditional server running LevelDB. Our CPU-bound functionality sample drives average sequential write throughout of 63 MB/sec and sequential read throughput of 78 MB/sec for MB value sizes and are able to demonstrate unique Kinetic features including direct disk-todisk data transfer. Our Macro Benchmark (YCSB) shows that the full-fledged LevelDB servers outperform the Kinetic Drives for several workloads; however, this is not always the case. For larger value sizes, even these sample Kinetic drives outperform a full server for several different workloads. I. INTRODUCTION The amount of digital data is growing at an extremely rapid pace, and it is estimated that its volume will grow at 4%-5% per year []. Most of this data explosion is due to unstructured data. According to predictions from International Data Corporation (IDC), 8% of global data (approximately 33 EB in 27) will be unstructured [2]. Managing such an enormous amount of data is a challenging task. Most of the data in existing systems is stored and accessed using traditional file-based or block-based systems [3]. Unfortunately, these traditional systems are becoming inefficient as file-based access and hardware requirements limit their scalability. Therefore, there is a need for a data access method that is highly scalable and has the flexibility of scaling-out horizontally [4]. Object storage overcomes limitations of file-based systems by offering scalability and being software defined [5]. The data is communicated as objects rather than files or blocks, and additional metadata can be stored alongside the object [3]. Object storage is flat structured and prevents higher-level applications from needing to manipulate data at the lowest level. Therefore, object storage has the flexibility to scale-out horizontally. Furthermore, a unique identifier called "key" is used to access the object or "value". Therefore, this form of storage is called a key-value store (KV store). There are several key-value stores being deployed to support large websites such as: Dynamo at Amazon [6], Redis at GitHub [7], and Memcached at Facebook [7]. All these systems store ordered (key, value) pairs. Even though keyvalue stores address the problem of scaling and managing huge amounts of data for the above systems, the existing keyvalue stores also have several limitations. They run on top of multiple layers of legacy software and hardware, such as POSIX, RAID controllers, etc., designed for file-based systems [8]. Also, these huge systems consume significant amounts of power and rack space [9]. To help overcome these hardware and software limitations, Seagate has announced a new class of hard drives called Kinetic Drives []. These drives have a built-in processor that runs a LevelDB-based key-value store directly on the drive []. Rather than the typical SATA or SAS interface, Kinetic drives communicate externally via TCP/IP over Ethernet. Each drive acts as a tiny server in itself. An important function of the drives is direct P2P (Peer-to-Peer) transfer that allows direct data transfer from one drive to another via Ethernet without the need to copy data through a storage controller or other server [2]. By replacing hardware and software layers, Kinetic Drives can reduce the cost and complexity of a largescale object storage system. In this work, we attempt to better understand the key functionalities, features, and performance of the drives. We used this information to compare Kinetic Drives with other LevelDB based servers to drive insights about the possibility of replacing the traditional hard drives with Kinetic Drives. Furthermore, the specification sheets do not provide a detailed analysis of the throughput [3] specially in comparison to the other LevelDB based servers. We also study the ease of programmability of the Kinetic drives and share our experiences. To the best of our knowledge there are no prior works which evaluate Kinetic Drives this thoroughly. To understand several salient features including P2P transfer, get, put, and others, we developed several tests. With these tests, we identify bottlenecks and limitations of the drives. As an example, the internal processing capability of these functionality sample drives limits the I/O and thus the data transfer rate. Overall, these tests help by giving us experience using this latest class of drives from Seagate as well as help us determine throughput bounds and performance characteristics in different scenarios. Beyond academic interest, we believe the information in this paper may be useful to system-level engineers at data centers and for High Performance Computing or Big Data and Analytics systems. Our contributions are the following: Conduct tests on Kinetic Drives to understand the throughput and performance bottlenecks, limits, trade-

offs, and characteristics Make comparisons between Kinetic Drives and servers running the LevelDB key-value store on conventional hard drives Share the experience of using the Kinetic platform The

2 offs, and characteristics Make comparisons between Kinetic Drives and servers running the LevelDB key-value store on conventional hard drives Share the experience of using the Kinetic platform The rest of the paper is organized as follows. Section II presents motivation and architecture of the Kinetic Drives as well as a description of the software platform and API. Section III provides the test system configuration, while Section IV discusses our test methodology, results, and other findings for Kinetic Drives. Section V a comparison of Throughput of Kinetic Drives and LevelDB Servers. In Section VI results for Latency for Kinetic Drives and other Servers are discussed. Findings for "Keys Not Found" are presented in the Section VII, Macro Benchmark-based results are presented in the Section VIII Yahoo Cloud Serving Benchmark. Section IX provides an overview of some of the related works. Finally, Section X gives a brief conclusion and direction of future research. II. KINETIC PLATFORM - MOTIVATION AND ARCHITECTURE Kinetic drives are the latest class of hard drives from Seagate and have a similar form factor to a conventional 3.5" drive. However, Kinetic Drives are fundamentally different from traditional drives in two ways: Ethernet connectivity and an internal processor that converts the storage drive to a keyvalue store [4]. Each drive has two Ethernet ports ( Gbps port each) to transfer data to the clients or other drives using TCP/IP over Ethernet. However, they are for fault tolerance and only one should be used at a time. Newer models (2nd generation) are expected to have two 2.5 Gbps ports. However, after inquiring, these drives are not available for general public. Hence, we could not test the 2nd generation models of these drives. Furthermore, the drives do not use the SCSI interface and instead have a key-value interface. Applications interact with the drives directly using this interface and a standard Ethernet connection with IPs assigned to the drives by a DHCP server [4]. The use of Ethernet allows the storage system to be expanded without the need to install a separate Storage Area Network. Figure is a simplified comparison of the hardware and software stack of a traditional storage system and that of a Kinetic system. The second major benefit of using these drives is that they encapsulate the concept of in-storage processing by running a LevelDB-based key-value store in the drives. There is a single-core.2 GHz System-on-Chip (SoC) processor inside the drives that runs LevelDB and manages the data on the disk itself. This can reduce the software I/O complexity for applications and simplify the storage rack by eliminating the need for separate storage controllers [9]. Seagate provides a software platform to help write programs for these drives. This Kinetic API comes with the libraries for several popular programming languages, a simulator, installation instructions, some test utilities, and some basic test codes [5]. The simulator allows the test and custom programs to run without the actual drives (but we run all our experiments on actual drives). The applications or programs are created using the functions and protocols defined in the API library. The protocols are responsible for the actual connection and communication with the Kinetic Drives via messages (Figure 2). This Kinetic API is now maintained by the Linux Foundation. The drives have a built-in capability to support multiple threads and sessions. While the Kinetic API abstracts away most of the internal details of the drives, there are set limits to certain parameters. These limits are listed in Table I. Table II lists the main API commands used to interact with the key-value store on the drive. In addition, the API allows the programmer to perform write synchronization in several different modes as described in Table III. These modes allow the programmer to force immediate synchronization or allow the drive to control the write operations. A particularly unique feature that comes with the drive is the P2P transfer. One drive can directly transfer data from one drive to another via Ethernet. The process initiates when a client issues a "P2P Put" or "P2P Get" command to a particular Kinetic Drive. This drive takes control of the process and automatically (without the client s involvement) performs "put/write" or "get/read" operations from/to the target drive using Ethernet. After this process is completed both the drives have the same key-value pairs. This is a unique feature of the Kinetic Drive which is not available in the traditional drives. Therefore, these instructions get rid of the several software layers required to transfer the data. This reduces the I/O time drastically. We employed several different commands based on the API to evaluate the performance of the drives to understand the limitations and bottlenecks better. These findings are mentioned in the upcoming sections. Fig.. Software Stack Comparison between Server with Traditional Drives vs Kinetic Drives TABLE I SPECIFICATION FOR THE KINETIC DRIVES Device Feature Limit max Key Size 496 KB max MB max Version Size 248 max Tag Size 28 max Connections max Outstanding Read Requests 3 max Outstanding Write Requests 2 max Message Size MB max Key Range Count 2 max Identity Count max Pin Size 2

Kinetic Write Mode WriteThrough WriteBack Flush TABLE II COMMANDS AND FUNCTIONS OF KINETIC DRIVE Command Function put Writes the specified key-value (KV) pair to the persistent store get Reads the

3 Kinetic Write Mode WriteThrough WriteBack Flush TABLE II COMMANDS AND FUNCTIONS OF KINETIC DRIVE Command Function put Writes the specified key-value (KV) pair to the persistent store get Reads the key-value (KV) pair delete Deletes the KV pair getnext Gets the next key that is after the specified key in the sequence getprevious Gets the previous entry associated with a specified key in the sequence getkeyrange Gets a list of keys in the sequence based on the given key range getmetadata Get an entry s metadata for a specified key secureerase Securely erases all the user data, configurations, and setup information on the drive instanterase Erase all data in database for the drive. This is faster than secureerase setsecurity Set the access control list to the Kinetic Drive P2P Get Kinetic Drives reads from another Kinetic Drive directly P2P Put Kinetic Drives writes to another Kinetic Drive directly TABLE III DIFFERENT WRITE MODES OF KINETIC DRIVES Description The request is made persistent before returning. This does not affect any other pending operations The requests are made persistent when the drive chooses, or when a FLUSH is sent to the drive All the write requests that are pending and are not written yet will be made persistent to the disk Fig. 2. Message Protocol for Kinetic Drive III. CONFIGURATION In this section, we cover the three system configurations we employ to test and evaluate the Kinetic Drives and compare them with LevelDB-based servers. To manage and operate the Kinetic Drives, we must also understand the Kinetic API, which is explained in below. To benefit from the commands described in Table II, we have to leverage the Kinetic API. This API is installed in the client and is used to send commands to different Kinetic Drives which act as tiny independent servers, as shown in Figure 3. The Kinetic API is designed for several programming platforms including C, C++, Java, and Python. Our testing uses the C library of the API. Additionally, the write operation can be performed in several different modes (Table III). The programmer uses these modes to control when write operations are made persistent or to allow the drive to manage the write operations. The client machine is a Dell R42 running Ubuntu Server 4.4 LTS 64-bit (kernel 3.3) on two quad-core Intel Xeon E GHz and 2 GB of RAM. This machine uses a Gbps Ethernet PCIe network card to connect to a Kinetic disk enclosure containing four Kinetic sample drives. Internally, this enclosure contains redundant Ethernet switches with Gbps ports for the Kinetic Drives and a Gbps uplink port. The Kinetic enclosure also has a separate Ethernet connection to allow access to a web interface that allows basic management of each drive (power cycle the drive, see its IP address, etc.) The drives are model ST4NK, and we upgraded the firmware to version 3..4 using one of the included utility programs. These disks are 4 TB, 59 RPM, and have 64 MB of disk cache plus another 52 MB of onboard RAM to run the modified version of Linux and LevelDB on the added Marvel 37 SoC [3]. Additional details are in the previous section. To make comparisons between the Kinetic Drives and LevelDB-based servers, we have set up two different systems. Server-SATA Server-SAS Server-SATA has the same specifications as the Kinetic client system described above plus has LevelDB installed and two additional traditional SATA drives, separate from the OS drive, connected directly to the motherboard via SATA. These two extra SATA drives, Seagate model ST4VN, were chosen because they have specifications similar to those of the Kinetic Drives, i.e., 4 TB, 59 RPM, and 64 MB of disk cache. Server-SAS also has LevelDB installed but is a more powerful and modern system. This machine runs Ubuntu Server 4.4 LTS 64-bit (kernel 4.) on two six-core Intel Xeon E5-262 v3 2.4GHz and 64 GB of RAM. There is a PCIe LSI MegaRAID SAS-3 38 adapter connected to an external Dell MD4 disk enclosure containing Seagate model ST6NM34 Enterprise Capacity SAS drives. Each drive is 6 TB, 72 RPM, and has 28 MB of disk cache. For both Server-SATA and Server-SAS, LevelDB is installed and db_bench.cc (or a modified version) are run on the various drive configurations from the same machine, i.e., the client benchmark is run against a local server. For tests using two or more drives, Linux software RAID- is configured with default mdadm settings offered by the GNOME Disk Utility. The target location is then formatted with default ext4 settings and mounted for use. IV. METHODOLOGY AND RESULTS FOR KINETIC DRIVES This section details a variety of our tests particularly on Kinetic Drives. Performance results where we compared Kinetic Drives with LevelDB based servers are presented in the later section. Our attempt is to first inform the readers about the raw performance metrics of Kinetic Drives. This will help system engineers in designing their systems. The primary

Fig. 3. System Configuration with Kinetic Drives as Independent Servers measurements we made were sequential read and write along with random read and write for Kinetic Drives.

4 Fig. 3. System Configuration with Kinetic Drives as Independent Servers measurements we made were sequential read and write along with random read and write for Kinetic Drives. We also present the throughputs for peer-to-peer transfer for Kinetic Drives, which is a unique feature. A. Comparison of Writethrough vs. Flush Mode There are two different modes for forcing synchronous write operations, Writethrough and Flush, as mentioned in the Table III. Of the two, Writethrough operations tend to be faster than the pure Flush operations. Flush mode must make sure all previously written commands in the asynchronous writeback mode are made permanent. However, requests made in the Writethrough mode are made permanent without checking the status of any other command. Results of a test to illustrate this difference are shown in Table IV. In this test there are two variations, one where all writes are issued in Writethrough mode and one where all writes are issued in Flush mode. In both cases, each write (or put) is made persistent before returning a completion signal to the application. However, for Flush mode, the drives must ensure that there are no pending write requests waiting to be made persistent. For, key-value pairs and, key-value pairs, Writethrough gives higher throughput than repeated Flush mode writes. This illustrates that Writethrough should be used for synchronous writes and it is bad programming to use Flush mode for repeated writes. Furthermore, it could be concluded that the throughput depends on the write modes, value size, and number of keyvalue pairs. Therefore, we obtain different throughputs for Table III and, Figures 4 and 5. B. Comparison of Throughputs for Write and Read Operation ) Sequential Order Writes: This testing consists of two sets of experiments. For Sequential Write, write requests are issued by the client in increasing key ID order, i.e., from key number to key number n. The write operation issues n commands in the WriteBack mode and then the n th command is written in Flush mode to make all the commands persistent. This series of repeated asynchronous WriteBack writes followed by a single Flush command is the correct way to use Flush, and we call this WriteBack-Flush mode. After Sequential Write, the data is read back from key number to key number n sequentially for Sequential Read. For Random TABLE IV COMPARISON OF THROUGHPUTS FOR DIFFERENT WRITE MODES Key-value pairs Flush WriteThrough MB, 5.4 (MB/sec) 8.99 (MB/sec) MB, 4.8 (MB/sec) 8.9 (MB/sec) Read, after performing Sequential Write the data is read back in random key ID order. The Figure 4 demonstrates the obtained throughput from the Kinetic Drives for Random Read and Sequential Read for up to TB of sequentially written data. Since the write process is the same for both the experiments, it is demonstrated by only one plot in Figure 4. For these experiments we vary the size of the value from 2 bytes to MB but keep the key size constant at 6 bytes. We keep the number of key-value pairs constant at,,. Thus, the x-axis can also be interpreted as value-size times one million. Random Order Read throughput is lower than Sequential Order Read because the head has to be repositioned on the platter lot more often for random order reads than sequential order reads. 2) Random Order Writes: This testing also consists of two sets of experiments. This time, the client issues write requests in random key ID order for Random Write. After the data is successfully written in the WriteBack-Flush mode (as explained in Section IV-B), key-value pairs in one experiment are sequentially read back. In the other set of experiments, data is randomly written and then read back randomly. The value size and other settings for Random Order Write tests are similar to the Sequential Order Write tests. The throughput comparisons are shown in Figure 5. Again, write is only plotted once because the results are the same for both sets of experiments. We observe that the throughputs of Random Order Writes Average Throughput (MB/sec) _Bytes Sequential_Write Sequential_Read Random_Read Fig. 4. Sequential Order Write then Random Order Read or Sequential Order Read for,, Key-Value Pairs with MB and 6 Bytes Key Size 7 Average Throughput (MB/sec) Random_Write Sequential_Read Random_Read 2_Bytes Fig. 5. Throughputs of Random Order Write then Random Order Read or Sequential Order Read for,, Key-Value Pairs with MB and 6 Bytes Key Size

5 and Sequential Order Writes are similar. We believe this trend happens because the drive does not consider the key ids of the KV pairs. Instead, it receives a write request and simply writes it to the disk. Moreover, LevelDB (installed in the drives) is supposed to sort the key-value pairs based on the ids. We expected that the throughput of random writes would be lower than the sequential write throughput. We can only conclude that the LevelDB inside the drives has been modified by Seagate. We are investigating this phenomenon by conducting similar tests on a server with LevelDB installed. C. Maximum Throughput for Different s In this test we try to find the maximum throughput that can be obtained for different value sizes. Each data point on the x-axis in Figure 6 is a different experiment. For each experiment, we vary the number of key-value pairs and perform the sequential order write operation, as explained in Section IV-B. These experiments give us the maximum write throughputs that can be extracted out of the Kinetic Drives while previous results show an average. The number of key-value pairs required to achieve the maximum throughput was also different for all the experiments. Maximum write throughput increases with value size as shown in Figure 6. This trend is likely because the CPU-bound Kinetic Drive wastes fewer resources on overhead with larger value sizes. D. Impact of Key Size on Throughput Performance Figure 7 depicts the impact on throughput because of different key sizes. The comparison is made when data is written sequentially (as explained in Section IV-B) in Writeback- Flush mode with two different key sizes, 4 KB and 6 Bytes. In this experiment, we vary the value size from 2 bytes to MB (the max value size), and,, key-value pairs are sequentially written. To write more than TB of data we have to increase the number of key-value pairs to 2,, or more with MB as the value size. Therefore, for 3.6 TB of data written, the client sends 3,6, key-value pairs of value size MB each. The better throughput for smaller keys is partly because the total amount of data transferred from the client to the Kinetic Drive, key size + value size, is smaller. When we write a million 4 KB key-sized key-value pairs, we transmit TB of data and 4 GB of data for their keys. With 6 Byte keys, there is hardly any key data sent. This performance difference is also 9 Write likely due to the already CPU-bound Kinetic drive struggling to process the larger keys. E. Multi-threaded Throughput This section presents the throughput obtained when two or more threads perform the write operation. The API allows two or more threads to perform sequential writes. The API also allows for two simultaneous sessions. The write operation for each thread is similar to the write operation described in Section IV-D. In Figure 8, it can be observed that the cumulative throughputs for multiple threads are similar to that of a single thread. This is again likely because the internal Kinetic CPU is the limiting factor and it cannot process multiple threads any faster than a single thread. The drives also have two Ethernet ports, but when we attempt to utilize both of these ports simultaneously for read or write operations, the test programs crash. We conclude that these ports are used more for fault tolerance rather than extracting parallel performance from the drives. F. Throughput for P2P Transfer This section presents the results for a unique feature called P2P transfer. The client issues a P2P Get or P2P Put command to a Kinetic Drive (KD-Init) along with the IP address of the target Kinetic Drive (KD-Target). KD-Init establishes a connection over Ethernet to KD-Target and initiates put/write or get/read operations based on the command sent from the client. After this operation is successfully completed both the drives have copies of the transferred key-value pairs. In the Table V, we show the throughput and internal Kinetic CPU load while using this feature. First, we write 52 keyvalue pairs to KD-Target. Thereafter, we performed P2P Get operation in which KD-Init copies 52 key-value pairs, each with MB value size, from KD-Target. After the operation is successfully completed, both the drives have these same keyvalue pairs. This data copying process takes place without interference from the client. The client only issues the initial instruction to begin the P2P transfer. Average Throughput (MB/sec) Write_4KB_Key Write_6Bytes_Key Average Throughput (MB/sec) KB 6KB 32KB 64KB 28KB 256KB 52KB Fig. 6. Maximum Throughput for different MB 2_Bytes Fig. 7. Total data transfer with different key-size 5 TABLE V THROUGHPUT FOR P2P TRANSFER Read No. of KV pairs Internal CPU KB.43 (MB/sec) % MB (MB/sec) %

6 Average Throughput (MB/sec) Average Throughput (MB/sec) _threads 4_threads 6_threads 2MB 256MB 52MB GB 2GB 4GB 8GB 6GB 32GB 64GB 28GB Total Data Transferred per Run Fig. 8. Aggregate Throughputs for Muiltithread Performance Write_Throughput P2P_Throughput 256GB 52GB TB 2TB 3TB 3.6TB 2_Bytes Fig. 9. Throughput Comparisons between Write Throughput of Kinetic Drives with P2P Operation for 25, KV Pairs Unfortunately, when we attempt to transfer more than.5 GB, or more than 52 key-value (KV) pairs with value-size of MB, the experiments crash. We are working with Seagate to fix this problem, and it is most likely a software bug. As with many of our other experiments, we see in Table V that the internal processor s utilization is almost % when P2P transfer is taking place. After observing the throughputs for different size key-value pairs, we conclude that the throughput obtained for this process is dependent on the value size of each key-value (KV) pair and that the utilization of the internal processor remains high with small or large value sizes. Therefore, we believe that the Kinetic CPU acts as a bottleneck and prevents throughputs from increasing beyond a certain range. Seagate is working on upgrading this processor for future Kinetic models. G. Get Previous Performance of Kinetic Drives Figure compares the read performance of reading keyvalue pairs in reverse order. Here, we test the "previous" command feature that comes with the Kinetic API. First, we perform sequential writes as described in the Section IV-D and after completion, we use the previous command of the API to read the key-value pairs in reverse order, i.e., from key number n to. As an illustration, if we give key n to the Kinetic Drive then by using this feature the Kinetic Drive returns back to us the key-value pair with the key n-. We compare this test with a custom created program that issues read commands from the nth key-value pair to the first key-value pair. It can be observed that the performance is comparable. Therefore, this feature only brings convenience to the users. It can also be concluded from the experimental data that the reverse command and the commands issued by the client are handled similarly by the Kinetic Drive. Average Throughput (MB/sec) MB KD_previous_cmd 256MB KD_reverse 52MB GB 2GB 4GB 8GB 6GB 32GB 64GB 28GB Total Data Transferred per Run Fig.. Comparison of Throughputs for Reverse Command and Custom Backwards Read; KD-Kinetic Drive, cmd-command/instruction, kd-reverseprogram created to test and compare V. THROUGHPUT COMPARISONS OF KINETIC DRIVES AND LEVELDB SERVERS This section compares the throughput of Kinetic Drives against traditional servers running LevelDB (Server-SATA and Server-SAS). We believe it is important for the readers to understand the possible implications of replacing traditional hard drives with Kinetic Drives for various workloads and disk configurations. As described in Section III, we have two different LevelDBbased server systems. Server-SATA has 2 GB of RAM and two extra 59 RPM SATA disks with similar properties to the Kinetic drives, and the faster Server-SAS has 64 GB of RAM with several 72 RPM enterprise drives. ) Sequential Order Write and Sequential Order Read for Single and Multiple Drives: In this set of tests, we compare the sequential key order throughput of the three systems (Kinetic Drives, Server-SATA, and Server-SAS) for reads and writes with various value sizes. As in the previous figures, each point on the x-axis represents different experiments. For example, 2_B in Figure represents,, KV pairs, each with value size of 2 bytes, first written in sequential key order for one experiment and then read back in sequential key order for another experiment. The total data transfer of each run is 2 MB. Similarly, 256_B represents another,, KV pairs written and then read back, each with a value size of 256 bytes (total data transfer is 256 MB per run). It is also important to note that the y-axis is in log scale, because some of the server read experiments with smaller value sizes fit mostly in the server RAM and avoid many of the slower disk accesses. For read tests, the LevelDB benchmark, db_bench.cc, first loads some data to read but does not purge any caches before measuring the reads. The most interesting observation is that the Kinetic drive can sustain higher throughput than a LevelDB server for large enough values sizes. For small value sizes, however, the servers with their large RAM size can easily outperform a 256GB 52GB TB 2TB 3TB 3.6TB

7 Kinetic drive. Overall, even ignoring the small experiments that mostly fit in server RAM, we can also conclude that read operations achieve higher throughput than writes for all of the three systems. During reads, the drives simply have to deliver the data that has already been sorted by the systems. When there are many writes, LevelDB has to do compaction and level management that introduces write amplification. We extend these experiments by adding drives, striped in software RAID-, to Server-SATA and Server-SAS as shown in Figures 2, 3, and 4. Only Server-SAS has enough disks for four and eight drive tests. Having conducted simultaneous parallel experiments on all four Kinetic disks to verify no individual performance degradation, we extrapolate the plots for multiple Kinetic Drive tests by assuming the load is balanced evenly across the independent disks and the performance will scale linearly until the Kinetic enclosureâăźs Gbps network link becomes a bottleneck. While the LevelDB servers do show throughput increases as we add more disks, their performance does not scale up as much as expected. For example, sequential order writes on Server-SAS stop improving beyond four drives. Because Server-SAS has faster and far more RAM, it shows significantly higher read throughput than the Kinetic drives or Server-SATA for small value sizes. However, Server-SATA and Server-SAS both reflect a âăijmemory cliffâăi trend in read throughputs. After the read throughputs increase continuously, they suddenly drop beyond a certain value size. As the amount of data transferred in each experiment increases with value size, the system memory cannot continue to hide slower disk accesses and throughput drops. At these points, the Kinetic Drives usually take over and start giving better throughputs than either of the servers. Average Throughput (MB/sec) in Log. KD_Seq_WR_D KD_Seq_RD_D Server_SATA_Seq_WR_D 2_B 256_B 52_B Server_SATA_Seq_RD_D Server_SAS_Seq_WR_D Server_SAS_Seq_RD_D Fig.. Sequential (Seq) Read (RD) and Write (WR) Throughput Comparison of a Kinetic Drive (KD), Server-SATA, and Server-SAS using one Drive (D) 2) Random Order Write and Random Order Read for Single and Multiple Drives: In this section, we compare throughputs when data is written and read back in a random order. Again, we use db_bench.cc, which comes with LevelDB, to perform random reads and writes on a single drive as well as multiple drives. As in the previous subsection, the y-axis is in a log scale so the server RAM results do not overwhelm the graph, and the multiple drive Kinetic results are a linear extrapolation of the single drive results in Section IV. Average Throughput (MB/sec) in Log. KD_Seq_WR_2D KD_Seq_RD_2D Server_SATA_Seq_WR_2D 2_Bytes Server_SATA_Seq_RD_2D Server_SAS_Seq_WR_2D Server_SAS_Seq_RD_2D Fig. 2. Sequential (Seq) Read (RD) and Write (WR) Throughput Comparison of Kinetic Drives (KD), Server-SATA, and Server-SAS using two drives (2D) Average Throughput (MB/sec) in Log. KD_Seq_WR_4D KD_Seq_RD_4D 2_Bytes Server-SAS_Seq_WR_4D Server-SAS_Seq_RD_4D Fig. 3. Sequential (Seq) Read (RD) and Write (WR) Throughput Comparison of Kinetic Drives (KD) and Server-SAS using four drives (4D) Average Throughput (MB/sec) in Log KD_Seq_WR_8D KD_Seq_RD_8D 2_Bytes Server-SAS_Seq_WR_8D Server-SAS_Seq_RD_8D Fig. 4. Sequential (Seq) Read (RD) and Write (WR) Throughput Comparison of Kinetic Drives (KD) and Server-SAS using eight drives (8D) Looking at Figures 5, 6, 7, and 8 we notice that overall throughputs for these random server tests are lower than sequential order reads and writes. This reduced random performance is due to the underlying spinning platter hard drive technology being unable to prefetch or batch write nearby data as it can when it reads or writes sequentially. The servers show particularly poor random write performance. In fact, as the value size increases, the random write throughput decreases to below MB/s. The larger value sizes were so slow that we were forced to abandon those tests. In addition, we realize that the system s memory size is quite critical for the server s performance. As before, however, when the requested data increases beyond what is cached in the serverâăźs RAM,

8 the high throughputs due to memory are not maintained and throughputs for read operation drop. In this section, we have shown that the Kinetic drives scale well for datasets with larger value sizes that do not fit in the serverâăźs memory. In the next section, we will discuss latency measurements from the different systems. Average Throughput (MB/sec) in Log. KD_RND_WR_D KD_RND_RD_D Server-SATA_RND_WR_D 2_B 256_B 52_B Server-SATA_RND_RD_D Server-SAS_RND_WR_D Server-SAS_RND_RD_D Fig. 5. Random (RND) Read (RD) and Write (WR) Throughput Comparison of a Kinetic Drive (KD), Server-SATA, and Server-SAS using one drive (D) Average Throughput (MB/sec) in Log. KD_RND_WR_2D KD_RND_RD_2D Server-SATA_RND_WR_2D 2_Bytes Server-SATA_RND_RD_2D Server-SAS_RND_WR_2D Server-SAS_RND_RD_2D Fig. 6. Random (RND) Read (RD) and Write (WR) Throughput Comparison of Kinetic Drives (KD), Server-SATA, and Server-SAS using two drives (2D) Average Throughput (MB/sec) in Log. KD_RND_WR_4D KD_RND_RD_4D 2_Bytes Server-SAS_RND_WR_4D Server-SAS_RND_RD_4D Fig. 7. Random (RND) Read (RD) and Write (WR) Throughput Comparison of Kinetic Drives (KD) and Server-SAS using four drives (4D) VI. LATENCY TESTS OF KINETIC DRIVES AND LEVELDB SERVERS Another important characteristic of the drives is latency. In this section, we ascertain the latency of Kinetic Drives in Average Throughput (MB/sec) in Log KD_RND_WR_8D KD_RND_RD_8D 2_Bytes Server-SAS_RND_WR_8D Server-SAS_RND_RD_8D Fig. 8. Random (RND) Read (RD) and Write (WR) Throughput Comparison of Kinetic Drives (KD) and Server-SAS using eight drives (8D) various scenarios and compare it with the LevelDB servers. The configuration for the Kinetic Drives and LevelDB servers remain the same as explained in the earlier sections. A. Effect of Randomly Written Data on Random Read Latency Here we describe the latency results obtained on the Kinetic Drives, Server-SATA, and Server-SAS. There are three different sets of tests conducted. Effect of GB and 3.7 TB of Random Order and Sequential Order Writes on Random Read Latency for Kinetic Drives Effect of GB of Random Order and Sequential Order Writes on Random Read Latency for Server-SATA Effect of GB of Random Order and Sequential Order Writes on Random Read Latency for Server-SAS ) Read Latency of Kinetic Drives Following Random and Sequential Order Writes: The first set of experiments study the impact of writing GB or 3.7 TB in a random order or sequential order on the read latency of a subsequent random set of, KV pairs, and Figure 9 presents a histogram of our findings. To obtain the experimental data, we first perform the sequential key order write operation as described in Section IV-B and then read GB of that data by randomly requesting, key-value pairs (value size of MB and key size of 6 bytes). We record the time it takes to successfully complete each read operation and put these latencies in different histogram bins to determine a count or frequency of accesses in that latency range. In a subsequent experiment, we write the GB of data in a random order (as described in Section IV-B2) and then randomly read, key-value pairs. We place these latencies as well in different bins. Before comparing these frequencies with other results, we obtain relative frequencies by dividing the number in each bin by number of key-value pairs (, in this case), as shown in Figure 9. In addition, we test the effect of the amount of data written on the latencies by repeating the same experiment with 3.7 TB of data written in a random order as well as sequential order. Referring again to Figure 9, we can observe that the read latencies are higher when the Kinetic drive is full. Reads

9 following a drive filled with random order writes show yet higher latency. Another interesting observation is that around 4-5% of the reads have a latency above ms. We can speculate that this is due to some poorly timed LevelDB compaction, but we are not certain why this happens. To rule out network latency, we tested the network ping time from the client to the Kinetic drives and found that it is usually around.3 ms. The ping is never above ms, even when the Kinetic drive is at % CPU load. The latency also depends on the distribution of the searched key-value pairs, so it is possible that a few of the randomly chosen keys are requested in an order that is particularly difficult to process. Overall, we can conclude that the amount of data written and the amount of randomness in that data affect the latencies for subsequent read operations on these Kinetic drives. From our results, we observe that a substantial number of read latencies for Server-SAS fall in the "(,2]" bin, which implies that Server-SAS tends to be faster for read operations than Server-SATA. These findings are reflective of the higher memory size of Server-SAS. Therefore, increasing the memory size of a server improves the read performance of the system. Fraction of Frequency After GB RW on Server-SATA After GB SW on Server-SATA After GB RW on Server-SAS After GB SW on Server-SAS After GB SW on Kinetic Drives After GB RW on Kinetic Drives After 3.7 TB SW on Kinetic Drives After 3.7 TB RW on Kinetic Drives [,] (,2] (2,3] (3,4] (4,5] (5,6] (6,7] (7,8] (8,9] (9,] > Latency (ms) Histogram Bins Fraction Frequency [,] (,2] (2,3] (3,4] (4,5] (5,6] (6,7] (7,8] (8,9] (9,] > Latency (ms) Histogram Bins Fig. 9. GB Random Read after Varied Writes on Kinetic Drives 2) Comparison of Latency Tests between Kinetic Drives, Server-SATA, and Server-SAS: In our study of Kinetic Drives, we consider it important to compare the performance of Kinetic Drives with the servers with LevelDB installed. Figure 2 presents our findings for running the LevelDB benchmarking script on Server-SATA and Server-SAS. Overall we performed 4 set of operations on two servers and the order of the writes and read is defined by the script itself. GB Random Write followed by GB Random Read on Server-SATA GB Sequential Write followed by GB Random Read on Server-SATA GB Random Write followed by GB Random Read on Server-SAS GB Sequential Write and GB Random Read on Server-SAS For the first experiment, we wrote GB of data in random order (key-size of 6 bytes and MB value size) on Server- SATA. This data is written on a single drive of the Server- SATA. Afterwards, we read, KV pairs in the random order to record the read latency of each KV pair and bin these latencies after finding fraction of these latencies (similar to as explained in the Section VI-A). Furthermore, we perform the above mentioned 4 experiments and compare them in the Figure 2. Fig. 2. GB Random Read after Varied Writes on Server-SATA and Server-SAS Moreover, we wanted to better understand the results by studying the trends in the read latencies. Therefore, we obtain mean of the latencies and use those to find 99% confidence intervals. Figures 2, 22, 23, and 24 We calculated mean and confidence interval based on the book by Lilja et.al. [6]. We used the latency data obtained to calculate mean and the confidence interval. For calculating mean, we sampled every data points which represents the latencies of reading KV pairs. We employed this mean to calculate the 99% confidence interval. We plotted data points from, KV pairs. We did similar findings for Server-SAS and Server-SATA. The pattern demonstrates us a transition in the latencies of all the plotted graphs. Overall, these transition depends on the randomness of the data. Mean of Latency for Read Operation KV Pair Number * ^3 mean Confidence Interval Fig % Confidence Interval for Reading, KV Pairs in Random Order After Writing GB in Random Order with MB and Key Size 6 Bytes on Server-SATA Based on our findings it could be concluded that there are significant differences between the Kinetic Drive s LevelDB and the LevelDB based servers. Overall, the latencies for the Server-SATA and Server-SAS are much higher than the latencies for the Kinetic Drives

10 Mean of Latency for Read Operation KV Pair Number * ^3 mean Confidence Interval the obtained latencies for 2 drives, 4 drives, and 8 drives on Server-SAS respectively. We noticed that because of multiple drives the overall latencies for read operations reduce. Fraction of Frequency After GB RW 2 Drives After GB RW 4 Drives After GB RW 8 Drives Fig % Confidence Interval for Reading, KV Pairs in Random Order After Writing GB in Sequential Order with MB and Key Size of 6 Bytes on Server-SATA Mean of Latency for Read Operation KV Pair Number * ^3 mean Confidence Interval Fig % Confidence Interval for Reading, KV Pairs in Random Order After Writing GB in Random Order with MB and Key Size of 6 Bytes on Server-SAS Mean of Latency for Read Operation KV Pair Number * ^ mean Confidence Interval Fig % Confidence Interval for Reading, KV Pairs in Random Order After Writing GB in Sequential Order with MB and Key Size of 6 Bytes on Server-SAS 3) Latency Tests for Server-SAS with Multiple Drives: In the previous section, we discussed latency tests that were conducted on single drives. In this section we discuss the latency findings for LevelDB based servers with multiple drives. To setup the system for multiple drives we used RAID. For these experiments as well we used the LevelDB based benchmarking script and then wrote GB in a random order. The key size was 6 bytes and value size being MB. Afterwards,, KV pairs are read back and latency is plotted for all the key-value pairs. Figure 25 demonstrates [,] (,2] (2,3] (3,4] (4,5] (5,6] (6,7] (7,8] (8,9] (9,] > Latency (ms) Histogram Bins Fig. 25. GB Random Read after Varied Writes on Server-SAS with Multiple Drives VII. KEYS NOT FOUND - TRENDS FOR KINETIC DRIVE AND LEVELDB BASED SERVERS In this section, we study the time it takes when the searched keys are "not found" in a Kinetic Drive or in a server. This aspect is important when large number of drives are used to create a system. There is a strong possibility that users search for Key-Value pairs and may not find them in a particular Kinetic Drive. Therefore, the users will have to search other Kinetic drives. The time it takes to perform the operation of searching key-value pairs in a particular server or Kinetic drive can affect the entire system s performance. A. Time for Search Operation When Keys Are Not Found in a Kinetic Drive ) With GB Data Written: For this test, we performed the write operation in the sequential order, as explained in the Section IV-B, and wrote GB of data in the form of, KV pairs with each value size of MB. Following that, we requested, key-value pairs from a Kinetic Drive in the sequential order. We ensured that the requested keys do not exist in the Kinetic Drive. We record the time it takes, for each request, for the Kinetic drive to return back the status that the requested key is not found in the drive. We plot the time against the total number of requests made in Figure 26. To find out if the drives treat random order search differently from sequential order search, we performed the above mentioned experiment again. However, this time we requested the, KV pairs in the random order after writing GB data in the sequential order. We plotted the time it takes to serve each request, when keys are not found, and plot it against the KV pairs requested as shown in the Figure 27. Based on our experience from the previous latency based experiments, we attempted to introduce more randomness in the process. We wrote GB of data in the random order and tried to search for keys in the random order as well. Our observation is given in the Figure 28.

11 Time to Read Key-Value Pair (seconds) Fig. 26. Keys Not Found - k KV Pairs Searched in Sequential Order With GB Data Written in Sequential Order Time to Read Key-Value Pair Fig. 27. Keys Not Found - k KV Pairs Searched in Random Order With GB of Written Data in Sequential Order Latency (seconds) Fig. 28. Keys Not Found - k KV Pairs Searched in Random Order With GB of Written Data in Random Order We found a very interesting pattern in the Figures 26, 27, and 28. All the figures depict a very similar and regular pattern. This data made us conclude that the Kinetic Drives are searching an index and returning the timings. This entire index should be fitting inside the memory of Kinetic Drives to yield very similar patterns. 2) With 3.7 TB Data Written: In this section we will discuss the conducted experiments which are very similar to the experiments in the previous subsection. However, for these experiments, we completely filled the drive to 3.7 TB of data. This was done to observe the impact of filling the drive completely. For the first experiment, we wrote 3.7 TB of data in the sequential order in the form of 3,7, KV pairs with keys of 6 bytes each and value size of MB. All the generated keys were unique and "kinetic-c-util" generated MB values. Figure 29 presents our findings for the time it takes to search, keys and report them to be not found. Time to Read Key-Value Pair (seconds) Fig. 29. Keys Not Found - k KV Pairs Searched in Sequential Order With 3.7 TB of Data Written in Sequential Order Following this we performed the same operation however, we requested KV pairs in the random order and the observations are presented in the Figure 3. None of these KV pairs or keys are located in a Kinetic Drive. We realized that the pattern obtained in all the cases was very similar. Therefore, it could not be confirmed that searching for unavailable keys is independent of the order in which data is written however, the order has bare minimum effect on the time it takes to search for keys that are not found. From these 4 experiments, it could be concluded that the time spent on fulfilling a request for each key not found is extremely less as compared to performing a successful read operation in the Kinetic Drive. Perhaps, the index which records a list of all the KV pairs in the drive could easily fit inside the memory and thus takes minimal time. Time to Read Key-Value Pair (seconds) Fig. 3. Keys Not Found - k KV Pairs Searched in Random Order With 3.7 TB of Data Written in Sequential Order B. Effect of Storing KV Pairs Before Reading Back When read and write operations are performed it is likely that some of the KV pairs remain in the memory of the Kinetic Drives. Therefore, if we read back the data immediately

12 . Time to Read Key-Value Pair (seconds).8.6 It is interesting to conclude from these two Figures that the latencies substantially reduce when we have multiple operations performed (Figure 33). This trend is obtained primarily because after repeating several operations mem_table is brought in the memory of the Server-SATA and subsequently it becomes much faster for the server to search for keys that are missing in the drive. Hence, the latencies to search for missing keys reduce substantially after repeating several operations..6 Time to Read Key-Value Pair (seconds) after writing then the Kinetic Drives might give us higher throughput or our searches might be faster. Therefore, we wrote 3.7 TB of data to the Kinetic Drive in a random order and then kept the drive idle for a week. After waiting for about a week, we attempted to search, KV pairs that could not be found in a drive. We wanted to find out the pattern for accessing, keys that are not stored in a drive. We plotted the time it takes to search for each key in Figure 3. The pattern we obtained is quite different from the previously obtained data (shown in the Figure 3). This is primarily because all the data must have been flushed to the drive and Kinetic Drive had to fetch the data back from the persistent drive Fig. 32. Time for Searching Keys Which Are Not Found In The LevelDB Installed Server-SATA with Write and Read Operations Fig. 3. Keys Not Found -, KV Pairs Searched in Random Order After Waiting for 7 Days, With 3.7 TB of Data Written in Random Order C. Impact of LevelDB Based Server-SATA s System Memory On Performance When Keys Not Found In this section, we tried to ascertain the performance of a LevelDB installed Server-SATA. We performed a similar test as mentioned above in this section. However, to test the effect of Server-SATA s memory on the searching of unavailable keys, we attempted two different sequences of read and write operations as given below: ) Random Write GB, Search for Missing Keys (Figure 32) 2) Random Write GB, Random Read GB, Sequential Read GB, Search for Missing Keys (Figure 33) For the first sequence, we simply performed the write operation by writing, key-value pairs of MB value each. Thereafter, we immediately performed the searching of, missing keys and obtained Figure 32. For the second sequence, we performed the write operation by writing, key-value pairs of MB value each by using the script that is provided by Google, Inc. for testing servers. After successful completion, we performed random order read operation of all the stored key-value pairs. We then read all GB data in a sequential order, as well. After following these operations, we then searched for keys that are not stored in the Server-SATA. This was intentionally done to see the impact of the repeated operations on the latencies of searching for missing key-value pairs. Time to Read Key-Value Pair (seconds) Fig. 33. Time for Searching Keys Which Are Not Found In The LevelDB Installed Server-SATA D. Keys Not Found on LevelDB Installed Server-SAS With Multiple Drives In this section, we would discuss the performance implications when the requested keys are not available in a LevelDB based Server-SAS. We conducted these tests on multiple drives. As an illustration, we tested it on Server-SAS system with drive, 2 drives, and 4 drives. We ran the script that comes with LevelDB (db_bench). This script is helpful in benchmarking the system it runs on. ) Drive Tests: For the first experiment, we used db_bench to write, KV pairs with MB value size and 6 bytes of key size. We used only drive for this and afterwards, we requested keys which are not stored in the drives of the system. The findings of this experiment are presented in the Figure 34. We performed a similar test as mentioned above however, this time we wrote GB of data in sequential order using db_bench. The findings are presented in the Figure 35. It could

PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees

PebblesDB: Building Key-Value Stores using Fragmented Log Structured Merge Trees Pandian Raju 1, Rohan Kadekodi 1, Vijay Chidambaram 1,2, Ittai Abraham 2 1 The University of Texas at Austin 2 VMware Research