Analysis of HBase Read/Write

Size: px

Start display at page:

Download "Analysis of HBase Read/Write"

Damon Cummings
5 years ago
Views:

1 Analysis of HBase Read/Write Arvind Dwarakanath School of Informatics and Computing, Indiana University Vaibhav Nachankar School of Informatics and Computing, Indiana University ABSTRACT As scientific computing problems become more and more data intensive, many technologies and systems have been developed to efficiently store and serve terabytes or even petabytes of data. One example is the HBase system developed by Yahoo!, which is an open source implementation of the BigTable system originated from Google. HBase supports reliable storage and efficient access of billions of rows of semi-structured data. On the other hand, the Apache Cassandra developed by Facebook, which can be describe as BigTable data model running on an Amazon Dynamo-like infrastructure. Cassandra is designed to handle very large amount of data spread out across many commodity servers while providing a highly available service with no single point of failure. The main objective of the project is to evaluate the performance of Hbase and compare it with Cassandra. The sequence of the work was to study any existing documentation (like YCSB) of Hbase Benchmarks and using the various strategies draw our own evaluation strategies to effectively evaluate Hbase and Cassandra. 2. TECHNOLOGY SURVEY 2.1 HBase Hbase is an open-source, distributed, column-oriented sorted map data store modeled after Google s BigTable. It runs on top of HDFS providing BigTable-like capabilities for Hadoop. It is useful when fault-tolerant, random, real-time read/write access to data stored in HDFS is required. HBase runs on top of HDFS. General Terms Performance, Design, Timeline. Keywords HBase, Cassandra, HPC cluster, wordcount, Brisk. 1. INTRODUCTION The concept of NoSQL differs from the standard Relational Database. The problems of the relational databases included the inability to work on data-intensive applications and indexing of large number of files/documents. Many NoSQL systems have been developed in order to cater to the above requirements. Many of the more popular NoSQL databases have of late been distributed in nature. This type of structure means redundant storage of data on many servers. The storing occurs using a distributed hash table. Generally in a distributed hash table, the data is stored and a keyspace is evaluated using a hash function. The hashing is done using a SHA-1 hash. The data is traversed and then stored in a node that is responsible for that keyspace. A keyspace partitioning scheme splits ownership of this keyspace among the participating nodes. An overlay network then connects the nodes, allowing them to find the owner of any given key in the keyspace, Two very popular versions of NoSQL database using the concept of keyspace are Hbase and Apache Cassandra; the topics of our project. In addition to using the database normally, we also used the map-reduce framework to see their effect on performance. Figure 1. An example of the BigTable data model []. HBase runs on top of HDFS and Figure 2 shows its architecture. Tables are horizontally split into regions, and regions are assigned to different region servers by the HBase master. Regions are further vertically divided into stores by column families, and stores are saved as store files in HDFS. Data replication in HDFS ensures high availability of HBase table data. During the runtime operations of the whole HBase system, the ZooKeeper is used to coordinate the activities of the master and region servers, and save a small amount of system metadata. Data Model of Hbase To put it simply, HBase can be reduced to a Map<byte[], Map<byte[], Map<byte[], Map<Long, byte[]>>>>. The first Map maps row keys to their column families. The second maps column families to their column keys. The third one maps column keys to their timestamps. Finally, the last one maps the timestamps to a single value. The keys are typically strings, the timestamp is a long and the value is an uninterpreted array of bytes. The column key is always preceded by its family and is represented like this: family:key. Since a family maps to another map, this means that a single column family can contain a theoretical infinity of column keys. So, to retrieve a single value, the user has to do a get using three keys: row key+column key+timestamp -> value

Rows: The row key is treated by HBase as an array of bytes but it must have a string representation. A special property of the row key Map is that it keeps them in a lexicographical order.

2 Rows: The row key is treated by HBase as an array of bytes but it must have a string representation. A special property of the row key Map is that it keeps them in a lexicographical order. For example, numbers going from 1 to 100 will be ordered like this: 1,10,100,11,12,13,14,15,16,17,18,19,2,20,21,...,9,91,92,93,9 4,95,96,97,98,99 To keep the integers natural ordering, the row keys have to be left-padded with zeros. To take advantage of this, the functionalities of the row key Map are augmented by offering a scanner which takes a start row key (if not specified, the first one in the table) and an stop row key (if not specified, the last one in the table). For example, if the row keys are dates in the format YYYYMMDD, getting the month of July 2008 is a matter of opening a scanner from to It does not matter if the specified row keys are existing or not, the only thing to keep in mind is that the stop row key will not be returned which is why the first of August is given to the scanner. than one HDFS block or may only contain part of a record if the record is longer than a HDFS block. - Timestamps: Max number: the maximum number of different versions a value has. - Time to live: versions older than specified time will be garbage collected. - Block Cache caches blocks fetched from HDFS in a LRU-style queue. Improves random read performances and is a nice feature while waiting for full in-memory storage Column Families: A column family regroups data of a same nature in HBase and has no constraint on the type. The families are part of the table schema and stay the same for each row; what differs from rows to rows is that the column keys can be very sparse. For example, row " " may have in its "info:" family the following column keys: info:aaa info:bbb info:ccc While row " " only has: info:12342 Developers have to be very careful when using column keys since a key with a length of zero is permitted which means that in the previous example data can be inserted in column key "info:". We strongly suggest using empty column keys only when no other keys will be specified. Also, since the data in a family has the same nature, many attributes can be specified regarding performance and timestamps. Timestamps: The values in HBase may have multiple versions kept according to the family configuration. By default, HBase sets the timestamp to each new value to current time in milliseconds and returns the latest version when a cell is retrieved. The developer can also provide its own timestamps when inserting data as he can specify a certain timestamp when fetching it. Family Attributes: The following attributes can be specified or each families:- - Compression Record: means that each exact values found at rowkey+columnkey+timestamp value will be then compressed independently. - Block: means that blocks in HDFS are compressed. A block may contain multiple records if they are shorter Figure 1. HBase architecture. 2.2 Cassandra Cassandra is an open source distributed database management system. It is an Apache Software Foundation toplevel project designed to handle very large amounts of data spread out across many commodity servers while providing a highly available service with no single point of failure. It is a NoSQL system that was initially developed by Facebook and it powers their Inbox Search feature. A standalone test version of Twitter called Twissandra has also been created as demonstration. The basic fundamental of Cassandra is that it is a columnar database or rather a column-oriented distributed database. The data is stored in the form of columns and it is uniquely marked using 'keyspace'. It can be classified as a 'Cloud Db' similar to HBase. For instance: usrs['adwaraka'] will indicate a column family of users. In it, there will be an identifier adwaraka. In usrs, we can further add usrs[adwaraka][fname], usrs[adwaraka][lname] and usrs[adwaraka][gender]. The data model for the Cassandra NoSQL is as follows:- Column and Column Family: As mentioned before, the data model is columnar in nature. The column is the base of Cassandra data model. The column is the lowest and smallest increment of data. It s a tupple (triplet) that contains a name, a value and a timestamp. Here s a column represented in JSON notation: For the usr[adwaraka] { fname: "Arvind", lname: "Dwarakanath",

3 } gender: Male A column family resembles a table in an RDBMS. Column families contain rows and columns. Each row is uniquely identified by a row key. Each row has multiple columns, each of which has a name, value, and a timestamp. Unlike a table in an RDBMS, different rows in the same column family do not have to share the same set of columns, and a column may be added to one or multiple rows at any time. It can be useful to distinguish between static column families that contain values such as user data or other object data, and dynamic column families that contain data such as precalculated query results. Keyspaces: Keyspaces group column families together. Typically, there will be one Keyspace for each application that uses a Cassandra cluster. The most important settings that are defined at the keyspace level are the replication factor and the replica placement strategy. Thus, if you have sets of data that have different requirements for these settings (such as different levels of fault-tolerance), these sets of data should reside in different keyspaces. A keyspace is to be set before any client API like thrift has to be fired. On the Cassandra CLI, use the 'use <keyspace name>' to select the required keyspace. The command goes like this use keyspace Keyspace1; Super Columns: Super Columns are a type of super structure of columns. Super columns are way to group multiple columns. Every super column must have a different name, just like with regular columns. Different super columns may hold sub columns with the same name. Super columns are a way to add an extra map layer to the data model. Super columns are frequently used to hold a single record where each field in the record is represented by a sub column. For example, the name of a super column might be the ID of a transaction and each sub-column could hold some attribute of the transaction. For example, if a transactions row like the one describe had two entries, it might look like: { trans-a: { date: 01/02/2010, amount: 5000 timespace: <value1> }, trans-b: { date: 01/03/2010, amount: 4500 timespace: <value2> } } Major Client Libraries for Cassandra: Thrift Thrift is a software framework that allows for scalable crossprogramming development. In this context, Thrift is the name of the RPC client used to communicate with the Cassandra server. It statically generates an interface for serialization in a variety of languages, including C++, Java, Python, PHP, Perl, C# to name a few. It is this mechanism that allows you to interact with Cassandra from any of these client languages. Some other clients that are used include Hector (using Java), Pycassa (using Python), phpcasssa (PHP), Ruby (Cassandra) etc. The libraries are available at github website. 3. ARCHITECTURE DESIGN The main objective of the project is to evaluate the performance of Hbase and compare it with Cassandra; as mentioned in the abstract. To do so, we studied a set of benchmark techniques already studied and implemented on Hbase. Two of them were primarily Kareem Dana s work and one was by D. Carstoiu. Dana s paper used sort, interspersed read/write to benchmark. Carstoiu s paper was a comparison of Hadoop/Hbase with previous versions and along with BigTable. Another useful benchmarking tool which we obtained and studied was YCSB: Yahoo Cloud Storage Benchmark. A few details we introduce in the following; this was one of the ideas which we used in our study; to build a benchmark suite of codes and then analyze the performance. YCSB is centered around analyzing No- SQL database- primarily Hbase, Cassandra, Voldemort and so on. This is a frequently updated Github Code page and is used/forked by many users to benchmark No-SQL databases. YCSB in Detail The goal of the YCSB project is to develop a framework and common set of workloads for evaluating the performance of different key-value and cloud serving stores. The project comprises two things: 1) The YCSB Client, an extensible workload generator 2) The Core workloads, a set of workload scenarios to be executed by the generator Although the core workloads provide a well rounded picture of a system s performance, the Client is extensible so that you can define new and different workloads to examine system aspects, or application scenarios, not adequately covered by the core workload. Similarly, the Client is extensible to support benchmarking different databases. Although we include sample code for benchmarking HBase, Cassandra, Infinispan and MongoDB, it is straightforward to write a new interface layer to benchmark a database and test it using a workload file. A common use of the tool is to benchmark multiple systems and compare them. For example, you can install multiple systems on the same hardware configuration, and run the same workloads against each system. Then you can plot the performance of each system (for example, as latency versus throughput curves) to see when one system does better than another. Example of a Work Load File:- # Yahoo! Cloud System Benchmark # Workload A: Update heavy workload # Application example: Session store recording recent actions # Read/update ratio: 50/50# Default data size: 1 KB records (10 fields, 100 bytes each, plus key) # Request distribution: zipfian recordcount=1000 operationcount=1000 workload=com.yahoo.ycsb.workloads.coreworklo ad readallfields=true

4 readproportion=0.5 updateproportion=0.5 scanproportion=0 insertproportion=0 To emulate YCSB becnhmark, we need to have a bench marking program. So the program selected for the same is the Word count Algorithm. The Word Count is a favorite of the Hadoop Introductory course and is synonymous with Hello World of a language. In general, Word Count parses the input files and returns essentially two values the word that is counted and the number of times it has repeated itself.we decided to use the Word Count and the output will be stored in columns in the Cloud Db and therefore used to compare the performances. We would compare the read and the write operations. We used the Hadoop and Hbase to evaluate the performance of Hbase. For Cassandra, we decided to use Brisk that is available on downloads via DataStax. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 4. IMPLEMENTATION DETAILS, VALIDATION AND PERFORMANCE Time line The main challenge of the project was to install the softwares. The documentation on installation is very rare and hence we had a difficult time installing the databases. Brisk has even less documentation and the most pressing problem of Brisk is that we need root access user to get Brisk up and running. So the analysis was focused more on Hbase. The timeline was as follows:- Get familiar with HBase and do a broad study of what HBase and Cassandra benchmarking techniques are. (1 week) Complete the setup for Hbase and Cassandra (2 weeks). Do a read/write analysis (3 weeks). The codes which we used were the simple single read/ write and scan. The more complex example was the word count example. The code would count all the occurrences of a word in a text and store the word and the number of times it appears in the text in the Hbase database. The correctness of the answer woul be checked by logging onto the Hbase shell by using the command $hbase shell and seeing the contents of the wordcount table. 5. PERFORMANCE OF Word Count EXAMPLE The performance was tested on 2 cores/ 2 nodes and then 2 cores /3 nodes each. Also we varied the number of mappers/reducers each to check their performance. We concluded that the writes were faster than reads for Hbase. Additionally we also took performance graphs for single reads/write/scans 6. PERFORMANCE GRAPHS 2 nodes/ 2 cores readings Time in secs 49 1 mapper/ 3 reducer mapper/ 3 reducer Number of readings

5 Time in secs 2 mapper/ 3 reducers mapper/ 3 reducers Number of readings Time in secs 2 mapper/ 3 reducers mapper/ 3 reducers Number of readings

6 Single Read/Write/ Scan (3 values) get scan put cores and 3 nodes Time in sec 37 1 mapper/ 3 reducers mapper/ 3 reducers

7 Time in sec 2 mappers/ 2 reducers mappers/ 2 reducers Number of readings 2 mappers/ 3 reducers mappers/ 3 reducers

8 Single Read/Write/ Scan (3 values) get scan put REFERENCES [1] Hadoop Hbase Performance Evaluation by D. Carstoiu, A. Cernian, A. Olteanu. University of Bucharest; Ding, W. and Marchionini, G A Study on Video Browsing Strategies. Technical Report. University of Maryland at College Park. [2] Hadoop Hbase Performance Evaluation by Kareem Dana at Duke University. It shows a varied set of test cases for executions to test HBase. [4] Benchmarking Cloud Serving Systems with YCSB by Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears 8. FUTURE WORK We would like to see the same codes run on Cassandra; especially Brisk. Brisk seemed better in our study because Hadoop doesn t have to be separately installed. And even newer benchmarks can be added t o the existing codes to study and do more analysis. The code is available at git@github.com:adwaraka/hbase-hadoop.git. [3] Cassandra Structured Storage System over a P2P Network by Avinash Lakshman, Prashant Malik 9. ACKNOWLEDGEMENTS We would like to thank Professor Judy Qiu and AI Stephen for their help in the project. Also special thanks to the class for their discussions and other help in understanding the project.

Benchmarking Cloud Serving Systems with YCSB 詹剑锋 2012 年 6 月 27 日

Benchmarking Cloud Serving Systems with YCSB 詹剑锋 2012 年 6 月 27 日 Motivation There are many cloud DB and nosql systems out there PNUTS BigTable HBase, Hypertable, HTable Megastore Azure Cassandra Amazon