Scalable Storage: The drive for web-scale data management

Scalable Storage: The drive for web-scale data management Bryan Rosander University of Central Florida bryanrosander@gmail.com March 28, 2012 Abstract Data-intensive applications have become prevalent in todays information economy. The sheer amount of data stored and utilized by todays web services presents unique challenges in the areas of scalability, security, and availability. This has opened new possibilities in data mining, allowing for more tightly integrated, informative services. It has also created new challenges. Traditional, monolithic, relational databases are inherently limited in terms of scalability. This has caused many leading companies to abandon traditional databases in favor of horizontally scalable data stores. This paper will evaluate the state of the art in data storage and retrieval, covering the history of the database and moving on to newer database technologies such as Googles Bigtable, Apache Cassandra, and Amazons DynamoDB. 1 Introduction Data storage and retrieval has become a central part of many popular web applications. As the amount of data available increases, the database capacity must scale up to meet it. Traditional methods of scaling up database capacity focus mainly on increasing the computing power of the single server on which the database resides. This strategy has been sufficient for many applications but has become infeasible for those that need to store more data than can be efficiently processed by one machine. Newer database paradigms emphasizing horizontal scalability, the ability to add as many nodes as are necessary and redistribute the data between all active nodes, have been growing in popularity. This increase in scalability does come at a cost. Many features that developers take for granted arent feasible on a horizontally scalable platform. For example, Googles Bigtable doesnt support many traditional querying operations. (e.g. joins) This means that substantial changes must be made to an application switching from a traditional relational database. Another problem is that SQL as a standard meant that most databases behaved in more or less the same way. Newer technologies eschewing SQL have been categorized as NoSQL. (sometimes expanded to Not Only SQL) There is no set standard for what they support. This puts a lot of pressure on the 1

developer to make the right decision as the cost of switching from one NoSQL platform to another requires much more effort than a transition from one SQL database to another. This paper aims to provide enough information to make an intelligent decision on which technology is right for a given application as well as what the tradeoffs between scalability and ease of use have been made. 2 History of DBMS The data base management system (DBMS) specifications were published in the CO- DASYL Data Base Task Group s 1971 report [40]. The first DBMS systems relied on tree-structured files and network models of data. These systems required applications to depend on the underlying structures, resulting in fragile applications that depended on artifacts of how data was stored rather than what that data was. This data dependence manifested itself in applications reliance on the existence of indexes (which were specified by name in application code) and in the order in which collections were persisted to disk. The desire for data independence gave rise to the idea of relational databases. The goal of the relational database was to increase the proportion of data representation characteristics that could be changed without logically impairing some application programs. Relational databases made data normalization feasible. Normalization is the decomposition of all nonsimple domains into multiple simple domains. This has several advantages including deduplication of data, easier consistency checking, and aggregation. [28] Before the relational database, procedural data manipulation languages were used to retrieve data. This meant that the user had to manually navigate the data structures in order to retrieve the desired data. Relational databases opened up the possibility of declarative data manipulation languages. Declarative languages allow the user to specify the results they are interested in and use the DBMS to translate the declarative query into the procedure for retrieving the data. The development of SQL, which is based on relational calculus, lead to the a de facto standardization of the database industry. [37] Modern relational databases provide many features that facilitate processing data while maintaining consistency. These consistency constraints can be summed up as atomicity, consistency, isolation, and durability (ACID). [33] ACID properties make it very easy to develop applications that won t leave the database in an inconsistent state. Unfortunately, enforcing these properties comes with quite a bit of overhead, limit concurrent operations by definition, and are not conducive to scaling horizontally. Scaling horizontally has become a necessity for processing the amounts of data that many of today s Web 2.0 companies need to process. Scaling vertically is more expensive than adding more nodes and is fundamentally limited by the current state of the art in processors, memory, storage capability, and network capacity. This has led companies to increasingly abandon ACID and SQL in favor of more scalable technologies, collectively grouped under the NoSQL flag. These NoSQL technologies are all different but most emphasize BASE (basically available, soft state, eventually consistent) [38, 41] which is much more conducive to performance but sacrifices much of 2

the precision of ACID. 3 Traditional Databases 3.1 Microsoft SQL Server Microsoft SQL Server was originally developed in coordination with Sybase, Inc. under the understanding that Microsoft would have exclusive rights to the DataServer product for OS/2 and all other Microsoft-developed operating systems. [30] Version 1.0 shipped in 1989 and 1.1 shipped in 1990. In 1994, after Microsoft shipped Microsoft SQL Server 4.2 for Windows NT, Microsoft and Sybase ended joint development and Microsoft SQL Server became a wholly Microsoft product [30]. There are three normal versions of Microsoft SQL Server 2012. Their Standard, Business Intelligence, and Enterprise versions all offer the same basic functionality but the more advanced versions offer more in the way of database management tools. The Enterprise version also includes features such as Multi-site and Geo-Clustering [21]. There is also a cloud-based version that Microsoft provides called SQL Azure. SQL Azure provides traditional SQL database access as a service billed monthly. Microsoft has also implemented a way to scale these databases horizontally using what they call Federations [23]. Utilizing federations adds to the complexity of application development as non-federated tables cannot have foreign key relationships with a federated table and columns cannot be guaranteed to be unique across federations [35]. 3.2 Oracle 11g Oracle database is an established enterprise DBMS provider with product licenses ranging from $47,500 per processer down to a free entry level version [13]. Oracle s scalability packages revolve around clusters which are configured manually [11]. Oracle s Relational database is geared at more traditional data sets. To handle Big Data, they have released their own NoSQL Database that purports to scale horizontally while still supporting ACID transactions [12]. They also have their own toolchain for processing Big Data [9]. While Oracle is an established name with a solid reputation for performant, scalable products, their pricing on scalable solutions is prohibitive to non-enterprise applications [13]. 3.3 PostgreSQL PostgreSQL was originally designed as a successor to the INGRES DBMS. It was to support complex objects, allow for user extensibility of types, operators, and access methods as well as many other improvements with minimal changes to the relational model [39]. It is a free and open source (FOSS) database that is fully ACID compliant, has full support for foreign keys, joins, views, triggers, and stored procedures (in multiple languages). [1] 3

3.4 MySQL MySQL is the traditional database component the LAMP (Linux, Apache, MySQL, Perl, Python, or PHP) open source web application stack [31]. MySQL was acquired by Oracle as part of their acquisition of Sun Microsystems in 2010 [10]. Since the acquisition, Oracle has been adding to Sun s commercially licensed side of MySQL which has threatened to alienate their installed user base [32], many of whom weren t happy about the initial acquisition [43]. MySQL supports user specification of Storage Engine at a table level [8] which allows users to optimize individual tables. One particular optimization supported by InnoDB, MySQL s default storage engine, is that it is able to group commits so that there is only a single write to the log file, increasing write throughput [7]. 4 Google s Bigtable Google developed the specification for a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. [25] They use it in-house for many of their services such as their web index, Google Earth, Google Analytics, etc. They also make it available as a service to AppEngine users via their datastore API [3]. Because Google published the specification [25], there have been open source implementations, most notably Apache HBase. HBase utilizes the Hadoop Core in contrast to the Google File System [4]. Many big web companies have started using HBase, including Facebook which uses it to power their Messages infrastructure. [5, 36] Bigtable doesn t support the traditional relational data model. It provides a simpler model and treats data as uninterpreted strings. [25] Bigtable is essentially a sparse map distributed across all nodes with a complex key made up of a row identifier, column identifier, and timestamp that maps to a string value. Rows are ordered lexicographically by row key. This means that applications can use similar row keys for data that is likely to be accessed sequentially in order take advantage of locality. Bigtable groups columns into column families which should be of the same type for data compression purposes [25]. A table should have a small number of distinct column families but there isn t a limit on the number of columns a table can contain. The lack of joins requires significant rethinking of application design for those accustomed to relational databases. This leads to denormalization as well as the storing of dependent objects (or keys to locate them) on the parent objects. The payoff for this is massive scalability that is transparent to the application. Bigtable does support atomic transactions but only across the same row key [25]. 5 Amazon DynamoDB Dynamo came about because of Amazon s need for reliability concerning their massive dataset [29]. Amazon s service run on top of tens of thousands of servers. With this number of machines, hardware failure is a constant reality. In order to deal with this, they developed their initial proprietary NoSQL solution. 4

Amazon s business model revolves around a Service Oriented Architecture [22]. While Dynamo did offer a fast and reliable NoSQL option, Amazon departments were slow to pick it up due to their hesitance to manage their own databases [42]. As a result, Amazon is now offering a managed version as a service. This combines benefits like transparent scaling and high availability with the ease of development that comes from a managed service. The main drawbacks of DynamoDB are that it doesn t support complex relational queries (e.g. joins) or complex transactions [14] and that it is completely proprietary, subjecting users to vendor lock in. 6 Apache Cassandra Cassandra was initially developed by Facebook to allow for inbox searching and later open sourced and turned into a top level Apache project [6]. Cassandra aims to provide the best features of both Amazon s Dynamo DB and Google s Bigtable [2]. Incorporating Dynamo s eventual consistency [24] with Bigtable s Column Family data model. Netflix has migrated to Cassandra on Amazon Web Services from Oracle [26] in large part because Cassandra s performance scales linearly with added nodes [27]. As of October 2011, the largest Cassandra production cluster is run on more than 300 servers and contains more than 300 TB of data. [34] Cassandra supports several different consistency levels but does not include specific support for transactions. 7 CouchDB CouchDB is a document database, meaning that it stores objects made up of named fields. It supports a RESTful (Representational State Transfer) JSON (JavaScript Object Notation) API which allows users to utilize any language capable of making http requests. [15] The JSON format of the API also facilitates use by JavaScript, opening up interesting use cases like allowing the user s web browser to request needed information directly from the database rather than through another server. CouchDB supports ACID properties on single document updates and utilizes Multi- Version Concurrency Control, a concurrency model in which each client sees a consistent view of the database for the duration of a read operation. CouchDB s data model is always in a persistent state on disk, meaning that there isn t a concept of shutting down the database, the process is simply terminated at any time. [18] CouchDB has advanced support for bi-directional replication, allowing users and servers to access and update the same shared data while disconnected and then bidirectionally replicate those changes later. [18] This helps with some scaling and distribution problems but horizontal scaling isn t supported [17], meaning that scaling to the levels of data that the other systems are designed for will require more effort and other products build on top of CouchDB. 5

8 MongoDB MongoDB is a horizontally scalable document database [17]. It supports dynamic queries across both indexed and unindexed data and atomic operations on individual documents [16]. It utilizes BSON (Binary JSON) in order to support mapping to modern object-oriented languages without a complicated ORM (Object-relational mapping) layer. [19] The goal of MongoDB is to bridge the gap between key-value stores (which are fast and scalable) and relational databases (which have rich functionality. [19] MongoDB stores data in JSON-like documents with dynamic schemas, providing flexibility during the development process. [20] This allows users to change application functionality without explicitly modifying their database schema. They can then use performance metrics to optimize operations when needed by adding indexes, etc. While MongoDB is more scalable than CouchDB, it utilizes language-specific drivers. This gives a performance boost but limits flexibility [17]. It also doesn t provide the more advanced bi-directional replication features of CouchDB. 9 Conclusion Relational databases still provide the best solution for a number of use-cases. Their integrity constraints are absent in all surveyed NoSQL alternatives. Their SQL interface is almost identical across implementations and is well known to developers. ACID transactions are very useful when developing and greatly simplify error handling their two main disadvantages is that they don t horizontally scale and that they don t handle unstructured data well. Bigtable and its open source alternative HBase, DynamoDB, and Cassandra all fit the key-value model. They are essentially distributed maps between a row key and its corresponding column values. This allows for great performance with relatively simple data models but forces developers to handle relationships between objects in application code. Depending on the number of join-like operations, this can degrade performance considerably. Document stores such as CouchDB and MongoDb allow developers to work with more complex unstructured data. CouchDB is ideal for use-cases in which horizontal scaling isn t needed. CouchDB s bi-directional replication is most useful for failover and redundancy and disconnected updates while also allowing for multiple synchronized databases to handle requests. MongoDB is more focused on massive scalability but sacrifices the ease of use of a RESTful interface as well as the advanced replication capability of CouchDB. MongoDB also supports querying without precreated views or indices, facilitating development. References [1] About postgresql. http://www.postgresql.org/about/, Retrieved 2012-03-25. 3 [2] Cassandra wiki. http://wiki.apache.org/cassandra/, Retrieved 2012-03-25. 5 6

[3] Datastore overview. http://code.google.com/appengine/docs/datastore/overview.html, Retrieved 2012-03-25. 4 [4] Hbase: Bigtable-like structured storage for hadoop hdfs. http://wiki.apache.org/hadoop/hbase, Retrieved 2012-03-25. 4 [5] Hbase/poweredby. http://wiki.apache.org/hadoop/hbase/poweredby, Retrieved 2012-03- 25. 4 [6] Introduction to apache cassandra. http://www.datastax.com/docs/0.8/introduction/index, Retrieved 2012-03-25. 5 [7] Mysql :: Innodb 1.1 for mysql 5.5 users guide :: 7 innodb performance and scalability enhancements. http://dev.mysql.com/doc/innodb/1.1/en/innodb-performance.html, Retrieved 2012-03-25. 4 [8] Mysql :: Mysql 5.1 reference manual::chapter 13. storage engines. http://dev.mysql.com/doc/refman/5.1/en/storage-engines.html, Retrieved 2012-03-25. 4 [9] Oracle and big data. http://www.oracle.com/us/technologies/big-data/index.html, Retrieved 2012-03-25. 3 [10] Oracle and sun. http://www.oracle.com/us/sun/index.htm, Retrieved 2012-03-25. 4 [11] Oracle database 11g editions. http://www.oracle.com/us/products/database/producteditions-066501.html?sssourcesiteid=otnen, Retrieved 2012-03-25. 3 [12] Oracle nosql database. http://www.oracle.com/us/products/database/nosql/overview/index.html, Retrieved 2012-03-25. 3 [13] Oracle price list. http://www.oracle.com/us/corporate/pricing/price-lists/index.html, Retrieved 2012-03-25. 3 [14] Amazon dynamodb(beta). http://aws.amazon.com/dynamodb/, Retrieved 2012-03-27. 5 [15] Apache couchdb: Introduction. http://couchdb.apache.org/docs/intro.html, Retrieved 2012-03-28. 5 [16] Atomic operations - mongodb. http://www.mongodb.org/display/docs/atomic+operations, Retrieved 2012-03-28. 6 [17] Comparing mongo db and couch db. http://www.mongodb.org/display/docs/comparing+mongo+db+and+couch+db, Retrieved 2012-03-28. 5, 6 [18] Technical overview - couchdb wiki. http://wiki.apache.org/couchdb/technical2012-03-28. 5 [19] What is mongodb? http://www.10gen.com/what-is-mongodb, Retrieved 2012-03-28. 6 [20] Why mongodb? http://www.10gen.com/why-mongodb, Retrieved 2012-03-28. 6 [21] Sql server 2012 editions. SQL Server homepage. http://www.microsoft.com/sqlserver/en/us/sql-2012-editions.aspx, Retrieved 2012-3- 25. 3 [22] J. Bezos. Amazon s soa strategy: just do it. http://www.zdnet.com/blog/serviceoriented/amazons-soa-strategy-just-do-it/648, June 2006. 5 [23] C. Biyikoglu. Building scalable database solution with sql azure - introducing federation in sql azure. http://blogs.msdn.com/b/cbiyikoglu/archive/2010/10/30/building-scalabledatabase-solution-in-sql-azure-introducing-federation-in-sql-azure.aspx, Retrieved 2012-3-25. 3 [24] B. Black. Cassandra replication and consistency. http://www.slideshare.net/benjaminblack/introduction-to-cassandra-replication-andconsistency, April 2010. 5 [25] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: a distributed storage system for structured data. In 7

Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7, OSDI 06, pages 15 15, Berkeley, CA, USA, 2006. USENIX Association. 4 [26] A. Cockcroft. Replacing datacenter oracle with global apache cassandra on aws. http://www.slideshare.net/adrianco/migrating-netflix-from-oracle-to-global-cassandra, July 2011. 5 [27] A. Cockcroft and D. Sheahan. Benchmarking cassandra scalability on aws - over a million writes per second. http://techblog.netflix.com/2011/11/benchmarking-cassandrascalability-on.html, November 2011. 5 [28] E. F. Codd. A relational model of data for large shared data banks. Commun. ACM, 26(1):64 69, Jan. 1983. 2 [29] G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: amazon s highly available key-value store. SIGOPS Oper. Syst. Rev., 41(6):205 220, Oct. 2007. 4 [30] K. Delaney. The evolution of microsoft sql server: 1989 to 2000. http://www.insidesqlserver.com/history of SQL Server.pdf, Retrieved 2012-3-24. 3 [31] D. Dougherty. http://onlamp.com/pub/a/onlamp/2001/01/25/lamp.html, 1 2001. 4 [32] S. Gallagher. Oracle may fork itself with recent mysql moves. http://arstechnica.com/business/news/2011/09/oracle-may-fork-itself-with-recent-mysqlmoves.ars, September 2011. 4 [33] T. Haerder and A. Reuter. Principles of transaction-oriented database recovery. ACM Comput. Surv., 15(4):287 317, Dec. 1983. 2 [34] J. Jackson. Apache cassandra nosql database ready for enterprise. http://www.computerworld.com/s/article/9220978/, October 2011. 5 [35] N. Mackenzie. Introduction to sql azure federations. http://convective.wordpress.com/2012/03/05/introduction-to-sql-azure-federations/, Retrieved 2012-03-25. 3 [36] K. Muthukkaruppan. The underlying technology of messages. http://www.facebook.com/notes/facebook-engineering/the-underlying-technology-ofmessages/454991608919, November 2010. 4 [37] S. B. Navathe. Evolution of data modeling for databases. Commun. ACM, 35(9):112 123, Sept. 1992. 2 [38] D. Pritchett. Base: An acid alternative. Queue, 6(3):48 55, May 2008. 2 [39] M. Stonebraker and L. A. Rowe. The design of postgres. SIGMOD Rec., 15(2):340 355, June 1986. 3 [40] R. W. Taylor and R. L. Frank. Codasyl data-base management systems. ACM Comput. Surv., 8(1):67 103, Mar. 1976. 2 [41] W. Vogels. Eventually consistent. Commun. ACM, 52(1):40 44, Jan. 2009. 2 [42] W. Vogels. Amazon dynamodb a fast and scalable nosql database service designed for internet scale applications - all things distributed. http://www.allthingsdistributed.com/2012/01/amazon-dynamodb.html, January 2012. 5 [43] M. Widenius. Help saving mysql. http://monty-says.blogspot.com/2009/12/help-savingmysql.html, December 2009. 4 8