CA485 Ray Walshe NoSQL

Size: px

Start display at page:

Download "CA485 Ray Walshe NoSQL"

Bryan Owen
5 years ago
Views:

1 NoSQL

2 BASE vs ACID Summary Traditional relational database management systems (RDBMS) do not scale because they adhere to ACID. A strong movement within cloud computing is to utilize non-traditional data stores (sometimes poorly dubbed NoSQL or NewSQL) for managing large amounts of data. This article contrasts the traditional ACID with the new-style BASE approach 2

3 Scaling "If your application relies upon persistence, then data storage will probably become your bottleneck." Many websites are I/O-bound. That is they are limited by how quickly they can access data from their data storage system (normally a SQL database). To scale or improve performance, you have two options: Vertical Scaling: Get a stronger, faster, better machine. Easiest, but also expensive Limited to the largest single system available Horizontal Scaling: Spread data across multiple machines. More flexible, but also more complex Functional Scaling: group data by function and spread functional groups across databases. Sharding: splitting data with functional areas across multiple databases

4 CAP A theorem which conjectures that web services cannot ensure all three of the following properties at once: Consistency: All operations appear to occur at once. Availability: Every operation must terminate in an intended response. Partition Tolerance: Operations will complete, even if individual components are unavailable.

5 ACID Traditional databases utilize transactions that adhere to the following guarantees: Atomicity: All operations in the transaction will complete or none will. Consistency: Database will be in a consistent state before and after a transaction. Isolation: Transaction will behave as if it is the only operation being performed. Durability: Upon completion of the transaction, the operation will not be reversed.

6 To ensure these properities when using partitioned databases, traditional RDBMS utilizet two-phase commit: 1. First the transaction coordinator asks each database involved to precommit and indicate if the commit is possible. 2. If all agree, then coordinator instructs each database to commit. This method ensures consistency over availability (if any databases are down, then we can't commit). Likewise, this locking and coordination serves as a bottleneck and prevents from scaling to large numbers of nodes.

7 BASE The current trend in cloud computing data storage is to loosen or relax the requirements of consistency in favor of more availablity. This is embodied in the BASE approach: Basically available: system guarantees the availability of your data; but the response can be "failure" if the data is in the middle of changing. Soft State: the state of the system is constantly changing. Eventually Consistent: the system will eventually become consistent once it stops receiving input.

8 BASE is optimistic and accepts that the database consistency will be in a state of flux. It achieves availibility by supporting partial failures without total system failure (i.e. partition tolerance). To implement BASE, many systems rely on some sort of message queue to persistently store and route data to various storage services the perform the actual database operations.

9 BigTable BigTable is a distributed storage system created by Google for managing structured data. It is structured as a large table that may be petabytes in size and distributed across tens of thousands of machines. HBase is an open source version of BigTable that works on top of Hadoop.

10 BigTable is a large, persistant, distributed, sparse, sorted, and multidimensional map. Map A map is an associative array or data structure that allows one to look up a value to a corresponding key quickly (e.g. hash table, binary search tree, etc.); in other words, it's a collection of key, value pairs. In BigTable, the key consists of the following: row key: string, column key: string, timestamp: int64 while the value is simply an array of bytes that is interpreted by the application (up to 64KB).

11 Sorted Normally, associative arrays are not sorted (keys are hashed to a position in the map). In BigTable, however, data is sorted by row to keep related data close together. This means that we must be careful in choosing row names such that related data is sorted near each other. For example, to store data about websites, Google's WebTable reverses the domain names of web pages: ie.dcu.computing ie.dcu.eeng ie.dcu.meng This keeps DCU website rows close together

12 Data Locality Sorting the rows is mechanism for improving data locality. With pure hashing it is possible for related data to be spread across multiple machines. Sorting and then partitioning the data allows all the data for one key subset to reside on one machine. A similar technique is used to shuffle data to reducers in MapReduce.

13 Multidimensional Each table is indexed by rows. Each row contains one or more named column families which are defined when the table is first created. Within a column family, there can be one or more named columns which can be created on the fly. With rows, column families, and columns, we have three-level naming hierarchy to identify data. For example: ie.dcu.computing: # Row - users: # Column Family - ray: Ray Walshe # Column - cdaly: Charlie # Column - system: # Column Family - : Linux 3.2 # Column (Null name)

14 To get data, we first access the row via the row name and then specify column key which is in the form column-family:column. In the example above, we first get the row ie.dcu.computing and then get a particular user with users:ray. To get multiple users, we can use a regular expression (or glob) to fetch multiple values: users:*. In addition to row and column, the data is also versioned by timestamps (either real time or application defined time) and sorted such that the most recent cell is first. To help manage these multiple versions, BigTable provides a mechanism to remove entries either by date (keep versions since some time t) or by amount (keep only the latest n versions). These garbage collection settings can be specified per column-family.

15 In addition to row and column, the data is also versioned by timestamps (either real time or application defined time) and sorted such that the most recent cell is first. To help manage these multiple versions, BigTable provides a mechanism to remove entries either by date (keep versions since some time t) or by amount (keep only the latest n versions). These garbage collection settings can be specified per column-family.

16 Sparse While the number of column-families is fixed at creation, the number of columns can grow arbitrarily. This means that within a particular row, it is possible for many columns to be empty. ie.dcu.computing: - language: - : EN - contents: - : <html>... - anchor: - dcu.ie: Dublin City University - microsoft.com: Microsoft Ie.dcu.computing.ftp: - language: - : EN - contents: - : <html>... - anchor: - dcu.ie: Dublin City University - kernel.org: Linux - computing.dcu.ie: Vinson - reddit.com: Reddit - freenode.net: Freenode

17 Distributed BigTable's data is spread across many independent machines. Tables are broken up into collections of rows called tablets such that each tablet has a set of consecutive rows. This allows for distribution of a Table onto multiple machines and for load balancing (split large Tablets into smaller ones). Persistant BigTable uses GFS to store data and log files persistantly. Large Can handle upwards of a Petabyte of data. Hooks into MapReduce (can be used as either input or output) and is utilized by a variety of applications.

18 Implementation Architecturally, BigTable resembles GFS: a master that coordinates activity and a large number of tablet servers that store and manage the data. These tablet servers can be added or removed dynamically. Master Master assigns tablets to tablet servers and balances tablet server load. It also manages garbage collection of files in GFS and handles scheme changes. Tablet Server A tablet server manages a set of tablets (10-1,000 per server) and handles read/write requests to the tablets. Internally, this data is stored in Google' SSTable format, which is a persistent, ordered, immutable key, value map file.

19 Chubby To coordinate the various servers, Chubby, a highly available and persistent distributed lock service is used to manage leases for resources and configuration storage by providing a namespace of files and directories that the user can lock atomically. It is used to: Ensure there is only one active master. Discover tablet servers. Store BigTable schema information. Store access control lists. Example of how it is used: When a tablet server starts, it creates and acquires an exclusive lock on a uniquely-named file in the servers directory. The master can monitor this directory for new servers.

20 Replication A BigTable can be configured for replication to multiple BigTable clusters in different data centers to ensure availability. Data is propagated asynchronously, which results in an eventually consistent model.

21 Applications BigTable, like GFS and MapReduce, is utilized internally by Google for many of their operations. Google Analytics This is a service that helps webmasters analyze traffic patterns at their website. BigTable is used to maintain raw click information (200 TB). Google Earth BigTable is used to store the raw image data. Personalized Search User data for personalized search is stored in BigTable.

22 How is it NoSQL? A BigTable cluster may contain several large tables, but it does not support operations across multiple tables (non-relational, no joining). No SQL! Perform key lookups to access data. Columns have no type (just a bunch of bytes) and may be quite large. Columns can be added dynamically. Columns within a row may be quite sparse; that is we may have a large number of columns, but each row may only have a tiny fraction of them populated. Availability is increased by asynchronously propogating data to multiple clusters in different data centers.

CS November 2017

CS November 2017 Bigtable Highly available distributed storage Distributed Systems 18. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account