NoSQL Data Stores. The reference Big Data stack

Size: px
Start display at page:

Download "NoSQL Data Stores. The reference Big Data stack"

Transcription

1 Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica NoSQL Data Stores Corso di Sistemi e Architetture per Big Data A.A. 2017/18 Valeria Cardellini The reference Big Data stack High-level Interfaces Data Processing Data Storage Resource Management Support / Integration Valeria Cardellini - SABD 2017/18 1

2 Traditional RDBMSs RDBMSs: the traditional technology for storing structured data in web and business applications SQL is good Rich language and toolset Easy to use and integrate Many vendors RDBMSs promise ACID guarantees Valeria Cardellini - SABD 2017/18 2 ACID properties Atomicity All included statements in a transaction are either executed or the whole transaction is aborted without affecting the database ( all or nothing principle) Consistency A database is in a consistent state before and after a transaction Isolation Transactions cannot see uncommitted changes in the database (i.e., the results of incomplete transactions are not visible to other transactions) Durability Changes are written to a disk before a database commits a transaction so that committed data cannot be lost through a power failure. Valeria Cardellini - SABD 2017/18 3

3 RDBMS constraints Domain constraints Restricts the domain of each attribute or the set of possible values for the attribute Entity integrity constraint No primary key value can be null Referential integrity constraint To maintain consistency among the tuples in two relations: every value of one attribute of a relation should exist as a value of another attribute in another relation Foreign key To cross-reference between multiple relations: it is a key in a relation that matches the primary key of another relation Valeria Cardellini - SABD 2017/18 4 Pros Well-defined consistency model ACID guarantees Relational integrity maintained through entity and referential integrity constraints Well suited for OLTP apps OLTP: OnLine Transaction Processing Pros and cons of RDBMS Sound theoretical foundation Stable and standardized DBMSs available Well understood Cons Performance as major constraint, scaling is difficult Limited support for complex data structures Complete knowledge of DB structure required to create ad hoc queries Commercial DBMSs are expensive Some DBMSs have limits on fields size Data integration from multiple RDBMSs can be cumbersome Valeria Cardellini - SABD 2017/18 5

4 RDBMS challenges Web-based applications caused spikes Internet-scale data size High read-write rates Frequent schema changes Let s scale RDBMSs RDBMS were not designed to be distributed Possible solutions: Replication Sharding Valeria Cardellini - SABD 2017/18 6 Replication Primary backup with master/worker architecture Replication improves read scalability Write operations? Valeria Cardellini - SABD 2017/18 7

5 Sharding Horizontal partitioning of data across many separate servers Scales read and write operations Cannot execute transactions across shards (partitions) Consistent hashing is one form of sharding - Hash both data and nodes using the same hash function in a same ID space Valeria Cardellini - SABD 2017/18 8 Scaling RDBMSs is expensive and inefficient Source: Couchbase technical report Valeria Cardellini - SABD 2017/18 9

6 NoSQL data stores NoSQL = Not Only SQL SQL-style querying is not the crucial objective Main features of NoSQL data stores Support flexible schema No requirement for fixed rows in a table schema Scale horizontally Partitioning of data and processing over multiple nodes Provide scalability and high availability by replicating data in multiple nodes, often across datacenters Multiprocessor support Mainly utilize shared-nothing architecture With exception of graph-based database Valeria Cardellini - SABD 2017/18 10 NoSQL data stores (2) Main features of NoSQL data stores (continued) Avoid unneeded complexity E.g., elimination of join operations Useful when working with Big data when the data s nature does not require a relational model Support weaker concurrency models than the standard ACID transaction model Rather BASE: compromising reliability for better performance Valeria Cardellini - SABD 2017/18 11

7 ACID vs BASE Two design philosophies at opposite ends of the consistency-availability spectrum - Keep in mind the CAP theorem! Pick two of Consistency, Availability and Partition tolerance ACID: the traditional approach to address the consistency issue in RDBMS A pessimistic approach: prevent conflicts from occurring Usually implemented with write locks managed by the system But ACID does not scale well when handling petabytes of data (remember of latency!) Valeria Cardellini - SABD 2017/18 12 ACID vs BASE (2) BASE stands for Basically Available, Soft state, Eventual consistency An optimistic approach Lets conflicts occur, but detects them and takes action to sort the out Approaches: conditional updates: test the value just before updating save both updates: record that they are in conflict and then merge them Basically Available: the system is available most of the time and there could exist a subsystem temporarily unavailable Soft state: data is not durable in the sense that its persistence is in the hand of the user that must take care of refresh them Eventually consistent: the system eventually converge to a consistent state Usually adopted in NoSQL databases Valeria Cardellini - SABD 2017/18 13

8 Consistency Biggest change from a centralized RDBMS to a cluster-oriented NoSQL RDBMS: strong consistency Traditional RDBMS are CA systems (or CP systems, depending on the configuration) NoSQL systems: mostly eventual consistency AP systems Valeria Cardellini - SABD 2017/18 14 Consistency: an example Ann is trying to book a room of the Ace Hotel in New York on a node located in London of a booking system Pathin is trying to do the same on a node located in Mumbai The booking system uses a replicated database with the master located in Mumbai and the slave in London There is only a room available The network link between the two servers breaks Ann London Mumbay Pathin Valeria Cardellini - SABD 2017/18 15

9 Consistency: an example CA system: neither user can book any hotel room No tolerance to network partitions CP system: Pathin can make the reservation Ann can see the inconsistent room information but cannot book the room AP: both nodes accept the hotel reservation Overbooking! Remember that the tolerance to this situation depends on the application type Blog, financial exchange, shopping chart, Valeria Cardellini - SABD 2017/18 16 Pessimistic vs. optimistic approach Concurrency involves a fundamental tradeoff between: - Safety (avoiding errors such as update conflicts) and - Liveness (responding quickly to clients) Pessimistic approaches often: - Severely degrade the responsiveness of a system - Leads to deadlocks, which are hard to prevent and debug Valeria Cardellini - SABD 2017/18 17

10 NoSQL cost and performance Source: Couchbase technical report Valeria Cardellini - SABD 2017/18 18 Valeria Cardellini - SABD 2017/18 Pros Easy to scale-out Higher performance for massive data scale Allows sharing of data across multiple servers Most solutions are either open-source or cheaper HA and fault tolerance provided by data replication Supports complex data structures and objetcs No fixed schema, supportrs unstructured data Very fast retrieval of data, suitable for real-time apps Pros and cons of NoSQL Cons Do not provide ACID guarantees, less suitable for OLTP apps No fixed schema, no common data storage model Limited support for aggregation (sum, avg, count, group by) Performance for complex join is poor No well defined approach for DB design (different solutions have different data models) Lack of consistent model can lead to solution lock-in 19

11 Barriers to NoSQL Main barriers to NoSQL adoption No full ACID transaction support Lack of standardized interfaces Huge investments already made in existing RDBMSs A commercial example AWS launched two NoSQL services (SimpleDB in 2007 and later DynamoDB in 2012) and one RDBMS service (RDS in 2009) Valeria Cardellini - SABD 2017/18 20 NoSQL data models A number of largely diverse data stores not based on the relational data model Valeria Cardellini - SABD 2017/18 21

12 NoSQL data models A data model is a set of constructs for representing the information Relational model: tables, columns and rows Storage model: how the DBMS stores and manipulates the data internally A data model is usually independent of the storage model Data models for NoSQL systems: Aggregate-oriented models: key-value, document, and column-family Graph-based models Valeria Cardellini - SABD 2017/18 22 Aggregates Data as units that have a complex structure More structure than just a set of tuples E.g.: complex record with simple fields, arrays, records nested inside Aggregate pattern in Domain-Driven Design A collection of related objects that we treat as a unit A unit for data manipulation and management of consistency Advantages of aggregates Easier for application programmers to work with Easier for database systems to handle operating on a cluster Valeria Cardellini - SABD 2017/18 See 23

13 Transactions? Relational databases do have ACID transactions! Aggregate-oriented databases: Support atomic transactions, but only within a single aggregate Don t have ACID transactions that span multiple aggregates Update over multiple aggregates: possible inconsistent reads Part of the consideration for deciding how to aggregate data Graph databases tend to support ACID transactions Valeria Cardellini - SABD 2017/18 24 Key-value data model Simple data model in which data is represented as a collection of key-value pairs Associative array (map or dictionary) as fundamental data model Strongly aggregate-oriented Lots of aggregates Each aggregate has a key Data model: A set of <key,value> pairs Value: an aggregate instance The aggregate is opaque to the database Just a big blob of mostly meaningless bit Access to an aggregate: Lookup based on its key Richer data models can be implemented on top Valeria Cardellini - SABD 2017/18 25

14 Key-value data model: example Valeria Cardellini - SABD 2017/18 26 Types of key-value stores Consistency models Range from eventual consistency to serializability AP: Dynamo, Voldemort, Riak CP: Scalaris, Redis, Berkeley DB Some data stores support ordering of keys Some maintain data in memory (RAM), while others employ HDs or SSDs Valeria Cardellini - SABD 2017/18 27

15 Query features in key-value data stores Only query by the key! There is a key and there is the rest of the data (the value) It is not possible to use some attribute of the value column The key needs to be suitably chosen E.g., session ID for storing session data What if we don t know the key? Some system allows the search inside the value using a fulltext search (e.g., using Apache Solr) Valeria Cardellini - SABD 2017/18 28 Suitable use cases for key-value data stores Storing session information in web apps Every session is unique and is assigned a unique sessionid value Store everything about the session using a single put, request or retrieved using get User profiles and preferences Almost every user has a unique userid, username,, as well as preferences such as language, which products the user has access to, Put all into an object, so getting preferences of a user takes a single get operation Shopping cart data All the shopping information can be put into the value where the key is the userid Valeria Cardellini - SABD 2017/18 29

16 Examples of key-value stores Amazon s Dynamo is the most notable example Riak: open-source implementation Other key-value stores include: Amazon DynamoDB Data model and name from Dynamo, but different implementation Berkeley DB (ordered keys) Memcached, Redis, Hazelcast (in-memory data stores) Oracle NoSQL Database Encache (Java-based cache) Scalaris Voldemort (used by LinkedIn) upscaledb LevelDB (written by Google s fellows and open source) Valeria Cardellini - SABD 2017/18 30 Document data model Strongly aggregate-oriented Lots of aggregates Each aggregate has a key Similar to a key-value store (unique key), but API or query/update language to query or update based on the internal structure in the document The document content is no longer opaque Similar to a column-family store, but values can have complex documents, instead of fixed format Document: encapsulates and encodes data in some standard formats or encodings XML, YAML, JSON, BSON, Valeria Cardellini - SABD 2017/18 31

17 Document data model Data model: A set of <key, document> pairs Document: an aggregate instance Structure of the aggregate is visible Limits on what we can place in it Access to an aggregate Queries based on the fields in the aggregate Flexible schema No strict schema to which documents must conform, which eliminates the need of schema migration efforts Valeria Cardellini - SABD 2017/18 32 Document data model The data model resembles the JSON object Valeria Cardellini - SABD 2017/18 33

18 Document data store API Usual CRUD operations (although not standardized) Creation (or insertion) Retrieval (or query, search, finds) Not only simple key-to-document lookup API or query language that allows the user to retrieve documents based on content (or metadata) Update (or edit) Replacement of the entire document or individual structural pieces of the document Deletion (or removal) Valeria Cardellini - SABD 2017/18 34 Examples of document stores MongoDB and CouchDB are the two major representatives You will use MongoDB in lab Documents grouped together to form collections Collections organized into databases Other popular document stores include: Couchbase DocumentDB: PaaS service in Microsoft s Azure cloud platform RethinkDB RavenDB (for.net) Valeria Cardellini - SABD 2017/18 35

19 Key-value vs. document stores Key-value store A key plus a big blob of mostly meaningless bits We can store whatever we like in the aggregate We can only access an aggregate by lookup based on its key Document store A key plus a structured aggregate More flexibility in access We can submit queries to the database based on the fields in the aggregate We can retrieve part of the aggregate rather than the whole thing Indexes based on the contents of the aggregate Valeria Cardellini - SABD 2017/18 36 Suitable use cases for document stores Good for storing and managing big data-size collections of semi-structured data with a varying number of fields Text and XML documents, messages Conceptual documents like de-normalized (aggregate) representations of DB entities such as product or customer Sparse data in general, i.e., irregular (semistructured) data that would require an extensive use of nulls in RDBMS Nulls being placeholders for missing or nonexistent values Valeria Cardellini - SABD 2017/18 37

20 When not to use document stores Complex transactions spanning different operations Document stores unsuited for atomic crossdocument operations Queries against varying aggregate structure Data is saved as an aggregate in the form of application entities. If the structure of the aggregate constantly changes, the aggregate is saved at the lowest level of granularity. In this scenario, document stores may not work Valeria Cardellini - SABD 2017/18 38 Column-family data model Strongly aggregate-oriented Lots of aggregates Each aggregate has a key Similar to a key-value store, but the value can have multiple attributes (columns) Data model: a two-level map structure A set of <row-key, aggregate> pairs Each aggregate is a group of pairs <column-key, value> Column: a set of data values of a particular type Aggregate structure is visible Columns can be organized in families Data usually accessed together Valeria Cardellini - SABD 2017/18 39

21 Column-family data model: example Representing customer information in a column-family structure Valeria Cardellini - SABD 2017/18 40 Column-store vs. row-store Row-store systems: store and process data by row However, DBMSs support indexes to improve the performance of set-wide operations on the whole tables Column-store systems: store and process data by column Can access faster the needed data rather than scanning and discarding unwanted data in row Examples: C-Store (pre-nosql) and its commercial fork Vertica Bigtable and Cassandra: no column stores in the original sense of the term, but column families are stored separately Valeria Cardellini - SABD 2017/18 41

22 Properties of column-family stores In many analytical databases queries, few attributes are needed Example: aggregate queries (avg, max, ) Column-store is preferable for performance Moreover, both rows and columns are split over multiple nodes using sharding to achieve scalability So column-family stores are suitable for readmostly, read-intensive, large data repositories Valeria Cardellini - SABD 2017/18 42 Properties of column-family stores Operations also allow picking out a particular column See slide 40: get('1234', 'name')! Each column: Has to be part of a single column family Acts as unit for access You can add any column to any row, and rows can have very different columns You can model a list of items by making each item a separate column Two ways to look at data: Row-oriented Each row is an aggregate Column families represent useful chunks of data within that aggregate Column-oriented Each column family defines a record type Row as the join of records in all column families Valeria Cardellini - SABD 2017/18 43

23 Suitable use cases for column-family stores Queries that involve only a few columns Aggregation queries against vast amounts of data - E.g., average age of all of your users Column-wise compression Well-suited for OLAP-like workloads (e.g., data warehouses) which typically involve highly complex queries over all data (possibly petabytes) Valeria Cardellini - SABD 2017/18 44 Examples of column-family stores Google s Bigtable is the most notable example Other column-family stores include: Apache HBase: open-source implementation providing Bigtable-like capabilities on top of Hadoop and HDFS Apache Accumulo: based on the design of BigTable and powered by Apache Hadoop, Zookeeper, and Thrift Different APIs and different nomenclature from HBase, but same in operational and architectural standpoint Better security Cassandra Hypertable Amazon Redshift Valeria Cardellini - SABD 2017/18 45

24 Graph data model Uses graph structures with nodes, edges, and properties to represent stored data Nodes are the entities and have a set of attributes Edges are the relationships between the entities E.g.: an author writes a book Edges can be directed or undirected Nodes and edges also have individual properties consisting of key-value pairs Replaces relational tables with structured relational graphs of interconnected key-value pairs Powerful data model Differently from other types of NoSQL stores, it concerns itself with relationships Focus on visual representation of information (more humanfriendly than other NoSQL stores) Other types of NoSQL stores are poor for interconnected data Valeria Cardellini - SABD 2017/18 46 Graph data model: example Small example from Neo4j Valeria Cardellini - SABD 2017/18 47

25 Graph database Explicit graph structure Each node knows its adjacent nodes As the number of nodes increases, the cost of local step (or hop) remains the same Plus an index for lookups Cons: Sharding: data partitioning is difficult Horizontal scalability When related nodes are stored on different servers, traversing multiple servers is not performance-efficient Require rewiring your brain Valeria Cardellini - SABD 2017/18 48 Examples of graph databases Neo4j OrientDB InfiniteGraph AllegroGraph HyperGraphDB Valeria Cardellini - SABD 2017/18 49

26 Suitable use cases for graph databases Good for applications where you need to model entities and relationships between them Social networking applications Pattern recognition Dependency analysis Recommendation systems Solving path finding problems raised in navigation systems Good for applications in which the focus is on querying for relationships between entities and analyzing relationships Computing relationships and querying related entities is simpler and faster than in RDBMS Valeria Cardellini - SABD 2017/18 50 Case studies Key-value data stores: Amazon s Dynamo (and Riak KV), Redis Document-oriented data stores: MongoDB Column-family data stores: Google s Bigtable (and HBase), Cassandra Graph databases: Neo4j In blue: Lab Valeria Cardellini - SABD 2017/18 51

27 Case study: Amazon s Dynamo Highly available and scalable distributed keyvalue data store built for Amazon s platform A very diverse set of Amazon applications with different storage requirements Need for storage technologies that are always available on a commodity hardware infrastructure E.g., shopping cart service: Customers should be able to view and add items to their shopping cart even if disks are failing, network routes are flapping, or data centers are being destroyed by tornados Meet stringent Service Level Agreements (SLAs) E.g., service guaranteeing that it will provide a response within 300ms for 99.9% of its requests for a peak client load of 500 requests per second. G. DeCandia et al., "Dynamo: Amazon's highly available key-value store", Proc. of ACM SOSP Valeria Cardellini - SABD 2017/18 52 Dynamo: Features Simple key-value API Simple operations to read (get) and write (put) objects uniquely identified by a key Each operation involves only one object at time Focus on eventually consistent store Sacrifices consistency for availability BASE rather than ACID Efficient usage of resources Simple scale-out schema to manage increasing data set or request rates Internal use of Dynamo Security is not an issue since operation environment is assumed to be non-hostile Valeria Cardellini - SABD 2017/18 53

28 Dynamo: Design principles Sacrifice consistency for availability (CAP theorem) Use optimistic replication techniques Possible conflicting changes which must be detected and resolved: when to resolve them and who resolves them? When: execute conflict resolution during reads rather than writes, i.e. always writeable data store Who: data store or application; if data store, use simple policy (e.g., last write wins ) Other key principles: Incremental scalability Scale-out with minimal impact on the system Simmetry and decentralization P2P techniques Heterogeneity Valeria Cardellini - SABD 2017/18 54 Dynamo: API Each stored object has an associated key Simple API including get() and put() operations to read and write objects get(key)! Returns single object or list of objects with conflicting versions and context Conflicts are handled on reads, never reject a write put(key, context, object)! Determines where the replicas of the object should be placed based on the associated key, and writes the replicas to disk Context encodes system metadata, e.g., version number Both key and object treated as opaque array of bytes Key: 128-bit MD5 hash applied to client supplied key Valeria Cardellini - SABD 2017/18 55

29 Dynamo: Used techniques Problem Technique Advantage Partitioning Consistent hashing Incremental scalability Vector clocks with Version size is decoupled High Availability for writes reconciliation during reads from update rates Handling temporary Sloppy Quorum and Provides high availability failures hinted handoff and durability guarantee when some of the replicas are not available Recovering from permanent failures Membership and failure detection Anti-entropy using Merkle trees Gossip-based membership protocol and failure detection Synchronizes divergent replicas in the background Preserves symmetry and avoids having a centralized registry for storing membership and node liveness information Valeria Cardellini - SABD 2017/18 56 Dynamo: Data partitioning Consistent hashing: output range of a hash is treated as a ring (similar to Chord) MD5(key) -> node (position on the ring) Differently from Chord: zero-hop DHT Virtual nodes Each node can be responsible for more than one virtual node Work distribution proportional to the capabilities of the individual node Valeria Cardellini - SABD 2017/18 57

30 Dynamo: Replication Each object is replicated on N nodes N is a parameter configured per-instance by the application Preference list: list of nodes that is responsible for storing a particular key More than N nodes to account for node failures See figure: object identified by key K is replicated on nodes B, C and D Node D will store the keys in the ranges (A, B], (B, C], and (C, D] Valeria Cardellini - SABD 2017/18 58 Dynamo: Used techniques Problem Technique Advantage Partitioning Consistent hashing Incremental scalability Vector clocks with Version size is decoupled High availability for writes reconciliation during reads from update rates Handling temporary Sloppy Quorum and Provides high availability failures hinted handoff and durability guarantee when some of the replicas are not available Recovering from permanent failures Membership and failure detection Anti-entropy using Merkle trees Gossip-based membership protocol and failure detection Synchronizes divergent replicas in the background Preserves symmetry and avoids having a centralized registry for storing membership and node liveness information Valeria Cardellini - SABD 2017/18 59

31 Dynamo: Data versioning A put() call may return to its caller before the update has been applied at all the replicas A get() call operation may return an object that does not have the latest updates Version branching can also happen due to node/ network failures Problem: multiple versions of an object, that the system needs to reconcile Solution: use vectorial clocks to capture the casuality among different versions of the same object If causal: older version can be forgotten If concurrent: conflict exists, requiring reconciliation Valeria Cardellini - SABD 2017/18 60 Dynamo: Used techniques Problem Technique Advantage Partitioning Consistent hashing Incremental scalability Vector clocks with Version size is decoupled High Availability for writes reconciliation during reads from update rates Handling temporary Sloppy Quorum and Provides high availability failures hinted handoff and durability guarantee when some of the replicas are not available Recovering from permanent failures Membership and failure detection Anti-entropy using Merkle trees Gossip-based membership protocol and failure detection Synchronizes divergent replicas in the background Preserves symmetry and avoids having a centralized registry for storing membership and node liveness information Valeria Cardellini - SABD 2017/18 61

32 Dynamo: Sloppy quorum R/W: minimum numebr of nodes at must participate in a successful read/write operation Setting R + W > N yields a quorum-like system The latency of a get (or put) operation is dictated by the slowest of the R (or W) replicas R and W are usually configured to be less than N, to provide better latency Typical configuration in Dynamo: (N, R, W) = (3, 2, 2) Balances performance, durability, and availability Sloppy quorum Due to partitions, quorums might not exist Sloppy quorum: create transient replicas (called hinted replicas) N healthy nodes from the preference list (may not always be the first N nodes encountered while walking the consistent hashing ring) Valeria Cardellini - SABD 2017/18 62 Dynamo: Put and get operations put operation Coordinator generates new vector clock and writes the new version locally Send to N nodes Wait for response from W nodes get operation Coordinator requests existing versions from N Wait for response from R nodes If multiple versions, return all versions that are causally unrelated Divergent versions are then reconciled Reconciled version written back Valeria Cardellini - SABD 2017/18 63

33 Dynamo: Hinted handoff Hinted handoff for transient failures Consider N = 3; if A is temporarily down or unreachable, put will use D D knows that the replica belongs to A Later, D detects A is alive Sends the replica to A Removes the replica Again, always writeable principle Valeria Cardellini - SABD 2017/18 64 Dynamo: Used techniques Problem Technique Advantage Partitioning Consistent hashing Incremental scalability Vector clocks with Version size is decoupled High Availability for writes reconciliation during reads from update rates Handling temporary Sloppy Quorum and Provides high availability failures hinted handoff and durability guarantee when some of the replicas are not available Recovering from permanent failures Membership and failure detection Anti-entropy using Merkle trees Gossip-based membership protocol and failure detection Synchronizes divergent replicas in the background Preserves symmetry and avoids having a centralized registry for storing membership and node liveness information Valeria Cardellini - SABD 2017/18 65

34 Dynamo: Membership management Administrator explicitly adds and removes nodes Gossiping to propagate membership changes Eventually consistent view O(1) hop overlay Valeria Cardellini - SABD 2017/18 66 Dynamo: Failure detection and management Passive failure detection Use pings only for detection from failed to alive In the absence of client requests, node A doesn t need to know if node B is alive Anti-entropy mechanism to keep replica synchronized Use Merkle trees to rapidly detect inconsistency and limit the amount of data transferred Merkle tree: every leaf node is labeled with the hash of a data block and every non-leaf node is labeled with the cryptographic hash of the labels of its child nodes - Nodes maintain Merkle tree of each key range - Exchange root of Merkle tree to check if the key ranges are updated Valeria Cardellini - SABD 2017/18 67

35 Riak KV Distributed NoSQL key-value data store inspired by Dynamo Open-source version, Like Dynamo Employs consistent hashing to partition and replicate data around the ring Makes use of gossiping to propagate membership changes Uses vector clocks to resolve conflicts Nodes can be added and removed from the Riak cluster as needed Two ways of resolving update conflicts: Last write wins Both values are returned allowing the client to resolve the conflict Can run on Mesos Valeria Cardellini - SABD 2017/18 68 Case study: Google s Bigtable Built on GFS, Chubby (lock service), SSTable (logstructured storage) and a few other Google technologies Data storage organized in tables, whose rows are distributed over GFS In 2015 made available as a service on Google Cloud Platform: Cloud Bigtable Underlies Google Cloud Datastore Used by a number of Google applications, including: Web indexing, MapReduce, Google Maps, Google Earth, YouTube and Gmail LevelDB is based on concepts from Bigtable Stores entries lexicographically sorted by keys, but widely noted for being unreliable Chang et al., "Bigtable: A Distributed Storage System for Structured Data", ACM Trans. Comput. Syst., Valeria Cardellini - SABD 2017/18 69

36 Bigtable: Motivation Lots of semi-structured data at Google URLs, geographical locations,... Big data Billions of URLs, hundreds of millions of users, 100+TB of satellite image data, Valeria Cardellini - SABD 2017/18 70 Bigtable: Main features Distributed storage structured as a large table Distributed multi-dimensional sorted map Fault-tolerant Scalable and self-managing CP system: strong consistency and network partition tolerance Valeria Cardellini - SABD 2017/18 71

37 Bigtable: Data model Table Distributed, multi-dimensional, sparse and sorted map Indexed by rows Rows Sorted in lexicographical order by row key Every read or write in a row is atomic: no concurrent ops on same row Columns The basic unit of data access Sparse table: different rows may use different columns Column family: group of columns Data within a column family usually of the same type Column family allows for specific optimization for better access control, storage and data indexing Column naming: column-family:column Valeria Cardellini - SABD 2017/18 72 Bigtable: Data model Multi-dimensional: rows, column families and columns provide a three-level naming hierarchy in identifying data Valeria Cardellini - SABD 2017/18 73

38 Bigtable: Data model Time-based Each column family may keep multiple versions, each one having a timestamp Bigtable data model vs. relational data model Valeria Cardellini - SABD 2017/18 74 Bigtable: Tablet Tablet: group of consecutive rows of a table stored together The basic unit for data storing and distribution Table sorted by row keys: select row keys properly to improve data locality Auto-sharding: tablets are split by the system when they become too large Each tablet is served by exactly one tablet server Valeria Cardellini - SABD 2017/18 75

39 Bigtable: API Metadata operations Create/delete tables, column families, change metadata Write operations: single-row, atomic Write/delete cells in a row, delete all cells in a row Read operations: read arbitrary cells in a Bigtable table Each row read is atomic One row, all or specific columns, certain timestamps,... Valeria Cardellini - SABD 2017/18 76 Bigtable: Writing and reading examples Valeria Cardellini - SABD 2017/18 77

40 Main components: Master server Tablet server Client library Bigtable: Architecture Valeria Cardellini - SABD 2017/18 78 Bigtable: Master server One master server Assigns tablets to tablet servers Balances tablet server load Garbage collection of unneeded files in GFS Handles schema changes, e.g., table and column family creations Valeria Cardellini - SABD 2017/18 79

41 Bigtable: Tablet server Many tablet servers Can be added or removed dynamically Each manages a set of tablets (typically tablets/server) Handles read/write requests to tablets Splits tablets when too large Valeria Cardellini - SABD 2017/18 80 Bigtable: Client library Library that is linked into every client Client data do not move through the master Clients communicate directly with tablet servers for reads/writes Valeria Cardellini - SABD 2017/18 81

42 Bigtable: Building blocks The external building blocks of Bigtable are: Google File System (GFS): raw storage Chubby: distributed lock service Cluster scheduler: schedules jobs onto machines Valeria Cardellini - SABD 2017/18 82 Bigtable: Chubby lock service Distributed and highly available lock service used in many Google s products File system {directory/file} for locking Uses Paxos for consensus to keep replicas consistent A client leases a session with the service In Bigtable Chubby is used to: Ensure there is only one active master Store bootstrap location of Bigtable data Discover tablet servers Store Bigtable schema information Store access control lists (ACL) Valeria Cardellini - SABD 2017/18 83

43 Bigtable: Locating rows Three-level indexing hierarchy Root tablet stores location of all METADATA tablets in a special METADATA tablet Each METADATA tablet contains location of user data tablets For efficiency the client library caches tablet locations Valeria Cardellini - SABD 2017/18 84 Bigtable: Master startup The master executes the following steps at startup Grabs a unique master lock in Chubby (to prevent concurrent master instantiations) Scans the tablet servers directory in Chubby to find the live servers Communicates with each live tablet server to find what tablets are assigned to each server Scans the METADATA table to learn the full set of tablets and builds a set of unassigned tablet servers, which are eligible for tablet assignment Valeria Cardellini - SABD 2017/18 85

44 Bigtable: Tablet assignment A tablet is assigned to one tablet server at a time Master uses Chubby to keep tracks of live tablet serves and unassigned tablets When a tablet server starts, it creates and acquires an exclusive lock in Chubby Master detects the status of the lock of each tablet server by checking it periodically Master is responsible for finding when tablet server is no longer serving its tablets and reassigning those tablets as soon as possible Valeria Cardellini - SABD 2017/18 86 Bigtable: SSTable Sorted Strings Table (SSTable) file format used internally to store Bigtable data Immutable file of key/value string pairs, sorted by keys Each SSTable is stored in a GFS file Chunks of data plus a block index A block index is used to locate blocks The index is loaded into memory when the SSTable is opened Valeria Cardellini - SABD 2017/18 87

45 Bigtable: Serving tablets Updates committed to a commit log Recently committed updates are stored in memory (memtable) Older updates are stored in a sequence of SSTables Recent updates kept sorted in memory memtable and SSTables are merged to serve a read request Write operations are logged Valeria Cardellini - SABD 2017/18 88 Bigtable: Loading tablets To load a tablet, a tablet server: Finds location of tablet through its METADATA Metadata for a tablet includes list of SSTables and set of redo points Read SSTables index blocks into memory Read the commit log since the redo point and reconstructs the memtable Valeria Cardellini - SABD 2017/18 89

46 Bigtable: Consistency and availability Strong consistency CP system Only one tablet server is responsible for a given piece of data Replication is handled on the GFS layer Tradeoff with availability If a tablet server fails, its portion of data is temporarily unavailable until a new server is assigned Valeria Cardellini - SABD 2017/18 90 Comparing Dynamo and Bigtable Dynamo Bigtable Data model Key-value, row store Column store API Single value Single value and range Data partition Random Ordered Optimized for Writes Reads Consistency Eventual Atomic Multiple version Version Timestamp Replication Quorum GFS Data center aware Yes Yes Persistency Local and pluggable Replicated and distributed file system Architecture Decentralized Hierarchical (master/ worker) Client library Yes Yes Valeria Cardellini - SABD 2017/18 91

47 Cloud Bigtable Bigtable as a Cloud service Sparsely populated table that can scale to billions of rows and thousands of columns Table: sorted key/value map Table composed of rows, each of which describes a single entity, and columns, which contain individual values for each row Row indexed by a single row key Columns that are related to one another are grouped together into a column family Column identified by column family and column qualifier Valeria Cardellini - SABD 2017/18 92 Cloud Bigtable: table example Application: social network for United States presidents Table: tracks who each president is following Valeria Cardellini - SABD 2017/18 93

48 Cloud Bigtable: use cases When to use To store large amounts of single-keyed data with low latency for apps that need high throughput and scalability for non-structured key/value data Value no larger than 10 MB Some examples Marketing data (e.g., purchase histories, customer preferences) Financial data (e.g., transaction histories, stock prices) IoT data (e.g., usage reports from energy meters and home appliances) Time-series data (e.g., CPU and memory usage over time for multiple servers) Valeria Cardellini - SABD 2017/18 94 Cloud Bigtable: architecture Valeria Cardellini - SABD 2017/18 95

49 Case study: Cassandra Initially developed at Facebook A mixture of Amazon s Dynamo and Google s BigTable Dynamo P2P architecture (replication & partitioning), gossip-based discovery and error detection BigTable Sparse column-oriented data model, storage architecture (SSTable disk storage) Cassandra Some large production deployments: - Apple: over 75,000, over 10 PB of data - Netflix: 2,500 nodes, 420 TB, over 1 trillion requests per day Valeria Cardellini - SABD 2017/18 96 Cassandra: features High availability and incremental scalability Robust support for systems spanning multiple data centers Asynchronous master-less replication allowing low latency operations Data model: structured key-value store where columns are added only to specified rows Distributed multi-dimensional map indexed by a row key As in BigTable: different rows can have different number of columns and columns are grouped into column families Emphasizes denormalization instead of normalization and joins See example at Write-oriented system Bigtable designed for intensive read workloads Valeria Cardellini - SABD 2017/18 97

50 Cassandra: Consistency Decentralized No master node to coordinate reads and writes Tunable read and write consistency for each query SELECT points! FROM fantasyfootball.playerpoints USING CONSISTENCY QUORUM! WHERE playername = Tom Brady ;! Achieved through quorum-based protocol If N+W > R and W >= N/2 +1 you have strong consistency Some available consistency levels (see ONE: only a single replica must respond QUORUM: a majority of the replicas must respond ALL: all of the replicas must respond LOCAL_QUORUM: a majority of the replicas in the local datacenter must respond Tunable tradeoffs between consistency and latency Valeria Cardellini - SABD 2017/18 98 Cassandra Query Language (CQL) An SQL-like language CREATE COLUMNFAMILY Customer (! KEY varchar PRIMARY KEY,! name varchar,! city varchar,! web varchar);!! INSERT INTO Customer (KEY,name,city,web)! VALUES ('mfowler',! 'Martin Fowler',! 'Boston',! ' SELECT * FROM Customer! SELECT name,web FROM Customer! SELECT name,web FROM Customer WHERE city='boston'! Valeria Cardellini - SABD 2017/18 99

51 Case study: Neo4j Support for ACID Properties of nodes and relationships captured in the form of multiple attributes (key-value pairs) Relationships are unidirectional Nodes tagged with labels Labels used to represent different roles in the modeled domain Neo4j architecture Clustered installation Single read/write master and multiple read-only slaves Improve data throughput via a multi-level caching scheme Valeria Cardellini - SABD 2017/ Neo4j: Twitter graph example Valeria Cardellini - SABD 2017/18 101

52 Neo4j: Cypher Cypher: declarative SQL-like language for performing CRUD operations and more See ASCII-Art to represent patterns Relationships are an arrow --> between two nodes Some examples: Create a node with labels and properties!create (movie: Movie {title: Shrek })! Search for a specified pattern: MATCH corresponds to SELECT in SQL!MATCH (me {name: Rick })-[:KNOWS *2..3]-> (remote_friend) RETURN remote_friend.name! Nodes that are a variable number of relationship node hops away can be found using the following syntax:!!-[:type*minhops..maxhops]! Valeria Cardellini - SABD 2017/ Neo4j: Cypher Native support for shortest path queries Example: find the shortest path between two airports (SFO and MSO) shortestpath returns a single shortest path between two nodes In the example, the path length is between 0 and 2 allshortestpath returns all the shortest paths between two nodes Valeria Cardellini - SABD 2017/18 103

53 Neo4j: Cypher Shortest path queries (see Twitter graph on slide 98) More queries on Twitter graph at You can map your Twitter network and analyze it with Neo4j Find all the shortest paths between two Twitter users linked by Follows as long as the path is min 0 and max 3 long Find all the shortest paths between two Twitter users as long as the path is min 0 and max 5 long Valeria Cardellini - SABD 2017/ Performance comparison of NoSQL data stores Unsolved issue: no standard benchmark Yahoo Cloud Serving Benchmark (YCSB): open source workload generator used in some studies We consider two papers: 1. Solving Big Data Challenges for Enterprise Application Performance Management, VLDB Is Elasticity of Scalable Databases a Myth?, BigData 2016 Their focus is on performance evaluation Throughput and latency as metrics Overall conclusion: No single winner takes all among NoSQL data stores Results depend on use case, workload and deployment conditions Valeria Cardellini - SABD 2017/18 105

54 Performance 12 Target: Application Performance Monitoring (APM) platform for monitoring big data Workload R: 95% reads and only 5% writes Throughput Read latency Write latency Cassandra: linear scalability in terms of throughput but slow at reading and writing HBase: suffers in scalability, slow at reading but fast at writing Redis: being in-memory, the fastest at reading Valeria Cardellini - SABD 2017/ Performance 12 Workload WR: 50% reads and 50% writes Throughput Read latency Study conclusions: - Linear scalability for Cassandra, HBase, and Voldemort in most of the tests - Cassandra s throughput dominated in all the tests, however high latency - HBase achieved least throughput but exhibited a low write latency at the cost of a high read latency Valeria Cardellini - SABD 2017/18 107

55 Performance 16 Valeria Cardellini - SABD 2017/18 Couchbase achieves highest throughput and lowest latency Cassandra benefits the most from larger cluster sizes in contrast to MongoDB - Cassandra shows best results also in NoSQL Databases: MongoDB vs Cassandra 108 Which data store to use? Different data stores designed to solve different problems Using a single database engine for all of the requirements storing transactional data caching session information traversing graph of customers performing OLAP operations usually leads to non-performing solutions Different needs for availability, consistency, or backup requirements Valeria Cardellini - SABD 2017/18 109

56 How to choose about consistency? Let us consider the differences about consistency Valeria Cardellini - SABD 2017/ Poliglot persistence Better approach: rather than a single solution use multiple data storage technologies Choose them based upon the way data are used by applications or components of a single application Different kinds of data are best dealt with different data stores: pick the right tool for the right use case See Valeria Cardellini - SABD 2017/18 111

57 References Sadalage and Fowler, NoSQL distilled, Addison-Wesley, Golinger at al., Data management in cloud environments: NoSQL and NewSQL data stores, J. Cloud Comp., DeCandia et al., "Dynamo: Amazon's highly available key-value store", ACM SOSP Chang et al., Bigtable: a distributed storage system for structured data, OSDI Lakshman and Malik, Cassandra - a decentralized structured storage system, LADIS Valeria Cardellini - SABD 2017/18 112

NoSQL Databases. The reference Big Data stack

NoSQL Databases. The reference Big Data stack Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica NoSQL Databases Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini The reference

More information

CAP Theorem, BASE & DynamoDB

CAP Theorem, BASE & DynamoDB Indian Institute of Science Bangalore, India भ रत य व ज ञ न स स थ न ब गल र, भ रत DS256:Jan18 (3:1) Department of Computational and Data Sciences CAP Theorem, BASE & DynamoDB Yogesh Simmhan Yogesh Simmhan

More information

CS Amazon Dynamo

CS Amazon Dynamo CS 5450 Amazon Dynamo Amazon s Architecture Dynamo The platform for Amazon's e-commerce services: shopping chart, best seller list, produce catalog, promotional items etc. A highly available, distributed

More information

BigTable: A Distributed Storage System for Structured Data

BigTable: A Distributed Storage System for Structured Data BigTable: A Distributed Storage System for Structured Data Amir H. Payberah amir@sics.se Amirkabir University of Technology (Tehran Polytechnic) Amir H. Payberah (Tehran Polytechnic) BigTable 1393/7/26

More information

CS 655 Advanced Topics in Distributed Systems

CS 655 Advanced Topics in Distributed Systems Presented by : Walid Budgaga CS 655 Advanced Topics in Distributed Systems Computer Science Department Colorado State University 1 Outline Problem Solution Approaches Comparison Conclusion 2 Problem 3

More information

CS November 2017

CS November 2017 Bigtable Highly available distributed storage Distributed Systems 18. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account

More information

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL CSE 544 Principles of Database Management Systems Magdalena Balazinska Winter 2015 Lecture 14 NoSQL References Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol. 39, No.

More information

NewSQL Databases. The reference Big Data stack

NewSQL Databases. The reference Big Data stack Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica NewSQL Databases Corso di Sistemi e Architetture per Big Data A.A. 2017/18 Valeria Cardellini The reference

More information

Goal of the presentation is to give an introduction of NoSQL databases, why they are there.

Goal of the presentation is to give an introduction of NoSQL databases, why they are there. 1 Goal of the presentation is to give an introduction of NoSQL databases, why they are there. We want to present "Why?" first to explain the need of something like "NoSQL" and then in "What?" we go in

More information

NoSQL systems: sharding, replication and consistency. Riccardo Torlone Università Roma Tre

NoSQL systems: sharding, replication and consistency. Riccardo Torlone Università Roma Tre NoSQL systems: sharding, replication and consistency Riccardo Torlone Università Roma Tre Data distribution NoSQL systems: data distributed over large clusters Aggregate is a natural unit to use for data

More information

CS November 2018

CS November 2018 Bigtable Highly available distributed storage Distributed Systems 19. Bigtable Built with semi-structured data in mind URLs: content, metadata, links, anchors, page rank User data: preferences, account

More information

Dynamo: Amazon s Highly Available Key-value Store. ID2210-VT13 Slides by Tallat M. Shafaat

Dynamo: Amazon s Highly Available Key-value Store. ID2210-VT13 Slides by Tallat M. Shafaat Dynamo: Amazon s Highly Available Key-value Store ID2210-VT13 Slides by Tallat M. Shafaat Dynamo An infrastructure to host services Reliability and fault-tolerance at massive scale Availability providing

More information

Dynamo: Amazon s Highly Available Key-value Store

Dynamo: Amazon s Highly Available Key-value Store Dynamo: Amazon s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and

More information

NoSQL Databases. Amir H. Payberah. Swedish Institute of Computer Science. April 10, 2014

NoSQL Databases. Amir H. Payberah. Swedish Institute of Computer Science. April 10, 2014 NoSQL Databases Amir H. Payberah Swedish Institute of Computer Science amir@sics.se April 10, 2014 Amir H. Payberah (SICS) NoSQL Databases April 10, 2014 1 / 67 Database and Database Management System

More information

DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE. Presented by Byungjin Jun

DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE. Presented by Byungjin Jun DYNAMO: AMAZON S HIGHLY AVAILABLE KEY-VALUE STORE Presented by Byungjin Jun 1 What is Dynamo for? Highly available key-value storages system Simple primary-key only interface Scalable and Reliable Tradeoff:

More information

Introduction to NoSQL Databases

Introduction to NoSQL Databases Introduction to NoSQL Databases Roman Kern KTI, TU Graz 2017-10-16 Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 1 / 31 Introduction Intro Why NoSQL? Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 2 / 31 Introduction

More information

CIB Session 12th NoSQL Databases Structures

CIB Session 12th NoSQL Databases Structures CIB Session 12th NoSQL Databases Structures By: Shahab Safaee & Morteza Zahedi Software Engineering PhD Email: safaee.shx@gmail.com, morteza.zahedi.a@gmail.com cibtrc.ir cibtrc cibtrc 2 Agenda What is

More information

Dynamo: Amazon s Highly Available Key-Value Store

Dynamo: Amazon s Highly Available Key-Value Store Dynamo: Amazon s Highly Available Key-Value Store DeCandia et al. Amazon.com Presented by Sushil CS 5204 1 Motivation A storage system that attains high availability, performance and durability Decentralized

More information

Extreme Computing. NoSQL.

Extreme Computing. NoSQL. Extreme Computing NoSQL PREVIOUSLY: BATCH Query most/all data Results Eventually NOW: ON DEMAND Single Data Points Latency Matters One problem, three ideas We want to keep track of mutable state in a scalable

More information

Presented By: Devarsh Patel

Presented By: Devarsh Patel : Amazon s Highly Available Key-value Store Presented By: Devarsh Patel CS5204 Operating Systems 1 Introduction Amazon s e-commerce platform Requires performance, reliability and efficiency To support

More information

CompSci 516 Database Systems

CompSci 516 Database Systems CompSci 516 Database Systems Lecture 20 NoSQL and Column Store Instructor: Sudeepa Roy Duke CS, Fall 2018 CompSci 516: Database Systems 1 Reading Material NOSQL: Scalable SQL and NoSQL Data Stores Rick

More information

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems CompSci 516 Data Intensive Computing Systems Lecture 21 (optional) NoSQL systems Instructor: Sudeepa Roy Duke CS, Spring 2016 CompSci 516: Data Intensive Computing Systems 1 Key- Value Stores Duke CS,

More information

Overview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL

Overview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL * Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL Overview * Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy *TowardsNewSQL NoSQL

More information

Introduction to Big Data. NoSQL Databases. Instituto Politécnico de Tomar. Ricardo Campos

Introduction to Big Data. NoSQL Databases. Instituto Politécnico de Tomar. Ricardo Campos Instituto Politécnico de Tomar Introduction to Big Data NoSQL Databases Ricardo Campos Mestrado EI-IC Análise e Processamento de Grandes Volumes de Dados Tomar, Portugal, 2016 Part of the slides used in

More information

CS 138: Dynamo. CS 138 XXIV 1 Copyright 2017 Thomas W. Doeppner. All rights reserved.

CS 138: Dynamo. CS 138 XXIV 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. CS 138: Dynamo CS 138 XXIV 1 Copyright 2017 Thomas W. Doeppner. All rights reserved. Dynamo Highly available and scalable distributed data store Manages state of services that have high reliability and

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2016) Week 10: Mutable State (1/2) March 15, 2016 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These

More information

Horizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage

Horizontal or vertical scalability? Horizontal scaling is challenging. Today. Scaling Out Key-Value Storage Horizontal or vertical scalability? Scaling Out Key-Value Storage COS 418: Distributed Systems Lecture 8 Kyle Jamieson Vertical Scaling Horizontal Scaling [Selected content adapted from M. Freedman, B.

More information

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI 2006 Presented by Xiang Gao 2014-11-05 Outline Motivation Data Model APIs Building Blocks Implementation Refinement

More information

CA485 Ray Walshe NoSQL

CA485 Ray Walshe NoSQL NoSQL BASE vs ACID Summary Traditional relational database management systems (RDBMS) do not scale because they adhere to ACID. A strong movement within cloud computing is to utilize non-traditional data

More information

Dynamo. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Motivation System Architecture Evaluation

Dynamo. Smruti R. Sarangi. Department of Computer Science Indian Institute of Technology New Delhi, India. Motivation System Architecture Evaluation Dynamo Smruti R. Sarangi Department of Computer Science Indian Institute of Technology New Delhi, India Smruti R. Sarangi Leader Election 1/20 Outline Motivation 1 Motivation 2 3 Smruti R. Sarangi Leader

More information

BigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis

BigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis BigTable: A Distributed Storage System for Structured Data (2006) Slides adapted by Tyler Davis Motivation Lots of (semi-)structured data at Google URLs: Contents, crawl metadata, links, anchors, pagerank,

More information

CSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores

CSE 444: Database Internals. Lectures 26 NoSQL: Extensible Record Stores CSE 444: Database Internals Lectures 26 NoSQL: Extensible Record Stores CSE 444 - Spring 2014 1 References Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol. 39, No. 4)

More information

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13

Bigtable. A Distributed Storage System for Structured Data. Presenter: Yunming Zhang Conglong Li. Saturday, September 21, 13 Bigtable A Distributed Storage System for Structured Data Presenter: Yunming Zhang Conglong Li References SOCC 2010 Key Note Slides Jeff Dean Google Introduction to Distributed Computing, Winter 2008 University

More information

Dynamo: Key-Value Cloud Storage

Dynamo: Key-Value Cloud Storage Dynamo: Key-Value Cloud Storage Brad Karp UCL Computer Science CS M038 / GZ06 22 nd February 2016 Context: P2P vs. Data Center (key, value) Storage Chord and DHash intended for wide-area peer-to-peer systems

More information

Scaling Out Key-Value Storage

Scaling Out Key-Value Storage Scaling Out Key-Value Storage COS 418: Distributed Systems Logan Stafman [Adapted from K. Jamieson, M. Freedman, B. Karp] Horizontal or vertical scalability? Vertical Scaling Horizontal Scaling 2 Horizontal

More information

Haridimos Kondylakis Computer Science Department, University of Crete

Haridimos Kondylakis Computer Science Department, University of Crete CS-562 Advanced Topics in Databases Haridimos Kondylakis Computer Science Department, University of Crete QSX (LN2) 2 NoSQL NoSQL: Not Only SQL. User case of NoSQL? Massive write performance. Fast key

More information

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre NoSQL systems: introduction and data models Riccardo Torlone Università Roma Tre Leveraging the NoSQL boom 2 Why NoSQL? In the last fourty years relational databases have been the default choice for serious

More information

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017) Week 10: Mutable State (1/2) March 14, 2017 Jimmy Lin David R. Cheriton School of Computer Science University of Waterloo These

More information

10. Replication. Motivation

10. Replication. Motivation 10. Replication Page 1 10. Replication Motivation Reliable and high-performance computation on a single instance of a data object is prone to failure. Replicate data to overcome single points of failure

More information

Distributed File Systems II

Distributed File Systems II Distributed File Systems II To do q Very-large scale: Google FS, Hadoop FS, BigTable q Next time: Naming things GFS A radically new environment NFS, etc. Independence Small Scale Variety of workloads Cooperation

More information

NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY WHAT IS NOSQL? Stands for No-SQL or Not Only SQL. Class of non-relational data storage systems E.g.

More information

Dynamo: Amazon s Highly Available Key-value Store

Dynamo: Amazon s Highly Available Key-value Store Dynamo: Amazon s Highly Available Key-value Store Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and

More information

DIVING IN: INSIDE THE DATA CENTER

DIVING IN: INSIDE THE DATA CENTER 1 DIVING IN: INSIDE THE DATA CENTER Anwar Alhenshiri Data centers 2 Once traffic reaches a data center it tunnels in First passes through a filter that blocks attacks Next, a router that directs it to

More information

Distributed Systems. 16. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2017

Distributed Systems. 16. Distributed Lookup. Paul Krzyzanowski. Rutgers University. Fall 2017 Distributed Systems 16. Distributed Lookup Paul Krzyzanowski Rutgers University Fall 2017 1 Distributed Lookup Look up (key, value) Cooperating set of nodes Ideally: No central coordinator Some nodes can

More information

CISC 7610 Lecture 2b The beginnings of NoSQL

CISC 7610 Lecture 2b The beginnings of NoSQL CISC 7610 Lecture 2b The beginnings of NoSQL Topics: Big Data Google s infrastructure Hadoop: open google infrastructure Scaling through sharding CAP theorem Amazon s Dynamo 5 V s of big data Everyone

More information

Introduction Aggregate data model Distribution Models Consistency Map-Reduce Types of NoSQL Databases

Introduction Aggregate data model Distribution Models Consistency Map-Reduce Types of NoSQL Databases Introduction Aggregate data model Distribution Models Consistency Map-Reduce Types of NoSQL Databases Key-Value Document Column Family Graph John Edgar 2 Relational databases are the prevalent solution

More information

A Survey Paper on NoSQL Databases: Key-Value Data Stores and Document Stores

A Survey Paper on NoSQL Databases: Key-Value Data Stores and Document Stores A Survey Paper on NoSQL Databases: Key-Value Data Stores and Document Stores Nikhil Dasharath Karande 1 Department of CSE, Sanjay Ghodawat Institutes, Atigre nikhilkarande18@gmail.com Abstract- This paper

More information

References. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals

References. What is Bigtable? Bigtable Data Model. Outline. Key Features. CSE 444: Database Internals References CSE 444: Database Internals Scalable SQL and NoSQL Data Stores, Rick Cattell, SIGMOD Record, December 2010 (Vol 39, No 4) Lectures 26 NoSQL: Extensible Record Stores Bigtable: A Distributed

More information

Bigtable: A Distributed Storage System for Structured Data. Andrew Hon, Phyllis Lau, Justin Ng

Bigtable: A Distributed Storage System for Structured Data. Andrew Hon, Phyllis Lau, Justin Ng Bigtable: A Distributed Storage System for Structured Data Andrew Hon, Phyllis Lau, Justin Ng What is Bigtable? - A storage system for managing structured data - Used in 60+ Google services - Motivation:

More information

Big Data Analytics. Rasoul Karimi

Big Data Analytics. Rasoul Karimi Big Data Analytics Rasoul Karimi Information Systems and Machine Learning Lab (ISMLL) Institute of Computer Science University of Hildesheim, Germany Big Data Analytics Big Data Analytics 1 / 1 Outline

More information

Chapter 24 NOSQL Databases and Big Data Storage Systems

Chapter 24 NOSQL Databases and Big Data Storage Systems Chapter 24 NOSQL Databases and Big Data Storage Systems - Large amounts of data such as social media, Web links, user profiles, marketing and sales, posts and tweets, road maps, spatial data, email - NOSQL

More information

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu NoSQL Databases MongoDB vs Cassandra Kenny Huynh, Andre Chik, Kevin Vu Introduction - Relational database model - Concept developed in 1970 - Inefficient - NoSQL - Concept introduced in 1980 - Related

More information

Advanced Data Management Technologies

Advanced Data Management Technologies ADMT 2017/18 Unit 15 J. Gamper 1/44 Advanced Data Management Technologies Unit 15 Introduction to NoSQL J. Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE ADMT 2017/18 Unit 15

More information

Spotify. Scaling storage to million of users world wide. Jimmy Mårdell October 14, 2014

Spotify. Scaling storage to million of users world wide. Jimmy Mårdell October 14, 2014 Cassandra @ Spotify Scaling storage to million of users world wide! Jimmy Mårdell October 14, 2014 2 About me Jimmy Mårdell Tech Product Owner in the Cassandra team 4 years at Spotify

More information

There is a tempta7on to say it is really used, it must be good

There is a tempta7on to say it is really used, it must be good Notes from reviews Dynamo Evalua7on doesn t cover all design goals (e.g. incremental scalability, heterogeneity) Is it research? Complexity? How general? Dynamo Mo7va7on Normal database not the right fit

More information

FAQs Snapshots and locks Vector Clock

FAQs Snapshots and locks Vector Clock //08 CS5 Introduction to Big - FALL 08 W.B.0.0 CS5 Introduction to Big //08 CS5 Introduction to Big - FALL 08 W.B. FAQs Snapshots and locks Vector Clock PART. LARGE SCALE DATA STORAGE SYSTEMS NO SQL DATA

More information

Bigtable. Presenter: Yijun Hou, Yixiao Peng

Bigtable. Presenter: Yijun Hou, Yixiao Peng Bigtable Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google, Inc. OSDI 06 Presenter: Yijun Hou, Yixiao Peng

More information

Database Architectures

Database Architectures Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/15/15 Agenda Check-in Parallelism and Distributed Databases Technology Research Project Introduction to NoSQL

More information

Distributed Systems 16. Distributed File Systems II

Distributed Systems 16. Distributed File Systems II Distributed Systems 16. Distributed File Systems II Paul Krzyzanowski pxk@cs.rutgers.edu 1 Review NFS RPC-based access AFS Long-term caching CODA Read/write replication & disconnected operation DFS AFS

More information

Bigtable: A Distributed Storage System for Structured Data by Google SUNNIE CHUNG CIS 612

Bigtable: A Distributed Storage System for Structured Data by Google SUNNIE CHUNG CIS 612 Bigtable: A Distributed Storage System for Structured Data by Google SUNNIE CHUNG CIS 612 Google Bigtable 2 A distributed storage system for managing structured data that is designed to scale to a very

More information

Sources. P. J. Sadalage, M Fowler, NoSQL Distilled, Addison Wesley

Sources. P. J. Sadalage, M Fowler, NoSQL Distilled, Addison Wesley Big Data and NoSQL Sources P. J. Sadalage, M Fowler, NoSQL Distilled, Addison Wesley Very short history of DBMSs The seventies: IMS end of the sixties, built for the Apollo program (today: Version 15)

More information

BigTable. CSE-291 (Cloud Computing) Fall 2016

BigTable. CSE-291 (Cloud Computing) Fall 2016 BigTable CSE-291 (Cloud Computing) Fall 2016 Data Model Sparse, distributed persistent, multi-dimensional sorted map Indexed by a row key, column key, and timestamp Values are uninterpreted arrays of bytes

More information

Distributed Hash Tables Chord and Dynamo

Distributed Hash Tables Chord and Dynamo Distributed Hash Tables Chord and Dynamo (Lecture 19, cs262a) Ion Stoica, UC Berkeley October 31, 2016 Today s Papers Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications, Ion Stoica,

More information

Distributed Systems [Fall 2012]

Distributed Systems [Fall 2012] Distributed Systems [Fall 2012] Lec 20: Bigtable (cont ed) Slide acks: Mohsen Taheriyan (http://www-scf.usc.edu/~csci572/2011spring/presentations/taheriyan.pptx) 1 Chubby (Reminder) Lock service with a

More information

Challenges for Data Driven Systems

Challenges for Data Driven Systems Challenges for Data Driven Systems Eiko Yoneki University of Cambridge Computer Laboratory Data Centric Systems and Networking Emergence of Big Data Shift of Communication Paradigm From end-to-end to data

More information

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems Jargons, Concepts, Scope and Systems Key Value Stores, Document Stores, Extensible Record Stores Overview of different scalable relational systems Examples of different Data stores Predictions, Comparisons

More information

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL

Building High Performance Apps using NoSQL. Swami Sivasubramanian General Manager, AWS NoSQL Building High Performance Apps using NoSQL Swami Sivasubramanian General Manager, AWS NoSQL Building high performance apps There is a lot to building high performance apps Scalability Performance at high

More information

EECS 498 Introduction to Distributed Systems

EECS 498 Introduction to Distributed Systems EECS 498 Introduction to Distributed Systems Fall 2017 Harsha V. Madhyastha Dynamo Recap Consistent hashing 1-hop DHT enabled by gossip Execution of reads and writes Coordinated by first available successor

More information

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5. Question 1 What makes a message unstable? How does an unstable message become stable? Distributed Systems 2016 Exam 2 Review Paul Krzyzanowski Rutgers University Fall 2016 In virtual sychrony, a message

More information

Data Acquisition. The reference Big Data stack

Data Acquisition. The reference Big Data stack Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Data Acquisition Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria Cardellini The reference

More information

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University CS 555: DISTRIBUTED SYSTEMS [DYNAMO & GOOGLE FILE SYSTEM] Frequently asked questions from the previous class survey What s the typical size of an inconsistency window in most production settings? Dynamo?

More information

SCALABLE CONSISTENCY AND TRANSACTION MODELS

SCALABLE CONSISTENCY AND TRANSACTION MODELS Data Management in the Cloud SCALABLE CONSISTENCY AND TRANSACTION MODELS 69 Brewer s Conjecture Three properties that are desirable and expected from realworld shared-data systems C: data consistency A:

More information

CS5412: OTHER DATA CENTER SERVICES

CS5412: OTHER DATA CENTER SERVICES 1 CS5412: OTHER DATA CENTER SERVICES Lecture V Ken Birman Tier two and Inner Tiers 2 If tier one faces the user and constructs responses, what lives in tier two? Caching services are very common (many

More information

CS5412: DIVING IN: INSIDE THE DATA CENTER

CS5412: DIVING IN: INSIDE THE DATA CENTER 1 CS5412: DIVING IN: INSIDE THE DATA CENTER Lecture V Ken Birman We ve seen one cloud service 2 Inside a cloud, Dynamo is an example of a service used to make sure that cloud-hosted applications can scale

More information

Recap. CSE 486/586 Distributed Systems Case Study: Amazon Dynamo. Amazon Dynamo. Amazon Dynamo. Necessary Pieces? Overview of Key Design Techniques

Recap. CSE 486/586 Distributed Systems Case Study: Amazon Dynamo. Amazon Dynamo. Amazon Dynamo. Necessary Pieces? Overview of Key Design Techniques Recap Distributed Systems Case Study: Amazon Dynamo CAP Theorem? Consistency, Availability, Partition Tolerance P then C? A? Eventual consistency? Availability and partition tolerance over consistency

More information

Distributed Non-Relational Databases. Pelle Jakovits

Distributed Non-Relational Databases. Pelle Jakovits Distributed Non-Relational Databases Pelle Jakovits Tartu, 7 December 2018 Outline Relational model NoSQL Movement Non-relational data models Key-value Document-oriented Column family Graph Non-relational

More information

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis 1 NoSQL So-called NoSQL systems offer reduced functionalities compared to traditional Relational DBMSs, with the aim of achieving

More information

NoSQL Databases. CPS352: Database Systems. Simon Miner Gordon College Last Revised: 4/22/15

NoSQL Databases. CPS352: Database Systems. Simon Miner Gordon College Last Revised: 4/22/15 NoSQL Databases CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/22/15 Agenda Check-in NoSQL Databases Aggregate databases Key-value, document, and column family Graph databases Related

More information

Relational databases

Relational databases COSC 6397 Big Data Analytics NoSQL databases Edgar Gabriel Spring 2017 Relational databases Long lasting industry standard to store data persistently Key points concurrency control, transactions, standard

More information

Recap. CSE 486/586 Distributed Systems Case Study: Amazon Dynamo. Amazon Dynamo. Amazon Dynamo. Necessary Pieces? Overview of Key Design Techniques

Recap. CSE 486/586 Distributed Systems Case Study: Amazon Dynamo. Amazon Dynamo. Amazon Dynamo. Necessary Pieces? Overview of Key Design Techniques Recap CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo CAP Theorem? Consistency, Availability, Partition Tolerance P then C? A?

More information

Search Engines and Time Series Databases

Search Engines and Time Series Databases Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Search Engines and Time Series Databases Corso di Sistemi e Architetture per Big Data A.A. 2017/18

More information

Cassandra- A Distributed Database

Cassandra- A Distributed Database Cassandra- A Distributed Database Tulika Gupta Department of Information Technology Poornima Institute of Engineering and Technology Jaipur, Rajasthan, India Abstract- A relational database is a traditional

More information

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB s C. Faloutsos A. Pavlo Lecture#23: Distributed Database Systems (R&G ch. 22) Administrivia Final Exam Who: You What: R&G Chapters 15-22

More information

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini Large-Scale Key-Value Stores Eventual Consistency Marco Serafini COMPSCI 590S Lecture 13 Goals of Key-Value Stores Export simple API put(key, value) get(key) Simpler and faster than a DBMS Less complexity,

More information

Final Exam Logistics. CS 133: Databases. Goals for Today. Some References Used. Final exam take-home. Same resources as midterm

Final Exam Logistics. CS 133: Databases. Goals for Today. Some References Used. Final exam take-home. Same resources as midterm Final Exam Logistics CS 133: Databases Fall 2018 Lec 25 12/06 NoSQL Final exam take-home Available: Friday December 14 th, 4:00pm in Olin Due: Monday December 17 th, 5:15pm Same resources as midterm Except

More information

Programming model and implementation for processing and. Programs can be automatically parallelized and executed on a large cluster of machines

Programming model and implementation for processing and. Programs can be automatically parallelized and executed on a large cluster of machines A programming model in Cloud: MapReduce Programming model and implementation for processing and generating large data sets Users specify a map function to generate a set of intermediate key/value pairs

More information

Migrating to Cassandra in the Cloud, the Netflix Way

Migrating to Cassandra in the Cloud, the Netflix Way Migrating to Cassandra in the Cloud, the Netflix Way Jason Brown - @jasobrown Senior Software Engineer, Netflix Tech History, 1998-2008 In the beginning, there was the webapp and a single database in a

More information

Integrity in Distributed Databases

Integrity in Distributed Databases Integrity in Distributed Databases Andreas Farella Free University of Bozen-Bolzano Table of Contents 1 Introduction................................................... 3 2 Different aspects of integrity.....................................

More information

Background. Distributed Key/Value stores provide a simple put/get interface. Great properties: scalability, availability, reliability

Background. Distributed Key/Value stores provide a simple put/get interface. Great properties: scalability, availability, reliability Background Distributed Key/Value stores provide a simple put/get interface Great properties: scalability, availability, reliability Increasingly popular both within data centers Cassandra Dynamo Voldemort

More information

6.830 Lecture Spark 11/15/2017

6.830 Lecture Spark 11/15/2017 6.830 Lecture 19 -- Spark 11/15/2017 Recap / finish dynamo Sloppy Quorum (healthy N) Dynamo authors don't think quorums are sufficient, for 2 reasons: - Decreased durability (want to write all data at

More information

CSE-E5430 Scalable Cloud Computing Lecture 9

CSE-E5430 Scalable Cloud Computing Lecture 9 CSE-E5430 Scalable Cloud Computing Lecture 9 Keijo Heljanko Department of Computer Science School of Science Aalto University keijo.heljanko@aalto.fi 15.11-2015 1/24 BigTable Described in the paper: Fay

More information

Non-Relational Databases. Pelle Jakovits

Non-Relational Databases. Pelle Jakovits Non-Relational Databases Pelle Jakovits 25 October 2017 Outline Background Relational model Database scaling The NoSQL Movement CAP Theorem Non-relational data models Key-value Document-oriented Column

More information

NOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS. Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe

NOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS. Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe NOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS h_da Prof. Dr. Uta Störl Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe 2017 163 Performance / Benchmarks Traditional database benchmarks

More information

Search and Time Series Databases

Search and Time Series Databases Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Search and Time Series Databases Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria

More information

10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414

10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414 Announcements Database Systems CSE 414 Lecture 11: NoSQL & JSON (mostly not in textbook only Ch 11.1) HW5 will be posted on Friday and due on Nov. 14, 11pm [No Web Quiz 5] Today s lecture: NoSQL & JSON

More information

Exam 2 Review. October 29, Paul Krzyzanowski 1

Exam 2 Review. October 29, Paul Krzyzanowski 1 Exam 2 Review October 29, 2015 2013 Paul Krzyzanowski 1 Question 1 Why did Dropbox add notification servers to their architecture? To avoid the overhead of clients polling the servers periodically to check

More information

Distributed PostgreSQL with YugaByte DB

Distributed PostgreSQL with YugaByte DB Distributed PostgreSQL with YugaByte DB Karthik Ranganathan PostgresConf Silicon Valley Oct 16, 2018 1 CHECKOUT THIS REPO: github.com/yugabyte/yb-sql-workshop 2 About Us Founders Kannan Muthukkaruppan,

More information

ΕΠΛ 602:Foundations of Internet Technologies. Cloud Computing

ΕΠΛ 602:Foundations of Internet Technologies. Cloud Computing ΕΠΛ 602:Foundations of Internet Technologies Cloud Computing 1 Outline Bigtable(data component of cloud) Web search basedonch13of thewebdatabook 2 What is Cloud Computing? ACloudis an infrastructure, transparent

More information

7680: Distributed Systems

7680: Distributed Systems Cristina Nita-Rotaru 7680: Distributed Systems BigTable. Hbase.Spanner. 1: BigTable Acknowledgement } Slides based on material from course at UMichigan, U Washington, and the authors of BigTable and Spanner.

More information

Cassandra Design Patterns

Cassandra Design Patterns Cassandra Design Patterns Sanjay Sharma Chapter No. 1 "An Overview of Architecture and Data Modeling in Cassandra" In this package, you will find: A Biography of the author of the book A preview chapter

More information