Distributed Data Store

Similar documents
CISC 7610 Lecture 2b The beginnings of NoSQL

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

Introduction to Computer Science. William Hsu Department of Computer Science and Engineering National Taiwan Ocean University

Database Availability and Integrity in NoSQL. Fahri Firdausillah [M ]

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

Goal of the presentation is to give an introduction of NoSQL databases, why they are there.

Introduction Aggregate data model Distribution Models Consistency Map-Reduce Types of NoSQL Databases

CompSci 516 Database Systems

Rule 14 Use Databases Appropriately

COSC 416 NoSQL Databases. NoSQL Databases Overview. Dr. Ramon Lawrence University of British Columbia Okanagan

Introduction to NoSQL Databases

NoSQL Databases. Amir H. Payberah. Swedish Institute of Computer Science. April 10, 2014

Extreme Computing. NoSQL.

Distributed File Systems II

CS 655 Advanced Topics in Distributed Systems

SCALABLE CONSISTENCY AND TRANSACTION MODELS

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

CS /15/16. Paul Krzyzanowski 1. Question 1. Distributed Systems 2016 Exam 2 Review. Question 3. Question 2. Question 5.

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

Introduction to NoSQL

NoSQL systems: sharding, replication and consistency. Riccardo Torlone Università Roma Tre

Database Architectures

10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414

CS October 2017

Apache Cassandra - A Decentralized Structured Storage System

4/9/2018 Week 13-A Sangmi Lee Pallickara. CS435 Introduction to Big Data Spring 2018 Colorado State University. FAQs. Architecture of GFS

Overview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL

Introduction to Distributed Data Systems

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

Distributed systems. Lecture 6: distributed transactions, elections, consensus and replication. Malte Schwarzkopf

Chapter 24 NOSQL Databases and Big Data Storage Systems

Eventual Consistency 1

CA485 Ray Walshe Google File System

Final Exam Logistics. CS 133: Databases. Goals for Today. Some References Used. Final exam take-home. Same resources as midterm

CIB Session 12th NoSQL Databases Structures

NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

Cluster-Based Computing

11/5/2018 Week 12-A Sangmi Lee Pallickara. CS435 Introduction to Big Data FALL 2018 Colorado State University

Distributed Filesystem

Google File System, Replication. Amin Vahdat CSE 123b May 23, 2006

DIVING IN: INSIDE THE DATA CENTER

Cassandra Design Patterns

Relational databases

5/2/16. Announcements. NoSQL Motivation. The New Hipster: NoSQL. Serverless. What is the Problem? Database Systems CSE 414

Exam 2 Review. October 29, Paul Krzyzanowski 1

10. Replication. Motivation

Database Systems CSE 414

Consistency. CS 475, Spring 2018 Concurrent & Distributed Systems

Modern Database Concepts

Advanced Database Technologies NoSQL: Not only SQL

CSE 530A. Non-Relational Databases. Washington University Fall 2013

Staggeringly Large Filesystems

Distributed Key-Value Stores UCSB CS170

Big Data Analytics. Rasoul Karimi

5/1/17. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414

CPSC 426/526. Cloud Computing. Ennan Zhai. Computer Science Department Yale University

A Study of NoSQL Database

CS5412: DIVING IN: INSIDE THE DATA CENTER

CS November 2017

Last time. Distributed systems Lecture 6: Elections, distributed transactions, and replication. DrRobert N. M. Watson

CA485 Ray Walshe NoSQL

The Google File System (GFS)

CSE 344 JULY 9 TH NOSQL

CS-580K/480K Advanced Topics in Cloud Computing. NoSQL Database

Consensus and related problems

Distributed Systems 16. Distributed File Systems II

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2017

A Review Of Non Relational Databases, Their Types, Advantages And Disadvantages

CS /30/17. Paul Krzyzanowski 1. Google Chubby ( Apache Zookeeper) Distributed Systems. Chubby. Chubby Deployment.

INF-5360 Presentation

Warnings! Today. New Systems. OLAP vs. OLTP. New Systems vs. RDMS. NoSQL 11/5/17. Material from Cattell s paper ( ) some info will be outdated

Middle East Technical University. Jeren AKHOUNDI ( ) Ipek Deniz Demirtel ( ) Derya Nur Ulus ( ) CENG553 Database Management Systems

PROFESSIONAL. NoSQL. Shashank Tiwari WILEY. John Wiley & Sons, Inc.

FLAT DATACENTER STORAGE CHANDNI MODI (FN8692)

CS5412: OTHER DATA CENTER SERVICES

CS5412: DIVING IN: INSIDE THE DATA CENTER

Challenges for Data Driven Systems

Exam 2 Review. Fall 2011

CS November 2018

Data Modeling and Databases Ch 14: Data Replication. Gustavo Alonso, Ce Zhang Systems Group Department of Computer Science ETH Zürich

goals monitoring, fault tolerance, auto-recovery (thousands of low-cost machines) handle appends efficiently (no random writes & sequential reads)

GFS: The Google File System. Dr. Yingwu Zhu

CIT 668: System Architecture. Distributed Databases

MongoDB and Mysql: Which one is a better fit for me? Room 204-2:20PM-3:10PM

GFS: The Google File System

Network Protocols. Sarah Diesburg Operating Systems CS 3430

Distributed KIDS Labs 1

Bigtable: A Distributed Storage System for Structured Data By Fay Chang, et al. OSDI Presented by Xiang Gao

Distributed Systems. Lec 10: Distributed File Systems GFS. Slide acks: Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung

High Performance NoSQL with MongoDB

DATABASE DESIGN II - 1DL400

Programming model and implementation for processing and. Programs can be automatically parallelized and executed on a large cluster of machines

Distributed Systems. 15. Distributed File Systems. Paul Krzyzanowski. Rutgers University. Fall 2016

Map-Reduce. Marco Mura 2010 March, 31th

Cassandra- A Distributed Database

Data Informatics. Seon Ho Kim, Ph.D.

Chapter 11: File System Implementation. Objectives

MapReduce. U of Toronto, 2014

Transcription:

Distributed Data Store Large-Scale Distributed le system Q: What if we have too much data to store in a single machine? Q: How can we create one big filesystem over a cluster of machines, whose data is reliable despite individual node failures? Basic idea: Split the data into small chunks Distribute the chunks to nodes Replicate each chunk at multiple nodes to deal with failure A master server with the filename -> chunk mapping GFS (HDFS) [ GFS diagram ] client master chunkserver chunkserver chunkserver Provides a file system (global namespace) over cluster of machines A file is split into (often) 64M chunks each chunk is replicated on multiple chunkservers GFS master + chuck server Master provides file name -> chuck server mapping chunk server monitoring and replacement Chunk server provides actual data chunk Junghoo "John" Cho (cho@cs.ucla.edu) 1

best chunkserver is chosen to minimize network bandwidth consumption Optimized for Sequential/random read of large file Append at the end Synchronizing random writes much more costly with failures Q: Is a single master server enough to deal with the huge filesystem? Ex) 1 billion files. 64 byte name per file. 4GB per file. How much space to remember the mapping? SAN (Storage Area Network) Scale up approach for disks Provides SCSI interface over many hard disks (often thru optical LAN) All nodes in the cluster see one shared big disk Convenient but very expensive NoSQL datastores Limitation of relational databases Relation: very general. But is it the best data representation? e.g., java objects SQL: very expressive query language. But is this expressiveness necessary? e.g., student records. How will it be accessed? The power of SQL also leads to complexity. Do we really need it? Very difficult to reimplement full relational database on a cluster architecture Need for a new system that is designed from the ground up for Ultimate scalability for highly distributed environment Core functionality necessary for new Web apps ACID: very strong data integrity guarantee. But is it necessary? e.g., user s status updates. Is ACID necessary? Difficulty of maintaining a shared state among distributed nodes Byzantine-general problem Two generals A, B need to attack together to win Junghoo "John" Cho (cho@cs.ucla.edu) 2

Messenger may be lost during delivery (say with 90% chance) Q: How can they make sure simultaneous attack on Sunday midnight? Q: Any problem? A sends a messenger to B A attacks after sending a messenger B attacks once he hears from a messenger Q: Any better? A sends a messenger to B B sends ACK messenger back to A A attacks if he gets ACK from B B attacks if he hears from A s messenger Q: Any better? A sends a messenger. B ACKS. A ACKS back to B. Q: How to improve chance of success? Remarks Keeping states on multiple nodes synchronized is often VERY COSTLY. Synchronization overhead outweighs benefit for more than a few nodes If possible, relax state synchronization guarantee on multiple nodes Otherwise, the overall performance will be too based compared to centralized processing CAP theorem (by Eric Brewer) Consistency: after an update, all readers will see the latest update Availability: continuous operation despite node failure Partition Tolerance: continuous operation despite network partition Only two of three characteristics can be guaranteed by any system Two important questions: 1. Q: What data model to use? (Key, value) store Column store Document store 2. Q: How to relax consistency guarantee? MongoDB Atomicity guarantee on individual operation Junghoo "John" Cho (cho@cs.ucla.edu) 3

No transaction support on multiple operations The application is responsible for concurrency control ACID vs. BASE Basically Available: system works basically all the time Soft-state: does not have to be consistent all the time Eventual consistency: will be in a consistent state eventually Possible consistency models Read Your Own Writes (RYOW) Consistency Session Consistency Monotonic-Read Consistency: readers will see only newer update in the future... (Key, value) store Example: Amazon SimpleDB, Redis, Memcached,... Example scenario Key: student id, Value: (student name, address, email, major,... ) Q: How to process find all student info with sid 301? Q: How to process find student email with sid 301? Q: How to process change the major of sid 301 to CS? Q: How to process find student whose email is cho@cs? Remarks Read can be inefficient for a single-field retrieval Update can be inefficient Data access using a non-primary key is not supported Junghoo "John" Cho (cho@cs.ucla.edu) 4

Need to create and maintain a separate index for non-key access Column store Example: Google s BigTable, Cassandra (Facebook),... Each data item is a row in a table - Each row has a unique row-key, but may not have all column values Columns are grouped as column families Preserves locality of column access and storage Data is accessed through (row-key, column-name) value Compared to (key,value) store, more structure of the data is exposed to the system More efficient update and retrieval Q: What about retrieval based on column values only without row-key (like student by email)? Data access is still limited to row-key Document store Example: MongoDB, Amazon DynamoDB, CouchDB Document consists of multiple fields, which are (key, value) pairs. No preset schema for documents Example document: Title : CouchDB Categories : [ Database, NoSQL, Document Database ] Date : Jan 10, 2011 Author :... User specifies the fields on which to build an index Allows data retrieval based on non-key fields (by using appropriate indexes) But, this data store will be more complex to build than others and less scalable Junghoo "John" Cho (cho@cs.ucla.edu) 5

Consistent hashing Q: How to distribute objects with different keys to k nodes? hash(object) mod k? What about Q: How to minimize data reorganization when a new node is introduced? Consistent hashing Hash BOTH objects and nodes to a hash ring Assign objects to the node right behind them Q: What to do when a new node is introduced? Q: Non-uniform hash range for each node. How to minimize load imbalance? Hash many virtual nodes to the ring Assign virtual nodes to physical nodes Q: How to make data available despite node failure? Replication to the next k nodes Distributed Transactions Q: How can we guarantee atomicity of a transaction on multiple machines? Either everyone commits or no one should commit. Two-phase commit Before commit, ask everyone whether they are ready PREPARE -> VOTE-COMMIT/VOTE-ABORT Junghoo "John" Cho (cho@cs.ucla.edu) 6

Note: Anyone who said ready cannot say otherwise later If everyone says yes, commit if anyone says no, abort Coordinator Participant Initial ------- PREPARE -------> Initial Wait <----- VOTE - ABORT - - - - - -+ - N - ready? Y < -------- VOTE - COMMIT - - - - - - - Ready + -Y - - any no? N Abort - - - - - - - GLOBAL - ABORT - - - - - - - - - - - - - - - > Commit ---- GLOBAL - COMMIT - - - - - - - - - > + - N - commit? Y Abort Commit Q: Any potential problem of two phase commit? Q: Where should we add timeout to avoid indefinite wait? Remark: Do not get confused with two-phase locking Two-phase commit can be used for managing distributed transactions in general Example: A travel site that arranges both flight and hotel User wants to book the flight F and hotel H using credit card C The site needs to coordinate transaction with with banks, hotels, and airlines Q: How should the site handle the booking? Junghoo "John" Cho (cho@cs.ucla.edu) 7

Q: What if the credit card authorizations fail? What if the flight is no longer available? What if the hotel is no longer available? Asynchronous transaction Q: What if one participant is very slow? Example: Starbucks. Cashier faster. Barista slower. Q: What does two-phase commit mean in this scenario? Q: How can we let each participant go ahead without waiting for the slow one? Q: What does Starbucks do? Asynchronous transaction Each participant commits whenever he is done and moves ahead Transaction = sequence of smaller transactions by each participant The entire transaction is done when every participant commits No coordinated wait and synchronous commit Q: What if the coffee machine breaks down after customer paid? Compensating transaction A transaction that rolls back a committed transaction The coordinator should keep track of the dependency of transactions Together with their compensating transactions If any transaction aborts, run compensating transaction for all committed transactions Q: When should we use two-phase commit/asynchronous transaction? Importance of individual commit guarantee Junghoo "John" Cho (cho@cs.ucla.edu) 8

Duration of individual transaction Probability of abort Junghoo "John" Cho (cho@cs.ucla.edu) 9