Database Availability and Integrity in NoSQL. Fahri Firdausillah [M ]

Similar documents
Introduction to NoSQL

Migrating Oracle Databases To Cassandra

CIB Session 12th NoSQL Databases Structures

Introduction to NoSQL Databases

SCALABLE CONSISTENCY AND TRANSACTION MODELS

Perspectives on NoSQL

Distributed Data Store

CompSci 516 Database Systems

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

CS-580K/480K Advanced Topics in Cloud Computing. NoSQL Database

A Study of NoSQL Database

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

Topics. History. Architecture. MongoDB, Mongoose - RDBMS - SQL. - NoSQL

Data Informatics. Seon Ho Kim, Ph.D.

Introduction to Computer Science. William Hsu Department of Computer Science and Engineering National Taiwan Ocean University

NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

CSE 530A. Non-Relational Databases. Washington University Fall 2013

Distributed Databases: SQL vs NoSQL

GridGain and Apache Ignite In-Memory Performance with Durability of Disk

Goal of the presentation is to give an introduction of NoSQL databases, why they are there.

Overview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL

L22: NoSQL. CS3200 Database design (sp18 s2) 4/5/2018 Several slides courtesy of Benny Kimelfeld

NoSQL systems: sharding, replication and consistency. Riccardo Torlone Università Roma Tre

Introduction Aggregate data model Distribution Models Consistency Map-Reduce Types of NoSQL Databases

CISC 7610 Lecture 2b The beginnings of NoSQL

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

CA485 Ray Walshe NoSQL

Introduction to Big Data. NoSQL Databases. Instituto Politécnico de Tomar. Ricardo Campos

10. Replication. Motivation

PROFESSIONAL. NoSQL. Shashank Tiwari WILEY. John Wiley & Sons, Inc.

Chapter 24 NOSQL Databases and Big Data Storage Systems

CS 655 Advanced Topics in Distributed Systems

The NoSQL Ecosystem. Adam Marcus MIT CSAIL

Big Data Development CASSANDRA NoSQL Training - Workshop. November 20 to (5 days) 9 am to 5 pm HOTEL DUBAI GRAND DUBAI

CAP Theorem. March 26, Thanks to Arvind K., Dong W., and Mihir N. for slides.

Architekturen für die Cloud

DEMYSTIFYING BIG DATA WITH RIAK USE CASES. Martin Schneider Basho Technologies!

MongoDB and Mysql: Which one is a better fit for me? Room 204-2:20PM-3:10PM

Five Common Myths About Scaling MySQL

Relational databases

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems

Eventual Consistency Today: Limitations, Extensions and Beyond

NoSQL Concepts, Techniques & Systems Part 1. Valentina Ivanova IDA, Linköping University

Eric Redmond. Seven Databases in Seven Weeks. Pragmatic Press Out by the end of Summer

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini

COMP9321 Web Application Engineering

Rule 14 Use Databases Appropriately

A Survey Paper on NoSQL Databases: Key-Value Data Stores and Document Stores

Final Exam Logistics. CS 133: Databases. Goals for Today. Some References Used. Final exam take-home. Same resources as midterm

AN introduction to nosql databases

NoSQL Databases. CPS352: Database Systems. Simon Miner Gordon College Last Revised: 4/22/15

Non-Relational Databases. Pelle Jakovits

NoSQL Databases. Amir H. Payberah. Swedish Institute of Computer Science. April 10, 2014

NoSQL : A Panorama for Scalable Databases in Web

Eventual Consistency 1

DATABASES SQL INFOTEK SOLUTIONS TEAM

SCALABLE CONSISTENCY AND TRANSACTION MODELS THANKS TO M. GROSSNIKLAUS

CS Amazon Dynamo

Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 NoSQL Databases

A Review Of Non Relational Databases, Their Types, Advantages And Disadvantages

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

Apache Cassandra - A Decentralized Structured Storage System

Databases : Lecture 1 2: Beyond ACID/Relational databases Timothy G. Griffin Lent Term Apologies to Martin Fowler ( NoSQL Distilled )

Performance Comparison of NOSQL Database Cassandra and SQL Server for Large Databases

CIT 668: System Architecture. Distributed Databases

COMP9321 Web Application Engineering

PRESENTATION TITLE GOES HERE. Understanding Architectural Trade-offs in Object Storage Technologies

NOSQL Databases: The Need of Enterprises

Database Evolution. DB NoSQL Linked Open Data. L. Vigliano

Application development with relational and non-relational databases

COSC 416 NoSQL Databases. NoSQL Databases Overview. Dr. Ramon Lawrence University of British Columbia Okanagan

Design and Validation of Distributed Data Stores using Formal Methods

Transactions and ACID

Haridimos Kondylakis Computer Science Department, University of Crete

Sources. P. J. Sadalage, M Fowler, NoSQL Distilled, Addison Wesley

Data Consistency Now and Then

Why distributed databases suck, and what to do about it. Do you want a database that goes down or one that serves wrong data?"

Modern Database Concepts

Big Data Analytics. Rasoul Karimi

Consistency in Distributed Storage Systems. Mihir Nanavati March 4 th, 2016

NoSQL Databases. Vincent Leroy

Databases : Lectures 11 and 12: Beyond ACID/Relational databases Timothy G. Griffin Lent Term 2013

4. Managing Big Data. Cloud Computing & Big Data MASTER ENGINYERIA INFORMÀTICA FIB/UPC. Fall Jordi Torres, UPC - BSC

Getting to know. by Michelle Darling August 2013

Encrypting Data of MongoDB at Application Level

Advanced Database Technologies NoSQL: Not only SQL

Final Exam Review 2. Kathleen Durant CS 3200 Northeastern University Lecture 23

Next-Generation Cloud Platform

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

Scaling for Humongous amounts of data with MongoDB

Scalability of web applications

The NoSQL movement. CouchDB as an example

CSE 344 JULY 9 TH NOSQL

CSE 344 Final Review. August 16 th

Presented by Sunnie S Chung CIS 612

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

Data Management for Big Data Part 1

International Journal of Informative & Futuristic Research ISSN:

CS555: Distributed Systems [Fall 2017] Dept. Of Computer Science, Colorado State University

Warnings! Today. New Systems. OLAP vs. OLTP. New Systems vs. RDMS. NoSQL 11/5/17. Material from Cattell s paper ( ) some info will be outdated

Transcription:

Database Availability and Integrity in NoSQL Fahri Firdausillah [M031010012]

What is NoSQL Stands for Not Only SQL Mostly addressing some of the points: nonrelational, distributed, horizontal scalable, schema-free, easy replication support, eventually consistent, and huge data amount This presentation will talk much about replication and horizontal scalable for database availability, then eventually consistent and schema-free for database integrity

List of NoSQL Products Cassandra used on: Digg, Facebook, Twitter, Reddit, Rackspace, Cloudkick, Cisco Hadoop used on: Amazon Web Services, Pentaho, Yahoo!, The New York Times CouchDB used on: CERN, BBC, Interactive Mediums MongoDB used on: Foursquare, bit.ly, SourceForge, Fotopedia, Joomla Ads Riak used on: Widescript, Western Communications, Ask Sponsored Listings

Database Availability Outline Database Availability Means CAP Theorem (BASE vs ACID) Partitioning and Replication Replication Diagram Ring of Consistent Hashing Next. Database Integrity

What is NoSQL Stands for Not Only SQL Mostly addressing some of the points: nonrelational, distributed, horizontal scalable, schema-free, easy replication support, eventually consistent, and huge data amount This presentation will talk much about horizontal scalable for database availability and eventually consistent for database integrity

What is NoSQL Stands for Not Only SQL Mostly addressing some of the points: nonrelational, distributed, horizontal scalable, schema-free, easy replication support, eventually consistent, and huge data amount This presentation will talk much about horizontal scalable for database availability and eventually consistent for database integrity

Database Availability Mean IBM divide database availability into 3 section: High Availability: database and application is available in scheduled period, when maintenance period system is temporarily down. Continuous Operation: system available all the time with no scheduled outages. Continuous Availability: combination of HA & CO, data is always available, and * Database Availability Considerations. IBM RedBook [2001] maintenance is done without shutdown the system

CAP Theorem (1) Consistency, Availability and Partition Tolerance. A shared-data system can have at most two of those three. * Visual Guide to NoSQL Systems, Nathan Hurst, 2010 [8]

CAP Theorem (2) Consistent, Available (CA) Systems have trouble with partitions and typically deal with it with replication. Consistent, Partition-Tolerant (CP) Systems have trouble with availability while keeping data. Available, Partition-Tolerant (AP) Systems achieve eventual consistency through replication and verification consistent across partitioned nodes.

ACID and BASE ACID Atomicity: All or nothing Consistency: Any transaction should result in valid tables Isolation: separate transactions Durability: Database will survive a system failures.

ACID and BASE cont'd BASE Basically Available - system seems to work all the time Soft State - it doesn't have to be consistent all the time Eventually Consistent - becomes consistent at some later time

Horizontal Scale Data explosion (especially in web application) force database system to scale 1st solution : Vertical scale Improving server specification by adding more processor, RAM, and storage device. Limited and expensive. 2 nd solution : Horizontal scale Adding more cheap computer as server expansion. Do sharding and partitioning which is hard to implement and expensive using relational databases (RDBMS)

Partitioning & Replication Partitioning Sharing the data between different nodes (data host) Each node placed on a ring Advantage : ability to scale incrementally Issues : non-uniform data distribution Replication Multiple nodes Multiple datacenters High availability and durability

Data Replication

Ring of Consistent Hashing

When a New Node Join Network

When Existing Node Leaves Network

Database Integrity Outline Database Integrity Means Do We Really Need Consistency? Eventually Consistent Variations of Eventually Consistency Problem in Strict Schema Schema-Free

Database Integrity Means Ensure data entered into the database is accurate, valid, and consistent. Three basic types of integrity constraints: Entity integrity, allowing no two rows to have the same identity within a table. Domain integrity, restricting data to predefined data types. Referential integrity, requiring the existence of a related row in another table, e.g. a customer for a given customer ID.

Do We Really Need Consistency? In strict OLTP environment (e.g. banking and ERP) data consistency is heart of the system. But even in Amazon (e-commerce) real-time consistency is not really needed. In large shared data environment such Facebook, Digg, Yahoo, Google, etc. data consistency can be relaxed Systems with strong ACID have poor performance.

Eventually Consistent Specific form of weak consistency If no new updates are made, eventually all accesses will return the last updated value. System does not guarantee subsequent accesses will return the updated value. A number of conditions need to be met before the value will be returned.

Variations of Eventually Causal consistency Consistency Read-your-writes consistency Session consistency Monotonic read consistency Monotonic write consistency

Problem in Strict Schema Agile methodology is about changing adoption Dynamic Frameworks (e.g. Ruby on Rails, Django, and Grails, Symfony) are now widely used In many cases it is hard to migrate across database Adding more column leaves null values on previous record.

Schema-Free Enable to add column in row level. Not restricted to column level. Each rows only use column they need (saving space). All we need to do is defining Namespace for tables. Then we can just add column, even another table in particular column. No more integration headache

Conclusion & (not a) Summary NoSQL is yet another form of database. NoSQL don't intend to replace RDBMS. It is database alternative in Large data shared environment. Relaxing consistency will boost database availability and performance. There is no Free Lunch and Silver Bullet in database technologies.

Thank You