Overview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL

Similar documents
NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

Introduction to NoSQL Databases

CISC 7610 Lecture 2b The beginnings of NoSQL

CompSci 516 Database Systems

CSE 344 JULY 9 TH NOSQL

Non-Relational Databases. Pelle Jakovits

5/2/16. Announcements. NoSQL Motivation. The New Hipster: NoSQL. Serverless. What is the Problem? Database Systems CSE 414

Database Systems CSE 414

Introduction to Computer Science. William Hsu Department of Computer Science and Engineering National Taiwan Ocean University

Chapter 24 NOSQL Databases and Big Data Storage Systems

CS-580K/480K Advanced Topics in Cloud Computing. NoSQL Database

NoSQL Databases. Amir H. Payberah. Swedish Institute of Computer Science. April 10, 2014

10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414

Distributed Non-Relational Databases. Pelle Jakovits

Introduction Aggregate data model Distribution Models Consistency Map-Reduce Types of NoSQL Databases

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

Introduction to Big Data. NoSQL Databases. Instituto Politécnico de Tomar. Ricardo Campos

Goal of the presentation is to give an introduction of NoSQL databases, why they are there.

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

COSC 416 NoSQL Databases. NoSQL Databases Overview. Dr. Ramon Lawrence University of British Columbia Okanagan

A Study of NoSQL Database

CIB Session 12th NoSQL Databases Structures

Class Overview. Two Classes of Database Applications. NoSQL Motivation. RDBMS Review: Client-Server. RDBMS Review: Serverless

Database Evolution. DB NoSQL Linked Open Data. L. Vigliano

Introduction to NoSQL

Databases : Lectures 11 and 12: Beyond ACID/Relational databases Timothy G. Griffin Lent Term 2013

Databases : Lecture 1 2: Beyond ACID/Relational databases Timothy G. Griffin Lent Term Apologies to Martin Fowler ( NoSQL Distilled )

5/1/17. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414

Announcements. Two Classes of Database Applications. Class Overview. NoSQL Motivation. RDBMS Review: Serverless

A NoSQL Introduction for Relational Database Developers. Andrew Karcher Las Vegas SQL Saturday September 12th, 2015

Presented by Sunnie S Chung CIS 612

Extreme Computing. NoSQL.

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

Column-Family Databases Cassandra and HBase

Advanced Data Management Technologies

Rule 14 Use Databases Appropriately

Sources. P. J. Sadalage, M Fowler, NoSQL Distilled, Addison Wesley

CS 655 Advanced Topics in Distributed Systems

Warnings! Today. New Systems. OLAP vs. OLTP. New Systems vs. RDMS. NoSQL 11/5/17. Material from Cattell s paper ( ) some info will be outdated

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

Understanding NoSQL Database Implementations

Distributed Data Store

Database Availability and Integrity in NoSQL. Fahri Firdausillah [M ]

10. Replication. Motivation

Big Data Analytics. Rasoul Karimi

CIT 668: System Architecture. Distributed Databases

Architekturen für die Cloud

Review - Relational Model Concepts

Study of NoSQL Database Along With Security Comparison

Introduction to Graph Databases

Migrating Oracle Databases To Cassandra

Challenges for Data Driven Systems

CSE 530A. Non-Relational Databases. Washington University Fall 2013

Relational databases

NoSQL : A Panorama for Scalable Databases in Web

Motivation Overview of NoSQL space Comparing technologies used Getting hands dirty tutorial section

Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 NoSQL Databases

COMP9321 Web Application Engineering

Final Exam Review 2. Kathleen Durant CS 3200 Northeastern University Lecture 23

Lecture 25 Overview. Last Lecture Query optimisation/query execution strategies

L22: NoSQL. CS3200 Database design (sp18 s2) 4/5/2018 Several slides courtesy of Benny Kimelfeld

A Review Of Non Relational Databases, Their Types, Advantages And Disadvantages

NoSQL data stores and SOS: Uniform Access to Non-Relational Database Systems Paolo Atzeni Francesca Bugiotti Luca Rossi

The NoSQL Ecosystem. Adam Marcus MIT CSAIL

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis

What is database? Types and Examples

Hands-on immersion on Big Data tools

CSE 344 APRIL 16 TH SEMI-STRUCTURED DATA

relational Key-value Graph Object Document

A Survey Paper on NoSQL Databases: Key-Value Data Stores and Document Stores

Data Informatics. Seon Ho Kim, Ph.D.

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

MongoDB An Overview. 21-Oct Socrates

Cassandra- A Distributed Database

Final Exam Logistics. CS 133: Databases. Goals for Today. Some References Used. Final exam take-home. Same resources as midterm

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

Lecture 23 10/21/15. CMPSC431W: Database Management Systems. Instructor: Yu- San Lin

Cassandra Design Patterns

HyperDex. A Distributed, Searchable Key-Value Store. Robert Escriva. Department of Computer Science Cornell University

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

SCALABLE CONSISTENCY AND TRANSACTION MODELS

DATABASE DESIGN II - 1DL400

NOSQL Databases: The Need of Enterprises

COSC 304 Introduction to Database Systems. NoSQL Databases. Dr. Ramon Lawrence University of British Columbia Okanagan

NoSQL Databases Analysis

PROFESSIONAL. NoSQL. Shashank Tiwari WILEY. John Wiley & Sons, Inc.

OPEN SOURCE DB SYSTEMS TYPES OF DBMS

Data Management for Big Data Part 1

Modern Database Concepts

CA485 Ray Walshe NoSQL

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

The NoSQL movement. CouchDB as an example

Relational Database Features

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis

COMP9321 Web Application Engineering

SCALABLE DATABASES. Sergio Bossa. From Relational Databases To Polyglot Persistence.

BIG DATA TECHNOLOGIES: WHAT EVERY MANAGER NEEDS TO KNOW ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29,

Haridimos Kondylakis Computer Science Department, University of Crete

Transcription:

* Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL Overview * Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy *TowardsNewSQL NoSQL DBMSes

* Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL Some History I 1970 s Relational Databases Invented I Fixed schema I Data is normalized I Expensive storage I Data abstracted away from apps I 1980 s RDBMS commercialized I Client/Server model I SQL becomes a standard I 1990 s Something new I 3-tier architecture I Rise of Internet I 2000 s I Web 2.0 I Rise of social media and E-Commerce I Huge increase of collected data I Constant decrease of HW prices NoSQL DBMSes

* Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL What is NoSQL? NoSQL definition is evolving over time. I Initially (2009) intended as Absolutely No-SQL I Most of the features o ered by RDBMSes (global ACID, join, SQL,... ) considered useless or at least unnecessarily heavy I Later on becomes not only SQL I Some RDBMSes features are recognized as useful, or even necessary, in many real application scenarios NoSQL DBMSes

* Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL What is NoSQL? There is no full agreement but nowadays we can summarize NoSQL definition as follows I Next generation databases addressing some of the points: I non relational I schema-free I no Join I distributed I horizontally scalable with easy replication support I eventually consistent (this will be clarified soon) I open source NoSQL DBMSes

* Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL Why NoSQL? NoSQL databases first started out as in-house solutions to real problems: I Amazon s Dynamo I Google s BigTable I LinkedIn s Voldemort I Facebook s Cassandra I Yahoo! s PNUTS NoSQL DBMSes

* Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL Why NoSQL? cont. The listed companies didn t start o by rejecting relational technologies. They tried them and found that they didn t meet their requirements: I Huge concurrent transactions volume I Expectations of low-latency access to massive datasets I Expectations of nearly perfect service availability while operating in an unreliable environment NoSQL DBMSes

* Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL Why NoSQL? cont. They tried the traditional approach I Adding more HW I Upgrading to faster HW as available...and when it didn t work they tried to scale existing relational solutions: I Simplifying DB schema I De-normalization I Introducing numerous query caching layers I Separating read-only from write-dedicated replicas I Data partitioning NoSQL DBMSes

* Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL CAP Theorem Formulated in 2000 by Eric Brewer It is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: I Consistency (all nodes always see the same data at the same time) I Availability (every request always receives a response about whether it was successful or failed) I Partition Tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system) NoSQL DBMSes

* Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL CAP Theorem and NoSQL Most NoSQL database system architectures favour partition tolerance and availability over strong consistency Eventual Consistency: inconsistencies between data held by di erent nodes are transitory. Eventually all nodes in the system will receive the latest consistent updates. NoSQL DBMSes

* Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL RDBMS vs NoSQL I RDBMSs enforce global ACID properties thus allowing multiple arbitrary operations in the context of a single transaction. I NoSQL databases enforce only local BASE properties I Basically Available (data is always perceived as available by the user) I Soft State (data at some node could change without any explicit user intervention. This follows from eventual consistency) I Eventually Consistent (NoSQL guarantees consistency only at some undefined future time) NoSQL DBMSes

* Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL RDBMS vs NoSQL NoSQL DBMSes

* Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL NoSQL Taxonomy I Key/Value Store I Amazon s Dynamo, LinkedIn s Voldemort, FoundationDB,... I Document Store I MongoDB, CouchDB,... I Column Store I Google s Bigtable, Apache s HBase, Facebook s Cassandra,... I Graph Store I Neo4J, InfiniteGraph,... NoSQL DBMSes

* Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL RDBMS Data id name surname o ce 1 Tom Smith 41 2 John Doe 42 3 Ann Smith 41 id building tel 41 A4 45798 42 B7 12349 NoSQL DBMSes

* Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL Key/Value Store I Global collection of Key/Value pairs. Every item in the database is stored as an attribute name (key) together with its associated value I Every key associated to exactly one value. No duplicates I The value is simply a binary object. The DB does not associate any structure to stored values I Designed to handle massive load of data I Inspired by Distributed Hash Tables NoSQL DBMSes

* Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL Key/Value Store Key Value employee 1 name@tom-surn@smith-o @41-buil@A4-tel@45798 employee 2 name@john-surn@doe-o @42-buil@B7-tel@12349 employee 3 name@tom-surn@smith o ce 41 buil@a4-tel@45798 o ce 42 buil@b7-tel@12349 NoSQL DBMSes

* Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL JSON I Stands for JavaScript Object Notation I Syntax for storing and exchanging text information I Uses JavaScript syntax but it is language and platform independent I Much like XML but smaller, faster and easier to parse than XML (and human readable) I Basic data types(number, String, Boolean) andsupportsdata structures as objects and arrays NoSQL DBMSes

* Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL JSON { employees : [ { firstname : John, lastname : Doe } { firstname : Peter, lastname : Jones } ] } The employees object is an array of two employee records (objects). NoSQL DBMSes

* Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL Document Store I Same as Key/Value Store but pair each key with a arbitrarily complex data structure known as a document. I Documents may contain many di erent key-value pairs or key-array pairs or even nested documents (like a JSON object). I Data in documents can be understood by the DB: querying data is possible by other means than just a key (selection and projection over results are possible). NoSQL DBMSes

* Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL Document Store Key: employee 1 Key: o ce 1 { id: 1. name: Tom. surname: Smith. o ce:{ id: 41. building: A4. telephone: 45798 } } { id: 41. building: A4. telephone: 45798 } NoSQL DBMSes

* Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL Column Store I A sparse, distributed multi-dimensional sorted map I Store rows of data in similar fashion as typical RDBMSs do I Rows are contained within a Column Families. Column Families can be considered as tables in RDBMSes I Unlike table in RDBMSes, a Column Family can have di erent columns for each row it contains I Each row is identified by a key that is unique in the context of a single Column Family. The same key can be however re-used within other Column Families, so it is possible to store unrelated data about the same key in di erent Column Families I Each column is simply a key/value couple NoSQL DBMSes

* Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL Column Store cont. I It is also possible to organize data in Super Columns, thatis columns whose values are themselves columns I Usually data from the same Column Family are stored contiguously on disk (and consequently on the same node of the network) NoSQL DBMSes

* Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL Column Store ColumnFamily: Employees Key id name surname o ce id buil. tel. employee 1 1 Tom Smith 41 A4 45798 Key id name surname employee 3 3 Anna Smith Key id name surname o ce id buil. employee 2 2 John Doe 42 B7 NoSQL DBMSes

* Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL Graph Store I Use graph structures with nodes, edges and properties to store pieces of data and relations between them I Every element contains direct pointers to its adjacent elements. No index I Computing answers to queries over the DB corresponds to finding suitable paths on the graph structure NoSQL DBMSes

* Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL Graph Store NoSQL DBMSes

* Some History * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL * NoSQL Taxonomy * Towards NewSQL Summarizing I Key/Value Store + Very fast lookups Stored data cannot have any schema I Document-Column Store + Tolerant of incomplete data Query performance I Graph Store + Exploit well known graph algorithms (shortest path, connectedness,... ) Have to traverse the entire graph to achieve a definitive answer I Any NoSQL Store NO JOIN! Pieces of related data have to be stored together no standard query language NoSQL DBMSes

Dynamic&schema& MongoDB&has&databases,&collec3ons,&and& indexes&much&like&a&tradi3onal&rdbms.&& In&some&cases&(databases&and&collec3ons)&these& objects&can&be&implicitly&created,&however& once&created&they&exist&in&a&system&catalog& (db.systems.collec3ons,&db.system.indexes).&

Schema&free& Collec3ons&contain&(BSON)&documents.&Within& these&documents&are&fields.& &In&MongoDB&there&is&no&predefini3on&of&fields& (what&would&be&columns&in&an&rdbms).&& There&is&no&schema&for&fields&within& documents& &the&fields&and&their&value& datatypes&can&vary.&& Thus&there&is&no&no3on&of&an&"alter&table"& opera3on&which&adds&a&"column".&

Schema&free& In&prac3ce,&it&is&highly&common&for&a&collec3on& to&have&a&homogenous&structure&across& documents;&however&this&is&not&a& requirement.&& This&flexibility&means&that&schema&migra3on& and&augmenta3on&are&very&easy&in&prac3ce&

findone()& For&convenience,&the&mongo&shell&(and&other& drivers)&lets&you&avoid&the&programming& overhead&of&dealing&with&the&cursor,&and&just&lets& you&retrieve&one&document&via&the&findone()& func3on.&& findone()&takes&all&the&same&parameters&of&the& find()&func3on,&but&instead&of&returning&a&cursor,& it&will&return&either&the&first&document&returned& from&the&database,&or&null&if&no&document&is& found&that&matches&the&specified&query.&

Architecture&Replica&Sets,& Autosharding& MongoDB&uses&replica&sets&to&provide&read& scalability,&and&high&availability.&& Autosharding&is&used&to&scale&writes&(and& reads).&& Replica&sets&and&autosharding&go&hand&in&hand& if&you&need&mass&scale&out.&&

Replica&sets& The&major&advantages&of&replica&sets&are:&& business&con3nuity&through&high&availability,&& data&safety&through&data&redundancy,&& read&scalability&through&load&sharing& (reads).& &Replica&sets&use&a&share&nothing&architecture.&

Replica&sets& Typically&you&have&at&least&three&MongoDB& instances&in&a&replica&set&on&different&server& machines.&& You&can&add&more&replicas&of&the&primary&if& you&like&for&read&scalability,&but&you&only&need& three&for&high&availability&failover.&

Replica&sets& By&default&replica3on&is&nonXblocking/async.&& This&might&be&acceptable&for&some&data&category&& descrip3ons&in&an&online&store& but&not&other&data& shopping&cart's&credit&card&transac3on&data&& For&important&data,&the&client&can&block&un3l&data& is&replicated&on&all&servers&or&wri[en&to&the& journal.&& The&client&can&force&the&master&to&sync&to&slaves& before&con3nuing.& &This&sync&blocking&is&slower.&&

Replica&sets& Async/nonXblocking&is&faster&and&is&o\en&described&as& eventual&consistency.&wai3ng&for&a&master&to&sync&is&a& form&of&data&safety.&& Here&is&a&list&of&some&data&safety&op3ons&for&MongoDB:& Wait&un3l&write&has&happened&on&all&replicas& Wait&un3l&write&is&on&two&servers&(primary&and&one& other)& Wait&un3l&write&has&occurred&on&majority&of&replicas& Wait&un3l&write&opera3on&has&been&wri[en&to&journal&

Sharding& Sharding&allows&MongoDB&to&scale& horizontally.&& Sharding&is&also&called&par33oning.& &You&par33on&each&of&your&servers&a&por3on&of&the& data&to&hold&or&the&system&does&this&for&you.&& MongoDB&can&automa3cally&change&par33ons& for&op3mal&data&distribu3on&and&load& balancing,&and&it&allows&you&to&elas3cally&add& new&nodes&