Non-Relational Databases. Pelle Jakovits

Similar documents
Distributed Non-Relational Databases. Pelle Jakovits

NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

Overview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL

Sources. P. J. Sadalage, M Fowler, NoSQL Distilled, Addison Wesley

CIB Session 12th NoSQL Databases Structures

Introduction to NoSQL Databases

Relational databases

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

Chapter 24 NOSQL Databases and Big Data Storage Systems

Introduction to Big Data. NoSQL Databases. Instituto Politécnico de Tomar. Ricardo Campos

Introduction Aggregate data model Distribution Models Consistency Map-Reduce Types of NoSQL Databases

Understanding NoSQL Database Implementations

CISC 7610 Lecture 2b The beginnings of NoSQL

Database Evolution. DB NoSQL Linked Open Data. L. Vigliano

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems

Introduction to Computer Science. William Hsu Department of Computer Science and Engineering National Taiwan Ocean University

Databases : Lecture 1 2: Beyond ACID/Relational databases Timothy G. Griffin Lent Term Apologies to Martin Fowler ( NoSQL Distilled )

A Survey Paper on NoSQL Databases: Key-Value Data Stores and Document Stores

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

Advanced Data Management Technologies

A Study of NoSQL Database

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

CS-580K/480K Advanced Topics in Cloud Computing. NoSQL Database

Study of NoSQL Database Along With Security Comparison

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

SCALABLE DATABASES. Sergio Bossa. From Relational Databases To Polyglot Persistence.

Goal of the presentation is to give an introduction of NoSQL databases, why they are there.

A NoSQL Introduction for Relational Database Developers. Andrew Karcher Las Vegas SQL Saturday September 12th, 2015

A Review Of Non Relational Databases, Their Types, Advantages And Disadvantages

DEMYSTIFYING BIG DATA WITH RIAK USE CASES. Martin Schneider Basho Technologies!

Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 NoSQL Databases

DATABASE DESIGN II - 1DL400

COSC 416 NoSQL Databases. NoSQL Databases Overview. Dr. Ramon Lawrence University of British Columbia Okanagan

Column-Family Databases Cassandra and HBase

Challenges for Data Driven Systems

Getting to know. by Michelle Darling August 2013

CS 655 Advanced Topics in Distributed Systems

Why NoSQL? Why Riak?

Databases : Lectures 11 and 12: Beyond ACID/Relational databases Timothy G. Griffin Lent Term 2013

NOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS. Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe

Database Solution in Cloud Computing

NoSQL Systems for Big Data Management

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

Introduction to Graph Databases

Distributed Databases: SQL vs NoSQL

Presented by Sunnie S Chung CIS 612

CSE 530A. Non-Relational Databases. Washington University Fall 2013

Stream Processing Platforms Storm, Spark,.. Batch Processing Platforms MapReduce, SparkSQL, BigQuery, Hive, Cypher,...

Course Content MongoDB

5/2/16. Announcements. NoSQL Motivation. The New Hipster: NoSQL. Serverless. What is the Problem? Database Systems CSE 414

NoSQL Databases. Amir H. Payberah. Swedish Institute of Computer Science. April 10, 2014

Database Systems CSE 414

Eric Redmond. Seven Databases in Seven Weeks. Pragmatic Press Out by the end of Summer

BIG DATA TECHNOLOGIES: WHAT EVERY MANAGER NEEDS TO KNOW ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29,

Motivation Overview of NoSQL space Comparing technologies used Getting hands dirty tutorial section

SCALING COUCHDB WITH BIGCOUCH. Adam Kocoloski Cloudant

Spotify. Scaling storage to million of users world wide. Jimmy Mårdell October 14, 2014

Highly Scalable, Ultra-Fast and Lots of Choices

Hands-on immersion on Big Data tools

Oral Questions and Answers (DBMS LAB) Questions & Answers- DBMS

NoSQL data stores and SOS: Uniform Access to Non-Relational Database Systems Paolo Atzeni Francesca Bugiotti Luca Rossi

NoSQL Databases. an overview

Using space-filling curves for multidimensional

Document stores using CouchDB

Big Data Analytics. Rasoul Karimi

Beyond relational databases. Daniele Apiletti

Database Availability and Integrity in NoSQL. Fahri Firdausillah [M ]

CompSci 516 Database Systems

A Comparative Analysis of MongoDB and Cassandra

MongoDB An Overview. 21-Oct Socrates

The NoSQL Landscape. Frank Weigel VP, Field Technical Opera;ons

The NoSQL Ecosystem. Adam Marcus MIT CSAIL

NoSQL Databases. CPS352: Database Systems. Simon Miner Gordon College Last Revised: 4/22/15

Cassandra- A Distributed Database

PROFESSIONAL. NoSQL. Shashank Tiwari WILEY. John Wiley & Sons, Inc.

/ Cloud Computing. Recitation 8 October 18, 2016

Capabilities of Cloudant NoSQL Database IBM Corporation

Introduction to NoSQL

Intro To Big Data. John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center. Copyright 2017

COSC 304 Introduction to Database Systems. NoSQL Databases. Dr. Ramon Lawrence University of British Columbia Okanagan

Relational Database Systems 1 Wolf-Tilo Balke Hermann Kroll, Janus Wawrzinek, Stephan Mennicke

Topic 5.4: NoSQL Database Implementations

Data Modeling with NoSQL: How, When and Why

A Novel Approach for Transformation of Data from MySQL to NoSQL (MongoDB)

NoSQL Databases Analysis

Final Exam Logistics. CS 133: Databases. Goals for Today. Some References Used. Final exam take-home. Same resources as midterm

Relational Database Features

Topics. History. Architecture. MongoDB, Mongoose - RDBMS - SQL. - NoSQL

Copyright 2013, Oracle and/or its affiliates. All rights reserved.

8/24/2017 Week 1-B Instructor: Sangmi Lee Pallickara

Polyglot Persistence in Today s Data World

WHITEPAPER

CSE 344 JULY 9 TH NOSQL

NOSQL Databases: The Need of Enterprises

Distributed Data Store

MONGODB INTERVIEW QUESTIONS

NOSQL DATABASE SYSTEMS: DATA MODELING. Big Data Technologies: NoSQL DBMS (Data Modeling) - SoSe

Review - Relational Model Concepts

CISC 7610 Lecture 4 Approaches to multimedia databases. Topics: Graph databases Neo4j syntax and examples Document databases

MongoDB and Mysql: Which one is a better fit for me? Room 204-2:20PM-3:10PM

Transcription:

Non-Relational Databases Pelle Jakovits 25 October 2017

Outline Background Relational model Database scaling The NoSQL Movement CAP Theorem Non-relational data models Key-value Document-oriented Column family Graph Example databases 2/36

The Relational Model Data is stored in tables Strict relationships between tables Foreign key references between columns The expected format of the data is specified with a restrictive schema Data is typically accessed using Structured Query Language (SQL) 3/36

Database Scaling Vertical scaling on one machine Horizontal scaling across a cluster of machines Relational model does not scale well horizontally Because there are too many dependencies in relational model Database sharding is one approach 4/36

Sharding 5/36

Relational database https://dev.mysql.com/doc/sakila/en/ 6/38

The NoSQL Movement Emergence of persistence solutions using non-relational data models Driven by the rise of Big Data and Cloud Computing Non-relational data models are based on Key - Value structure Simpler schema-less key-value based data models scale better than the relational model NoSQL is a broad term with no clear boundaries The term NoSQL itself is very misleading No SQL? Not only SQL? 7/36

CAP Theorem It is impossible for a distributed computer system to simultaneously provide all three of the following: Consistency - every read receives the most recent write or an error Availability - every request receives a response Partition/Fault tolerance - the system continues to operate despite arbitrary partitioning (network failures, dropped Have to choose between consistency or availability NoSQL solutions which are more focused on availability try to achieve eventual consistency When trying to aim for both consistency or availability Will have high latency 8/36

CAP Theorem 9/36

Benefits of the Key-value Model Horizontal scalability Data with the same Key stored close to each other Suitable for cloud computing Flexible schema-less models suitable for unstructured data Fetching data by key can be very fast 10/36

Indexing and Partitioning In Key-value model, Key acts as an index Secondary indexes can be created in some solutions Implementations vary from partitioning to local distributed indexes Data is partitioned between different machines in the cluster Usually data is partitioned by rows or/and column families Users can often specify partitioning parameters Gives control over how data is distributed Very important for optimizing query speed 11/36

Aggregate-oriented DB Non-relational data models aim to store aggregated data together Aggregate is a collection of data that is treated as a unit E.g. a customer and all of his orders In normalized relational databases aggregates are computed using GroupBy operations Keyed aggregates make for a natural unit of data sharding 12/36

Non-relational Data Models

The Key-value Model Data stored as key-value pairs The value is an opaque blob to the database Examples: Dynamo, Riak 14

The Document-oriented Model Data is also stored as key-value pairs Value is a document and has further structure No strict schema Examples: CouchDB, MongoDB 15/36

Example Aggregates described in JSON using map and array data structures 16

The Column Family Model Data stored in large sparse tabular structures Columns are grouped into column families Column family is a meaningful group of columns Similar concept as a table in relational database A record can be thought of as a two-level map New columns can be added at any time Examples: BigTable, Cassandra, HBase 17/36

Column Family Example a001 username jsmith Names firstname John lastname Smith phone 5 550 001 Contacts email jsmith@example.com item1 Message1 item2 Message2 Messages itemn MessageN username pauljones Names b014 Contacts item1 new Message Messages 18/36

Graph Databases Data stored as nodes and edges of a graph The nodes and edges can have fields and values Aimed at graph traversal queries in connected data Examples: Neo4J, FlockDB https://neo4j.com/developer/guide-build-a-recommendation-engine/ 19/36

Non-relational Data Models In non-relational data stores data is denormalized It is common to also store JSON documents in keyvalue stores The key-value, document-oriented and column family models are aggregate oriented models The other models are based on the key-value model In reality the classification of databases into different models is not as straight forward Multiparadigm databases also exist (e.g ArangoDB) 20/36

Non Relational Database Examples

Riak Based on Amazon's Dynamo specification Key-value model Distributed decentralized persistent hash table Keyspace is partitioned between nodes Consistent hashing Avoid hash-to-node remapping when a node is added or removed Eventual Consistency using Multiversion Concurrency Control (MVCC) 22/36

Riak RESTful query interface Basic PUT, GET, POST, and DELETE functions Links and link walking One way relationships between data objects Turns Key-Value store into a simple Graph database Higher level querying on-top of Key-Value structure: Riak search is based on Apache Solr search engine MapReduce in Erlang and JavaScript 23/36

MongoDB Document oriented (BSON) Query language based on JavaScript GridFS for BLOB file storage Master-slave architecture Linking between documents Supports MapReduce in JavaScript 24/36

Query example db.inventory.find({ status: "A", $or: [ {qty: { $lt: 30 }}, { item: /^p/ }] }) Matches the following SQL query: SELECT * FROM inventory WHERE status = "A" AND ( qty < 30 OR item LIKE "p%") 25/38

CouchDB Document oriented (JSON) RESTful query interface Built in web server Web based query front-end Futon 26/36

CouchDB MapReduce is the primary query method (JavaScript and Erlang) Materialized views as results of incremental MapReduce jobs CouchApps javascript-heavy web applications built entirely on top of CouchDB without a separate web server for the logic layer 27/36

Query example Documents: { id : 2, "position" : "programmer", "first_name": "James", "salary" : 100 } { id : 7, "position" : "support", "first_name": "John", "salary" : 23 } Lets extract average salary for each unique position Map: function(doc, meta) { if (doc.position && doc.salary) { emit(doc.position,doc.salary); } } Reduce: function(key, values, rereduce) { return sum(values)/values.length; } Some Reduce functions have been pre-defined: _sum(), _count(), _stats() 28/38

Cassandra Column family model Data model from BigTable, distribution model from Dynamo (decentralized) Uses Cassandra Query Language (CQL) SQL-like querying language 29/36

Cassandra Provides Availability & Partition-Tolerance from the CAP theorem Static and dynamic column families Dynamic column families as materialized views on data The concept of super-columns Family of column families 30/36

Neo4J Open source NoSQL Graph Database Uses the Cypher Query Language for querring graph data Graph consists of Nodes, Edges and Attributes 31/36

Neo4J Querry example MATCH (you {name:"you"}) MATCH (expert)-[:worked_with]-> (db:database {name:"neo4j"}) MATCH path = shortestpath( (you)-[:friend*..5]-(expert) ) RETURN db,expert,path https://neo4j.com/developer/cypher-query-language/ 32/38

Why use Non-Relational DB? Volume of Data Elasticity Flexible Schemas Cost 33/38

Polyglot Persistence http://martinfowler.com/bliki/polyglotpersistence.html 34/36

Conclusions In recent years there has been a rise in nonrelational (NoSQL) data stores This is related to the rise of cloud computing key-value models offer better scalability The NoSQL landscape is extremely varied The four basic non-relational data models are: The key-value model The document-oriented model The column family model Graph databases 35/36

That is All Next week's practice session Using NoSQL databases Exam times Wednesday - 01 November 2017 Thursday - 02 November 2017 NB! A set of example exam questions are available on the course website 36/36