Relational databases

Similar documents
Introduction Aggregate data model Distribution Models Consistency Map-Reduce Types of NoSQL Databases

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems

CIB Session 12th NoSQL Databases Structures

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

Non-Relational Databases. Pelle Jakovits

NoSQL Databases. CPS352: Database Systems. Simon Miner Gordon College Last Revised: 4/22/15

CS-580K/480K Advanced Topics in Cloud Computing. NoSQL Database

CompSci 516 Database Systems

Introduction to Big Data. NoSQL Databases. Instituto Politécnico de Tomar. Ricardo Campos

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

NoSQL systems: sharding, replication and consistency. Riccardo Torlone Università Roma Tre

NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

Modern Database Concepts

Topics. History. Architecture. MongoDB, Mongoose - RDBMS - SQL. - NoSQL

Understanding NoSQL Database Implementations

Introduction to NoSQL Databases

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

The NoSQL Ecosystem. Adam Marcus MIT CSAIL

Sources. P. J. Sadalage, M Fowler, NoSQL Distilled, Addison Wesley

Rule 14 Use Databases Appropriately

Chapter 24 NOSQL Databases and Big Data Storage Systems

Haridimos Kondylakis Computer Science Department, University of Crete

NoSQLDatabases: AggregatedDBs

CSE 530A. Non-Relational Databases. Washington University Fall 2013

Distributed Databases: SQL vs NoSQL

Distributed Non-Relational Databases. Pelle Jakovits

Database Solution in Cloud Computing

BIG DATA TECHNOLOGIES: WHAT EVERY MANAGER NEEDS TO KNOW ANALYTICS AND FINANCIAL INNOVATION CONFERENCE JUNE 26-29,

CISC 7610 Lecture 2b The beginnings of NoSQL

CA485 Ray Walshe NoSQL

Transactions and ACID

MongoDB and Mysql: Which one is a better fit for me? Room 204-2:20PM-3:10PM

Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 NoSQL Databases

A Review Of Non Relational Databases, Their Types, Advantages And Disadvantages

Goal of the presentation is to give an introduction of NoSQL databases, why they are there.

Overview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL

Distributed Data Store

Presented by Sunnie S Chung CIS 612

Administration Naive DBMS CMPT 454 Topics. John Edgar 2

A Study of NoSQL Database

RDBMS - Features. Lecture 5

Database Architectures

Databases : Lectures 11 and 12: Beyond ACID/Relational databases Timothy G. Griffin Lent Term 2013

Databases : Lecture 1 2: Beyond ACID/Relational databases Timothy G. Griffin Lent Term Apologies to Martin Fowler ( NoSQL Distilled )

A Survey Paper on NoSQL Databases: Key-Value Data Stores and Document Stores

Comparing SQL and NOSQL databases

COSC 416 NoSQL Databases. NoSQL Databases Overview. Dr. Ramon Lawrence University of British Columbia Okanagan

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

PROFESSIONAL. NoSQL. Shashank Tiwari WILEY. John Wiley & Sons, Inc.

5/2/16. Announcements. NoSQL Motivation. The New Hipster: NoSQL. Serverless. What is the Problem? Database Systems CSE 414

Database Systems CSE 414

Motivation Overview of NoSQL space Comparing technologies used Getting hands dirty tutorial section

{brandname} 9.4 Glossary. The {brandname} community

Demystifying NoSQL. Erik Ljungstrom

Introduction to NoSQL

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

Why Are NoSQL Databases Interesting?

Database Availability and Integrity in NoSQL. Fahri Firdausillah [M ]

Migrating Oracle Databases To Cassandra

A NoSQL Introduction for Relational Database Developers. Andrew Karcher Las Vegas SQL Saturday September 12th, 2015

Final Exam Review 2. Kathleen Durant CS 3200 Northeastern University Lecture 23

NoSQL : A Panorama for Scalable Databases in Web

Integrity in Distributed Databases

CSE 530A ACID. Washington University Fall 2013

NoSQL Databases. Amir H. Payberah. Swedish Institute of Computer Science. April 10, 2014

Introduction to Graph Databases

SCALABLE DATABASES. Sergio Bossa. From Relational Databases To Polyglot Persistence.

CS 655 Advanced Topics in Distributed Systems

Application development with relational and non-relational databases

Study of NoSQL Database Along With Security Comparison

ICALEPS 2013 Exploring No-SQL Alternatives for ALMA Monitoring System ADC

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Review - Relational Model Concepts

Making MongoDB Accessible to All. Brody Messmer Product Owner DataDirect On-Premise Drivers Progress Software

CIT 668: System Architecture. Distributed Databases

TRANSACTION PROPERTIES

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis

What s new in Mongo 4.0. Vinicius Grippa Percona

Hands-on immersion on Big Data tools

Relational Database Features

Introduction to Computer Science. William Hsu Department of Computer Science and Engineering National Taiwan Ocean University

Topic 5.4: NoSQL Database Implementations

Scaling. Marty Weiner Grayskull, Eternia. Yashh Nelapati Gotham City

MONGODB INTERVIEW QUESTIONS

10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414

Final Exam Logistics. CS 133: Databases. Goals for Today. Some References Used. Final exam take-home. Same resources as midterm

Lecture 25 Overview. Last Lecture Query optimisation/query execution strategies

CSE 344 JULY 9 TH NOSQL

Highly Scalable, Ultra-Fast and Lots of Choices

International Journal of Informative & Futuristic Research ISSN:

OPEN SOURCE DB SYSTEMS TYPES OF DBMS

Putting together the platform: Riak, Redis, Solr and Spark. Bryan Hunt

WHITEPAPER

Introduction to Distributed Data Systems

NoSQL Databases Analysis

2 Copyright 2015 M. E. Kabay. All rights reserved. 4 Copyright 2015 M. E. Kabay. All rights reserved.

MongoDB - a No SQL Database What you need to know as an Oracle DBA

Why distributed databases suck, and what to do about it. Do you want a database that goes down or one that serves wrong data?"

Scalability of web applications

Transcription:

COSC 6397 Big Data Analytics NoSQL databases Edgar Gabriel Spring 2017 Relational databases Long lasting industry standard to store data persistently Key points concurrency control, transactions, standard interfaces to access data Act as an integration point between different applications 1

Problems with relational databases Mismatch between the relational data structures and the in-memory data structures of applications Everything is mapped to tables All entries within a table have to have the same schema Often using generic fields that have very different meanings for different entries to overcome this limitation No way to include more complex structure, e.g. nested records, lists, etc. Relational databases were not designed (with very few exceptions) to run efficiently on clusters ACID Relational databases often provide properties summarized as ACID Atomicity: if a transaction is started, it should be either completed or undone (rollback) Consistency: guarantees that a transaction never leaves your database in a half-finished state. Isolation: keeps transactions separated from each other until they re finished. Durability: guarantees that the database will keep track of pending changes in such a way that the server can recover from an abnormal termination. 2

RDBMS data management Typical RDBMS representation of a purchasing system A transaction is an update to (multiple) tables as a single, atomic operation Customer ID Name Firstname 1 Martin John Order ID CustomerId ShippingAddr BillingAddress 99 1 77 28 Address Id City Street Number 77 Houston Calhoun Blvd 4800 Aggregate Model Aggregate Model representation //in orders { Id : 99, Customer : { Id : 1, FirstName : John, LastName : Martin, } BillingAddress : { Id : 77, City : Houston, Street : Calhoun Blvd, Number : 4800 } 3

Aggregate Data Model An aggregate is a collection of data managed as a single unit Form the boundaries for ACID operations with the database Drawing boundaries on how much information to include in a single aggregate is domain and problem specific Aggregate-oriented databases work best when most data interaction is done with the same aggregate Atomic updates typically only supported within a single aggregate Aggregate-oriented databases make inter-aggregate relationships more difficult to handle than intraaggregate relationship NoSQL databases Loosely defined term integrating various classes of nonrelational data storage systems Typically don t rely (exclusively) on SQL Open source projects Usually do not require a fixed table schema Designed to run on clustered environments Relaxing one or more of the ACID properties 4

Sharding Distributes Data across multiple servers Each server acts as a single source for a subset of the data Aggregate data models allow to store an entire aggregate on a single server Scales well for both reads and writes Data distribution Automatic: e.g. hash functions, lexicographic order, etc. User defined Availability Traditionally, thought of as the server/process available five 9 s (99.999 %). However, for large node system, at almost any point in time there s a good chance that a node is either down or there is a network disruption among the nodes. Want a system that is resilient in the face of network disruption 5

Availability Master-Slave Replication All writes are written to the master. All reads performed against the replicated slave databases Critical reads may be incorrect as writes may not have been propagated down Large data sets can pose problems as master needs to duplicate data to slaves Availability Peer-to-peer replication Hold copy of the data on multiple servers Nodes coordinate synchronization of the data internally Removes the single point-of-failure of master-slave replication Improves the write-load performance 6

Consistency Consistency model determines rules for visibility and apparent order of updates Write-write conflict: concurrent update of the same entry Optimistic approach: detects and reports a conflict, but does not prevent them conditional update: test value to be modified before update to see whether it has changed Perform and log both updates and report a conflict (e.g. done by revision control software) Pessimistic approach: prevent conflict E.g. using locks to serialize access to an entry Read-write conflict: accessing an element that was modified by another client Not trivial with replication Consistency Sequential consistency: the result of any execution is the same as if the operations of all clients were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program Comes at often significant costs in case of sharding and replication Replication consistency: ensure that data has the same value across all replicas Inconsistency window: length of time an inconsistency is present Also referred to as Eventual Consistency: all replicas will be eventually updated, but there might be an inconsistency window 7

Consistency Session consistency: within a single user session, your own writes are immediately visible Sticky session: ensure that both reads and writes are handled by a single server Version stamps: ensure that every interaction with the data store includes latest version stamp seen by that session Consistency within a single aggregate typically ensured by NoSQL databases Often the driving force in determining what belongs into a single aggregate NoSQL Databases Key-value stores: everything is stored as a key-value pair Value is consider a blob without internal structure Lookup and retrieval is based purely on the key Examples: Memcached Redis Riak Project Voldemor 8

NoSQL Databases Document Databases: The data aggregate (value) has a structure (e.g. text) and can be used for query operations Boundaries between key-value stores and Document Databases is not always clear-cut Examples: MongoDB CouchDB OrientDB Terrastore NoSQL Databases Column-family stores: Optimize scenarios where only a subset of the entries in a table are required for a query/analysis Stores data based on columns, not rows Assumes that data is read significantly more often than written Examples: HBase Cassandra Hypertable Amazon SimbleDB 9

NoSQL Databases Graph Databases: Organize data into nodes and edges of a graph Allows to capture complex relations between data Supports querying along (selected) edges of the graph Not well suited for sharding! Examples: Neo4J FlockDB HyperGraphDB Infinite Graph Summary Increasing number of highly popular NoSQL databases Do not necessarily replace RDBMS, but have a very special purpose for a targeted application scenario Lots of literature available on the topic 10