Introduction to NoSQL Databases

Similar documents
NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

Overview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL

Presented by Sunnie S Chung CIS 612

Introduction to Big Data. NoSQL Databases. Instituto Politécnico de Tomar. Ricardo Campos

Chapter 24 NOSQL Databases and Big Data Storage Systems

CompSci 516 Database Systems

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

Introduction to Computer Science. William Hsu Department of Computer Science and Engineering National Taiwan Ocean University

Non-Relational Databases. Pelle Jakovits

Distributed Non-Relational Databases. Pelle Jakovits

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

Goal of the presentation is to give an introduction of NoSQL databases, why they are there.

Graph and Timeseries Databases

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

NoSQL Databases. Amir H. Payberah. Swedish Institute of Computer Science. April 10, 2014

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

Database Availability and Integrity in NoSQL. Fahri Firdausillah [M ]

CIB Session 12th NoSQL Databases Structures

Introduction Aggregate data model Distribution Models Consistency Map-Reduce Types of NoSQL Databases

Introduction to Graph Databases

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis

Challenges for Data Driven Systems

CSE 344 JULY 9 TH NOSQL

CISC 7610 Lecture 2b The beginnings of NoSQL

10. Replication. Motivation

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems

COSC 416 NoSQL Databases. NoSQL Databases Overview. Dr. Ramon Lawrence University of British Columbia Okanagan

A Study of NoSQL Database

PROFESSIONAL. NoSQL. Shashank Tiwari WILEY. John Wiley & Sons, Inc.

DATABASE DESIGN II - 1DL400

Hands-on immersion on Big Data tools

Sources. P. J. Sadalage, M Fowler, NoSQL Distilled, Addison Wesley

Distributed Databases: SQL vs NoSQL

Distributed Data Store

10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414

A NoSQL Introduction for Relational Database Developers. Andrew Karcher Las Vegas SQL Saturday September 12th, 2015

Introduction to NoSQL

Database Evolution. DB NoSQL Linked Open Data. L. Vigliano

Advances in Data Management - NoSQL, NewSQL and Big Data A.Poulovassilis

Data Informatics. Seon Ho Kim, Ph.D.

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

CSE 530A. Non-Relational Databases. Washington University Fall 2013

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

Relational databases

5/2/16. Announcements. NoSQL Motivation. The New Hipster: NoSQL. Serverless. What is the Problem? Database Systems CSE 414

Database Systems CSE 414

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

Advanced Data Management Technologies

GridGain and Apache Ignite In-Memory Performance with Durability of Disk

CA485 Ray Walshe NoSQL

Study of NoSQL Database Along With Security Comparison

Data Management for Big Data Part 1

A Review Of Non Relational Databases, Their Types, Advantages And Disadvantages

CS-580K/480K Advanced Topics in Cloud Computing. NoSQL Database

Facebook, 14 Fast projection index, 84 First database revolution data handling code, 6 DBMS, 6 network and hierarchical model, 6 7

OPEN SOURCE DB SYSTEMS TYPES OF DBMS

Why NoSQL? Why Riak?

NOSQL DATABASE SYSTEMS: DECISION GUIDANCE AND TRENDS. Big Data Technologies: NoSQL DBMS (Decision Guidance) - SoSe

CIT 668: System Architecture. Distributed Databases

An Brief Introduction to Data Storage

SESSION TITLE GOES HERE Second Cosmos for Line the Goes Business Here Intelligence Professional

NewSQL Databases. The reference Big Data stack

The NoSQL Ecosystem. Adam Marcus MIT CSAIL

10 Million Smart Meter Data with Apache HBase

Databases : Lecture 1 2: Beyond ACID/Relational databases Timothy G. Griffin Lent Term Apologies to Martin Fowler ( NoSQL Distilled )

NoSQL Databases An efficient way to store and query heterogeneous astronomical data in DACE. Nicolas Buchschacher - University of Geneva - ADASS 2018

SCALABLE CONSISTENCY AND TRANSACTION MODELS

The Creation of Scalable Tools for Solving Big Data Analysis Problems Based on the MongoDB Database

Final Exam Logistics. CS 133: Databases. Goals for Today. Some References Used. Final exam take-home. Same resources as midterm

Lecture Notes to Big Data Management and Analytics Winter Term 2017/2018 NoSQL Databases

Highly Scalable, Ultra-Fast and Lots of Choices

NoSQL Systems for Big Data Management

5/1/17. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Haridimos Kondylakis Computer Science Department, University of Crete

NoSQL Databases. Concept, Types & Use-cases.

SQL, NoSQL, MongoDB. CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden

Intro To Big Data. John Urbanic Parallel Computing Scientist Pittsburgh Supercomputing Center. Copyright 2017

Megastore: Providing Scalable, Highly Available Storage for Interactive Services & Spanner: Google s Globally- Distributed Database.

Migrating Oracle Databases To Cassandra

NoSQL Databases. an overview

NoSQL Databases. The reference Big Data stack

Shen PingCAP 2017

Making Sense of NoSQL Dan McCreary, Kelly-McCreary & Associates. Minnesota Web Design Community Meetup Monday, February 3rd, :00pm to 8:30pm

International Journal of Informative & Futuristic Research ISSN:

NoSQL Databases. CPS352: Database Systems. Simon Miner Gordon College Last Revised: 4/22/15

Google big data techniques (2)

Motivation Overview of NoSQL space Comparing technologies used Getting hands dirty tutorial section

NoSQL data stores and SOS: Uniform Access to Non-Relational Database Systems Paolo Atzeni Francesca Bugiotti Luca Rossi

Rule 14 Use Databases Appropriately

Introduction to NoSQL by William McKnight

MySQL Cluster Web Scalability, % Availability. Andrew

Class Overview. Two Classes of Database Applications. NoSQL Motivation. RDBMS Review: Client-Server. RDBMS Review: Serverless

SCALABLE DATABASES. Sergio Bossa. From Relational Databases To Polyglot Persistence.

Modern Database Concepts

CockroachDB on DC/OS. Ben Darnell, CTO, Cockroach Labs

Accelerate MySQL for Demanding OLAP and OLTP Use Cases with Apache Ignite. Peter Zaitsev, Denis Magda Santa Clara, California April 25th, 2017

AN introduction to nosql databases

Large-Scale Key-Value Stores Eventual Consistency Marco Serafini

Comparing SQL and NOSQL databases

Transcription:

Introduction to NoSQL Databases Roman Kern KTI, TU Graz 2017-10-16 Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 1 / 31

Introduction Intro Why NoSQL? Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 2 / 31

Introduction Introduction The birth of NoSQL Term appeared in 2009 Not only SQL Common properties (pros) Non relational Schema-less (schema free) Good scalability Potential down-sides (cons) Limited query abilities Not standardised (evolving technology) Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 3 / 31

Introduction Introduction Motivations for starting NoSQL 1 Growth of data User-generated Machine-generated, eg log-files, sensors Higher degree of connectedness 2 Need for flexibility instead of a rigid schema For semi-structured data (schema-free / schema-less) 3 No separation of data management and data processing Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 4 / 31

Introduction Introduction Data Management vs Data Processing Classic CRUD operations no longer sufficient for advanced data analytics need to combine both functionalities Paradigm shift: Bring the code to the data ie the locality of data is taken into considerations for the data processing Example applications: Online transaction processing (OLTP) relational databases Online analytical processing (OLAP) data warehousing High performance, scalability NoSQL Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 5 / 31

Introduction Introduction Scalability Scale up (scale vertically) vs scale out (scale horizontally) Scale up: Add more hardware to a single machine Scale out: Add more machines Degree of sharing Shared memory (single machine, single storage) Shared disk (multiple machines, single storage) Shared nothing (multiple machines, multiple storage) Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 6 / 31

Introduction Introduction Replication In an distributed system, data is replicated between nodes thus data is stored multiple times Types of replication 1 Synchronous (eager) All data is replicated to all nodes before ending the operation complex, even impossible in some configurations 2 Asynchronous (lazy) Operation is finished before all data has been written by all nodes potentially inconsistent Access for writing options 1 Single node accepts writing of data (master/slave, primary copy) 2 All nodes accept write operations (update anywhere) Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 7 / 31

Introduction Introduction Sharding In an distributed system, each node may be responsible for different parts of the full data still data is replicated for redundancy Also known as: partitioning, fragmentation Advantage: improved efficiency (fewer resources) Types of sharding: 1 Hash-based Hash-key determines partition no data locality 2 Range-based Assigns range (binning) rebalancing needed 3 Entity-group All data from single transactions assigned to a single partition partitions cannot easily change Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 8 / 31

Introduction Introduction ACID vs BASE ACID Atomicity Consistency Isolation Durability BASE Basically Available Soft state Eventually consistent Trade-offs for improved performance Some database systems prefer performance over durability Redundancy for improved performance (no normalisation) Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 9 / 31

Introduction Introduction CAP theorem Not possible to achieve all three properties: Consistent Reads are guaranteed to incorporate all previous writes (all nodes see the same data at the same time) Availability Every query returns an answer, instead of an error (failures do not prevent the remaining system to be operational) Partitioned The systems runs, even if a part of the system is not reachable (eg due to network failure, message loss) Implications of CAP One needs to find a trade-off between the properties, eg choose availability over consistency (as consistency is a major bottleneck for scalability) Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 10 / 31

Introduction Introduction Classification scheme of NoSQL systems 1 According to the data model Key-Values Tabular (wide column) Document Graph Specialised, eg time-series, triples, objects, XML, files, 2 According to the CAP trade-off Available & partition tolerant Consistent & partition tolerant Not partition tolerant 3 According to the replication & sharding types lazy vs eager hash based vs range based vs entity-group Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 11 / 31

Systems What types of NoSQL systems are out there? Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 12 / 31

Systems Distributed File System Data model Folders & files (plus metadata, eg time of creation, ) Interface File system operations Variations Examples NFS, GPFS, HDFS Network File System: (often) single storage Cluster File Systems: (multiple) storage Distributed File Systems: multiple, independent storage Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 13 / 31

Systems Key/Value Store Data model Key Value where the value is a (binary) opaque blob similar to hash-tables Interface CRUD operations Properties Excellent scalability May support redundant storage Examples Amazon Dynamo (AP, lazy, hash-based), Redis (CP, lazy, hash-based), Riak (AP, lazy, hash-based), Memcached (CP), Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 14 / 31

Systems Tabular / Wide Column Data model (Rowkey, Column, Timestamp) Value where the value is a (binary) opaque blob Interface CRUD operations, scan operations Properties Allow vertical and horizontal partitioning adjacent rows are stored closed to other certain columns are stored close to each other, eg via column families Each cell might have multiple versions (timestamps) Examples Cassandra (AP, lazy, hash-based), Google BigTable (CP, eager, range-based), HBase (CP, eager, range-based), Parquet, Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 15 / 31

Systems Example of Cassandra Query Language Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 16 / 31

Systems Document Storage Data model (Collection, Key) Value where the value is understood by the system Interface CRUD operations, specialised queries (eg JavaScript) Properties Documents are schema free, ie no need for schema migrations Documents may also be versioned Documents are often JSON Examples CouchDB (AP, lazy), MongoDB (CP, lazy eager, range-based), Amazon SimpleDB (AP), Cloudant, Rethink (lazy eager, range-based), Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 17 / 31

Systems Key/Value Store vs Document Storage vs Tabular Storage Key/Value store, if requirements are simple Document store, if need to access parts of the value Document store, if documents are independent units Tabular store, if multiple entries (eg rows) are updated at the same time Tabular store, if only certain columns need to be retrieved Things to watch out for Maximum size of value depends on actual implementation Avoid joins for optimal performance Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 18 / 31

Systems Consistency vs Availability vs Partitioning See also: http://blognahurstcom/visual-guide-to-nosql-systems Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 19 / 31

Systems Graph Storage Data model G = (V, E) where each vertex or edge may have additional properties Interface Graph traversals, specialised queries & insert/update methods Properties Optimised for graph traversal, ie no joins needed Types of edges can be specified by the user Examples Neo4J (CA), OrientDB (CA), TitanDB, Giraph, InfiniteGraph (CA), Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 20 / 31

Systems Search Storage Data model documents, metadata often stored as Vector Space Model Interface specialised query languages Properties Documents may consist of multiple fields (facets) field may be structured as well, eg date, integer, strings Fine control over indexing process, ie how each field is indexed Examples Solr, ElasticSearch, Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 21 / 31

Systems Object Oriented Storage Data model classes, objects, relations Interface CRUD, traversal methods Properties Known model from OO programming Often strong coupling between DB system and programming language Examples db4o (Ca), Versant (CA), Objectivity (CA), Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 22 / 31

Systems XML Databases Data model XML, RDF (triples) Interface CRUD, query languages (XQuery, SPARQL, ) Properties RDF based systems often called TripleStore Often used in combination with semantic technologies Examples BaseX, MarkLogic (CA), AllegroGraph (CA), BigData, Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 23 / 31

Systems Timeseries Databases Data model (timestamp) > value Interface CRUD, specialised query languages Variations Properties Type of value is the sample for all entries, typically simple, eg floating point number Complex value type, eg JSON Optimised for time series data, ie small storage requirements Query for time ranges Operations on time series Examples InfluxDB, KairoDB, Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 24 / 31

Systems In-Memory Databases Data model (key) > value but not limited to this model Interface CRUD, specialised query languages Properties Data is stored in RAM Often distributed over multiple machine (RAM is the new Disk) In its purest form does not satisfy durability criteria Examples Hazelcast, Redis, SAP HANA, Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 25 / 31

Systems API & Data Formats NoSQL system often use RESTful APIs Direct match with data model and CRUD operations Serialisation of objects Many techniques used eg Apache Avro, Protocol Buffers, Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 26 / 31

Systems Features Not all NoSQL systems support transactions Instead they support atomic single transactions Therefore not all operations are supported Not all NoSQL systems support security features eg access control Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 27 / 31

Systems Cloud Database Solutions Storage in the internet (cloud) DBaaS - Database as a Service Not limited to NoSQL, traditional SQL are available as well Multi-tenancy as important feature (separation of multiple clients) Private OS - all separate (eg Amazon RDS) Private process - same machine (eg Compose) Private schema - same database (eg Google DataStore) Shared schema - same tables (most SaaS apps) Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 28 / 31

Systems Current State Current state of data storage systems Depending on the actual requirements select a suitable storage solution Or select multiple solutions for each sub-system polyglot persistence Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 29 / 31

Systems Future of Outlook - NewSQL Attempt to achieve consistency and availability for distributed systems Eg Google Spanner, CockroachDB build on the Raft Consensus algorithm relies on specialised hardware https://githubcom/cockroachdb/cockroach Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 30 / 31

Systems The End Next: Graph Databases Credits Scalable Data Management: NoSQL Data Stores in Research and Practice http://icde2016fi/tutorialsphp Roman Kern (KTI, TU Graz) Dbase2 2017-10-16 31 / 31