CSE 344 JULY 9 TH NOSQL

Similar documents
Class Overview. Two Classes of Database Applications. NoSQL Motivation. RDBMS Review: Client-Server. RDBMS Review: Serverless

Announcements. Two Classes of Database Applications. Class Overview. NoSQL Motivation. RDBMS Review: Serverless

Database Systems CSE 414

5/2/16. Announcements. NoSQL Motivation. The New Hipster: NoSQL. Serverless. What is the Problem? Database Systems CSE 414

5/1/17. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414

10/18/2017. Announcements. NoSQL Motivation. NoSQL. Serverless Architecture. What is the Problem? Database Systems CSE 414

CSE 344 APRIL 16 TH SEMI-STRUCTURED DATA

Introduction to Data Management CSE 344

CSE 344 FEBRUARY 2 ND DATA SEMI-STRUCTURED

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2015 Lecture 14 NoSQL

Overview. * Some History. * What is NoSQL? * Why NoSQL? * RDBMS vs NoSQL. * NoSQL Taxonomy. *TowardsNewSQL

JSON - Overview JSon Terminology

NOSQL EGCO321 DATABASE SYSTEMS KANAT POOLSAWASD DEPARTMENT OF COMPUTER ENGINEERING MAHIDOL UNIVERSITY

Introduction to NoSQL Databases

CompSci 516 Database Systems

CSE 544 Principles of Database Management Systems. Fall 2016 Lecture 4 Data models A Never-Ending Story

Introduction to Data Management CSE 344. Lectures 9: Relational Algebra (part 2) and Query Evaluation

Announcements. From SQL to RA. Query Evaluation Steps. An Equivalent Expression

Introduction to Big Data. NoSQL Databases. Instituto Politécnico de Tomar. Ricardo Campos

CS-580K/480K Advanced Topics in Cloud Computing. NoSQL Database

Jargons, Concepts, Scope and Systems. Key Value Stores, Document Stores, Extensible Record Stores. Overview of different scalable relational systems

relational Key-value Graph Object Document

Chapter 24 NOSQL Databases and Big Data Storage Systems

Cassandra, MongoDB, and HBase. Cassandra, MongoDB, and HBase. I have chosen these three due to their recent

NoSQL systems. Lecture 21 (optional) Instructor: Sudeepa Roy. CompSci 516 Data Intensive Computing Systems

Agenda. AWS Database Services Traditional vs AWS Data services model Amazon RDS Redshift DynamoDB ElastiCache

CIB Session 12th NoSQL Databases Structures

Benchmarking Cloud Serving Systems with YCSB 詹剑锋 2012 年 6 月 27 日

References. NoSQL Motivation. Why NoSQL as the Solution? NoSQL Key Feature Decisions. CSE 444: Database Internals

CISC 7610 Lecture 2b The beginnings of NoSQL

CIS 601 Graduate Seminar. Dr. Sunnie S. Chung Dhruv Patel ( ) Kalpesh Sharma ( )

Challenges for Data Driven Systems

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

Rule 14 Use Databases Appropriately

COSC 416 NoSQL Databases. NoSQL Databases Overview. Dr. Ramon Lawrence University of British Columbia Okanagan

Goal of the presentation is to give an introduction of NoSQL databases, why they are there.

CISC 7610 Lecture 5 Distributed multimedia databases. Topics: Scaling up vs out Replication Partitioning CAP Theorem NoSQL NewSQL

/ Cloud Computing. Recitation 6 October 2 nd, 2018

DATABASE DESIGN II - 1DL400

Advanced Database Project: Document Stores and MongoDB

OPEN SOURCE DB SYSTEMS TYPES OF DBMS

Why NoSQL? Why Riak?

CS 655 Advanced Topics in Distributed Systems

PROFESSIONAL. NoSQL. Shashank Tiwari WILEY. John Wiley & Sons, Inc.

Oral Questions and Answers (DBMS LAB) Questions & Answers- DBMS

Distributed Databases: SQL vs NoSQL

Introduction Aggregate data model Distribution Models Consistency Map-Reduce Types of NoSQL Databases

A NoSQL Introduction for Relational Database Developers. Andrew Karcher Las Vegas SQL Saturday September 12th, 2015

Final Exam Logistics. CS 133: Databases. Goals for Today. Some References Used. Final exam take-home. Same resources as midterm

Introduction to Computer Science. William Hsu Department of Computer Science and Engineering National Taiwan Ocean University

Distributed Data Store

CSE 344 Final Review. August 16 th

Key Value Store. Yiding Wang, Zhaoxiong Yang

Warnings! Today. New Systems. OLAP vs. OLTP. New Systems vs. RDMS. NoSQL 11/5/17. Material from Cattell s paper ( ) some info will be outdated

Shen PingCAP 2017

Introduction to NoSQL

COMP9321 Web Application Engineering

CSE 344 MAY 7 TH EXAM REVIEW

Database Architectures

NoSQL Databases MongoDB vs Cassandra. Kenny Huynh, Andre Chik, Kevin Vu

Data Informatics. Seon Ho Kim, Ph.D.

CSE 544 Principles of Database Management Systems. Lecture 4: Data Models a Never-Ending Story

Accessing other data fdw, dblink, pglogical, plproxy,...

Large-Scale Web Applications

Introduction to Database Systems CSE 414. Lecture 16: Query Evaluation

CSE 544 Principles of Database Management Systems

What We Have Already Learned. DBMS Deployment: Local. Where We Are Headed Next. DBMS Deployment: 3 Tiers. DBMS Deployment: Client/Server

NoSQL systems: introduction and data models. Riccardo Torlone Università Roma Tre

HyperDex. A Distributed, Searchable Key-Value Store. Robert Escriva. Department of Computer Science Cornell University

Database Systems CSE 414

Announcements. JSon Data Structures. JSon Syntax. JSon Semantics: a Tree! JSon Primitive Datatypes. Introduction to Database Systems CSE 414

Understanding NoSQL Database Implementations

A Review Of Non Relational Databases, Their Types, Advantages And Disadvantages

<Insert Picture Here> MySQL Cluster What are we working on

Comparing SQL and NOSQL databases

CSE 344 JANUARY 5 TH INTRO TO THE RELATIONAL DATABASE

NoSQL Databases. Amir H. Payberah. Swedish Institute of Computer Science. April 10, 2014

MySQL Cluster Web Scalability, % Availability. Andrew

Advanced Data Management Technologies

Relational databases

Introduction to Data Management CSE 344. Lecture 1: Introduction

Presented by Sunnie S Chung CIS 612

Migrating Oracle Databases To Cassandra

CSE 530A. Non-Relational Databases. Washington University Fall 2013

Non-Relational Databases. Pelle Jakovits

Perspectives on NoSQL

Cassandra- A Distributed Database

A Study of NoSQL Database

NoSQL : A Panorama for Scalable Databases in Web

Big Data Technology Ecosystem. Mark Burnette Pentaho Director Sales Engineering, Hitachi Vantara

Database Evolution. DB NoSQL Linked Open Data. L. Vigliano

Getting to know. by Michelle Darling August 2013

1

Introduction to Azure DocumentDB. Jeff Renz, BI Architect RevGen Partners

Stages of Data Processing

Lecture 2: January 24

A Survey Paper on NoSQL Databases: Key-Value Data Stores and Document Stores

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Study of NoSQL Database Along With Security Comparison

IBM Informix xC2 Enhancements IBM Corporation

Transcription:

CSE 344 JULY 9 TH NOSQL

ADMINISTRATIVE MINUTIAE HW3 due Wednesday tests released actual_time should have 0s not NULLs upload new data file or use UPDATE to change 0 ~> NULL Extra OOs on Mondays 5-7pm in CSE 006 (Andrew) Recording lectures email me for missed class to get access

CLASS OVERVIEW Unit 1: Intro Unit 2: Relational Data Models and Query Languages Unit 3: Non-relational data NoSQL Json SQL++ Unit 4: RDMBs internals and query optimization Unit 5: Parallel query processing Unit 6: DBMS usability, conceptual design Unit 7: Transactions Unit 8: Advanced topics (time permitting)

TWO CLASSES OF DATABASE APPLICATIONS OLTP (Online Transaction Processing) Queries are simple lookups: 0 or 1 join E.g., find customer by ID and their orders Many updates. E.g., insert order, update payment Consistency is critical: transactions (more later) OLAP (Online Analytical Processing) aka Decision Support Queries have many joins, and group-by s E.g., sum revenues by store, product, clerk, date No updates

NOSQL MOTIVATION Term has two different meanings 1. non-relational data (more useful) 2. simplified functionality (less useful) Item 2. originally motivated by Web 2.0 applications E.g. ebay, Facebook, Amazon, Instagram, etc Web startups need to scale up to 100+m users very quickly Needed: very large scale OLTP workloads Give up on large-scale consistency Give up OLAP

WHAT IS THE PROBLEM? Single server DBMS are too small for Web data Solution: try scaling out to multiple servers This is hard for the entire functionality of DMBS NoSQL: reduce functionality for easier scale up Simpler data model Very restricted updates

RDBMS REVIEW: SERVERLESS Desktop DBMS Application (SQLite) User SQLite: One data file One user One DBMS application Consistency is easy But only a limited number of scenarios work with such model File Data file Disk

RDBMS REVIEW: CLIENT-SERVER Server Machine Client Applications File 1 Connection (JDBC, ODBC) File 2 File 3 DB Server One server running the database Many clients, connecting via the ODBC or JDBC (Java Database Connectivity) protocol

RDBMS REVIEW: CLIENT-SERVER Server Machine Many users and apps Consistency is harder à transactions Client Applications File 1 Connection (JDBC, ODBC) File 2 File 3 DB Server One server running the database Many clients, connecting via the ODBC or JDBC (Java Database Connectivity) protocol

CLIENT-SERVER One server that runs the DBMS (or RDBMS): Your own desktop, or Some beefy system, or A cloud service (SQL Azure)

CLIENT-SERVER One server that runs the DBMS (or RDBMS): Your own desktop, or Some beefy system, or A cloud service (SQL Azure) Many clients run apps and connect to DBMS Microsoft s Management Studio (for SQL Server), or psql (for postgres) Some Java program (HW8) or some C++ program

CLIENT-SERVER One server that runs the DBMS (or RDBMS): Your own desktop, or Some beefy system, or A cloud service (SQL Azure) Many clients run apps and connect to DBMS Microsoft s Management Studio (for SQL Server), or psql (for postgres) Some Java program (HW8) or some C++ program Clients talk to server using JDBC/ODBC protocol

WEB APPS: 3 TIER Browser File 1 File 2 File 3 DB Server

WEB APPS: 3 TIER Browser File 1 File 2 Connection (e.g., JDBC) HTTP/SSL File 3 DB Server App+Web Server

WEB APPS: 3 TIER Web-based applications Browser File 1 File 2 Connection (e.g., JDBC) HTTP/SSL File 3 DB Server App+Web Server

WEB APPS: 3 TIER Web-based applications File 1 App+Web Server File 2 Connection (e.g., JDBC) HTTP/SSL File 3 App+Web Server DB Server App+Web Server

WEB APPS: 3 TIER for scaleup Web-based applications Replicate App server File 1 App+Web Server File 2 Connection (e.g., JDBC) File 3 DB Server App+Web Server HTTP/SSL Why not replicate DB server? App+Web Server

WEB APPS: 3 TIER for scaleup Web-based applications Replicate App server File 1 App+Web Server File 2 Connection (e.g., JDBC) File 3 DB Server App+Web Server HTTP/SSL Why not replicate DB server? Consistency! App+Web Server

REPLICATING THE DATABASE Two basic approaches: Scale up through partitioning Scale up through replication Consistency is much harder to enforce

SCALE THROUGH PARTITIONING Partition the database across many machines in a cluster Database now fits in main memory Queries spread across these machines Can increase throughput Easy for writes but reads become expensive! Application updates here Three partitions May also update here

SCALE THROUGH REPLICATION Create multiple copies of each database partition Spread queries across these replicas Can increase throughput and lower latency Can also improve fault-tolerance Easy for reads but writes become expensive! App 1 updates here only Three replicas App 2 updates here only

RELATIONAL MODEL à NOSQL Relational DB: difficult to replicate/partition Given Supplier(sno, ),Part(pno, ),Supply(sno,pno) Partition: we may be forced to join across servers Replication: local copy has inconsistent versions Consistency is hard in both cases (why?) NoSQL: simplified data model Give up much functionality Application must now handle joins and consistency (that s a lot!) (Future NoSQL systems should fix this.)

DATA MODELS Taxonomy based on data models: Key-value stores e.g., Project Voldemort, Memcached Document stores e.g., SimpleDB, CouchDB, MongoDB Extensible Record Stores e.g., HBase, Cassandra, PNUTS

KEY-VALUE STORES FEATURES Data model: (key,value) pairs Key = string/integer, unique for the entire data Value = can be anything (very complex object)

KEY-VALUE STORES FEATURES Data model: (key,value) pairs Key = string/integer, unique for the entire data Value = can be anything (very complex object) Operations get(key), put(key,value) Operations on value not supported

KEY-VALUE STORES FEATURES Data model: (key,value) pairs Key = string/integer, unique for the entire data Value = can be anything (very complex object) Operations get(key), put(key,value) Operations on value not supported Distribution / Partitioning w/ hash function No replication: key k is stored at server h(k) h(k) returns a number in [0, num_machines-1] 3-way replication: key k stored at h1(k),h2(k),h3(k)

KEY-VALUE STORES FEATURES Data model: (key,value) pairs Key = string/integer, unique for the entire data Value = can be anything (very complex object) Operations get(key), put(key,value) Operations on value not supported Distribution / Partitioning w/ hash function No replication: key k is stored at server h(k) h(k) returns a number in [0, num_machines-1] 3-way replication: key k stored at h1(k),h2(k),h3(k) How does get(k) work? How does put(k,v) work?

Flights(fid, date, carrier, flight_num, origin, dest,...) Carriers(cid, name) EXAMPLE How would you represent the Flights data as key, value pairs? How does query processing work?

Flights(fid, date, carrier, flight_num, origin, dest,...) Carriers(cid, name) EXAMPLE How would you represent the Flights data as key, value pairs? Option 1: key=fid, value=entire flight record How does query processing work?

Flights(fid, date, carrier, flight_num, origin, dest,...) Carriers(cid, name) EXAMPLE How would you represent the Flights data as key, value pairs? Option 1: key=fid, value=entire flight record Option 2: key=date, value=all flights that day How does query processing work?

Flights(fid, date, carrier, flight_num, origin, dest,...) Carriers(cid, name) EXAMPLE How would you represent the Flights data as key, value pairs? Option 1: key=fid, value=entire flight record Option 2: key=date, value=all flights that day Option 3: key=(origin,dest), value=all flights between How does query processing work?

KEY-VALUE STORES INTERNALS Partitioning: Use a hash function h, store every (key,value) pair on server h(key) Replication: Store each key on (say) three servers On update, propagate change to the other servers; eventual consistency Issue: when an app reads one replica, it may be stale Usually: combine partitioning & replication

DATA MODELS Taxonomy based on data models: Key-value stores e.g., Project Voldemort, Memcached Document stores e.g., SimpleDB, CouchDB, MongoDB Extensible Record Stores e.g., HBase, Cassandra, PNUTS

EXTENSIBLE RECORD STORES Based on Google s BigTable Data model is rows and columns Scalability by splitting rows and columns over nodes Rows partitioned through sharding on primary key Columns of a table are distributed over multiple nodes by using column groups HBase is an open source implementation of BigTable

DATA MODELS Taxonomy based on data models: Key-value stores e.g., Project Voldemort, Memcached Document stores e.g., SimpleDB, CouchDB, MongoDB Extensible Record Stores e.g., HBase, Cassandra, PNUTS

MOTIVATION In Key, Value stores, the Value is often a very complex object Key = 2010/7/1, Value = [all flights that date] Better: allow DBMS to understand the value Represent value as a JSON (or XML...) document [all flights on that date] = a JSON file May search for all flights on a given date

DOCUMENT STORES FEATURES Data model: (key,document) pairs Key = string/integer, unique for the entire data Document = JSon or XML Operations Get/put document by key Query language over JSon Distribution / Partitioning Entire documents, as for key/value pairs We will discuss JSon

WHERE WE ARE So far we have studied the relational data model Data is stored in tables(=relations) Queries are expressions in SQL, relational algebra, or Datalog Today: Semistructured data model Popular formats today: XML, JSon, protobuf

JSON - OVERVIEW JavaScript Object Notation = lightweight text-based open standard designed for human-readable data interchange. Interfaces in C, C++, Java, Python, Perl, etc. The filename extension is.json. We will emphasize JSon as semi-structured data

JSON SYNTAX { "book": [ {"id":"01", "language": "Java, "author": H. Javeson, year : 2015 }, {"id":"07", "language": "C++", "edition": "second" "author": E. Sepp, price : 22.25 } ] }

JSON VS RELATIONAL Relational data model Rigid flat structure (tables) Schema must be fixed in advanced Binary representation: good for performance, bad for exchange Query language based on Relational Algebra Semistructured data model / JSon Flexible, nested structure (trees) Does not require predefined schema ("self describing ) Text representation: good for exchange, bad for performance not a panacea: more rigid structures are easier for you to query too! Most common use: Language API; query languages emerging

JSON TERMINOLOGY Data is represented in name/value pairs. Curly braces hold objects Each object is a list of name/value pairs separated by, (comma) Each pair is a name is followed by ':'(colon) followed by the value Square brackets hold arrays and values are separated by,(comma).