Parallel DBs. April 25, 2017

Similar documents
Parallel DBs. April 23, 2018

Semi-Joins and Bloom Join. Databases: The Complete Book Ch 20

Parallel DBs. April 25, 2017

Parallel DBMS. Parallel Database Systems. PDBS vs Distributed DBS. Types of Parallelism. Goals and Metrics Speedup. Types of Parallelism

CompSci 516: Database Systems. Lecture 20. Parallel DBMS. Instructor: Sudeepa Roy

Parallel DBMS. Prof. Yanlei Diao. University of Massachusetts Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

CSE 190D Spring 2017 Final Exam Answers

Introduction to Data Management CSE 344

PARALLEL & DISTRIBUTED DATABASES CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

CSE 344 MAY 2 ND MAP/REDUCE

Parallel DBMS. Lecture 20. Reading Material. Instructor: Sudeepa Roy. Reading Material. Parallel vs. Distributed DBMS. Parallel DBMS 11/15/18

External Sorting Implementing Relational Operators

! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large

Chapter 20: Parallel Databases

Chapter 20: Parallel Databases. Introduction

Database Architectures

Parallel DBMS. Chapter 22, Part A

CSE 344 MAY 7 TH EXAM REVIEW

CSE 190D Spring 2017 Final Exam

CMSC424: Database Design. Instructor: Amol Deshpande

Chapter 17: Parallel Databases

Chapter 18: Parallel Databases

Chapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction

Parallel DBMS. Lecture 20. Reading Material. Instructor: Sudeepa Roy. Reading Material. Parallel vs. Distributed DBMS. Parallel DBMS 11/7/17

Chapter 12: Query Processing. Chapter 12: Query Processing

Chapter 13: Query Processing

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

COURSE 12. Parallel DBMS

Announcements. Database Systems CSE 414. Why compute in parallel? Big Data 10/11/2017. Two Kinds of Parallel Data Processing

CSE 344 Final Review. August 16 th

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

Chapter 13: Query Processing Basic Steps in Query Processing

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Advanced Databases: Parallel Databases A.Poulovassilis

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

Database Architectures

Advanced Database Systems

Advanced Databases. Lecture 15- Parallel Databases (continued) Masood Niazi Torshiz Islamic Azad University- Mashhad Branch

2.3 Algorithms Using Map-Reduce

The Extended Algebra. Duplicate Elimination. Sorting. Example: Duplicate Elimination

Outline. Database Management and Tuning. Outline. Join Strategies Running Example. Index Tuning. Johann Gamper. Unit 6 April 12, 2012

Chapter 12: Query Processing

CAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

Database Applications (15-415)

Evaluation of Relational Operations

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Chapter 3. Algorithms for Query Processing and Optimization

Introduction to Data Management CSE 344

B.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2

Query optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag.

Implementation of Relational Operations

Evaluation of relational operations

Outline. Parallel Database Systems. Information explosion. Parallelism in DBMSs. Relational DBMS parallelism. Relational DBMSs.

Huge market -- essentially all high performance databases work this way

Evaluation of Relational Operations. Relational Operations

Architecture and Implementation of Database Systems (Winter 2014/15)

Evaluation of Relational Operations

Improving the Performance of OLAP Queries Using Families of Statistics Trees

CMSC424: Database Design. Instructor: Amol Deshpande

Sublinear Models for Streaming and/or Distributed Data

Lassonde School of Engineering Winter 2016 Term Course No: 4411 Database Management Systems

What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques

CSE Midterm - Spring 2017 Solutions

Query Optimization. Query Optimization. Optimization considerations. Example. Interaction of algorithm choice and tree arrangement.

Database Applications (15-415)

CMP-3440 Database Systems

Database System Concepts

CS 245 Midterm Exam Winter 2014

Database Applications (15-415)

Track Join. Distributed Joins with Minimal Network Traffic. Orestis Polychroniou! Rajkumar Sen! Kenneth A. Ross

Data Partitioning and MapReduce

Introduction to Data Management CSE 344

EECS 647: Introduction to Database Systems

Advanced Data Management Technologies Written Exam

Implementation of Relational Operations. Introduction. CS 186, Fall 2002, Lecture 19 R&G - Chapter 12

Chapter 12: Query Processing

Database Applications (15-415)

Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Administrivia. Administrivia. Faloutsos/Pavlo CMU /615

Evaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi

Statistics. Duplicate Elimination

It also performs many parallelization operations like, data loading and query processing.

Today's Class. Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Example Database. Query Plan Example

Evaluation of Relational Operations: Other Techniques

MapReduce. Stony Brook University CSE545, Fall 2016

Cost Models. the query database statistics description of computational resources, e.g.

Database Ph.D. Qualifying Exam Spring 2006

Query Processing. Lecture #10. Andy Pavlo Computer Science Carnegie Mellon Univ. Database Systems / Fall 2018

University of Waterloo Midterm Examination Sample Solution

Operator Implementation Wrap-Up Query Optimization

Query Processing. Introduction to Databases CompSci 316 Fall 2017

Column Stores vs. Row Stores How Different Are They Really?

Optimization Overview

Announcements. Optional Reading. Distributed File System (DFS) MapReduce Process. MapReduce. Database Systems CSE 414. HW5 is due tomorrow 11pm

CSIT5300: Advanced Database Systems

CSE 344 APRIL 20 TH RDBMS INTERNALS

Query Execution [15]

GLADE: A Scalable Framework for Efficient Analytics. Florin Rusu University of California, Merced

Introduction to Wireless Sensor Network. Peter Scheuermann and Goce Trajcevski Dept. of EECS Northwestern University

Database Applications (15-415)

Transcription:

Parallel DBs April 25, 2017 1

Why Scale Up? Scan of 1 PB at 300MB/s (SATA r2 Limit) (x1000) ~1 Hour ~3.5 Seconds 2

Data Parallelism Replication Partitioning A A A A B C 3

Operator Parallelism Pipeline Parallelism: A task breaks down into stages; each machine processes one stage. Sequential Operation Sequential Operation Sequential Operation Partition Parallelism: Many machines doing the same thing to different pieces of data. Sequential Operation Sequential Operation Sequential Operation 4

Types of Parallelism Both types of parallelism are natural in a database management system. SELECT SUM( ) FROM Table WHERE Sequential Operation Sequential Operation Sequential Operation Sequential Operation Sequential Operation Sequential Operation Sequential Operation Sequential Operation Sequential Operation Sequential Operation LOAD SELECT AGG Combine 5

DBMSes: The First Success Story Every major DBMS vendor has a version. Reasons for success: Bulk Processing (Partition -ism). Natural Pipelining in RA plan. Users don t need to think in. 6

Types of Speedup Speed-up -ism More resources = proportionally less time spent. Scale-up -ism More resources = proportionally more data processed. Response Time Throughput # of Nodes # of Nodes 7

Parallelism Models CPU Memory Disk 8

Parallelism Models CPU Memory Disk How do the nodes communicate? 9

Parallelism Models Option 1: Shared Memory available to all CPUs CPU Memory Disk e.g., a Multi-Core/Multi-CPU System 10

Parallelism Models Option 2: Non-Uniform Memory Access. CPU Memory Disk Used by most AMD servers 11

Parallelism Models Option 3: Shared Disk available to all CPUs CPU Memory Disk Each node interacts with a disk on the network. 12

Parallelism Models Option 4: Shared Nothing in which all communication is explicit. CPU Memory Disk Examples include MPP, Map/Reduce. Often used as basis for other abstractions. 13

Parallelizing OLAP - Parallel Queries OLTP - Parallel Updates 14

Parallelizing OLAP - Parallel Queries OLTP - Parallel Updates 15

Parallelism & Distribution Distribute the Data Redundancy Faster access Parallelize the Computation Scale up (compute faster) Scale out (bigger data) 16

Operator Parallelism General Concept: Break task into individual units of computation. Challenge: How much data does each unit of computation need? Challenge: How much data transfer is needed to allow the unit of computation? Same challenges arise in Multicore, CUDA programming. 17

Parallel Data Flow A No Parallelism 18

Parallel Data Flow A 1 AN N-Way Parallelism 19

Parallel Data Flow B 1 BN??? A 1 AN Chaining Parallel Operators 20

Parallel Data Flow B 1 BN A 1 AN One-to-One Data Flow ( Map ) 21

Parallel Data Flow B 1 BN A 1 AN One-to-One Data Flow 22

Parallel Data Flow Extreme 1 All-to-All All nodes send all records to all downstream nodes B A 1 BN 1 AN Many-to-Many Data Flow 23 Extreme 2 Partition Each record goes to exactly one downstream node

Parallel Data Flow B A 1 AN Many-to-One Data Flow ( Reduce/Fold ) 24

Parallel Operators Select Project Union (bag) What is a logical unit of computation? (1 tuple) Is there a data dependency between units? (no) 25

Parallel Operators Select Project Union (bag) A 1 AN 1/N Tuples 1/N Tuples 26

Parallel Joins FOR i IN 1 to N FOR j IN 1 to K Partition JOIN(Block i of R, Partition Block j of S) One Unit of Computation 27

Parallel Joins K Partitions of S N Partitions of R Block 1 of R Block 1 of S N Block N of R Block 1 of S K K Block 1 of R Block K of S N Block N of R Block K of S 28

Practical Concerns UNION R1 S1 R1 S2 R2 S1 RN SM R1 R2 RN S1 S2 SM Where does the computation happen? How does the data get there? 29

Distributing the Work Let s start simple what can we do with no partitions? B R S R and S may be any RA expression 30

Distributing the Work B R S Node 1 No Parallelism! 31

Distributing the Work All of R and All of S get sent! B Node 3 R Node 1 S Node 2 Lots of Data Transfer! 32

Distributing the Work All of R get sent B R Node 1 S Node 2 Better! We can guess whether R or S is smaller. 33

Distributing the Work What can we do if R is partitioned? U B B R1 S R2 34

Distributing the Work There are lots of partitioning strategies, but this one is interesting. U B B R1 Node 1 S R2 Node 2 Node 3 35

Distributing the Work it can be used as a model for partitioning S U B B R1 Node 1 S1 R2 Node 2 Node 3 36

Distributing the Work it can be used as a model for partitioning S U B B R1 Node 1 S2 R2 Node 2 Node 3 37

Distributing the Work and neatly captures the data transfer issue. U B B R1 Node 1 S R2 Node 2 Node 3 38

Hash(R.B)%4 0 Hash(S.B)%4 Parallel Joins 0 1 2 3 X X X 1 2 3 R B S: Which Partitions of S Join w/ Bucket 0 of R? 39

Parallel Joins R.B S.B B<25 B<25 25 B<50 50 B<75 75 B 25 B<50 X 50 B<75 X X 75 B X R R.B < S.B S: X Which Partitions of S Can Produce Output? 40 X

Can we further reduce the amount of data sent? 41

Sending Hints Rk B Si The naive approach Node 1 Node 2 Rk Si 42

Sending Hints Rk B Si The naive approach Node 1 Send me Node 2 Rk Rk Si 43

Sending Hints Rk B Si The naive approach Rk Node 1 Node 2 Rk Si 44

Sending Hints Rk B Si The smarter approach πb( ) Si Node 1 Node 2 Rk Si 45

Sending Hints Rk B Si The smarter approach πb( ) Si Node 1 Node 2 πb( ) Rk Rk Si Si 46

Sending Hints Rk B Si The smarter approach Node 1 Node 2 <1,A> <2,B> <2,C> <3,D> <4,E> 47 <2,X> <3,Y> <6,Y>

Sending Hints Rk B Si The smarter approach Node 1 Node 2 <1,A> <2,B> <2,C> <3,D> <4,E> 48 Send me rows with a B of 2,3, or 6 <2,X> <3,Y> <6,Y>

Sending Hints Rk B Si The smarter approach Node 1 Node 2 <1,A> <2,B> <2,C> <3,D> <4,E> <2,B> <2,C> <3,D> 49 Send me rows with a B of 2,3, or 6 This is called a semi-join. <2,X> <3,Y> <6,Y>

Sending Hints Now Node 1 sends as little data as possible but Node 2 needs to send a lot of data. Can we do better? 50

Sending Hints Rk B Si Strategy 1: Parity Bits Node 1 Node 2 <1,A> <2,B> <2,C> <3,D> <4,E> 1 0 0 1 0 51 0 <2,X> 0 <6,Y>

Sending Hints Rk B Si Strategy 1: Parity Bits Node 1 Node 2 <1,A> <2,B> <2,C> <3,D> <4,E> 1 0 0 1 0 52 Send me data with a parity bit of 0 0 <2,X> 0 <6,Y>

Sending Hints Rk B Si Strategy 1: Parity Bit Node 1 sending too much is ok! (Node 2 still needs to compute B) Node 1 Node 2 <1,A> <2,B> <2,C> <3,D> <4,E> 1 0 0 1 0 <2,B> <2,C> <4,E> 53 Send me data with a parity bit of 0 Problem: One parity bit is too little 0 <2,X> 0 <6,Y>

Sending Hints Rk B Si Strategy 1: Parity Bit Node 1 Node 2 <1,A> <2,B> <2,C> <3,D> <4,E> 1 0 0 1 0 Problem: One parity bit is too little 54 0 <2,X> 1 <3,Y> 0 <6,Y>

Sending Hints Rk B Si Strategy 2: Parity Bits Node 1 Node 2 <1,A> <2,B> <2,C> <3,D> <4,E> 01 10 10 11 00 <2,B> <2,C> <3,D> 55 Send me data with parity bits 10 or 11 Problem: Almost as much data as πb 10<2,X> 11<3,Y> 10<6,Y>

Sending Hints Can we summarize the parity bits? 56

Bloom Filters Alice Bob Carol Dave 57

Bloom Filters Alice Bob Carol Dave Bloom Filter 58

Bloom Filters Yes Is Alice part of the set? Alice Bob Carol Bloom Filter No Is Eve part of the set? Dave Bloom Filter Guarantee Test definitely returns Yes if the element is in the set Test usually returns No if the element is not in the set Yes Is Fred part of the set? 59

Bloom Filters A Bloom Filter is a bit vector M - # of bits in the bit vector K - # of hash functions For ONE key (or record): For i between 0 and K: bitvector[ hashi (key) % M ] = 1 Each bit vector has ~K bits set 60

Bloom Filters Key 1 Key 2 Key 3 00101010 01010110 10000110 Filters are combined by Bitwise-OR e.g. (Key 1 Key 2) = 01111110 How do we test for inclusion? (Key & Filter) == Key? Key 4 01001100 (Key 1 & S) = 00101010 (Key 3 & S) = 00000110 X 61 (Key 4 & S) = 01001100 False Positive

Sending Hints Rk B Si Strategy 3: Bloom Filters Node 1 Node 2 <1,A> <2,B> <2,C> <3,D> <4,E> 62 <2,X> <3,Y> <6,Y>

Sending Hints Rk B Si Strategy 3: Bloom Filters Node 1 with a B in the Node 2 <1,A> <2,B> <2,C> <3,D> <4,E> 63 Send me rows bloom filter summarizing the set {2,3,6} <2,X> <3,Y> <6,Y>

Sending Hints Rk B Si Strategy 3: Bloom Filters Node 1 <2,B> with a B in the Node 2 <1,A> <2,B> <2,C> <3,D> <4,E> <2,C> <3,D> <4,E> This is called a bloom-join. 64 Send me rows bloom filter summarizing the set {2,3,6} <2,X> <3,Y> <6,Y>

Parallel Aggregates Algebraic: Bounded-size intermediate state (Sum, Count, Avg, Min, Max) Holistic: Unbounded-size intermediate state (Median, Mode/Top-K Count, Count-Distinct; Not Distribution-Friendly) 65

Fan-In Aggregation SUM B A 1 AN 66

Fan-In Aggregation SUM 8 Messages A 1 A 2 A 3 A 4 A 5 A 6 A 7 A 8 67

Fan-In Aggregation SUM 4 Messages 2 Messages (each) SUM 1 SUM 2 SUM 3 SUM 4 A 1 A 2 A 3 A 4 A 5 A 6 A 7 A 8 68

Fan-In Aggregation SUM 2 Messages 2 Messages (each) SUM 1 SUM 2 SUM 1 SUM 2 SUM 3 SUM 4 A 1 A 2 A 3 A 4 A 5 A 6 A 7 A 8 69

Fan-In Aggregation If Each Node Performs K Units of Work (K Messages) How Many Rounds of Computation Are Needed? LogK(N) 70

Fan-In Aggregation Components Combine(Intermediate1,, IntermediateN) = Intermediate <SUM1, COUNT1> <SUMN, COUNTN> = <SUM1+ +SUMN, COUNT1+ +COUNTN> Compute(Intermediate) = Aggregate Compute(<SUM, COUNT>) = SUM / COUNT 71