Parallel DBs. April 25, 2017
|
|
- Silvia Carr
- 6 years ago
- Views:
Transcription
1 Parallel DBs April 25,
2 Why Scale Up? Scan of 1 PB at 300MB/s (SATA r2 Limit) (x1000) ~1 Hour ~3.5 Seconds 2
3 Data Parallelism Replication Partitioning A A A A B C 3
4 Operator Parallelism Pipeline Parallelism: A task breaks down into stages; each machine processes one stage. Sequential Operation Sequential Operation Sequential Operation Partition Parallelism: Many machines doing the same thing to different pieces of data. Sequential Operation Sequential Operation Sequential Operation 4
5 Types of Parallelism Both types of parallelism are natural in a database management system. SELECT SUM( ) FROM Table WHERE Sequential Operation Sequential Operation Sequential Operation Sequential Operation Sequential Operation Sequential Operation Sequential Operation Sequential Operation Sequential Operation Sequential Operation LOAD SELECT AGG Combine 5
6 DBMSes: The First Success Story Every major DBMS vendor has a version. Reasons for success: Bulk Processing (Partition -ism). Natural Pipelining in RA plan. Users don t need to think in. 6
7 Types of Speedup Speed-up -ism More resources = proportionally less time spent. Scale-up -ism More resources = proportionally more data processed. Response Time Throughput # of Nodes # of Nodes 7
8 Parallelism Models CPU Memory Disk 8
9 Parallelism Models CPU Memory Disk How do the nodes communicate? 9
10 Parallelism Models Option 1: Shared Memory available to all CPUs CPU Memory Disk e.g., a Multi-Core/Multi-CPU System 10
11 Parallelism Models Option 2: Non-Uniform Memory Access. CPU Memory Disk Used by most AMD servers 11
12 Parallelism Models Option 3: Shared Disk available to all CPUs CPU Memory Disk Each node interacts with a disk on the network. 12
13 Parallelism Models Option 4: Shared Nothing in which all communication is explicit. CPU Memory Disk Examples include MPP, Map/Reduce. Often used as basis for other abstractions. 13
14 Parallelizing OLAP - Parallel Queries OLTP - Parallel Updates 14
15 Parallelizing OLAP - Parallel Queries OLTP - Parallel Updates 15
16 Parallelism & Distribution Distribute the Data Redundancy Faster access Parallelize the Computation Scale up (compute faster) Scale out (bigger data) 16
17 Operator Parallelism General Concept: Break task into individual units of computation. Challenge: How much data does each unit of computation need? Challenge: How much data transfer is needed to allow the unit of computation? Same challenges arise in Multicore, CUDA programming. 17
18 Parallel Data Flow A No Parallelism 18
19 Parallel Data Flow A 1 AN N-Way Parallelism 19
20 Parallel Data Flow B 1 BN??? A 1 AN Chaining Parallel Operators 20
21 Parallel Data Flow B 1 BN A 1 AN One-to-One Data Flow ( Map ) 21
22 Parallel Data Flow B 1 BN A 1 AN One-to-One Data Flow 22
23 Parallel Data Flow Extreme 1 All-to-All All nodes send all records to all downstream nodes B A 1 BN 1 AN Many-to-Many Data Flow 23 Extreme 2 Partition Each record goes to exactly one downstream node
24 Parallel Data Flow B A 1 AN Many-to-One Data Flow ( Reduce/Fold ) 24
25 Parallel Operators Select Project Union (bag) What is a logical unit of computation? (1 tuple) Is there a data dependency between units? (no) 25
26 Parallel Operators Select Project Union (bag) A 1 AN 1/N Tuples 1/N Tuples 26
27 Parallel Joins FOR i IN 1 to N FOR j IN 1 to K Partition JOIN(Block i of R, Partition Block j of S) One Unit of Computation 27
28 Parallel Joins K Partitions of S N Partitions of R Block 1 of R Block 1 of S N Block N of R Block 1 of S K K Block 1 of R Block K of S N Block N of R Block K of S 28
29 Practical Concerns UNION R1 S1 R1 S2 R2 S1 RN SM R1 R2 RN S1 S2 SM Where does the computation happen? How does the data get there? 29
30 Distributing the Work Let s start simple what can we do with no partitions? B R S R and S may be any RA expression 30
31 Distributing the Work B R S Node 1 No Parallelism! 31
32 Distributing the Work All of R and All of S get sent! B Node 3 R Node 1 S Node 2 Lots of Data Transfer! 32
33 Distributing the Work All of R get sent B R Node 1 S Node 2 Better! We can guess whether R or S is smaller. 33
34 Distributing the Work What can we do if R is partitioned? U B B R1 S R2 34
35 Distributing the Work There are lots of partitioning strategies, but this one is interesting. U B B R1 Node 1 S R2 Node 2 Node 3 35
36 Distributing the Work it can be used as a model for partitioning S U B B R1 Node 1 S1 R2 Node 2 Node 3 36
37 Distributing the Work it can be used as a model for partitioning S U B B R1 Node 1 S2 R2 Node 2 Node 3 37
38 Distributing the Work and neatly captures the data transfer issue. U B B R1 Node 1 S R2 Node 2 Node 3 38
39 Hash(R.B)%4 0 Hash(S.B)%4 Parallel Joins X X X R B S: Which Partitions of S Join w/ Bucket 0 of R? 39
40 Parallel Joins R.B S.B B<25 B<25 25 B<50 50 B<75 75 B 25 B<50 X 50 B<75 X X 75 B X R R.B < S.B S: X Which Partitions of S Can Produce Output? 40 X
41 Can we further reduce the amount of data sent? 41
42 Sending Hints Rk B Si The naive approach Node 1 Node 2 Rk Si 42
43 Sending Hints Rk B Si The naive approach Node 1 Send me Node 2 Rk Rk Si 43
44 Sending Hints Rk B Si The naive approach Rk Node 1 Node 2 Rk Si 44
45 Sending Hints Rk B Si The smarter approach πb( ) Si Node 1 Node 2 Rk Si 45
46 Sending Hints Rk B Si The smarter approach πb( ) Si Node 1 Node 2 πb( ) Rk Rk Si Si 46
47 Sending Hints Rk B Si The smarter approach Node 1 Node 2 <1,A> <2,B> <2,C> <3,D> <4,E> 47 <2,X> <3,Y> <6,Y>
48 Sending Hints Rk B Si The smarter approach Node 1 Node 2 <1,A> <2,B> <2,C> <3,D> <4,E> 48 Send me rows with a B of 2,3, or 6 <2,X> <3,Y> <6,Y>
49 Sending Hints Rk B Si The smarter approach Node 1 Node 2 <1,A> <2,B> <2,C> <3,D> <4,E> <2,B> <2,C> <3,D> 49 Send me rows with a B of 2,3, or 6 This is called a semi-join. <2,X> <3,Y> <6,Y>
50 Sending Hints Now Node 1 sends as little data as possible but Node 2 needs to send a lot of data. Can we do better? 50
51 Sending Hints Rk B Si Strategy 1: Parity Bits Node 1 Node 2 <1,A> <2,B> <2,C> <3,D> <4,E> <2,X> 0 <6,Y>
52 Sending Hints Rk B Si Strategy 1: Parity Bits Node 1 Node 2 <1,A> <2,B> <2,C> <3,D> <4,E> Send me data with a parity bit of 0 0 <2,X> 0 <6,Y>
53 Sending Hints Rk B Si Strategy 1: Parity Bit Node 1 sending too much is ok! (Node 2 still needs to compute B) Node 1 Node 2 <1,A> <2,B> <2,C> <3,D> <4,E> <2,B> <2,C> <4,E> 53 Send me data with a parity bit of 0 Problem: One parity bit is too little 0 <2,X> 0 <6,Y>
54 Sending Hints Rk B Si Strategy 1: Parity Bit Node 1 Node 2 <1,A> <2,B> <2,C> <3,D> <4,E> Problem: One parity bit is too little 54 0 <2,X> 1 <3,Y> 0 <6,Y>
55 Sending Hints Rk B Si Strategy 2: Parity Bits Node 1 Node 2 <1,A> <2,B> <2,C> <3,D> <4,E> <2,B> <2,C> <3,D> 55 Send me data with parity bits 10 or 11 Problem: Almost as much data as πb 10<2,X> 11<3,Y> 10<6,Y>
56 Sending Hints Can we summarize the parity bits? 56
57 Bloom Filters Alice Bob Carol Dave 57
58 Bloom Filters Alice Bob Carol Dave Bloom Filter 58
59 Bloom Filters Yes Is Alice part of the set? Alice Bob Carol Bloom Filter No Is Eve part of the set? Dave Bloom Filter Guarantee Test definitely returns Yes if the element is in the set Test usually returns No if the element is not in the set Yes Is Fred part of the set? 59
60 Bloom Filters A Bloom Filter is a bit vector M - # of bits in the bit vector K - # of hash functions For ONE key (or record): For i between 0 and K: bitvector[ hashi (key) % M ] = 1 Each bit vector has ~K bits set 60
61 Bloom Filters Key 1 Key 2 Key Filters are combined by Bitwise-OR e.g. (Key 1 Key 2) = How do we test for inclusion? (Key & Filter) == Key? Key (Key 1 & S) = (Key 3 & S) = X 61 (Key 4 & S) = False Positive
62 Sending Hints Rk B Si Strategy 3: Bloom Filters Node 1 Node 2 <1,A> <2,B> <2,C> <3,D> <4,E> 62 <2,X> <3,Y> <6,Y>
63 Sending Hints Rk B Si Strategy 3: Bloom Filters Node 1 with a B in the Node 2 <1,A> <2,B> <2,C> <3,D> <4,E> 63 Send me rows bloom filter summarizing the set {2,3,6} <2,X> <3,Y> <6,Y>
64 Sending Hints Rk B Si Strategy 3: Bloom Filters Node 1 <2,B> with a B in the Node 2 <1,A> <2,B> <2,C> <3,D> <4,E> <2,C> <3,D> <4,E> This is called a bloom-join. 64 Send me rows bloom filter summarizing the set {2,3,6} <2,X> <3,Y> <6,Y>
65 Parallel Aggregates Algebraic: Bounded-size intermediate state (Sum, Count, Avg, Min, Max) Holistic: Unbounded-size intermediate state (Median, Mode/Top-K Count, Count-Distinct; Not Distribution-Friendly) 65
66 Fan-In Aggregation SUM B A 1 AN 66
67 Fan-In Aggregation SUM 8 Messages A 1 A 2 A 3 A 4 A 5 A 6 A 7 A 8 67
68 Fan-In Aggregation SUM 4 Messages 2 Messages (each) SUM 1 SUM 2 SUM 3 SUM 4 A 1 A 2 A 3 A 4 A 5 A 6 A 7 A 8 68
69 Fan-In Aggregation SUM 2 Messages 2 Messages (each) SUM 1 SUM 2 SUM 1 SUM 2 SUM 3 SUM 4 A 1 A 2 A 3 A 4 A 5 A 6 A 7 A 8 69
70 Fan-In Aggregation If Each Node Performs K Units of Work (K Messages) How Many Rounds of Computation Are Needed? LogK(N) 70
71 Fan-In Aggregation Components Combine(Intermediate1,, IntermediateN) = Intermediate <SUM1, COUNT1> <SUMN, COUNTN> = <SUM1+ +SUMN, COUNT1+ +COUNTN> Compute(Intermediate) = Aggregate Compute(<SUM, COUNT>) = SUM / COUNT 71
Parallel DBs. April 23, 2018
Parallel DBs April 23, 2018 1 Why Scale? Scan of 1 PB at 300MB/s (SATA r2 Limit) Why Scale Up? Scan of 1 PB at 300MB/s (SATA r2 Limit) ~1 Hour Why Scale Up? Scan of 1 PB at 300MB/s (SATA r2 Limit) (x1000)
More informationSemi-Joins and Bloom Join. Databases: The Complete Book Ch 20
Semi-Joins and Bloom Join Databases: The Complete Book Ch 20 1 Practical Concerns UNION R1 S1 R1 S2 R2 S1 RN SM R1 R2 RN S1 S2 SM 2 Practical Concerns UNION R1 S1 R1 S2 R2 S1 RN SM R1 R2 RN S1 S2 SM Where
More informationParallel DBs. April 25, 2017
Parallel DBs April 25, 2017 1 Sending Hints Rk B Si Strategy 3: Bloom Filters Node 1 Node 2 2 Sending Hints Rk B Si Strategy 3: Bloom Filters Node 1 with
More informationParallel DBMS. Parallel Database Systems. PDBS vs Distributed DBS. Types of Parallelism. Goals and Metrics Speedup. Types of Parallelism
Parallel DBMS Parallel Database Systems CS5225 Parallel DB 1 Uniprocessor technology has reached its limit Difficult to build machines powerful enough to meet the CPU and I/O demands of DBMS serving large
More informationCompSci 516: Database Systems. Lecture 20. Parallel DBMS. Instructor: Sudeepa Roy
CompSci 516 Database Systems Lecture 20 Parallel DBMS Instructor: Sudeepa Roy Duke CS, Fall 2017 CompSci 516: Database Systems 1 Announcements HW3 due on Monday, Nov 20, 11:55 pm (in 2 weeks) See some
More informationParallel DBMS. Prof. Yanlei Diao. University of Massachusetts Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke
Parallel DBMS Prof. Yanlei Diao University of Massachusetts Amherst Slides Courtesy of R. Ramakrishnan and J. Gehrke I. Parallel Databases 101 Rise of parallel databases: late 80 s Architecture: shared-nothing
More informationCSE 190D Spring 2017 Final Exam Answers
CSE 190D Spring 2017 Final Exam Answers Q 1. [20pts] For the following questions, clearly circle True or False. 1. The hash join algorithm always has fewer page I/Os compared to the block nested loop join
More informationIntroduction to Data Management CSE 344
Introduction to Data Management CSE 344 Lecture 24: MapReduce CSE 344 - Fall 2016 1 HW8 is out Last assignment! Get Amazon credits now (see instructions) Spark with Hadoop Due next wed CSE 344 - Fall 2016
More informationPARALLEL & DISTRIBUTED DATABASES CS561-SPRING 2012 WPI, MOHAMED ELTABAKH
PARALLEL & DISTRIBUTED DATABASES CS561-SPRING 2012 WPI, MOHAMED ELTABAKH 1 INTRODUCTION In centralized database: Data is located in one place (one server) All DBMS functionalities are done by that server
More informationCSE 344 MAY 2 ND MAP/REDUCE
CSE 344 MAY 2 ND MAP/REDUCE ADMINISTRIVIA HW5 Due Tonight Practice midterm Section tomorrow Exam review PERFORMANCE METRICS FOR PARALLEL DBMSS Nodes = processors, computers Speedup: More nodes, same data
More informationParallel DBMS. Lecture 20. Reading Material. Instructor: Sudeepa Roy. Reading Material. Parallel vs. Distributed DBMS. Parallel DBMS 11/15/18
Reading aterial CompSci 516 atabase Systems Lecture 20 Parallel BS Instructor: Sudeepa Roy [RG] Parallel BS: Chapter 22.1-22.5 [GUW] Parallel BS and map-reduce: Chapter 20.1-20.2 Acknowledgement: The following
More informationExternal Sorting Implementing Relational Operators
External Sorting Implementing Relational Operators 1 Readings [RG] Ch. 13 (sorting) 2 Where we are Working our way up from hardware Disks File abstraction that supports insert/delete/scan Indexing for
More information! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large
Chapter 20: Parallel Databases Introduction! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems!
More informationChapter 20: Parallel Databases
Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!
More informationChapter 20: Parallel Databases. Introduction
Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!
More informationDatabase Architectures
Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 11/15/12 Agenda Check-in Centralized and Client-Server Models Parallelism Distributed Databases Homework 6 Check-in
More informationParallel DBMS. Chapter 22, Part A
Parallel DBMS Chapter 22, Part A Slides by Joe Hellerstein, UCB, with some material from Jim Gray, Microsoft Research. See also: http://www.research.microsoft.com/research/barc/gray/pdb95.ppt Database
More informationCSE 344 MAY 7 TH EXAM REVIEW
CSE 344 MAY 7 TH EXAM REVIEW EXAMINATION STATIONS Exam Wednesday 9:30-10:20 One sheet of notes, front and back Practice solutions out after class Good luck! EXAM LENGTH Production v. Verification Practice
More informationCSE 190D Spring 2017 Final Exam
CSE 190D Spring 2017 Final Exam Full Name : Student ID : Major : INSTRUCTIONS 1. You have up to 2 hours and 59 minutes to complete this exam. 2. You can have up to one letter/a4-sized sheet of notes, formulae,
More informationCMSC424: Database Design. Instructor: Amol Deshpande
CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Databases Data Models Conceptual representa1on of the data Data Retrieval How to ask ques1ons of the database How to answer those ques1ons
More informationChapter 17: Parallel Databases
Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems Database Systems
More informationChapter 18: Parallel Databases
Chapter 18: Parallel Databases Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery
More informationChapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction
Chapter 18: Parallel Databases Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of
More informationParallel DBMS. Lecture 20. Reading Material. Instructor: Sudeepa Roy. Reading Material. Parallel vs. Distributed DBMS. Parallel DBMS 11/7/17
Reading aterial CompSci 516 atabase Systems Lecture 20 Parallel BS Instructor: Sudeepa Roy [RG] Parallel BS: Chapter 22.1-22.5 [GUW] Parallel BS and map-reduce: Chapter 20.1-20.2 Acknowledgement: The following
More informationChapter 12: Query Processing. Chapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join
More informationChapter 13: Query Processing
Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing
More informationIntroduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe
Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms
More informationCOURSE 12. Parallel DBMS
COURSE 12 Parallel DBMS 1 Parallel DBMS Most DB research focused on specialized hardware CCD Memory: Non-volatile memory like, but slower than flash memory Bubble Memory: Non-volatile memory like, but
More informationAnnouncements. Database Systems CSE 414. Why compute in parallel? Big Data 10/11/2017. Two Kinds of Parallel Data Processing
Announcements Database Systems CSE 414 HW4 is due tomorrow 11pm Lectures 18: Parallel Databases (Ch. 20.1) 1 2 Why compute in parallel? Multi-cores: Most processors have multiple cores This trend will
More informationCSE 344 Final Review. August 16 th
CSE 344 Final Review August 16 th Final In class on Friday One sheet of notes, front and back cost formulas also provided Practice exam on web site Good luck! Primary Topics Parallel DBs parallel join
More information! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for
Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and
More informationChapter 13: Query Processing Basic Steps in Query Processing
Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and
More informationAnnouncement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17
Announcement CompSci 516 Database Systems Lecture 10 Query Evaluation and Join Algorithms Project proposal pdf due on sakai by 5 pm, tomorrow, Thursday 09/27 One per group by any member Instructor: Sudeepa
More informationAdvanced Databases: Parallel Databases A.Poulovassilis
1 Advanced Databases: Parallel Databases A.Poulovassilis 1 Parallel Database Architectures Parallel database systems use parallel processing techniques to achieve faster DBMS performance and handle larger
More informationSomething to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:
Query Evaluation Techniques for large DB Part 1 Fact: While data base management systems are standard tools in business data processing they are slowly being introduced to all the other emerging data base
More informationDatabase Architectures
Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/15/15 Agenda Check-in Parallelism and Distributed Databases Technology Research Project Introduction to NoSQL
More informationAdvanced Database Systems
Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed
More informationAdvanced Databases. Lecture 15- Parallel Databases (continued) Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
Advanced Databases Lecture 15- Parallel Databases (continued) Masood Niazi Torshiz Islamic Azad University- Mashhad Branch www.mniazi.ir Parallel Join The join operation requires pairs of tuples to be
More information2.3 Algorithms Using Map-Reduce
28 CHAPTER 2. MAP-REDUCE AND THE NEW SOFTWARE STACK one becomes available. The Master must also inform each Reduce task that the location of its input from that Map task has changed. Dealing with a failure
More informationThe Extended Algebra. Duplicate Elimination. Sorting. Example: Duplicate Elimination
The Extended Algebra Duplicate Elimination 2 δ = eliminate duplicates from bags. τ = sort tuples. γ = grouping and aggregation. Outerjoin : avoids dangling tuples = tuples that do not join with anything.
More informationOutline. Database Management and Tuning. Outline. Join Strategies Running Example. Index Tuning. Johann Gamper. Unit 6 April 12, 2012
Outline Database Management and Tuning Johann Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Unit 6 April 12, 2012 1 Acknowledgements: The slides are provided by Nikolaus Augsten
More informationChapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join
More informationCAS CS 460/660 Introduction to Database Systems. Query Evaluation II 1.1
CAS CS 460/660 Introduction to Database Systems Query Evaluation II 1.1 Cost-based Query Sub-System Queries Select * From Blah B Where B.blah = blah Query Parser Query Optimizer Plan Generator Plan Cost
More informationAlgorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)
Chapter 19 Algorithms for Query Processing and Optimization 0. Introduction to Query Processing (1) Query optimization: The process of choosing a suitable execution strategy for processing a query. Two
More informationDatabase Applications (15-415)
Database Applications (15-415) DBMS Internals- Part VIII Lecture 16, March 19, 2014 Mohammad Hammoud Today Last Session: DBMS Internals- Part VII Algorithms for Relational Operations (Cont d) Today s Session:
More informationEvaluation of Relational Operations
Evaluation of Relational Operations Yanlei Diao UMass Amherst March 13 and 15, 2006 Slides Courtesy of R. Ramakrishnan and J. Gehrke 1 Relational Operations We will consider how to implement: Selection
More informationQuery Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016
Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,
More informationChapter 3. Algorithms for Query Processing and Optimization
Chapter 3 Algorithms for Query Processing and Optimization Chapter Outline 1. Introduction to Query Processing 2. Translating SQL Queries into Relational Algebra 3. Algorithms for External Sorting 4. Algorithms
More informationIntroduction to Data Management CSE 344
Introduction to Data Management CSE 344 Lecture 26: Parallel Databases and MapReduce CSE 344 - Winter 2013 1 HW8 MapReduce (Hadoop) w/ declarative language (Pig) Cluster will run in Amazon s cloud (AWS)
More informationB.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2
Introduction :- Today single CPU based architecture is not capable enough for the modern database that are required to handle more demanding and complex requirements of the users, for example, high performance,
More informationQuery optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag.
Database Management Systems DBMS Architecture SQL INSTRUCTION OPTIMIZER MANAGEMENT OF ACCESS METHODS CONCURRENCY CONTROL BUFFER MANAGER RELIABILITY MANAGEMENT Index Files Data Files System Catalog DATABASE
More informationImplementation of Relational Operations
Implementation of Relational Operations Module 4, Lecture 1 Database Management Systems, R. Ramakrishnan 1 Relational Operations We will consider how to implement: Selection ( ) Selects a subset of rows
More informationEvaluation of relational operations
Evaluation of relational operations Iztok Savnik, FAMNIT Slides & Textbook Textbook: Raghu Ramakrishnan, Johannes Gehrke, Database Management Systems, McGraw-Hill, 3 rd ed., 2007. Slides: From Cow Book
More informationOutline. Parallel Database Systems. Information explosion. Parallelism in DBMSs. Relational DBMS parallelism. Relational DBMSs.
Parallel Database Systems STAVROS HARIZOPOULOS stavros@cs.cmu.edu Outline Background Hardware architectures and performance metrics Parallel database techniques Gamma Bonus: NCR / Teradata Conclusions
More informationHuge market -- essentially all high performance databases work this way
11/5/2017 Lecture 16 -- Parallel & Distributed Databases Parallel/distributed databases: goal provide exactly the same API (SQL) and abstractions (relational tables), but partition data across a bunch
More informationEvaluation of Relational Operations. Relational Operations
Evaluation of Relational Operations Chapter 14, Part A (Joins) Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Relational Operations v We will consider how to implement: Selection ( )
More informationArchitecture and Implementation of Database Systems (Winter 2014/15)
Jens Teubner Architecture & Implementation of DBMS Winter 2014/15 1 Architecture and Implementation of Database Systems (Winter 2014/15) Jens Teubner, DBIS Group jens.teubner@cs.tu-dortmund.de Winter 2014/15
More informationEvaluation of Relational Operations
Evaluation of Relational Operations Chapter 14 Comp 521 Files and Databases Fall 2010 1 Relational Operations We will consider in more detail how to implement: Selection ( ) Selects a subset of rows from
More informationImproving the Performance of OLAP Queries Using Families of Statistics Trees
Improving the Performance of OLAP Queries Using Families of Statistics Trees Joachim Hammer Dept. of Computer and Information Science University of Florida Lixin Fu Dept. of Mathematical Sciences University
More informationCMSC424: Database Design. Instructor: Amol Deshpande
CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Databases Data Models Conceptual representa1on of the data Data Retrieval How to ask ques1ons of the database How to answer those ques1ons
More informationSublinear Models for Streaming and/or Distributed Data
Sublinear Models for Streaming and/or Distributed Data Qin Zhang Guest lecture in B649 Feb. 3, 2015 1-1 Now about the Big Data Big data is everywhere : over 2.5 petabytes of sales transactions : an index
More informationLassonde School of Engineering Winter 2016 Term Course No: 4411 Database Management Systems
Lassonde School of Engineering Winter 2016 Term Course No: 4411 Database Management Systems Last Name: First Name: Student ID: 1. Exam is 2 hours long 2. Closed books/notes Problem 1 (6 points) Consider
More informationWhat happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques
376a. Database Design Dept. of Computer Science Vassar College http://www.cs.vassar.edu/~cs376 Class 16 Query optimization What happens Database is given a query Query is scanned - scanner creates a list
More informationCSE Midterm - Spring 2017 Solutions
CSE Midterm - Spring 2017 Solutions March 28, 2017 Question Points Possible Points Earned A.1 10 A.2 10 A.3 10 A 30 B.1 10 B.2 25 B.3 10 B.4 5 B 50 C 20 Total 100 Extended Relational Algebra Operator Reference
More informationQuery Optimization. Query Optimization. Optimization considerations. Example. Interaction of algorithm choice and tree arrangement.
COS 597: Principles of Database and Information Systems Query Optimization Query Optimization Query as expression over relational algebraic operations Get evaluation (parse) tree Leaves: base relations
More informationDatabase Applications (15-415)
Database Applications (15-415) DBMS Internals- Part V Lecture 13, March 10, 2014 Mohammad Hammoud Today Welcome Back from Spring Break! Today Last Session: DBMS Internals- Part IV Tree-based (i.e., B+
More informationCMP-3440 Database Systems
CMP-3440 Database Systems Relational DB Languages Relational Algebra, Calculus, SQL Lecture 05 zain 1 Introduction Relational algebra & relational calculus are formal languages associated with the relational
More informationDatabase System Concepts
Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth
More informationCS 245 Midterm Exam Winter 2014
CS 245 Midterm Exam Winter 2014 This exam is open book and notes. You can use a calculator and your laptop to access course notes and videos (but not to communicate with other people). You have 70 minutes
More informationDatabase Applications (15-415)
Database Applications (15-415) DBMS Internals- Part V Lecture 15, March 15, 2015 Mohammad Hammoud Today Last Session: DBMS Internals- Part IV Tree-based (i.e., B+ Tree) and Hash-based (i.e., Extendible
More informationTrack Join. Distributed Joins with Minimal Network Traffic. Orestis Polychroniou! Rajkumar Sen! Kenneth A. Ross
Track Join Distributed Joins with Minimal Network Traffic Orestis Polychroniou Rajkumar Sen Kenneth A. Ross Local Joins Algorithms Hash Join Sort Merge Join Index Join Nested Loop Join Spilling to disk
More informationData Partitioning and MapReduce
Data Partitioning and MapReduce Krzysztof Dembczyński Intelligent Decision Support Systems Laboratory (IDSS) Poznań University of Technology, Poland Intelligent Decision Support Systems Master studies,
More informationIntroduction to Data Management CSE 344
Introduction to Data Management CSE 344 Lecture 25: Parallel Databases CSE 344 - Winter 2013 1 Announcements Webquiz due tonight last WQ! J HW7 due on Wednesday HW8 will be posted soon Will take more hours
More informationEECS 647: Introduction to Database Systems
EECS 647: Introduction to Database Systems Instructor: Luke Huan Spring 2009 Stating Points A database A database management system A miniworld A data model Conceptual model Relational model 2/24/2009
More informationAdvanced Data Management Technologies Written Exam
Advanced Data Management Technologies Written Exam 02.02.2016 First name Student number Last name Signature Instructions for Students Write your name, student number, and signature on the exam sheet. This
More informationImplementation of Relational Operations. Introduction. CS 186, Fall 2002, Lecture 19 R&G - Chapter 12
Implementation of Relational Operations CS 186, Fall 2002, Lecture 19 R&G - Chapter 12 First comes thought; then organization of that thought, into ideas and plans; then transformation of those plans into
More informationChapter 12: Query Processing
Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation
More informationDatabase Applications (15-415)
Database Applications (15-415) DBMS Internals- Part VI Lecture 17, March 24, 2015 Mohammad Hammoud Today Last Two Sessions: DBMS Internals- Part V External Sorting How to Start a Company in Five (maybe
More informationCarnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Administrivia. Administrivia. Faloutsos/Pavlo CMU /615
Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB Applications C. Faloutsos A. Pavlo Lecture#14(b): Implementation of Relational Operations Administrivia HW4 is due today. HW5 is out. Faloutsos/Pavlo
More informationEvaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi
Evaluation of Relational Operations: Other Techniques Chapter 14 Sayyed Nezhadi Schema for Examples Sailors (sid: integer, sname: string, rating: integer, age: real) Reserves (sid: integer, bid: integer,
More informationStatistics. Duplicate Elimination
Query Execution 1. Parse query into a relational algebra expression tree. 2. Optimize relational algebra expression tree and select implementations for the relational algebra operators. (This step involves
More informationIt also performs many parallelization operations like, data loading and query processing.
Introduction to Parallel Databases Companies need to handle huge amount of data with high data transfer rate. The client server and centralized system is not much efficient. The need to improve the efficiency
More informationToday's Class. Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Example Database. Query Plan Example
Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications Today's Class Intro to Operator Evaluation Typical Query Optimizer Projection/Aggregation: Sort vs. Hash C. Faloutsos A. Pavlo Lecture#13:
More informationEvaluation of Relational Operations: Other Techniques
Evaluation of Relational Operations: Other Techniques Chapter 14, Part B Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke 1 Using an Index for Selections Cost depends on #qualifying
More informationMapReduce. Stony Brook University CSE545, Fall 2016
MapReduce Stony Brook University CSE545, Fall 2016 Classical Data Mining CPU Memory Disk Classical Data Mining CPU Memory (64 GB) Disk Classical Data Mining CPU Memory (64 GB) Disk Classical Data Mining
More informationCost Models. the query database statistics description of computational resources, e.g.
Cost Models An optimizer estimates costs for plans so that it can choose the least expensive plan from a set of alternatives. Inputs to the cost model include: the query database statistics description
More informationDatabase Ph.D. Qualifying Exam Spring 2006
Database Ph.D. Qualifying Exam Spring 2006 Please answer six of the following nine questions. Question 1. Consider the following relational schema: Employee (ID, Lname, Fname, Salary, Dnumber, City) where
More informationQuery Processing. Lecture #10. Andy Pavlo Computer Science Carnegie Mellon Univ. Database Systems / Fall 2018
Query Processing Lecture #10 Database Systems 15-445/15-645 Fall 2018 AP Andy Pavlo Computer Science Carnegie Mellon Univ. 2 ADMINISTRIVIA Project #2 Checkpoint #1 is due Monday October 9 th @ 11:59pm
More informationUniversity of Waterloo Midterm Examination Sample Solution
1. (4 total marks) University of Waterloo Midterm Examination Sample Solution Winter, 2012 Suppose that a relational database contains the following large relation: Track(ReleaseID, TrackNum, Title, Length,
More informationOperator Implementation Wrap-Up Query Optimization
Operator Implementation Wrap-Up Query Optimization 1 Last time: Nested loop join algorithms: TNLJ PNLJ BNLJ INLJ Sort Merge Join Hash Join 2 General Join Conditions Equalities over several attributes (e.g.,
More informationQuery Processing. Introduction to Databases CompSci 316 Fall 2017
Query Processing Introduction to Databases CompSci 316 Fall 2017 2 Announcements (Tue., Nov. 14) Homework #3 sample solution posted in Sakai Homework #4 assigned today; due on 12/05 Project milestone #2
More informationColumn Stores vs. Row Stores How Different Are They Really?
Column Stores vs. Row Stores How Different Are They Really? Daniel J. Abadi (Yale) Samuel R. Madden (MIT) Nabil Hachem (AvantGarde) Presented By : Kanika Nagpal OUTLINE Introduction Motivation Background
More informationOptimization Overview
Lecture 17 Optimization Overview Lecture 17 Lecture 17 Today s Lecture 1. Logical Optimization 2. Physical Optimization 3. Course Summary 2 Lecture 17 Logical vs. Physical Optimization Logical optimization:
More informationAnnouncements. Optional Reading. Distributed File System (DFS) MapReduce Process. MapReduce. Database Systems CSE 414. HW5 is due tomorrow 11pm
Announcements HW5 is due tomorrow 11pm Database Systems CSE 414 Lecture 19: MapReduce (Ch. 20.2) HW6 is posted and due Nov. 27 11pm Section Thursday on setting up Spark on AWS Create your AWS account before
More informationCSIT5300: Advanced Database Systems
CSIT5300: Advanced Database Systems L10: Query Processing Other Operations, Pipelining and Materialization Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science
More informationCSE 344 APRIL 20 TH RDBMS INTERNALS
CSE 344 APRIL 20 TH RDBMS INTERNALS ADMINISTRIVIA OQ5 Out Datalog Due next Wednesday HW4 Due next Wednesday Written portion (.pdf) Coding portion (one.dl file) TODAY Back to RDBMS Query plans and DBMS
More informationQuery Execution [15]
CSC 661, Principles of Database Systems Query Execution [15] Dr. Kalpakis http://www.csee.umbc.edu/~kalpakis/courses/661 Query processing involves Query processing compilation parsing to construct parse
More informationGLADE: A Scalable Framework for Efficient Analytics. Florin Rusu University of California, Merced
GLADE: A Scalable Framework for Efficient Analytics Florin Rusu University of California, Merced Motivation and Objective Large scale data processing Map-Reduce is standard technique Targeted to distributed
More informationIntroduction to Wireless Sensor Network. Peter Scheuermann and Goce Trajcevski Dept. of EECS Northwestern University
Introduction to Wireless Sensor Network Peter Scheuermann and Goce Trajcevski Dept. of EECS Northwestern University 1 A Database Primer 2 A leap in history A brief overview/review of databases (DBMS s)
More informationDatabase Applications (15-415)
Database Applications (15-415) DBMS Internals- Part VI Lecture 14, March 12, 2014 Mohammad Hammoud Today Last Session: DBMS Internals- Part V Hash-based indexes (Cont d) and External Sorting Today s Session:
More information