Advanced Databases: Parallel Databases A.Poulovassilis

Similar documents
! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large

Chapter 20: Parallel Databases

Chapter 20: Parallel Databases. Introduction

Chapter 18: Parallel Databases

Chapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction

Chapter 17: Parallel Databases

Parallel DBMS. Parallel Database Systems. PDBS vs Distributed DBS. Types of Parallelism. Goals and Metrics Speedup. Types of Parallelism

Parallel Databases C H A P T E R18. Practice Exercises

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Parallel Query Optimisation

Chapter 18: Parallel Databases

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

Architecture and Implementation of Database Systems (Winter 2014/15)

B.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2

Distributed DBMS. Concepts. Concepts. Distributed DBMS. Concepts. Concepts 9/8/2014

PARALLEL & DISTRIBUTED DATABASES CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

Course Content. Parallel & Distributed Databases. Objectives of Lecture 12 Parallel and Distributed Databases

Parallel DBMS. Prof. Yanlei Diao. University of Massachusetts Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

It also performs many parallelization operations like, data loading and query processing.

CSE 544, Winter 2009, Final Examination 11 March 2009

Chapter 18: Parallel Databases Chapter 19: Distributed Databases ETC.

Advanced Databases. Lecture 15- Parallel Databases (continued) Masood Niazi Torshiz Islamic Azad University- Mashhad Branch

CompSci 516: Database Systems. Lecture 20. Parallel DBMS. Instructor: Sudeepa Roy

Chapter 18 Strategies for Query Processing. We focus this discussion w.r.t RDBMS, however, they are applicable to OODBS.

University of Waterloo Midterm Examination Sample Solution

Using A Network of workstations to enhance Database Query Processing Performance

Database Architectures

Outline. Parallel Database Systems. Information explosion. Parallelism in DBMSs. Relational DBMS parallelism. Relational DBMSs.

Chapter 12: Query Processing. Chapter 12: Query Processing

Chapter 13: Query Processing

Advances in Data Management Query Processing and Query Optimisation A.Poulovassilis

Huge market -- essentially all high performance databases work this way

Parallel DBMS. Chapter 22, Part A

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

Chapter 13: Query Processing Basic Steps in Query Processing

COURSE 12. Parallel DBMS

Database Architectures

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Chapter 12: Query Processing

Database System Concepts

CMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS

Advanced Database Systems

Query Optimization. Query Optimization. Optimization considerations. Example. Interaction of algorithm choice and tree arrangement.

Multiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Hash-Based Indexing 165

Tradeoffs in Processing Multi-Way Join Queries via Hashing in Multiprocessor Database Machines

Parallel DBMS. Lecture 20. Reading Material. Instructor: Sudeepa Roy. Reading Material. Parallel vs. Distributed DBMS. Parallel DBMS 11/15/18

Query Processing & Optimization

High-Performance Parallel Database Processing and Grid Databases

Chapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems!

Evaluation of Relational Operations

Chapter 12: Query Processing

data parallelism Chris Olston Yahoo! Research

DISTRIBUTED DATABASES CS561-SPRING 2012 WPI, MOHAMED ELTABAKH

Chapter 20: Database System Architectures

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 9 - Query optimization

Greenplum Architecture Class Outline

Crescando: Predictable Performance for Unpredictable Workloads

Module 10: Parallel Query Processing

Database Applications (15-415)

Outline. Database Management and Tuning. Outline. Join Strategies Running Example. Index Tuning. Johann Gamper. Unit 6 April 12, 2012

Implementation of Relational Operations

Notes. Some of these slides are based on a slide set provided by Ulf Leser. CS 640 Query Processing Winter / 30. Notes

Parallel DBMS. Lecture 20. Reading Material. Instructor: Sudeepa Roy. Reading Material. Parallel vs. Distributed DBMS. Parallel DBMS 11/7/17

University of Waterloo Midterm Examination Solution

Final Exam Review. Kathleen Durant PhD CS 3200 Northeastern University

7. Query Processing and Optimization

Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Slide 25-1

CSE 444: Database Internals. Lecture 11 Query Optimization (part 2)

CSIT5300: Advanced Database Systems

Evaluation of Relational Operations. Relational Operations

CMSC424: Database Design. Instructor: Amol Deshpande

User Perspective. Module III: System Perspective. Module III: Topics Covered. Module III Overview of Storage Structures, QP, and TM

Part 1: Indexes for Big Data

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Query Processing. Introduction to Databases CompSci 316 Fall 2017

Distributed KIDS Labs 1

Module 4. Implementation of XQuery. Part 0: Background on relational query processing

Introduction to Spatial Database Systems

HAWQ: A Massively Parallel Processing SQL Engine in Hadoop

Segregating Data Within Databases for Performance Prepared by Bill Hulsizer

THE KEY TECHNOLOGIC ISSUES OF PARALLEL SPATIAL DATABASE MAMAGEMENT SYSTEM FOR PARALLEL GIS

6.830 Lecture 8 10/2/2017. Lab 2 -- Due Weds. Project meeting sign ups. Recap column stores & paper. Join Processing:

Question 1. Part (a) [2 marks] what are different types of data independence? explain each in one line. Part (b) CSC 443 Term Test Solutions Fall 2017

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

CSE 344 MAY 7 TH EXAM REVIEW

Lecture 23 Database System Architectures

Database Management Systems

Join (SQL) - Wikipedia, the free encyclopedia

CSIT5300: Advanced Database Systems

RAQUEL s Relational Operators

Physical Database Design and Tuning. Chapter 20

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:

σ (R.B = 1 v R.C > 3) (S.D = 2) Conjunctive normal form Topics for the Day Distributed Databases Query Processing Steps Decomposition

CS698F Advanced Data Management. Instructor: Medha Atre. Aug 11, 2017 CS698F Adv Data Mgmt 1

SQL, NoSQL, MongoDB. CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden

Transcription:

1 Advanced Databases: Parallel Databases A.Poulovassilis 1 Parallel Database Architectures Parallel database systems use parallel processing techniques to achieve faster DBMS performance and handle larger volumes of data than is possible with single-processor systems. There are three major architectures for parallel database systems: shared-memory, shared-disk, and shared-nothing. In all three architectures, all processors have their own cache, and cache accesses are much faster than accesses to main memory: 1 Shared-memory: multiple CPUs access a common region of main memory and a common set of disks. Shared-disk: each node has its own CPU(s) and its own main memory, but all nodes access a shared set of disks. Shared-nothing: each node has its own CPU(s), own main memory and own set of disks. The two primary measures of DBMS performance are throughput the number of tasks that can be performed within a given time, and response time the time it takes to complete a single task. Processing of a large number of small tasks can be speeded up by processing many tasks in parallel. Processing of individual large tasks can be speeded up by processing sub-tasks in parallel. Speed-up refers to the performance of tasks faster due to more processors being added. Linear speed-up means that a system with n times more processors can perform the same task n times faster. Scale-up refers to the performance of larger tasks due to more processors being added. Linear scale-up means that a system with n times more processors can perform a task that is n times larger in the same time. [ Aside 1: Adding more computing resources in a shared-memory or shared-disk architecture is called scaling up or vertical scaling (not to be confused with vertical data partitioning). Adding more 1 Shared-memory and shared-disk architectures are found in SMP (Symmetric Multi-Processor) machines. In shared-memory systems, each processor maintains its own cache memory and the main memory is shared between all of them. Cache coherency needs to be maintained across all the processors in the face of updates in main memory. Shared-disk architectures are found in NAS (network attached storage) architectures ( disk farms ) and SANs (storage area networks). Cache consistency needs to be maintained as updates are being made to data pages by different processors. Shared-nothing architectures are found in MPP (Massively Parallel Processing) machines that may comprise hundreds or thousands of processors.

2 computing resources in a shared-nothing architecture is called scaling out or horizontal scaling (again, not to be confused with horizontal data partitioning). Horizontal scaling is the basis of cloud computing, in which computing resources are provisioned as necessary in order to meet increasing demand. Aside 2: Parallel DBs vs DDBs A key difference between parallel database systems and distributed database (DDB) systems is that the processors of a parallel database system are linked by high-speed interconnections within a single computer or at a single geographical site, and queries/transactions are submitted to a single database server. The aim of a parallel database system is to utilise multiple processors in order to speed up query and transaction processing; whereas the aim of a DDB system is to manage multiple databases stored at different sites of a computer network. In DDB systems, a significant cost is therefore the comunications cost incurred when transmitting data between the DDB servers over the network, whereas communications costs are much lower within a parallel database system. ] A number of factors can affect speed-up and scale-up in parallel database architectures: start-up costs for initiating multiple parallel processes; assembly costs associated with combining the results of parallel evaluation; interference between multiple processes for shared system resources; communication costs between processes. The shared-memory architecture benefits from fast communication between processors through their shared memory and high-speed interconnection, and also ease of load-balancing. However, if there is a memory hardware fault all the processors will be affected. Also, it may suffer from the problem of interference: as more processors are added, existing processors are slowed down because of contention for memory accesses. The shared-disk architecture does not suffer from this problem, and also memory faults are isolated from other processors. Load-balancing can again be easily achieved, through the shared access to the disks. However, this architecture may suffer from contention for disk accesses with large numbers of processors (e.g. beyond 100). Also, its communication costs are higher than the shared-memory architecture since the processors cannot communicate through a shared memory, but need to pass messages to each other along the interconnection network. The shared-nothing architecture does not suffer from problems of contention for memory accesses or disk accesses, though it has higher communication costs than the other two architectures. Due to the lack of contention problems, it has the potential to achieve linear speed-up and scale-up. It also has higher availability as both memory and disk faults can be isolated from other processors. However, load balancing may be hard to achieve as the data needs to be partitioned effectively between the disks accessed by the different processors.

3 There are also hybrid architectures that combine in different ways these three fundamental architectures, aiming to combine the simplicity and efficiency of shared-memory architectures with the higher extensibility and availability of shared-disk and shared-nothing architectures. For example: clusters of homogeneous off-the-shelf components can support a shared-memory architecture within each cluster; and the clusters can be interconnected through a high-speed LAN into a shared-disk or shared-nothing architecture. This combines the performance benefits of sharedmemory within each cluster with the extensibility benefits of shared-disk or shared-nothing across clusters. A common way of utilising the multiple processors and disks available in a shared-nothing architecture is by horizontally partitioning relations across the disks in the system in order to allow parallel access to, and processing of, the data. Partitioning can be applied to the intermediate relations being produced during the evaluation of a query, as well to as the original stored relations; the aim is to achieve even load-balancing between processors and linear scale-up/speed-up performance characteristics. ( Recall that horizontal partitioning means that the relation is split into groups of rows. Vertical partitioning means that the relation is split into groups of columns. Data partitioning is also known as sharding. ) A query accessing a horizontally partitioned relation can be composed of multiple subqueries, one for each fragment of the relation. These subqueries can be processed in parallel in a shorter time than the original query. This is known as intra-query parallelism i.e. parallelising the execution of one query. We will look more closely at this in these notes. ( Inter-query parallelism is also present in parallel databases i.e. executing several different queries concurrently. ) A key issue with intra-query parallelism in shared-nothing architectures is what is the best way to partition the tuples of each relation across the available disks. Three major approaches are used: (i) round robin partitioning: tuples are placed on the disks in a circular fashion; (ii) hash partitioning: tuples are placed on the disks according to some hash function applied to one or more of their attributes; (iii) range partitioning: tuples are placed on the disks according to the sub-range in which the value of an attribute, or set of attributes, falls. The advantage of (i) is an even distribution of data on each disk. This is good for load balancing a scan of the entire relation. However, it is not good for exact-match queries since all disks will need to be accessed whereas in fact only one a subset of them is likely to contain the relevant tuples. The advantage of (ii) is that exact-match queries on the partitioning attribute(s) can be directed to the relevant disk. However, (ii) is not good for supporting range queries. Approach (iii) is good for range queries on the partitioning attribute(s) because only the disks that are known to overlap with the required range of values need be accessed.

4 However, one potential problem with (iii) is that data may not be evenly allocated across the disks this is known as data skew. One solution to this problem is to maintain a histogram for the partioning attribute(s) i.e. divide the domain into a number of ranges and keep a count of the number of rows that fall within each range as the relation is updated. This histogram can be used to determine the subrange of values allocated to each disk (rather than have fixed ranges of equal length). Shared-disk and shared-nothing architectures use similar protocols for currency control and atomic commitment of transactions as in distributed databases e.g. distributed 2PL, 2PC. Replication of data across multiple processors and disks allows the system to carry on operating if a processor or disk fails. We briefly discuss parallel query processing below. 2 Parallel implementation of the relational algebra Note: The algorithms described below generalise to all three architectures. Just the mode of communication between the processors is different: via the interconnection network with sharednothing which is what is assumed in the below description; and via shared memory or shared disk with shared-memory/shared-disk. The basic foundations of parallel query evaluation are the splitting and merging of parallel streams of data: Data streams arising from different disks or processors are merged as needed in order to provide the input for each relational operator in a query. The results being produced as the operator is evaluated in parallel at each node need to be merged at one node in order to produce the overall result of the operator. The output of the operator may then be split again in order to parallelize subsequent processing in the query. 2.1 sort Given the frequency with which sorting is required in query processing, parallel sorting algorithms have been much studied. A commonly used parallel sorting algorithm is the parallel merge sort: This first sorts each fragment of the relation individually on its local disk e.g. using the external merge sort algorithm we have already looked at. Groups of fragments are then shipped to one node per group, which merges the group of fragments into a larger sorted fragment. This process repeats until a single sorted fragment is produced at one of the nodes.

5 2.2 projection and selection Selection and projection operators on a relation are usually performed together in one scan. This can be considered as a reduction operator. Reducing a fragment at a single node can be done as in a centralised DBMS i.e. by a sequential scan or by utilising appropriate indexes if available. If duplicates are permitted in the overall result, or if they are not permitted but are known to be impossible (e.g. if the result contains all the key attributes of the relation), then the fragments can just be shipped to one node to be merged. If duplicates may arise in the overall result and they need to be eliminated, then a variant of the parallel merge sort algorithm described in Section 2.1 can be used to sort the result while at the same time eliminating all duplicates encountered. 2.3 join (i) R S using Parallel Nested Loops or Index Nested Loops Join First the outer relation has to be chosen. In particular, if S has an appropriate index on the join attribute(s) then R should be the outer relation. All the fragments of R are then shipped to all nodes. So each node i now has a whole copy of R as well as its own fragment of S, S i. The local joins R S i are performed in parallel on all the nodes, and the results are finally shipped to a chosen node for merging. (ii) R S using Parallel Sort-Merge Join (for natural/equijoins only) The first phase of this involves sorting R and S on the join attribute(s). performed using the parallel merge sort operation described in Section 2.1. These sorts can be The sorted relations are then partitioned across the nodes using range partitioning with the same subranges on the join attribute(s) for both relations. The local joins of each pair of sorted fragments R i S i are performed in parallel, and the results are finally shipped to a chosen node for merging. (iii) R S using Parallel Hash Join (for natural/equijoins only) Each bucket of R and S is logically assigned to one node. The first hashing phase, using the first hash function h 1, is undertaken in parallel on the all nodes. Each tuple t from R or S is shipped to node i if the bucket assigned to it by h 1 is the i th bucket. The next phase is also undertaken in parallel on all nodes. On each node i, a hash table is created from the local fragment of R, R i, using another hash function h 2. The local fragment of S, S i, is then scanned and h 2 is used to probe the hash table for matching records of R i for each record of S i. The results produced at each node are shipped to a chosen node for merging.

6 2.4 Parallel Query Optimisation The above parallel versions of the relational algebra operators have different costs compared to their sequential counterparts, and it is these costs that need to be considered during the generation and selection of parallel query plans: The costs of partitioning and merging the data now need to be taken into account. If the data distribution is skewed, this will have an impact on the overall time taken to complete the evaluation of an operator, so that too needs to be taken into account. There is now the possibility of pipelining the results being produced by one operator into the evaluation of another operator that is executing at the same time on a different node. For example, consider this left-deep join tree where all the joins are nested loops joins: ((R 1 R 2 ) R 3 ) R 4 The tuples being produced by R 1 R 2 can be used to probe R 3 and the resulting tuples can be used to probe R 4, thus setting up a pipeline of three concurrently executing join operators on three different nodes. There is now the possibility of executing different operators of the query concurrently on different nodes. For example, with this bushy join tree: ((R 1 R 2 ) (R3 R 4 )) the join of R 1 with R 2 and the join of R 3 with R 4 can be executed concurrently. The partial results of these joins can also be pipelined into their parent join operator. In uniprocessor systems only left-deep join orders are usually considered (and this still gives n! possible join orders for a join of n relations). In multi-processor systems, other join orders can result in more parallelisation e.g. bushy trees, as illustrated above. Even if such a plan is more costly in terms of the number of I/O operations performed, it may execute more quickly than a left-deep plan due to the increased parallelism it affords. So, in general, the number of candidate query plans is much greater in parallel database systems and more heuristics need to be employed to limit the search space of query plans. For example, one possible approach is for the query optimiser to first find the best plan for sequential evaluation of the query; and then find the fastest parallel execution of that plan. Reading For interest: Parallel database systems: the future of high performance database systems (up to the top of page 94), David DeWitt and Jim Gray, Comm. ACM, 35(6), 1992, pp 85-98. At http://dl.acm.org/citation.cfm?id=129894