Advanced Databases: Parallel Databases A.Poulovassilis
|
|
- Vivian Bell
- 5 years ago
- Views:
Transcription
1 1 Advanced Databases: Parallel Databases A.Poulovassilis 1 Parallel Database Architectures Parallel database systems use parallel processing techniques to achieve faster DBMS performance and handle larger volumes of data than is possible with single-processor systems. There are three major architectures for parallel database systems: shared-memory, shared-disk, and shared-nothing. In all three architectures, all processors have their own cache, and cache accesses are much faster than accesses to main memory: 1 Shared-memory: multiple CPUs access a common region of main memory and a common set of disks. Shared-disk: each node has its own CPU(s) and its own main memory, but all nodes access a shared set of disks. Shared-nothing: each node has its own CPU(s), own main memory and own set of disks. The two primary measures of DBMS performance are throughput the number of tasks that can be performed within a given time, and response time the time it takes to complete a single task. Processing of a large number of small tasks can be speeded up by processing many tasks in parallel. Processing of individual large tasks can be speeded up by processing sub-tasks in parallel. Speed-up refers to the performance of tasks faster due to more processors being added. Linear speed-up means that a system with n times more processors can perform the same task n times faster. Scale-up refers to the performance of larger tasks due to more processors being added. Linear scale-up means that a system with n times more processors can perform a task that is n times larger in the same time. [ Aside 1: Adding more computing resources in a shared-memory or shared-disk architecture is called scaling up or vertical scaling (not to be confused with vertical data partitioning). Adding more 1 Shared-memory and shared-disk architectures are found in SMP (Symmetric Multi-Processor) machines. In shared-memory systems, each processor maintains its own cache memory and the main memory is shared between all of them. Cache coherency needs to be maintained across all the processors in the face of updates in main memory. Shared-disk architectures are found in NAS (network attached storage) architectures ( disk farms ) and SANs (storage area networks). Cache consistency needs to be maintained as updates are being made to data pages by different processors. Shared-nothing architectures are found in MPP (Massively Parallel Processing) machines that may comprise hundreds or thousands of processors.
2 2 computing resources in a shared-nothing architecture is called scaling out or horizontal scaling (again, not to be confused with horizontal data partitioning). Horizontal scaling is the basis of cloud computing, in which computing resources are provisioned as necessary in order to meet increasing demand. Aside 2: Parallel DBs vs DDBs A key difference between parallel database systems and distributed database (DDB) systems is that the processors of a parallel database system are linked by high-speed interconnections within a single computer or at a single geographical site, and queries/transactions are submitted to a single database server. The aim of a parallel database system is to utilise multiple processors in order to speed up query and transaction processing; whereas the aim of a DDB system is to manage multiple databases stored at different sites of a computer network. In DDB systems, a significant cost is therefore the comunications cost incurred when transmitting data between the DDB servers over the network, whereas communications costs are much lower within a parallel database system. ] A number of factors can affect speed-up and scale-up in parallel database architectures: start-up costs for initiating multiple parallel processes; assembly costs associated with combining the results of parallel evaluation; interference between multiple processes for shared system resources; communication costs between processes. The shared-memory architecture benefits from fast communication between processors through their shared memory and high-speed interconnection, and also ease of load-balancing. However, if there is a memory hardware fault all the processors will be affected. Also, it may suffer from the problem of interference: as more processors are added, existing processors are slowed down because of contention for memory accesses. The shared-disk architecture does not suffer from this problem, and also memory faults are isolated from other processors. Load-balancing can again be easily achieved, through the shared access to the disks. However, this architecture may suffer from contention for disk accesses with large numbers of processors (e.g. beyond 100). Also, its communication costs are higher than the shared-memory architecture since the processors cannot communicate through a shared memory, but need to pass messages to each other along the interconnection network. The shared-nothing architecture does not suffer from problems of contention for memory accesses or disk accesses, though it has higher communication costs than the other two architectures. Due to the lack of contention problems, it has the potential to achieve linear speed-up and scale-up. It also has higher availability as both memory and disk faults can be isolated from other processors. However, load balancing may be hard to achieve as the data needs to be partitioned effectively between the disks accessed by the different processors.
3 3 There are also hybrid architectures that combine in different ways these three fundamental architectures, aiming to combine the simplicity and efficiency of shared-memory architectures with the higher extensibility and availability of shared-disk and shared-nothing architectures. For example: clusters of homogeneous off-the-shelf components can support a shared-memory architecture within each cluster; and the clusters can be interconnected through a high-speed LAN into a shared-disk or shared-nothing architecture. This combines the performance benefits of sharedmemory within each cluster with the extensibility benefits of shared-disk or shared-nothing across clusters. A common way of utilising the multiple processors and disks available in a shared-nothing architecture is by horizontally partitioning relations across the disks in the system in order to allow parallel access to, and processing of, the data. Partitioning can be applied to the intermediate relations being produced during the evaluation of a query, as well to as the original stored relations; the aim is to achieve even load-balancing between processors and linear scale-up/speed-up performance characteristics. ( Recall that horizontal partitioning means that the relation is split into groups of rows. Vertical partitioning means that the relation is split into groups of columns. Data partitioning is also known as sharding. ) A query accessing a horizontally partitioned relation can be composed of multiple subqueries, one for each fragment of the relation. These subqueries can be processed in parallel in a shorter time than the original query. This is known as intra-query parallelism i.e. parallelising the execution of one query. We will look more closely at this in these notes. ( Inter-query parallelism is also present in parallel databases i.e. executing several different queries concurrently. ) A key issue with intra-query parallelism in shared-nothing architectures is what is the best way to partition the tuples of each relation across the available disks. Three major approaches are used: (i) round robin partitioning: tuples are placed on the disks in a circular fashion; (ii) hash partitioning: tuples are placed on the disks according to some hash function applied to one or more of their attributes; (iii) range partitioning: tuples are placed on the disks according to the sub-range in which the value of an attribute, or set of attributes, falls. The advantage of (i) is an even distribution of data on each disk. This is good for load balancing a scan of the entire relation. However, it is not good for exact-match queries since all disks will need to be accessed whereas in fact only one a subset of them is likely to contain the relevant tuples. The advantage of (ii) is that exact-match queries on the partitioning attribute(s) can be directed to the relevant disk. However, (ii) is not good for supporting range queries. Approach (iii) is good for range queries on the partitioning attribute(s) because only the disks that are known to overlap with the required range of values need be accessed.
4 4 However, one potential problem with (iii) is that data may not be evenly allocated across the disks this is known as data skew. One solution to this problem is to maintain a histogram for the partioning attribute(s) i.e. divide the domain into a number of ranges and keep a count of the number of rows that fall within each range as the relation is updated. This histogram can be used to determine the subrange of values allocated to each disk (rather than have fixed ranges of equal length). Shared-disk and shared-nothing architectures use similar protocols for currency control and atomic commitment of transactions as in distributed databases e.g. distributed 2PL, 2PC. Replication of data across multiple processors and disks allows the system to carry on operating if a processor or disk fails. We briefly discuss parallel query processing below. 2 Parallel implementation of the relational algebra Note: The algorithms described below generalise to all three architectures. Just the mode of communication between the processors is different: via the interconnection network with sharednothing which is what is assumed in the below description; and via shared memory or shared disk with shared-memory/shared-disk. The basic foundations of parallel query evaluation are the splitting and merging of parallel streams of data: Data streams arising from different disks or processors are merged as needed in order to provide the input for each relational operator in a query. The results being produced as the operator is evaluated in parallel at each node need to be merged at one node in order to produce the overall result of the operator. The output of the operator may then be split again in order to parallelize subsequent processing in the query. 2.1 sort Given the frequency with which sorting is required in query processing, parallel sorting algorithms have been much studied. A commonly used parallel sorting algorithm is the parallel merge sort: This first sorts each fragment of the relation individually on its local disk e.g. using the external merge sort algorithm we have already looked at. Groups of fragments are then shipped to one node per group, which merges the group of fragments into a larger sorted fragment. This process repeats until a single sorted fragment is produced at one of the nodes.
5 5 2.2 projection and selection Selection and projection operators on a relation are usually performed together in one scan. This can be considered as a reduction operator. Reducing a fragment at a single node can be done as in a centralised DBMS i.e. by a sequential scan or by utilising appropriate indexes if available. If duplicates are permitted in the overall result, or if they are not permitted but are known to be impossible (e.g. if the result contains all the key attributes of the relation), then the fragments can just be shipped to one node to be merged. If duplicates may arise in the overall result and they need to be eliminated, then a variant of the parallel merge sort algorithm described in Section 2.1 can be used to sort the result while at the same time eliminating all duplicates encountered. 2.3 join (i) R S using Parallel Nested Loops or Index Nested Loops Join First the outer relation has to be chosen. In particular, if S has an appropriate index on the join attribute(s) then R should be the outer relation. All the fragments of R are then shipped to all nodes. So each node i now has a whole copy of R as well as its own fragment of S, S i. The local joins R S i are performed in parallel on all the nodes, and the results are finally shipped to a chosen node for merging. (ii) R S using Parallel Sort-Merge Join (for natural/equijoins only) The first phase of this involves sorting R and S on the join attribute(s). performed using the parallel merge sort operation described in Section 2.1. These sorts can be The sorted relations are then partitioned across the nodes using range partitioning with the same subranges on the join attribute(s) for both relations. The local joins of each pair of sorted fragments R i S i are performed in parallel, and the results are finally shipped to a chosen node for merging. (iii) R S using Parallel Hash Join (for natural/equijoins only) Each bucket of R and S is logically assigned to one node. The first hashing phase, using the first hash function h 1, is undertaken in parallel on the all nodes. Each tuple t from R or S is shipped to node i if the bucket assigned to it by h 1 is the i th bucket. The next phase is also undertaken in parallel on all nodes. On each node i, a hash table is created from the local fragment of R, R i, using another hash function h 2. The local fragment of S, S i, is then scanned and h 2 is used to probe the hash table for matching records of R i for each record of S i. The results produced at each node are shipped to a chosen node for merging.
6 6 2.4 Parallel Query Optimisation The above parallel versions of the relational algebra operators have different costs compared to their sequential counterparts, and it is these costs that need to be considered during the generation and selection of parallel query plans: The costs of partitioning and merging the data now need to be taken into account. If the data distribution is skewed, this will have an impact on the overall time taken to complete the evaluation of an operator, so that too needs to be taken into account. There is now the possibility of pipelining the results being produced by one operator into the evaluation of another operator that is executing at the same time on a different node. For example, consider this left-deep join tree where all the joins are nested loops joins: ((R 1 R 2 ) R 3 ) R 4 The tuples being produced by R 1 R 2 can be used to probe R 3 and the resulting tuples can be used to probe R 4, thus setting up a pipeline of three concurrently executing join operators on three different nodes. There is now the possibility of executing different operators of the query concurrently on different nodes. For example, with this bushy join tree: ((R 1 R 2 ) (R3 R 4 )) the join of R 1 with R 2 and the join of R 3 with R 4 can be executed concurrently. The partial results of these joins can also be pipelined into their parent join operator. In uniprocessor systems only left-deep join orders are usually considered (and this still gives n! possible join orders for a join of n relations). In multi-processor systems, other join orders can result in more parallelisation e.g. bushy trees, as illustrated above. Even if such a plan is more costly in terms of the number of I/O operations performed, it may execute more quickly than a left-deep plan due to the increased parallelism it affords. So, in general, the number of candidate query plans is much greater in parallel database systems and more heuristics need to be employed to limit the search space of query plans. For example, one possible approach is for the query optimiser to first find the best plan for sequential evaluation of the query; and then find the fastest parallel execution of that plan. Reading For interest: Parallel database systems: the future of high performance database systems (up to the top of page 94), David DeWitt and Jim Gray, Comm. ACM, 35(6), 1992, pp At
! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large
Chapter 20: Parallel Databases Introduction! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems!
More informationChapter 20: Parallel Databases
Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!
More informationChapter 20: Parallel Databases. Introduction
Chapter 20: Parallel Databases! Introduction! I/O Parallelism! Interquery Parallelism! Intraquery Parallelism! Intraoperation Parallelism! Interoperation Parallelism! Design of Parallel Systems 20.1 Introduction!
More informationChapter 18: Parallel Databases
Chapter 18: Parallel Databases Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery
More informationChapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction
Chapter 18: Parallel Databases Chapter 18: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of
More informationChapter 17: Parallel Databases
Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism Intraquery Parallelism Intraoperation Parallelism Interoperation Parallelism Design of Parallel Systems Database Systems
More informationParallel DBMS. Parallel Database Systems. PDBS vs Distributed DBS. Types of Parallelism. Goals and Metrics Speedup. Types of Parallelism
Parallel DBMS Parallel Database Systems CS5225 Parallel DB 1 Uniprocessor technology has reached its limit Difficult to build machines powerful enough to meet the CPU and I/O demands of DBMS serving large
More informationParallel Databases C H A P T E R18. Practice Exercises
C H A P T E R18 Parallel Databases Practice Exercises 181 In a range selection on a range-partitioned attribute, it is possible that only one disk may need to be accessed Describe the benefits and drawbacks
More informationSystems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15
Systems Infrastructure for Data Science Web Science Group Uni Freiburg WS 2014/15 Lecture X: Parallel Databases Topics Motivation and Goals Architectures Data placement Query processing Load balancing
More informationParallel Query Optimisation
Parallel Query Optimisation Contents Objectives of parallel query optimisation Parallel query optimisation Two-Phase optimisation One-Phase optimisation Inter-operator parallelism oriented optimisation
More informationChapter 18: Parallel Databases
Chapter 18: Parallel Databases Introduction Parallel machines are becoming quite common and affordable Prices of microprocessors, memory and disks have dropped sharply Recent desktop computers feature
More informationSomething to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:
Query Evaluation Techniques for large DB Part 1 Fact: While data base management systems are standard tools in business data processing they are slowly being introduced to all the other emerging data base
More informationArchitecture and Implementation of Database Systems (Winter 2014/15)
Jens Teubner Architecture & Implementation of DBMS Winter 2014/15 1 Architecture and Implementation of Database Systems (Winter 2014/15) Jens Teubner, DBIS Group jens.teubner@cs.tu-dortmund.de Winter 2014/15
More informationB.H.GARDI COLLEGE OF ENGINEERING & TECHNOLOGY (MCA Dept.) Parallel Database Database Management System - 2
Introduction :- Today single CPU based architecture is not capable enough for the modern database that are required to handle more demanding and complex requirements of the users, for example, high performance,
More informationDistributed DBMS. Concepts. Concepts. Distributed DBMS. Concepts. Concepts 9/8/2014
Distributed DBMS Advantages and disadvantages of distributed databases. Functions of DDBMS. Distributed database design. Distributed Database A logically interrelated collection of shared data (and a description
More informationPARALLEL & DISTRIBUTED DATABASES CS561-SPRING 2012 WPI, MOHAMED ELTABAKH
PARALLEL & DISTRIBUTED DATABASES CS561-SPRING 2012 WPI, MOHAMED ELTABAKH 1 INTRODUCTION In centralized database: Data is located in one place (one server) All DBMS functionalities are done by that server
More informationCourse Content. Parallel & Distributed Databases. Objectives of Lecture 12 Parallel and Distributed Databases
Database Management Systems Winter 2003 CMPUT 391: Dr. Osmar R. Zaïane University of Alberta Chapter 22 of Textbook Course Content Introduction Database Design Theory Query Processing and Optimisation
More informationParallel DBMS. Prof. Yanlei Diao. University of Massachusetts Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke
Parallel DBMS Prof. Yanlei Diao University of Massachusetts Amherst Slides Courtesy of R. Ramakrishnan and J. Gehrke I. Parallel Databases 101 Rise of parallel databases: late 80 s Architecture: shared-nothing
More informationIt also performs many parallelization operations like, data loading and query processing.
Introduction to Parallel Databases Companies need to handle huge amount of data with high data transfer rate. The client server and centralized system is not much efficient. The need to improve the efficiency
More informationCSE 544, Winter 2009, Final Examination 11 March 2009
CSE 544, Winter 2009, Final Examination 11 March 2009 Rules: Open books and open notes. No laptops or other mobile devices. Calculators allowed. Please write clearly. Relax! You are here to learn. Question
More informationChapter 18: Parallel Databases Chapter 19: Distributed Databases ETC.
Chapter 18: Parallel Databases Chapter 19: Distributed Databases ETC. Introduction Parallel machines are becoming quite common and affordable Prices of microprocessors, memory and disks have dropped sharply
More informationAdvanced Databases. Lecture 15- Parallel Databases (continued) Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
Advanced Databases Lecture 15- Parallel Databases (continued) Masood Niazi Torshiz Islamic Azad University- Mashhad Branch www.mniazi.ir Parallel Join The join operation requires pairs of tuples to be
More informationCompSci 516: Database Systems. Lecture 20. Parallel DBMS. Instructor: Sudeepa Roy
CompSci 516 Database Systems Lecture 20 Parallel DBMS Instructor: Sudeepa Roy Duke CS, Fall 2017 CompSci 516: Database Systems 1 Announcements HW3 due on Monday, Nov 20, 11:55 pm (in 2 weeks) See some
More informationChapter 18 Strategies for Query Processing. We focus this discussion w.r.t RDBMS, however, they are applicable to OODBS.
Chapter 18 Strategies for Query Processing We focus this discussion w.r.t RDBMS, however, they are applicable to OODBS. 1 1. Translating SQL Queries into Relational Algebra and Other Operators - SQL is
More informationUniversity of Waterloo Midterm Examination Sample Solution
1. (4 total marks) University of Waterloo Midterm Examination Sample Solution Winter, 2012 Suppose that a relational database contains the following large relation: Track(ReleaseID, TrackNum, Title, Length,
More informationUsing A Network of workstations to enhance Database Query Processing Performance
Using A Network of workstations to enhance Database Query Processing Performance Mohammed Al Haddad, Jerome Robinson Department of Computer Science, University of Essex, Wivenhoe Park, Colchester, CO4
More informationDatabase Architectures
Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 11/15/12 Agenda Check-in Centralized and Client-Server Models Parallelism Distributed Databases Homework 6 Check-in
More informationOutline. Parallel Database Systems. Information explosion. Parallelism in DBMSs. Relational DBMS parallelism. Relational DBMSs.
Parallel Database Systems STAVROS HARIZOPOULOS stavros@cs.cmu.edu Outline Background Hardware architectures and performance metrics Parallel database techniques Gamma Bonus: NCR / Teradata Conclusions
More informationChapter 12: Query Processing. Chapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Sorting Join
More informationChapter 13: Query Processing
Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing
More informationAdvances in Data Management Query Processing and Query Optimisation A.Poulovassilis
1 Advances in Data Management Query Processing and Query Optimisation A.Poulovassilis 1 General approach to the implementation of Query Processing and Query Optimisation functionalities in DBMSs 1. Parse
More informationHuge market -- essentially all high performance databases work this way
11/5/2017 Lecture 16 -- Parallel & Distributed Databases Parallel/distributed databases: goal provide exactly the same API (SQL) and abstractions (relational tables), but partition data across a bunch
More informationParallel DBMS. Chapter 22, Part A
Parallel DBMS Chapter 22, Part A Slides by Joe Hellerstein, UCB, with some material from Jim Gray, Microsoft Research. See also: http://www.research.microsoft.com/research/barc/gray/pdb95.ppt Database
More information! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for
Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and
More informationChapter 13: Query Processing Basic Steps in Query Processing
Chapter 13: Query Processing Basic Steps in Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 1. Parsing and
More informationCOURSE 12. Parallel DBMS
COURSE 12 Parallel DBMS 1 Parallel DBMS Most DB research focused on specialized hardware CCD Memory: Non-volatile memory like, but slower than flash memory Bubble Memory: Non-volatile memory like, but
More informationDatabase Architectures
Database Architectures CPS352: Database Systems Simon Miner Gordon College Last Revised: 4/15/15 Agenda Check-in Parallelism and Distributed Databases Technology Research Project Introduction to NoSQL
More informationData Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich
Data Modeling and Databases Ch 10: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application
More informationCopyright 2016 Ramez Elmasri and Shamkant B. Navathe
CHAPTER 19 Query Optimization Introduction Query optimization Conducted by a query optimizer in a DBMS Goal: select best available strategy for executing query Based on information available Most RDBMSs
More informationData Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich
Data Modeling and Databases Ch 9: Query Processing - Algorithms Gustavo Alonso Systems Group Department of Computer Science ETH Zürich Transactions (Locking, Logging) Metadata Mgmt (Schema, Stats) Application
More informationChapter 12: Query Processing
Chapter 12: Query Processing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Overview Chapter 12: Query Processing Measures of Query Cost Selection Operation Sorting Join
More informationDatabase System Concepts
Chapter 13: Query Processing s Departamento de Engenharia Informática Instituto Superior Técnico 1 st Semester 2008/2009 Slides (fortemente) baseados nos slides oficiais do livro c Silberschatz, Korth
More informationCMU SCS CMU SCS Who: What: When: Where: Why: CMU SCS
Carnegie Mellon Univ. Dept. of Computer Science 15-415/615 - DB s C. Faloutsos A. Pavlo Lecture#23: Distributed Database Systems (R&G ch. 22) Administrivia Final Exam Who: You What: R&G Chapters 15-22
More informationAdvanced Database Systems
Lecture IV Query Processing Kyumars Sheykh Esmaili Basic Steps in Query Processing 2 Query Optimization Many equivalent execution plans Choosing the best one Based on Heuristics, Cost Will be discussed
More informationQuery Optimization. Query Optimization. Optimization considerations. Example. Interaction of algorithm choice and tree arrangement.
COS 597: Principles of Database and Information Systems Query Optimization Query Optimization Query as expression over relational algebraic operations Get evaluation (parse) tree Leaves: base relations
More informationMultiprocessors and Thread-Level Parallelism. Department of Electrical & Electronics Engineering, Amrita School of Engineering
Multiprocessors and Thread-Level Parallelism Multithreading Increasing performance by ILP has the great advantage that it is reasonable transparent to the programmer, ILP can be quite limited or hard to
More informationQuery Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016
Query Processing Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016 Slides re-used with some modification from www.db-book.com Reference: Database System Concepts, 6 th Ed. By Silberschatz,
More informationHash-Based Indexing 165
Hash-Based Indexing 165 h 1 h 0 h 1 h 0 Next = 0 000 00 64 32 8 16 000 00 64 32 8 16 A 001 01 9 25 41 73 001 01 9 25 41 73 B 010 10 10 18 34 66 010 10 10 18 34 66 C Next = 3 011 11 11 19 D 011 11 11 19
More informationTradeoffs in Processing Multi-Way Join Queries via Hashing in Multiprocessor Database Machines
Tradeoffs in Processing Multi-Way Queries via Hashing in Multiprocessor Database Machines Donovan A. Schneider David J. DeWitt Computer Sciences Department University of Wisconsin This research was partially
More informationParallel DBMS. Lecture 20. Reading Material. Instructor: Sudeepa Roy. Reading Material. Parallel vs. Distributed DBMS. Parallel DBMS 11/15/18
Reading aterial CompSci 516 atabase Systems Lecture 20 Parallel BS Instructor: Sudeepa Roy [RG] Parallel BS: Chapter 22.1-22.5 [GUW] Parallel BS and map-reduce: Chapter 20.1-20.2 Acknowledgement: The following
More informationQuery Processing & Optimization
Query Processing & Optimization 1 Roadmap of This Lecture Overview of query processing Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Introduction
More informationHigh-Performance Parallel Database Processing and Grid Databases
High-Performance Parallel Database Processing and Grid Databases David Taniar Monash University, Australia Clement H.C. Leung Hong Kong Baptist University and Victoria University, Australia Wenny Rahayu
More informationChapter 18: Database System Architectures.! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems!
Chapter 18: Database System Architectures! Centralized Systems! Client--Server Systems! Parallel Systems! Distributed Systems! Network Types 18.1 Centralized Systems! Run on a single computer system and
More informationEvaluation of Relational Operations
Evaluation of Relational Operations Chapter 14 Comp 521 Files and Databases Fall 2010 1 Relational Operations We will consider in more detail how to implement: Selection ( ) Selects a subset of rows from
More informationChapter 12: Query Processing
Chapter 12: Query Processing Overview Catalog Information for Cost Estimation $ Measures of Query Cost Selection Operation Sorting Join Operation Other Operations Evaluation of Expressions Transformation
More informationdata parallelism Chris Olston Yahoo! Research
data parallelism Chris Olston Yahoo! Research set-oriented computation data management operations tend to be set-oriented, e.g.: apply f() to each member of a set compute intersection of two sets easy
More informationDISTRIBUTED DATABASES CS561-SPRING 2012 WPI, MOHAMED ELTABAKH
DISTRIBUTED DATABASES CS561-SPRING 2012 WPI, MOHAMED ELTABAKH 1 RECAP: PARALLEL DATABASES Three possible architectures Shared-memory Shared-disk Shared-nothing (the most common one) Parallel algorithms
More informationChapter 20: Database System Architectures
Chapter 20: Database System Architectures Chapter 20: Database System Architectures Centralized and Client-Server Systems Server System Architectures Parallel Systems Distributed Systems Network Types
More informationCSE 544 Principles of Database Management Systems. Magdalena Balazinska Fall 2007 Lecture 9 - Query optimization
CSE 544 Principles of Database Management Systems Magdalena Balazinska Fall 2007 Lecture 9 - Query optimization References Access path selection in a relational database management system. Selinger. et.
More informationGreenplum Architecture Class Outline
Greenplum Architecture Class Outline Introduction to the Greenplum Architecture What is Parallel Processing? The Basics of a Single Computer Data in Memory is Fast as Lightning Parallel Processing Of Data
More informationCrescando: Predictable Performance for Unpredictable Workloads
Crescando: Predictable Performance for Unpredictable Workloads G. Alonso, D. Fauser, G. Giannikis, D. Kossmann, J. Meyer, P. Unterbrunner Amadeus S.A. ETH Zurich, Systems Group (Funded by Enterprise Computing
More informationModule 10: Parallel Query Processing
Buffer Disk Space Buffer Disk Space Buffer Disk Space Buffer Disk Space Buffer Disk Space Buffer Disk Space Buffer Disk Space Buffer Disk Space Buffer Disk Space Buffer Disk Space Buffer Disk Space Buffer
More informationDatabase Applications (15-415)
Database Applications (15-415) DBMS Internals- Part VI Lecture 14, March 12, 2014 Mohammad Hammoud Today Last Session: DBMS Internals- Part V Hash-based indexes (Cont d) and External Sorting Today s Session:
More informationOutline. Database Management and Tuning. Outline. Join Strategies Running Example. Index Tuning. Johann Gamper. Unit 6 April 12, 2012
Outline Database Management and Tuning Johann Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Unit 6 April 12, 2012 1 Acknowledgements: The slides are provided by Nikolaus Augsten
More informationImplementation of Relational Operations
Implementation of Relational Operations Module 4, Lecture 1 Database Management Systems, R. Ramakrishnan 1 Relational Operations We will consider how to implement: Selection ( ) Selects a subset of rows
More informationNotes. Some of these slides are based on a slide set provided by Ulf Leser. CS 640 Query Processing Winter / 30. Notes
uery Processing Olaf Hartig David R. Cheriton School of Computer Science University of Waterloo CS 640 Principles of Database Management and Use Winter 2013 Some of these slides are based on a slide set
More informationParallel DBMS. Lecture 20. Reading Material. Instructor: Sudeepa Roy. Reading Material. Parallel vs. Distributed DBMS. Parallel DBMS 11/7/17
Reading aterial CompSci 516 atabase Systems Lecture 20 Parallel BS Instructor: Sudeepa Roy [RG] Parallel BS: Chapter 22.1-22.5 [GUW] Parallel BS and map-reduce: Chapter 20.1-20.2 Acknowledgement: The following
More informationUniversity of Waterloo Midterm Examination Solution
University of Waterloo Midterm Examination Solution Winter, 2011 1. (6 total marks) The diagram below shows an extensible hash table with four hash buckets. Each number x in the buckets represents an entry
More informationFinal Exam Review. Kathleen Durant PhD CS 3200 Northeastern University
Final Exam Review Kathleen Durant PhD CS 3200 Northeastern University 1 Outline for today Identify topics for the final exam Discuss format of the final exam What will be provided for you and what you
More information7. Query Processing and Optimization
7. Query Processing and Optimization Processing a Query 103 Indexing for Performance Simple (individual) index B + -tree index Matching index scan vs nonmatching index scan Unique index one entry and one
More informationTime Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix
Time Complexity and Parallel Speedup to Compute the Gamma Summarization Matrix Carlos Ordonez, Yiqun Zhang Department of Computer Science, University of Houston, USA Abstract. We study the serial and parallel
More informationCopyright 2007 Ramez Elmasri and Shamkant B. Navathe Slide 25-1
Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Slide 25-1 Chapter 25 Distributed Databases and Client-Server Architectures Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 25 Outline
More informationCSE 444: Database Internals. Lecture 11 Query Optimization (part 2)
CSE 444: Database Internals Lecture 11 Query Optimization (part 2) CSE 444 - Spring 2014 1 Reminders Homework 2 due tonight by 11pm in drop box Alternatively: turn it in in my office by 4pm, or in Priya
More informationCSIT5300: Advanced Database Systems
CSIT5300: Advanced Database Systems L10: Query Processing Other Operations, Pipelining and Materialization Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science
More informationEvaluation of Relational Operations. Relational Operations
Evaluation of Relational Operations Chapter 14, Part A (Joins) Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1 Relational Operations v We will consider how to implement: Selection ( )
More informationCMSC424: Database Design. Instructor: Amol Deshpande
CMSC424: Database Design Instructor: Amol Deshpande amol@cs.umd.edu Databases Data Models Conceptual representa1on of the data Data Retrieval How to ask ques1ons of the database How to answer those ques1ons
More informationUser Perspective. Module III: System Perspective. Module III: Topics Covered. Module III Overview of Storage Structures, QP, and TM
Module III Overview of Storage Structures, QP, and TM Sharma Chakravarthy UT Arlington sharma@cse.uta.edu http://www2.uta.edu/sharma base Management Systems: Sharma Chakravarthy Module I Requirements analysis
More informationPart 1: Indexes for Big Data
JethroData Making Interactive BI for Big Data a Reality Technical White Paper This white paper explains how JethroData can help you achieve a truly interactive interactive response time for BI on big data,
More informationIntroduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe
Introduction to Query Processing and Query Optimization Techniques Outline Translating SQL Queries into Relational Algebra Algorithms for External Sorting Algorithms for SELECT and JOIN Operations Algorithms
More informationQuery Processing. Introduction to Databases CompSci 316 Fall 2017
Query Processing Introduction to Databases CompSci 316 Fall 2017 2 Announcements (Tue., Nov. 14) Homework #3 sample solution posted in Sakai Homework #4 assigned today; due on 12/05 Project milestone #2
More informationDistributed KIDS Labs 1
Distributed Databases @ KIDS Labs 1 Distributed Database System A distributed database system consists of loosely coupled sites that share no physical component Appears to user as a single system Database
More informationModule 4. Implementation of XQuery. Part 0: Background on relational query processing
Module 4 Implementation of XQuery Part 0: Background on relational query processing The Data Management Universe Lecture Part I Lecture Part 2 2 What does a Database System do? Input: SQL statement Output:
More informationIntroduction to Spatial Database Systems
Introduction to Spatial Database Systems by Cyrus Shahabi from Ralf Hart Hartmut Guting s VLDB Journal v3, n4, October 1994 Data Structures & Algorithms 1. Implementation of spatial algebra in an integrated
More informationHAWQ: A Massively Parallel Processing SQL Engine in Hadoop
HAWQ: A Massively Parallel Processing SQL Engine in Hadoop Lei Chang, Zhanwei Wang, Tao Ma, Lirong Jian, Lili Ma, Alon Goldshuv Luke Lonergan, Jeffrey Cohen, Caleb Welton, Gavin Sherry, Milind Bhandarkar
More informationSegregating Data Within Databases for Performance Prepared by Bill Hulsizer
Segregating Data Within Databases for Performance Prepared by Bill Hulsizer When designing databases, segregating data within tables is usually important and sometimes very important. The higher the volume
More informationTHE KEY TECHNOLOGIC ISSUES OF PARALLEL SPATIAL DATABASE MAMAGEMENT SYSTEM FOR PARALLEL GIS
THE KEY TECHNOLOGIC ISSUES OF PARALLEL SPATIAL DATABASE MAMAGEMENT SYSTEM FOR PARALLEL GIS ZHAO Chun-yu a, ZHAO Yuan-chun b, MENG Ling-kui a, DENG Shi-jun a a The School of Remote Sensing and Information
More information6.830 Lecture 8 10/2/2017. Lab 2 -- Due Weds. Project meeting sign ups. Recap column stores & paper. Join Processing:
Lab 2 -- Due Weds. Project meeting sign ups 6.830 Lecture 8 10/2/2017 Recap column stores & paper Join Processing: S = {S} tuples, S pages R = {R} tuples, R pages R < S M pages of memory Types of joins
More informationQuestion 1. Part (a) [2 marks] what are different types of data independence? explain each in one line. Part (b) CSC 443 Term Test Solutions Fall 2017
Question 1. [12 marks] Short Answer Question Part (a) [2 marks] what are different types of data independence? explain each in one line Logical data independence: Protection from changes in logical structure
More informationFaloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline
Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications Lecture #14: Implementation of Relational Operations (R&G ch. 12 and 14) 15-415 Faloutsos 1 introduction selection projection
More informationCSE 344 MAY 7 TH EXAM REVIEW
CSE 344 MAY 7 TH EXAM REVIEW EXAMINATION STATIONS Exam Wednesday 9:30-10:20 One sheet of notes, front and back Practice solutions out after class Good luck! EXAM LENGTH Production v. Verification Practice
More informationLecture 23 Database System Architectures
CMSC 461, Database Management Systems Spring 2018 Lecture 23 Database System Architectures These slides are based on Database System Concepts 6 th edition book (whereas some quotes and figures are used
More informationDatabase Management Systems
Database Management Systems Distributed Databases Doug Shook What does it mean to be distributed? Multiple nodes connected by a network Data on the nodes is logically related The nodes do not need to be
More informationJoin (SQL) - Wikipedia, the free encyclopedia
페이지 1 / 7 Sample tables All subsequent explanations on join types in this article make use of the following two tables. The rows in these tables serve to illustrate the effect of different types of joins
More informationCSIT5300: Advanced Database Systems
CSIT5300: Advanced Database Systems E10: Exercises on Query Processing Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong SAR,
More informationRAQUEL s Relational Operators
Contents RAQUEL s Relational Operators Introduction 2 General Principles 2 Operator Parameters 3 Ordinary & High-Level Operators 3 Operator Valency 4 Default Tuples 5 The Relational Algebra Operators in
More informationPhysical Database Design and Tuning. Chapter 20
Physical Database Design and Tuning Chapter 20 Introduction We will be talking at length about database design Conceptual Schema: info to capture, tables, columns, views, etc. Physical Schema: indexes,
More informationR & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:
Relational Query Optimization R & G Chapter 13 Review Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops: simple, exploits extra memory
More informationσ (R.B = 1 v R.C > 3) (S.D = 2) Conjunctive normal form Topics for the Day Distributed Databases Query Processing Steps Decomposition
Topics for the Day Distributed Databases Query processing in distributed databases Localization Distributed query operators Cost-based optimization C37 Lecture 1 May 30, 2001 1 2 Query Processing teps
More informationCS698F Advanced Data Management. Instructor: Medha Atre. Aug 11, 2017 CS698F Adv Data Mgmt 1
CS698F Advanced Data Management Instructor: Medha Atre Aug 11, 2017 CS698F Adv Data Mgmt 1 Recap Query optimization components. Relational algebra rules. How to rewrite queries with relational algebra
More informationSQL, NoSQL, MongoDB. CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden
SQL, NoSQL, MongoDB CSE-291 (Cloud Computing) Fall 2016 Gregory Kesden SQL Databases Really better called Relational Databases Key construct is the Relation, a.k.a. the table Rows represent records Columns
More information