Outline. Database Management and Tuning. Outline. Join Strategies Running Example. Index Tuning. Johann Gamper. Unit 6 April 12, 2012

Similar documents
Outline. Database Tuning. Join Strategies Running Example. Outline. Index Tuning. Nikolaus Augsten. Unit 6 WS 2014/2015

Outline. Query Types

Database Management and Tuning

Outline. Database Management and Tuning. What is an Index? Key of an Index. Index Tuning. Johann Gamper. Unit 4

Database Management and Tuning

Outline. Database Management and Tuning. Index Tuning Examples. Exercise 1 Query for Student by Name. Concurrency Tuning. Johann Gamper.

Outline. Database Management and Tuning. Index Data Structures. Outline. Index Tuning. Johann Gamper. Unit 5

Outline. Database Management and Tuning. Isolation Guarantees (SQL Standard) Undesirable Phenomena of Concurrent Transactions. Concurrency Tuning

Outline. Database Tuning. Ideal Transaction. Concurrency Tuning Goals. Concurrency Tuning. Nikolaus Augsten. Lock Tuning. Unit 8 WS 2013/2014

Database Management and Tuning

Database Applications (15-415)

Chapter 12: Query Processing

University of Waterloo Midterm Examination Sample Solution

Database Applications (15-415)

Lassonde School of Engineering Winter 2016 Term Course No: 4411 Database Management Systems

Query Processing and Advanced Queries. Query Optimization (4)

Database Management System

RELATIONAL OPERATORS #1

Evaluation of relational operations

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

Database Applications (15-415)

7. Query Processing and Optimization

Assignment 6 Solutions

Query Processing: The Basics. External Sorting

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Database System Concepts

Evaluation of Relational Operations

Chapter 12: Query Processing. Chapter 12: Query Processing

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

Chapter 17: Parallel Databases

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

Outline. Database Tuning. What is a Transaction? 1. ACID Properties. Concurrency Tuning. Nikolaus Augsten. Introduction to Transactions

Fundamentals of Database Systems

Database Systems CSE 414

Advanced Database Systems

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

Query optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag.

Storage hierarchy. Textbook: chapters 11, 12, and 13

Implementing Relational Operators: Selection, Projection, Join. Database Management Systems, R. Ramakrishnan and J. Gehrke 1

Something to think about. Problems. Purpose. Vocabulary. Query Evaluation Techniques for large DB. Part 1. Fact:

R has a ordered clustering index file on its tuples: Read index file to get the location of the tuple with the next smallest value

Chapter 13: Query Processing

Evaluation of Relational Operations

Chapter 12: Query Processing

Implementation of Relational Operations

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

Chapter 13: Query Processing Basic Steps in Query Processing

Kathleen Durant PhD Northeastern University CS Indexes

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

CMPUT 391 Database Management Systems. Query Processing: The Basics. Textbook: Chapter 10. (first edition: Chapter 13) University of Alberta 1

Evaluation of Relational Operations. Relational Operations

Database Applications (15-415)

CSC 261/461 Database Systems Lecture 19

University of Waterloo Midterm Examination Solution

Cost-based Query Sub-System. Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Last Class.

CS 564 Final Exam Fall 2015 Answers

CS-245 Database System Principles

Advanced Data Management Technologies

Announcements. Two typical kinds of queries. Choosing Index is Not Enough. Cost Parameters. Cost of Reading Data From Disk

Algorithms for Query Processing and Optimization. 0. Introduction to Query Processing (1)

Midterm Review CS634. Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke

Physical Design. Elena Baralis, Silvia Chiusano Politecnico di Torino. Phases of database design D B M G. Database Management Systems. Pag.

Outline. Failure Types

Hash table example. B+ Tree Index by Example Recall binary trees from CSE 143! Clustered vs Unclustered. Example

Evaluation of Relational Operations: Other Techniques

Introduction to Database Systems CSE 444, Winter 2011

Query Processing. Introduction to Databases CompSci 316 Fall 2017

EECS 647: Introduction to Database Systems

Data Modeling and Databases Ch 10: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

External Sorting Implementing Relational Operators

Introduction to Database Systems CSE 414. Lecture 26: More Indexes and Operator Costs

Evaluation of Relational Operations

Data Modeling and Databases Ch 9: Query Processing - Algorithms. Gustavo Alonso Systems Group Department of Computer Science ETH Zürich

Query Evaluation Overview, cont.

Data Warehousing (Special Indexing Techniques)

What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques

15-415/615 Faloutsos 1

1.1 - Basics of Query Processing in SQL Server

Chapters 15 and 16b: Query Optimization

Overview of Query Processing. Evaluation of Relational Operations. Why Sort? Outline. Two-Way External Merge Sort. 2-Way Sort: Requires 3 Buffer Pages

Roadmap DB Sys. Design & Impl. Overview. Reference. What is Join Processing? Example #1. Join Processing

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11

Course No: 4411 Database Management Systems Fall 2008 Midterm exam

! Parallel machines are becoming quite common and affordable. ! Databases are growing increasingly large

Chapter 20: Parallel Databases

Chapter 20: Parallel Databases. Introduction

Midterm Review. March 27, 2017

Query Evaluation Overview, cont.

CS54100: Database Systems

ECS 165B: Database System Implementa6on Lecture 7

Outline. Query Processing Overview Algorithms for basic operations. Query optimization. Sorting Selection Join Projection

CMPT 354: Database System I. Lecture 7. Basics of Query Optimization

Query Processing & Optimization

Overview of Implementing Relational Operators and Query Evaluation

Database Systems External Sorting and Query Optimization. A.R. Hurson 323 CS Building

Chapter 18: Parallel Databases

Chapter 18: Parallel Databases. Chapter 18: Parallel Databases. Parallelism in Databases. Introduction

Column Stores vs. Row Stores How Different Are They Really?

Transcription:

Outline Database Management and Tuning Johann Gamper Free University of Bozen-Bolzano Faculty of Computer Science IDSE Unit 6 April 12, 2012 1 Acknowledgements: The slides are provided by Nikolaus Augsten and have been adapted from Database Tuning by Dennis Shasha and Philippe Bonnet. Johann Gamper (IDSE) Database Management and Tuning Unit 6 April 12, 2012 1 / 17 Johann Gamper (IDSE) Database Management and Tuning Unit 6 April 12, 2012 2 / 17 Outline Join Strategies Running Example 1 Relations: R and S disk block size: 4kB R: n r = 5000 records, b r = 100 disk blocks, 0.4MB S: n s = 10000 records, b s = 400 disk blocks, 1.6MB Running Example: R S R is called the outer relation S is called the inner relation Example from Silberschatz, Korth, Sudarashan. Database System Concepts. McGraw-Hill. Johann Gamper (IDSE) Database Management and Tuning Unit 6 April 12, 2012 3 / 17 Johann Gamper (IDSE) Database Management and Tuning Unit 6 April 12, 2012 4 / 17

Join Strategies Naive Nested Loop Join Strategies Block Nested Loop Naive nested loop join take each record of R (outer relation) and search through all records of S (inner relation) for matches for each record of R, S is scanned Example: Naive nested loop join worst case: buffer can hold only one block of each relation R is scanned once, S is scanned n r times in total n r b s + b r = 2, 000, 100 blocks must be read (= 8GB)! note: worst case different if S is outer relation best case: both relations fit into main memory b s + b r = 500 block reads (= 2MB) Block nested loop join compare all rows of each block of R to all records in S for each block of R, S is scanned worst case: buffer can hold only one block of each relation R is scanned once, S is scanned b r times in total b r b s + b r = 40, 100 blocks must be read (= 160MB) best case: b s + b r = 500 block reads Johann Gamper (IDSE) Database Management and Tuning Unit 6 April 12, 2012 5 / 17 Johann Gamper (IDSE) Database Management and Tuning Unit 6 April 12, 2012 6 / 17 Join Strategies Indexed Nested Loop Join Strategies Merge Join Indexed nested loop join take each row of R and look up matches in S using index runtime is O( R log S ) (vs. O( R S ) of naive nested loop) efficient if index covers join (no data access in S) efficient if R has less records than S has pages: not all pages of S must be read (e.g., foreign key join from small to large table) B + -tree index on S has 4 layers, thus max. c = 5 disk accesses per record of S in total b r + n r c = 25, 100 blocks must be read (= 100MB) Merge join (two clustered indexes) scan R and S in sorted order and merge each block of R and S is read once No index on R and/or S if no index: sort and store relation with b(2 log M 1 (b/m) + 1) + b block transfers (M: free memory blocks) if non-clustered index present: index scan possible best case: clustered indexes on R and S (M = 2 enough) b r + b s = 500 blocks must be read (2MB) worst case: no indexes, only M = 3 memory blocks sort and store R (1400 blocks) and S (7200 blocks) first: join with 9100 (36MB) block transfers in total case M = 25 memory blocks: 2500 block transfers (10MB) Johann Gamper (IDSE) Database Management and Tuning Unit 6 April 12, 2012 7 / 17 Johann Gamper (IDSE) Database Management and Tuning Unit 6 April 12, 2012 8 / 17

Join Strategies Hash Join Distinct Values and Join Selectivity Hash join (equality, no index): hash both tables into buckets using the same hash function join pairs of corresponding buckets in main memory R is called probe input, S is called build input Joining buckets in main memory: build hash index on one bucket from S (with new hash function) probe hash index with all tuples in corresponding bucket of R build bucket must fit main memory, probe bucket needs not assume that each probe bucket fits in main memory R and S are scanned to compute buckets, buckets are written to disk, then buckets are read pairwise in total 3(b r + b s ) = 1500 blocks are read/written (6MB) default in SQLServer and DB2 UDB when no index present Join selectivity: number of retrieved pairs divided by cardinality of cross product ( R S / R S ) selectivity is low if join result is small Distinct values refer to join attributes of one table Performance decreases with number of distinct join values few distinct values in both tables usually means many matching records many matching records: join result is large, join slow hash join: large buckets (build bucket does not fit main memory) index join: matching records on multiple disk pages merge join: matching records do not fit in memory at the same time Johann Gamper (IDSE) Database Management and Tuning Unit 6 April 12, 2012 9 / 17 Johann Gamper (IDSE) Database Management and Tuning Unit 6 April 12, 2012 10 / 17 Foreign Keys Indexes on Small Tables Foreign key: attribute R.A stores key of other table, S.B Foreign key constraints: R.A must be subset of S.B insert in R checks whether foreign key exists in S deletion in S checks whether there is a record with that key in R Index makes checking foreign key constraints efficient: index on R.A speeds up deletion from S index on S.B speeds up insertion into R some systems may create index on R.A and/or S.B by default Foreign key join: each record of one table matches at most one record of the other table most frequent join in practice both hash and index nested loop join work well Read query on small records: tables may fit on a single track on disk read query requires only one seek index not useful: seeks at least one index page and one table page Table with large records ( page size): each record occupies a whole page for example, 200 records occupy 200 pages index useful for point queries (read 3 pages vs. 200) Many inserts and deletions: index must be reorganized (locking!) lock conflicts near root since index is small Update of single records: without index table must be scanned scanned records are locked scan (an thus lock contention) can be avoided with index Johann Gamper (IDSE) Database Management and Tuning Unit 6 April 12, 2012 11 / 17 Johann Gamper (IDSE) Database Management and Tuning Unit 6 April 12, 2012 12 / 17

Update Queries on a Small Tables Experiment Join with Few Matching Records Throughput (updates/sec) 18 16 14 12 10 8 6 4 2 0 no index index Index avoids tables scan and thus lock contention. non-clustered index is ignored, hash join used instead SQL Server 7 on Windows 2000 Johann Gamper (IDSE) Database Management and Tuning Unit 6 April 12, 2012 13 / 17 Johann Gamper (IDSE) Database Management and Tuning Unit 6 April 12, 2012 14 / 17 Experiment Join with Many Matching Records Outline Conclusion 1 all joins slow since output size is large hash join (no index) slow because buckets are very large SQL Server 7 on Windows 2000 Johann Gamper (IDSE) Database Management and Tuning Unit 6 April 12, 2012 15 / 17 Johann Gamper (IDSE) Database Management and Tuning Unit 6 April 12, 2012 16 / 17

Summary Conclusion Index tuning composite indexes indexes and joins Johann Gamper (IDSE) Database Management and Tuning Unit 6 April 12, 2012 17 / 17