CPSC 421 Database Management Systems. Lecture 19: Physical Database Design Concurrency Control and Recovery

Similar documents
Physical Database Design and Tuning

Database Design and Tuning

Physical Database Design and Tuning. Review - Normal Forms. Review: Normal Forms. Introduction. Understanding the Workload. Creating an ISUD Chart

Physical Database Design and Tuning. Chapter 20

Lecture #16 (Physical DB Design)

Introduction to Data Management. Lecture #17 (Physical DB Design!)

Administrivia. Physical Database Design. Review: Optimization Strategies. Review: Query Optimization. Review: Database Design

Friday Nights with Databases!

CSIT5300: Advanced Database Systems

Discuss physical db design and workload What choises we have for tuning a database How to tune queries and views

Database Management System

Overview. Understanding the Workload. Physical Database Design And Database Tuning. Chapter 20

Storage and Indexing

Database Management System

Physical Design. Elena Baralis, Silvia Chiusano Politecnico di Torino. Phases of database design D B M G. Database Management Systems. Pag.

Techno India Batanagar Computer Science and Engineering. Model Questions. Subject Name: Database Management System Subject Code: CS 601

R & G Chapter 13. Implementation of single Relational Operations Choices depend on indexes, memory, stats, Joins Blocked nested loops:

Principles of Data Management. Lecture #9 (Query Processing Overview)

Overview of Storage and Indexing

Question 1 (a) 10 marks

Goals for Today. CS 133: Databases. Final Exam: Logistics. Why Use a DBMS? Brief overview of course. Course evaluations

Review of Storage and Indexing

CSE 344 Final Review. August 16 th

Administrivia. CS186 Class Wrap-Up. News. News (cont) Top Decision Support DBs. Lessons? (from the survey and this course)

Single Record and Range Search

Database Management Systems Paper Solution

Database Management

Administration Naive DBMS CMPT 454 Topics. John Edgar 2

Database Management Systems (COP 5725) Homework 3

CompSci 516: Database Systems

CSC 261/461 Database Systems Lecture 20. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

Announcements. Reading Material. Today. Different File Organizations. Selection of Indexes 9/24/17. CompSci 516: Database Systems

Overview of Indexing. Chapter 8 Part II. A glimpse at indices and workloads

Data about data is database Select correct option: True False Partially True None of the Above

Databases. Jörg Endrullis. VU University Amsterdam

Overview of Implementing Relational Operators and Query Evaluation

Hash-Based Indexing 165

Overview of Storage and Indexing

DATABASE MANAGEMENT SYSTEMS

Indexing. Chapter 8, 10, 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Rajiv GandhiCollegeof Engineering& Technology, Kirumampakkam.Page 1 of 10

Database Technology. Topic 8: Introduction to Transaction Processing

QUERY OPTIMIZATION [CH 15]

CSE 444: Database Internals. Lecture 22 Distributed Query Processing and Optimization

Database Management System

Database Architectures

Announcement. Reading Material. Overview of Query Evaluation. Overview of Query Evaluation. Overview of Query Evaluation 9/26/17

Intro to Transaction Management

Overview of Storage and Indexing

Datenbanksysteme II: Caching and File Structures. Ulf Leser

Transactions and Concurrency Control

Examples of Physical Query Plan Alternatives. Selected Material from Chapters 12, 14 and 15

University of Massachusetts Amherst Department of Computer Science Prof. Yanlei Diao

Chapter 18 Strategies for Query Processing. We focus this discussion w.r.t RDBMS, however, they are applicable to OODBS.

Introduction to Data Management. Lecture #18 (Transactions)

ROEVER ENGINEERING COLLEGE

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

CISC437/637 Database Systems Final Exam

CSE 544 Principles of Database Management Systems

CISC437/637 Database Systems Final Exam

Implementation of Relational Operations

) Intel)(TX)memory):) Transac'onal) Synchroniza'on) Extensions)(TSX))) Transac'ons)

Final Review. May 9, 2018 May 11, 2018

Introduction. Example Databases

Relational Query Optimization

Evaluation of relational operations

Final Review. May 9, 2017

Query Processing & Optimization. CS 377: Database Systems

Implementation of Relational Operations: Other Operations

Answers to the review questions can be found in the listed sections. What are the components of a workload description? (Section 20.1.

CPSC 421 Database Management Systems. Lecture 11: Storage and File Organization

Overview of Query Evaluation. Overview of Query Evaluation

Wentworth Institute of Technology COMP570 Database Applications Fall 2014 Derbinsky. Physical Tuning. Lecture 10. Physical Tuning

User Perspective. Module III: System Perspective. Module III: Topics Covered. Module III Overview of Storage Structures, QP, and TM

CSE 530A ACID. Washington University Fall 2013

INDEXES MICHAEL LIUT DEPARTMENT OF COMPUTING AND SOFTWARE MCMASTER UNIVERSITY

Babu Banarasi Das National Institute of Technology and Management

Administrivia. CS 133: Databases. Cost-based Query Sub-System. Goals for Today. Midterm on Thursday 10/18. Assignments

ECE 650 Systems Programming & Engineering. Spring 2018

Introduction to Data Management. Lecture #2 (Big Picture, Cont.)

Database Management Systems Introduction to DBMS

Cost-based Query Sub-System. Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications. Last Class.

ECS 165B: Database System Implementa6on Lecture 7

The use of indexes. Iztok Savnik, FAMNIT. IDB, Indexes

MaanavaN.Com DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING QUESTION BANK

Q.1 Short Questions Marks 1. New fields can be added to the created table by using command. a) ALTER b) SELECT c) CREATE. D. UPDATE.

Intro to DB CHAPTER 15 TRANSACTION MNGMNT

Evaluation of Relational Operations: Other Techniques. Chapter 14 Sayyed Nezhadi

Example: Transfer Euro 50 from A to B

Database Usage (and Construction)

University of Waterloo Midterm Examination Sample Solution

Introduction to Transaction Management

What happens. 376a. Database Design. Execution strategy. Query conversion. Next. Two types of techniques

CS 443 Database Management Systems. Professor: Sina Meraji

CS122 Lecture 15 Winter Term,

BIS Database Management Systems.

MIS Database Systems.

CHAPTER 3 RECOVERY & CONCURRENCY ADVANCED DATABASE SYSTEMS. Assist. Prof. Dr. Volkan TUNALI

EECS 647: Introduction to Database Systems

CS330. Some Logistics. Three Topics. Indexing, Query Processing, and Transactions. Next two homework assignments out today Extra lab session:

Transcription:

CPSC 421 Database Management Systems Lecture 19: Physical Database Design Concurrency Control and Recovery * Some material adapted from R. Ramakrishnan, L. Delcambre, and B. Ludaescher Agenda Physical Database Design Transactions (as time allows) CPSC 421, 2009 2 1

What is Physical Design? logical! design! physical! design! After we: Collect requirements Create ER diagram (conceptual design) Create (a normalized) schema Create view definitions Define queries, etc. We (often) need to worry about performance: Choose indexes Decide on what attributes should be clustered Refine external schema the external! schema! CPSC 421, 2009 3 Determine the workload For physical design, we need the anticipated database workload The most important queries and how often they occur The most important updates and how often they occur The performance requirements for the queries and updates These all depend on the requirements of the application! CPSC 421, 2009 4 2

Determine the workload For example: The DB is only updated once a year, but with 1,000 s of queries per day (e.g., product catalog) Alternatively, equal mix of updates and queries (typical) Or, 45% of operations are Q1, 25% are Q2, 5% are Q3, 10% are U1, 15% are U2 (detailed) And, Q1 must run in under 1 second, Q2 under 5 seconds, U1 under 2 seconds (very detailed) CPSC 421, 2009 5 Decisions, decisions Which indexes should we create for our workload? Which relations should have indexes? What field(s) should be the search key? Should we build several indexes? What are the tradeoffs? For each index, what kind should it be? Clustered? Unclustered? Hash? B+ Tree? Should we change the external schema? Alternative normalized schemas? (different BCNF decompositions) Should we denormalize (i.e., add redundancy ) Horizontal partitioning, materialized views CPSC 421, 2009 6 3

Index Selection for Joins When considering a join condition: Hash index on inner relation can be good for Index Nested Loops This can be done even if there is a B+ Tree on the key Should be clustered if join column is not the key for inner relation Clustered B+ Tree on join column(s) good for Sort-Merge What is the data skew though on the inner relation? CPSC 421, 2009 7 Example 1 Hash index on D.dname Given this, index on D.dno is not needed We lose D.dno index after pushing select SELECT E.ename, D.mgr FROM Emp E, Dept D WHERE D.dname= Toy AND E.dno = D.dno; π ename,mgr dno=dno (on the fly) (Block NL) Why make Dept the outer relation? If Dept is inner, we would have to save selection result to disk! σ dname= Toy Dept Emp (Hash Index) CPSC 421, 2009 8 4

Example 1 Hash index on D.dname Given this, index on D.dno is not needed We lose D.dno index after pushing select Hash index on E.dno allows matching (inner) Emp tuples for each selected (outer) Dept tuple SELECT E.ename, D.mgr FROM Emp E, Dept D WHERE D.dname= Toy AND E.dno = D.dno; π ename,mgr Dept dno=dno σ dname= Toy Emp (on the fly) (Index NL) (Hash Index) CPSC 421, 2009 9 Example 1 If query included E.age = 25 Can retrieve Emp tuples using index on E.age (push select) Join with Dept tuples satisfying D.dname selection (push select) With E.age index this query provides much less motivation for adding E.dno index! SELECT E.ename, D.mgr FROM Emp E, Dept D WHERE D.dname= Toy AND E.dno = D.dno AND E.age = 25; π ename,mgr dno=dno (on the fly) (Index NL) What does this depend upon? Selectivities! In this case, E.age is an equality σ dname= Toy Dept σ age=25 (Hash Index) Emp CPSC 421, 2009 10 5

Example 2 Emp should be the outer relation Suggests that we build a hash index on D.dno why? What index should we build on Emp? B+ Tree on E.sal could be used Or, hash index on E.hobby Only one of these is needed the other can be done on the fly As a rule of thumb, equality is more selective than ranges SELECT E.ename, D.mgr FROM Emp E, Dept D WHERE E.sal > 10000 AND E.sal < 20000 AND E.hobby= stamps AND E.dno = D.dno; CPSC 421, 2009 11 Physical Design and Query Optimization Both of these examples indicate that our choice of indexes is guided by the plan(s) we expect the optimizer to consider Physical design requires understanding query optimization! CPSC 421, 2009 12 6

Tuning the External Schema The external schema should be guided by the workload and redundancy issues Denormalization BCNF decomposition removes redundancy by separating attributes requiring joins to bring them together Joins are expensive! This is why we might want to undo decompositions CPSC 421, 2009 13 Tuning the External Schema We may also want to further decompose relations Makes relations smaller Sometimes called partitioning Horizontal partitions For example Employees partitioned into SalesEmployees and ManagerEmployees The partitions represent a unique selection over the original table Note: Decomposition applies projections CPSC 421, 2009 14 7

Tuning the External Schema We may also introduce new keys and vertically partition Reduce the number of columns in a table by creating a new key Student(sid, name, age, credits, major, {other attributes}) Becomes: Not accessed often! Student(sid, name, age, credits, major, oid), StudentMisc(oid, {other attributes}) Can be recovered through Joins CPSC 421, 2009 15 Tuning Queries If a query runs slower than expected Check if an index needs to be re-build Or if statistics are too old The DBMS may not be executing the plan you had in mind Weaknesses: Selections involving null values Selections involving arithmetic or string expressions Selections involving OR conditions Lack of evaluation features, like index-only strategies, lack of certain join methods, or poor estimation Check the plan that is being used In this case, adjust indexes, or rewrite the query/view CPSC 421, 2009 16 8

Tuning Queries Minimize the use of DISTINCT Don t need it if duplicates are acceptable Or if answer contains a key Minimize use of GROUP BY and HAVING SELECT MIN(E.age) FROM Emp E GROUP BY E.dno HAVING E.dno = 102; Consider DBMS use of index when writing arithmetic expressions E.age = 2*D.age SELECT MIN(E.age) FROM Emp E WHERE E.dno = 102; Will benefit from index on E.age, but (might) not from index on D.age CPSC 421, 2009 17 Wrap up Database design consists of several tasks: Requirements analysis Conceptual design Schema refinement Physical design and tuning Usually an iterative process Understanding the application workload and performance requirements is crucial to a good design Physical design and tuning requires understanding DBMS query optimization approaches CPSC 421, 2009 18 9

On to Concurrency Control and Recovery CPSC 421, 2009 19 Basic Database Architecture Web Forms Application Front Ends SQL Interface SQL Commands DBMS Plan Executor Operator Evaluator Parser Optimizer Query Evaluation Engine Transaction Manager Lock Manager Concurrency Control File and Access Methods Buffer Manager Disk Space Manager Recovery Manager Index Files Data Files CPSC 421, 2009 System Catalog 20 10

Concurrency Control and Recovery Transactions A way to define a single all or nothing set of SQL actions Based on a set of properties (the ACID properties) Enable concurrency The basis for crash recovery CPSC 421, 2009 21 Transactions A transaction is a set of SQL statements (that modify a database) chosen by a user Transfer $100 from one account to another (MySQL): START TRANSACTION; SELECT @A1 := balance FROM Account WHERE acctno=500; UPDATE Account SET balance := @A1+100 WHERE acctno=500; SELECT @A2 := balance FROM Account WHERE acctno=501; UPDATE Account SET balance := @A2 100 WHERE acctno=501; COMMIT; We re overlooking a lot of details here!!! CPSC 421, 2009 22 11

Transactions A transaction is a set of SQL statements (that modify a database) chosen by a user Transfer $100 from one account to another (generic): BEGIN TRANSACTION READ balance from account 500 ADD $100 to balance of account 500 WRITE new balance to account 500 READ balance in account 501 VERIFY balance to see if it contains at least $100 ABORT if balance is less than $100 SUBTRACT $100 from balance of account 501 WRITE new balance to account 501 COMMIT TRANSACTION this statement! is new! CPSC 421, 2009 23 Transactions User (application developer) must Begin transaction Read, write, and modify statements intermixed with other programming language statements (e.g., verify) Plus either Commit to indicate successful completion or Abort to indicate that the the transaction should be rolled back i.e., erase previous steps of transaction To ensure database stays in a consistent/correct state The DBMS and programmer must guarantee 4 properties of transactions These are called the ACID properties CPSC 421, 2009 24 12

ACID Properties Atomicity Consistency Isolation Durability CPSC 421, 2009 25 ACID Properties Atomicity A transaction happens in its entirety or not at all What if the OS crashed half-way through the transaction after $100 was removed from the first account? (good for the bank!) The recovery manager of the DBMS must assure that the $100 is deposited back to the first account Often called roll back CPSC 421, 2009 26 13

ACID Properties Consistency If the DB starts in a consistent state, the transaction will transform it into a consistent state The notion of consistency is specific to the application constraints (defined by the user) Thus the programmer must ensure transactions are consistent The DBMS ensures the transaction is atomic E.g., what if the transaction only deposited $100? Probably not consistent according to our example application CPSC 421, 2009 27 ACID Properties Isolation Each transaction is isolated from other transactions DB state is as if each transaction executed by itself What if an another transaction computed the total bank balance after $100 was removed from the first account? The concurrency control subsystem must ensure that all transactions run in isolation (i.e., don t mess up other transactions) unless the programmer chooses a less strict level of isolation similar to concurrency control in operating systems CPSC 421, 2009 28 14

ACID Properties Durability If a transaction commits, its changes to the DB state persist (changes are permanent) What if after the commit the OS crashed before the deposit was written to disk? The recovery manager must assure that the deposit was at least logged (e.g., to make the DB consistent) CPSC 421, 2009 29 15