Physical DB design and tuning: outline

Similar documents
Physical Design. Elena Baralis, Silvia Chiusano Politecnico di Torino. Phases of database design D B M G. Database Management Systems. Pag.

Discuss physical db design and workload What choises we have for tuning a database How to tune queries and views

Databasesystemer, forår 2005 IT Universitetet i København. Forelæsning 8: Database effektivitet. 31. marts Forelæser: Rasmus Pagh

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe Slide 16-1

CSE 344 Final Review. August 16 th

INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados

Physical Database Design and Tuning. Review - Normal Forms. Review: Normal Forms. Introduction. Understanding the Workload. Creating an ISUD Chart

Principles of Data Management. Lecture #9 (Query Processing Overview)

Questions about the contents of the final section of the course of Advanced Databases. Version 0.3 of 28/05/2018.

Physical Database Design and Tuning. Chapter 20

Physical Database Design and Tuning

Database Systems CSE 414

Administração e Optimização de Bases de Dados 2012/2013 Index Tuning

Introduction to Database Systems CSE 344

Rajiv GandhiCollegeof Engineering& Technology, Kirumampakkam.Page 1 of 10

Database Management Systems Paper Solution

Weak Levels of Consistency

Administrivia Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

Distributed KIDS Labs 1

6.830 Problem Set 3 Assigned: 10/28 Due: 11/30

7. Query Processing and Optimization

CPSC 421 Database Management Systems. Lecture 19: Physical Database Design Concurrency Control and Recovery

CMPT 354: Database System I. Lecture 11. Transaction Management

Copyright 2016 Ramez Elmasri and Shamkant B. Navathe

Introduction to Database Systems CSE 344

Database Applications (15-415)

CSE 544 Principles of Database Management Systems

Techno India Batanagar Computer Science and Engineering. Model Questions. Subject Name: Database Management System Subject Code: CS 601

Administrivia Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

Database Systems CSE 414

CMSC 461 Final Exam Study Guide

Inputs. Decisions. Leads to

Index Tuning. Index. An index is a data structure that supports efficient access to data. Matching records. Condition on attribute value

Introduction to Data Management. Lecture #17 (Physical DB Design!)

Introduction to Data Management. Lecture #26 (Transactions, cont.)

Outline. Database Tuning. Ideal Transaction. Concurrency Tuning Goals. Concurrency Tuning. Nikolaus Augsten. Lock Tuning. Unit 8 WS 2013/2014

CS317 File and Database Systems

NewSQL Database for New Real-time Applications

References. Transaction Management. Database Administration and Tuning 2012/2013. Chpt 14 Silberchatz Chpt 16 Raghu

Column Stores vs. Row Stores How Different Are They Really?

What are Transactions? Transaction Management: Introduction (Chap. 16) Major Example: the web app. Concurrent Execution. Web app in execution (CS636)

CSE 190D Spring 2017 Final Exam

CSE 190D Spring 2017 Final Exam Answers

Database Applications (15-415)

Transaction Management: Introduction (Chap. 16)

Database Management and Tuning

Introduction to Query Processing and Query Optimization Techniques. Copyright 2011 Ramez Elmasri and Shamkant Navathe

DATABASE MANAGEMENT SYSTEMS

Database Management Systems (COP 5725) Homework 3

Advanced Database Systems

Faloutsos 1. Carnegie Mellon Univ. Dept. of Computer Science Database Applications. Outline

OBJECTIVES. How to derive a set of relations from a conceptual data model. How to validate these relations using the technique of normalization.

Introduction to Data Management. Lecture #25 (Transactions II)

Administrivia. Physical Database Design. Review: Optimization Strategies. Review: Query Optimization. Review: Database Design

Hash table example. B+ Tree Index by Example Recall binary trees from CSE 143! Clustered vs Unclustered. Example

Lecture 21. Lecture 21: Concurrency & Locking

Database Systems CSE 414

Database Systems Management

Department of Information Technology B.E/B.Tech : CSE/IT Regulation: 2013 Sub. Code / Sub. Name : CS6302 Database Management Systems

INSTITUTO SUPERIOR TÉCNICO Administração e optimização de Bases de Dados

Introduction to Database Systems CSE 414. Lecture 26: More Indexes and Operator Costs

CS317 File and Database Systems

L i (A) = transaction T i acquires lock for element A. U i (A) = transaction T i releases lock for element A

Chapter 20 Introduction to Transaction Processing Concepts and Theory

Examination paper for TDT4145 Data Modelling and Database Systems

CSE 544, Winter 2009, Final Examination 11 March 2009

Kathleen Durant PhD Northeastern University CS Indexes

Implementation of Relational Operations. Introduction. CS 186, Fall 2002, Lecture 19 R&G - Chapter 12

Implementation of Relational Operations: Other Operations

Query Processing. Lecture #10. Andy Pavlo Computer Science Carnegie Mellon Univ. Database Systems / Fall 2018

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

Data about data is database Select correct option: True False Partially True None of the Above

Lecture #16 (Physical DB Design)

CSE 344 FEBRUARY 14 TH INDEXING

Data Storage. Query Performance. Index. Data File Types. Introduction to Data Management CSE 414. Introduction to Database Systems CSE 414

Physical Database Design

CS 564 Final Exam Fall 2015 Answers

TRANSACTION MANAGEMENT

CSIT5300: Advanced Database Systems

The DBMS accepts requests for data from the application program and instructs the operating system to transfer the appropriate data.

15-415/615 Faloutsos 1

CSC 261/461 Database Systems Lecture 21 and 22. Spring 2017 MW 3:25 pm 4:40 pm January 18 May 3 Dewey 1101

ROEVER ENGINEERING COLLEGE

CS 4604: Introduction to Database Management Systems. B. Aditya Prakash Lecture #10: Query Processing

Course Modules for MCSA: SQL Server 2016 Database Development Training & Certification Course:

E.G.S. PILLAY ENGINEERING COLLEGE (An Autonomous Institution, Affiliated to Anna University, Chennai) Nagore Post, Nagapattinam , Tamilnadu.

CSE 444 Midterm Exam

Final Review. CS634 May 11, Slides based on Database Management Systems 3 rd ed, Ramakrishnan and Gehrke

CSE 344 MARCH 25 TH ISOLATION

Query optimization. Elena Baralis, Silvia Chiusano Politecnico di Torino. DBMS Architecture D B M G. Database Management Systems. Pag.

Exam code: Exam name: Database Fundamentals. Version 16.0

Outline. Database Tuning. Join Strategies Running Example. Outline. Index Tuning. Nikolaus Augsten. Unit 6 WS 2014/2015

IT Best Practices Audit TCS offers a wide range of IT Best Practices Audit content covering 15 subjects and over 2200 topics, including:

Query Processing & Optimization

Introduction to Databases, Fall 2005 IT University of Copenhagen. Lecture 10: Transaction processing. November 14, Lecturer: Rasmus Pagh

Introduction to Database Systems CSE 344

RELATIONAL OPERATORS #1

Lecture 12. Lecture 12: The IO Model & External Sorting

C Examcollection.Premium.Exam.58q

Friday Nights with Databases!

Transcription:

Physical DB design and tuning: outline Designing the Physical Database Schema Tables, indexes, logical schema Database Tuning Index Tuning Query Tuning Transaction Tuning Logical Schema Tuning DBMS Tuning

Relational DB design Database design phases: (a) Requirement Analysis, (b) Conceptual design (c) Logical design (d) Physical design Physical Design Goal: definition of appropriate storage structures for a specific DBMS, to ensure the application performance desired

Physical design: needed information Logical relational schema and integrity constraints Statistics on data (table size and attribute values) Workload Critical queries and their frequency of use Critical updates and their frequency of use Performance expected for critical operations Knowledge about the DBMS Data organizations, Indexes and Query processing techniques supported

Workload definition Critical operations: those performed more frequently and for which a short execution time is expected.

Workload definition For each critical query Relations used Attributes of the result Attributes used in conditions (restrictions and joins) Selectivity factor of the conditions For each critical update Type (INSERT/DELETE/UPDATE) Attributes used in conditions (restrictions and joins) Selectivity factor of the conditions Attributes that are updated

The use of ISUD table Insert, Select, Update, Delete <---- Employee Attributes ------> APPLICATION FREQUENCY %DATI NAME SALARY ADDRESS Payroll monthly 100 S S S NewEmployee quartly 0.1 I I I DeleteEmployee quartly 0.1 D D D UpdateSalary monthly 10 S U

Decisions to be taken The physical organization of relations Indexes Index type... Logical schema transformation to improve performance

Decisions for relations and indexes Storage structures for relations: heap (small data set, scan operations, use of indexes) sequential (sorted static data) hash (key equality search), usually static tree (index sequential) (key equality and range search) Choice of secondary index, considering that they are extremely useful slow down the updated of the index keys require memory

How to choose indexes Use a DBMS application, such as DB2 Design Advisor, SQL Server DB Tuning Advisor, Oracle Access Advisor.

How to choose indexes Tips: don t use indexes on small relations, on frequently modified attributes, on non selective attributes (queries which returns 15% of data) on attributes with values long string Define indexes on primary and foreign keys Consider the definition of indexes on attributes used on queries which requires sorting: ORDER BY, GROUP BY, DISTINCT, Set operations

How to choose indexes (cont.) Evaluate the convenience of indexes on attributes that can be used to generate index-only plans. Evaluate the convenience of indexes on selective attributes in the WHERE: hash indexes, for equality search B + tree, for equality and range search, possibly clustered multi-attributes (composite), for conjunctive conditions to improve joins with IndexNestedLoop or MergeJoin, grouping, sorting, duplicate elimination. Attention to disjunctive conditions

How to choose indexes (cont.) Local view: indexes useful for one query (simple) Global view: indexes useful for a workload Index subsumption: some indexes may provide similar benefit An index Idx1(a, b, c) can replace indexes Idx2(a, b) and Idx3(a) Index merging: two indexes supporting two different queries can be merged into one index supporting both queries. When there are a lot of data to load, create indexes after data loading

How to choose indexes: the global view 1. Identify critical queries 2. Create possible indexes to tune single queries 3. From set of all indexes remove subsumed indexes 4. Merge indexes where possible indexes 5. Evaluate benefit/cost ratio for remaining indexes (need to consider frequency of queries/index usage) 6. Pick optimal index configuration satisfying storage constraints.

Primary organizations and secondary indexes CREATE [UNIQUE...] INDEX Nome [ USING {BTREE HASH RTREE} ] ON Table (Attributes+Ord)

Clustering indexes

Bitmap indexes

DB schema for the examples Lecturers(PkLecturer INT, Name VARCHAR(20), ResearchArea VARCHAR(1), Salary INT, Position VARCHAR(10), FkDepartment INT) Departments(PkDepartment INT, Name VARCHAR(20), City VARCHAR(10)) Departments Lecturers Nrec 300 10 000 Npag 30 1 200 Nkey(IdxSalary) 50 (min=40, max=160) Nkey(IdxCity) 15 Primary organization definitions in SQL CREATE TABLE Table (Attributes) < physical aspects >; CREATE TABLE Table (Attributes + keys) ORGANIZED INDEX; (ORACLE) CREATE TABLE Table (Attributes + keys) ORGANIZE BY DIMENSIONS (Att); (DB2)

The definition of indexes Do not index! Few records in this table Do not index! Possibly too many updates SELECT Name FROM Departments WHERE City = PI SELECT StockName, StockPrice FROM Stocks WHERE Stockprice > 50 The index is created implicitly on a PK SELECT Name FROM Lecturers WHERE PkLecturer = 70

The definition of (multi-attributes) indexes How many indexes on a Table? A secondary index on Position or on Salary? A secondary index on Position and another on Salary? A secondary index on <Position, Salary>? How many indexes use the DBMS for AND? SELECT Name FROM Lecturers WHERE Position = P AND Salary BETWEEN 50 AND 60 An index on Position, ResearchArea or ResearchArea Position to speed up GBY. Better on ResearchArea Position because it also returns result sorted SELECT ResearchArea,Position COUNT(*) FROM Lecturers GROUP BY Position, ResearchArea ORDER BY ResearchArea

The creation of clustered index With a not very selective predicate Salary>70, a clustered index may still be useful If there are a few lecturers with a Salary =70, a clustered index on Salary can be more useful than an unclustered one A clustered index on FkDepartment can be useful here SELECT Name FROM Lecturers WHERE Salary > 70 SELECT Name FROM Lecturers WHERE Salary = 70 SELECT FkDepartment, COUNT(*) FROM Lecturers WHERE Salary > 70 GROUP BY FkDepartment

The creation of indexes for index-only plans For Index-Only plans clustered indexes are not required. Index on su FkDepartment Index on Salary Index on FkDepartment for INL Index on <FkDepartment,Salary> SELECT DISTINCT FkDepartment FROM Lecturers ; SELECT Salary, COUNT(*) FROM Lecturers GROUP BY Salary; SELECT D.Name FROM Lecturers L, Departments D WHERE FkDepartment=PkDepartment; SELECT FkDepartment, MIN(Salary) FROM Lecturers GROUP BY FkDepartment;

Index-only plans Some DBMSs allow the definition of indexes on some attributes, and to include also others which are not part of the index key. DB2: CREATE UNIQUE INDEX Name ON Table (Attrs) INCLUDE (OtherAttrs); SQL Server: the clause UNIQUE is optional Index on FkDepartment INCLUDE Salary SELECT FkDepartment, MIN(Salary) FROM Lecturers GROUP BY FkDepartment;

Concluding remarks The implementation and maintenance of a database application to meet specific performance requirements is a very complex task Table organizations and index selections are fundamental to improve query performance, but... Indexes must be really useful because of their memory and update costs An index should be useful for different queries Clustered indexes are very useful, but only one for table can be defined The order of attributes in multi-attribute (composite) indexes is important

Database tuning: what is the goal? Improve performance Database tuning mostly refers to query (or DB applications) performance.

What can be tuned? Applications Queries, Transactions,... DBMS System configuration, DB schema, storage structures,... OS System parameters and configuration for IO,... HW Components for CPU, main memory, hard disks, backup solutions, network,... Here the focus is on Database Tuning

Needed knowledge Applications How is the DB used and what are the tuning goals? DBMS What possibilities of optimization does it provides? Fully efficient DB Tuning requires deep knowledge about... OS How does it manage HW resources and services by the overall system? HW What are suitable HW components to support the performace requirements? Schallehn: Database Tuning and Self-Tuning 2012

Schallehn: Database Tuning and Self-Tuning 2012 Who does the tuning? WHEN? WHAT KNOWLEDGE? DB and application designers During DB development (physical DB design) and initial testing. DB designers must have knowledge about applications, and good knowledge about the DBMS, but may be only fair to no knowledge about OS and HW. DB administrators (DBA) During ongoing DBMS maintenance. Adjustment to changing requirements, data volume, new HW. DBA have knowledge about DBMS, OS, and HW. And about applications? DBA knowledge about applications depends on the given organizational structure DB experts (consultants, in-house experts) During system re-design, troubleshooting or fire fighting (emergency actions) DB consultants usually have strong knowledge about DBMS, OS and HW, but have little knowledge about current applications.

Database tuning as a process Overall system continuously changes Data volume, # of user, # of queries, usage patterns, hardware, etc. Requirements may change Identify Existing Problem Monitor system behavior and identify cause of problem Apply changes to solve problem Problem Solved Current performance requirements not fulfilled Observe and measure relevant quantities, e.g. queries time. Adjust system parameters, storage structures etc. Schallehn: Database Tuning and Self-Tuning 2012

Basic principles: controlling trade-offs Database tuning very often is a process of decision about costs for a solution compared to its benefits. Costs: monetary costs (for HW, SW), working hours or more techinical costs (resource consumption, impact on other aspects) Benefits: improved performance (monetary effects most often not easily quantifiable) Examples Adding indexes -> benefit: better query performance costs: more disk memory, more update time Denormalization -> benefit: better query performance costs: need to control redundancy within tables Replace disk by RAID -> benefit: better I/O performance costs: HW costs Schallehn: Database Tuning and Self-Tuning 2012

Basic principles: Pareto principle 80/20 Rule: by applying 20% of the effort one can achieve 80% of the desired effects. Consequences for DB tuning 100% effect = full optimized system Full optimized system probably beyond necessary requirements. a little bit of DB tuning can help a lot Hence, one does not need to be an expert on all levels of the system to be able to implement a reasonable solution... Schallehn: Database Tuning and Self-Tuning 2012

Database tuning When a database has unexpected bad performances we must revise: Physical Design: the selection of indexes or their type, looking at the access plans generated by the optimizer Query and Transaction Definitions DB Logical Design DBMS: buffer and page size, disk use, log management. Hardware: number of CPU, disk types.

To begin Select the queries with low performance (either the critical one or those which do not satisfy the users) Analyze physical query plans looking at physical operators for tables sorting of intermediate results physical operators used in the query plans physical operators for the logical operators

Index tuning One of the most often applied tuning measures Great benefits with little effort (if applied correctly). Strong support within all available DBMS (index structures, index usage controlled by optimizer) Storage cost of additional disk memory most often acceptable. Cost for index updates. Cost for locking overhead and lock conflicts.

Query tuning SELECT with OR condition are rewritten as one predicate IN, or with UNION of SELECT SELECT with AND of predicates are rewritten as a condition with BETWEEN Rewrite SUBQUERIES as join Eliminate useless DISTINCT and ORDER BY from SELECT or SUBSELECTS Avoid the definition of temporary views with grouping and aggregations Avoid aggregation functions in SUBQUERIES

Query tuning (cont.) Avoid expressions with index attributes (e.g. Salary*2=100) Avoid useless HAVING or GROUP BY SELECT Position, MIN(Salary) FROM Lectures GROUP BY Position HAVING Position IN( P1, P2 ) SELECT Position, MIN(Salary) FROM Lectures WHERE Position IN( P1, P2 ) GROUP BY Position SELECT MIN(Salary) FROM Lectures GROUP BY Position HAVING Position = P1 SELECT FROM WHERE MIN(Salary) Lectures Position = P1

Transactions tuning Do not block data during db loading, as well as during read-only transactions Split complex transactions in smaller ones Select the right block granularity, if possible (long T, table; medium T, page; short T, record) Select the right isolation level among those provided by SQL SET TRANSACTION ISOLATION LEVEL {READ UNCOMMITTED READ COMMITTED REPEATABLE READ SERIALIZABLE }

Isolation levels in SQL READ UNCOMMITTED, record READ only without locks Problem: dirty read (read data updated by other active T) Account (No INTEGER PRIMARY KEY, Name CHAR(30), Balance FLOAT); -- T1.begin: UPDATE Account SET Balance = Balance - 200.00 WHERE No = 123; ROLLBACK; -- T2.begin: RU SELECT AVG(Balance)FROM Account; COMMIT;

Isolation levels in SQL READ COMMITTED, shared read locks are released immediately, exclusive locks until the T commit Problem: avoid dirty read, but unrepeatable reads or loss of updates -- T1.begin: UPDATE Account SET Balance = Balance - 200.00 WHERE No = 123; COMMIT; -- T2.begin: RC SELECT AVG(Balance) FROM Account; SELECT AVG(Balance) FROM Account;COMMIT;

Isolation levels in SQL REPEATABLE READ, shared and exclusive locks on records until the end of the transaction Problem: avoid the previous problems, but not the phantom records problem: -- T1.begin: INSERT INTO Account VALUES(1233, xx,200.00); COMMIT; -- T2.begin: RR SELECT AVG(Balance) FROM Account; SELECT AVG(Balance)FROM Account;COMMIT;

Isolation levels in sql (cont.) SERIALIZABLE, multi-granularity locks: tables read by a T cannot be updated. Good, but the number of transactions which can be executed concurrently is considerably reduced.

DBMS isolation levels Commercial DBMS may provide some isolation levels only, not have the same isolation level by default have other isolation levels (e.g. SNAPSHOT)

Logical schema tuning Types of logical schema restructuring: Vertical Partitioning. Horizontal Partitioning Denormalization Unlike changes to the physical schema (physical independence), changes to the logical schema (schema evolution) require views creation for the logical independence.

Logical schema tuning Partitioning: splitting a table for performance Horizontal: on a property Vertical: R1(pk, Name, Surname) R2(pk, Address, ) Normalization: divide Students from Exams to avoid anomalies Denormalization: store Students and Exams into one table: increases update time but makes join faster

Students Vertical partitioning (projections) Name StudentNo City BirthYear BDegree University Exams PkE Course StudentNo Master Date Other Critical Query: Find the number of exams passed and the number of students who have done the test by course, and by academic year. ExamForAnalysis ExamsOther PkE Course Master Date PkE StudentNo Other The decomposition must preserve data...

Horizontal partitioning (selections) Exams PkE Course StudentNo Degree Date... MasterExam FkM FkE... Master PkM Title President... Critical Query: For a study program with title X and a course with less than 5 exams passed, find the number of exams, by course, and academic year MasterXExams PkE Course StudentNo Degree Date... MasterYExams PkE Course StudentNo Degree Date......

Denormalization (attribute replication) Students Name StudentNo City BirthYear BDegree University Exams PkE Course Student Degree Date Other Critical Query: For a student number N, find the student name, the master program title and the grade of exams passed Exams PkE Course MasterExam FkM FkE... Master PkM Title President... Student Name Degree Date Master Other Finally: Schema fusion of relations (1:1) and View materialization

DBMS tuning Tune the transaction manager (log management and storage, checkpoint frequency, dump frequency) Tune the buffer size (as well as interactions with the operating system) Disk management (allocation of memory for tablespaces, filling factor for pages and files, number of preloaded pages, accesses to files) Use of distributed and parallel databases

Database self-tuning Applications Knowledge about from analyzing queries, TXNs, schema, etc. DBMS Naturally, it knows best about its functionality and tuning options DBMS itself is best Tuning Expert! OS Knowledge about encoded in platform specific code + runtime inf via OS interfaces HW Knowledge about encoded in platform specific code + runtime inf via OS interfaces Schallehn: Database Tuning and Self-Tuning 2012

Database self-tuning Databases are getting better and better at this This is clearly the way to go But there will be still space for a good DBA

Tables: Exercise Sales(Date,FKShop,FKCust,FKProd,UnitPrice,Q,TotPrice) Shops(PKShop,Name,City,Region,State) Customer(PKCust,Nome,FamName,City,Region,State,Inco me) Products(PKProd,Name,SubCategory,Category,Price)

Exercise Sales: 100.000.000, 1.000.000; Shops: 500, 2; Customers: 100.000, 1.000; Products: 10.000, 100 SELECT Sh.Region, Month(S.Date),Sum(TotPrice) FROM Sales S join Shops Sh on FKShops=PKShops GROUP BY Sh.Region, Month(S.Date) Propose a primary organization based on this query Compute the cost of an optimal access plan based on this organization Add a condition: WHERE 1/1/2017 < Date