CSE 544 Principles of Database Management Systems

Similar documents
CSE544 Database Architecture

CSE544: Principles of Database Systems. Lectures 5-6 Database Architecture Storage and Indexes

CSE 444: Database Internals. Lectures 5-6 Indexing

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 6 - Storage and Indexing

CSE 544: Principles of Database Systems

Database Systems CSE 414

Introduction to Database Systems CSE 344

Introduction to Database Systems CSE 344

Important Note. Today: Starting at the Bottom. DBMS Architecture. General HeapFile Operations. HeapFile In SimpleDB. CSE 444: Database Internals

Database Systems CSE 414

CSE 344 FEBRUARY 14 TH INDEXING

Announcements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6)

Database Systems CSE 414

Introduction to Database Systems CSE 344

Lecture 8 Index (B+-Tree and Hash)

User Perspective. Module III: System Perspective. Module III: Topics Covered. Module III Overview of Storage Structures, QP, and TM

RAID in Practice, Overview of Indexing

THE B+ TREE INDEX. CS 564- Spring ACKs: Jignesh Patel, AnHai Doan

Lecture 13. Lecture 13: B+ Tree

Data Storage. Query Performance. Index. Data File Types. Introduction to Data Management CSE 414. Introduction to Database Systems CSE 414

Spring 2017 B-TREES (LOOSELY BASED ON THE COW BOOK: CH. 10) 1/29/17 CS 564: Database Management Systems, Jignesh M. Patel 1

Topics to Learn. Important concepts. Tree-based index. Hash-based index

Kathleen Durant PhD Northeastern University CS Indexes

CSE 444: Database Internals. Lecture 3 DBMS Architecture

Overview of Storage and Indexing

What we already know. Late Days. CSE 444: Database Internals. Lecture 3 DBMS Architecture

Tree-Structured Indexes

IMPORTANT: Circle the last two letters of your class account:

CSE 544, Winter 2009, Final Examination 11 March 2009

CS143: Index. Book Chapters: (4 th ) , (5 th ) , , 12.10

Announcements (March 1) Query Processing: A Systems View. Physical (execution) plan. Announcements (March 3) Physical plan execution

Database Applications (15-415)

Indexing. Chapter 8, 10, 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Database design and implementation CMPSCI 645. Lecture 08: Storage and Indexing

CSE 444: Database Internals. Lecture 3 DBMS Architecture

Disks & Files. Yanlei Diao UMass Amherst. Slides Courtesy of R. Ramakrishnan and J. Gehrke

Physical Disk Structure. Physical Data Organization and Indexing. Pages and Blocks. Access Path. I/O Time to Access a Page. Disks.

Query Processing: A Systems View. Announcements (March 1) Physical (execution) plan. CPS 216 Advanced Database Systems

Storage and Indexing

L9: Storage Manager Physical Data Organization

CS 222/122C Fall 2016, Midterm Exam

CSC 261/461 Database Systems Lecture 17. Fall 2017

Tree-Structured Indexes

Lecture 12. Lecture 12: The IO Model & External Sorting

Unit 3 Disk Scheduling, Records, Files, Metadata

Overview of Storage and Indexing

Introduction to Database Systems CSE 414. Lecture 26: More Indexes and Operator Costs

CSE 530A. B+ Trees. Washington University Fall 2013

Announcements. Reading Material. Today. Different File Organizations. Selection of Indexes 9/24/17. CompSci 516: Database Systems

CSE 190D Database System Implementation

File Structures and Indexing

Principles of Data Management. Lecture #3 (Managing Files of Records)

CSIT5300: Advanced Database Systems

Introduction. Choice orthogonal to indexing technique used to locate entries K.

Lecture Data layout on disk. How to store relations (tables) in disk blocks. By Marina Barsky Winter 2016, University of Toronto

Spring 2013 CS 122C & CS 222 Midterm Exam (and Comprehensive Exam, Part I) (Max. Points: 100)

Parser. Select R.text from Report R, Weather W where W.image.rain() and W.city = R.city and W.date = R.date and R.text.

Why Is This Important? Overview of Storage and Indexing. Components of a Disk. Data on External Storage. Accessing a Disk Page. Records on a Disk Page

Introduction to Data Management. Lecture 21 (Indexing, cont.)

Tree-Structured Indexes. Chapter 10

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

CompSci 516: Database Systems

Tree-Structured Indexes

Storage hierarchy. Textbook: chapters 11, 12, and 13

Hash table example. B+ Tree Index by Example Recall binary trees from CSE 143! Clustered vs Unclustered. Example

Lecture 12. Lecture 12: Access Methods

Modern Database Systems Lecture 1

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25

Principles of Data Management. Lecture #5 (Tree-Based Index Structures)

Overview of Storage and Indexing

Tree-Structured Indexes

Oracle on RAID. RAID in Practice, Overview of Indexing. High-end RAID Example, continued. Disks and Files: RAID in practice. Gluing RAIDs together

Find the block in which the tuple should be! If there is free space, insert it! Otherwise, must create overflow pages!

Storing Data: Disks and Files

Database Applications (15-415)

Tree-Structured Indexes

Tree-Structured Indexes

Tree-Structured Indexes ISAM. Range Searches. Comments on ISAM. Example ISAM Tree. Introduction. As for any index, 3 alternatives for data entries k*:

Database Applications (15-415)

Principles of Data Management. Lecture #9 (Query Processing Overview)

Indexing. Announcements. Basics. CPS 116 Introduction to Database Systems

Module 4: Tree-Structured Indexing

Storing Data: Disks and Files

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

Indexing: Overview & Hashing. CS 377: Database Systems

Intro to DB CHAPTER 12 INDEXING & HASHING

Introduction to Data Management. Lecture #13 (Indexing)

Database Applications (15-415)

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Administrivia. Tree-Structured Indexes. Review. Today: B-Tree Indexes. A Note of Caution. Introduction

Overview of Storage and Indexing

Storing Data: Disks and Files

Introduction to Data Management. Lecture 14 (Storage and Indexing)

Disks, Memories & Buffer Management

Review 1-- Storing Data: Disks and Files

System Structure Revisited

(Storage System) Access Methods Buffer Manager

Background: disk access vs. main memory access (1/2)

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11

CPSC 421 Database Management Systems. Lecture 11: Storage and File Organization

Transcription:

CSE 544 Principles of Database Management Systems Alvin Cheung Fall 2015 Lecture 5 - DBMS Architecture and Indexing 1

Announcements HW1 is due next Thursday How is it going? Projects: Proposals are due next Wednesday (not graded) Submit on dropbox 2

Where We Are What we have already seen Overview of the relational model Motivation and where model came from Physical and logical independence How to design a database From ER diagrams to conceptual design Schema normalization How different data models work Where we go from here How can we efficiently implement this model? How can we run RA plans efficiently? 3

References Anatomy of a database system. J. Hellerstein and M. Stonebraker. In Red Book (4th ed). Chapters 8 through 11 (in the R&G book, third ed.) Disk and files: Sections 9.3 through 9.7 Index structures: Section 8.3 Hash-based indexes: Section 8.3.1 and Chapter 11 B+ trees: Section 8.3.2 and Chapter 10 4

DBMS Architecture Admission Control Connection Mgr Process Manager Parser Query Rewrite Optimizer Executor Query Processor Memory Mgr Disk Space Mgr Replication Services Admin Utilities Access Methods Buffer Manager Lock Manager Log Manager Storage Manager Shared Utilities [Anatomy of a Db System. J. Hellerstein & M. Stonebraker. Red Book. 4ed.] 5

DBMS Architecture Admission Control Connection Mgr Process Manager Parser Query Rewrite Optimizer Executor Query Processor Memory Mgr Disk Space Mgr Replication Services Admin Utilities Access Methods Buffer Manager Lock Manager Log Manager Storage Manager Shared Utilities [Anatomy of a Db System. J. Hellerstein & M. Stonebraker. Red Book. 4ed.] 6

Process Model Why not simply queue all user requests? (and serve them one at the time) Alternatives 1. Process per connection 2. Server process (thread per connection) OS threads or DBMS threads 3. Server process with I/O process Advantages and problems of each model? 7

Process Per Connection Overview DB server forks one process for each client connection Advantages Easy to implement (OS time-sharing, OS isolation, debuggers, etc.) Provides more physical memory than a single process can use Drawbacks Need OS support Since all processes access the same data on disk, need concurrency control Not scalable: memory overhead and expensive context switches Goal is efficient support for high-concurrency transaction processing 8

Server Process Overview DB assigns one thread per connection (from a thread pool) Advantages Shared structures can simply reside on the heap Threads are lighter weight than processes (memory, context switching) Drawbacks Concurrent programming is hard to get right (race conditions, deadlocks) Portability issues can arise when using OS threads Big problem: entire process blocks on synchronous I/O calls Solution 1: OS provides asynchronous I/O (true in modern OS) Solution 2: Use separate process(es) for I/O tasks 9

DBMS Threads vs OS Threads Why do some DBMSs implement their own threads? Legacy: originally, there were no OS threads Portability: OS thread packages are not completely portable Performance: fast task switching Drawbacks Replicating a good deal of OS logic Need to manage thread state, scheduling, and task switching How to map DBMS threads onto OS threads or processes? Rule of thumb: one OS-provided dispatchable unit per physical device See page 9 and 10 of Hellerstein and Stonebraker s paper 10

Commercial Systems Oracle Unix default: process-per-user mode Unix: DBMS threads multiplexed across OS processes Windows: DBMS threads multiplexed across OS threads IBM DB2 Unix: process-per-user mode Windows: OS thread-per-user SQL Server Windows default: OS thread-per-user Windows: DBMS threads multiplexed across OS threads 11

DBMS Architecture Admission Control Connection Mgr Process Manager Parser Query Rewrite Optimizer Executor Query Processor Memory Mgr Disk Space Mgr Replication Services Admin Utilities Access Methods Buffer Manager Lock Manager Log Manager Storage Manager Shared Utilities [Anatomy of a Db System. J. Hellerstein & M. Stonebraker. Red Book. 4ed.] 12

Admission Control Why does a DBMS need admission control? To avoid thrashing and provide graceful degradation under load When does DBMS perform admission control? In the dispatcher process: want to drop clients as early as possible to avoid wasting resources on incomplete requests This type of admission control can also be implemented before the request reaches the DBMS (e.g., application server or web server) Before query execution: delay queries to avoid thrashing Can make decisions based on estimated resource needs for a query 13

DBMS Architecture Admission Control Connection Mgr Process Manager Parser Query Rewrite Optimizer Executor Query Processor Memory Mgr Disk Space Mgr Replication Services Admin Utilities Access Methods Buffer Manager Lock Manager Log Manager Storage Manager Shared Utilities [Anatomy of a Db System. J. Hellerstein & M. Stonebraker. Red Book. 4ed.] 14

Storage Model Problem: DBMS needs spatial and temporal control over storage Spatial control for performance Temporal control for correctness and performance Alternatives Use raw disk device interface directly Interact directly with device drivers for the disks Use OS files 15

Spatial Control Using Raw Disk Device Interface Overview DBMS issues low-level storage requests directly to disk device Advantages DBMS can ensure that important queries access data sequentially Can provide highest performance Disadvantages Requires devoting entire disks to the DBMS Reduces portability as low-level disk interfaces are OS specific Many devices are in fact virtual disk devices 16

Spatial Control Using OS Files Overview DBMS creates one or more very large OS files Advantages Allocating large file on empty disk can yield good physical locality Disadvantages OS can limit file size to a single disk OS can limit the number of open file descriptors But these drawbacks have mostly been overcome by modern OSs 17

Commercial Systems Most commercial systems offer both alternatives Raw device interface for peak performance OS files more commonly used In both cases, we end-up with a DBMS file abstraction implemented on top of OS files or raw device interface 18

DBMS Architecture Admission Control Connection Mgr Process Manager Parser Query Rewrite Optimizer Executor Query Processor Memory Mgr Disk Space Mgr Replication Services Admin Utilities Access Methods Buffer Manager Lock Manager Log Manager Storage Manager Shared Utilities [Anatomy of a Db System. J. Hellerstein & M. Stonebraker. Red Book. 4ed.] 19

Temporal Control Buffer Manager Correctness problems DBMS needs to control when data is written to disk in order to provide transactional semantics (we will study transactions later) OS buffering can delay writes, causing problems when crashes occur Performance problems OS optimizes buffer management for general workloads DBMS understands its workload and can do better Areas of possible optimizations Page replacement policies Read-ahead algorithms (physical vs logical) Deciding when to flush tail of write-ahead log to disk 20

Buffer Manager Page requests from higher-level code Files and access methods Buffer pool Buffer pool manager Disk page Free frame Main memory Disk is a collection of blocks Disk Disk space manager 1 page corresponds to 1 disk block 21

Commercial Systems DBMSs implement their own buffer pool managers Modern filesystems provide good support for DBMSs Using large files provides good spatial control Using interfaces like the mmap suite Provides good temporal control Helps avoid double-buffering at DBMS and OS levels 22

DBMS Architecture Admission Control Parser Connection Mgr Process Manager Files and Access Methods Lock Manager Storage Manager Query Rewrite Optimizer Executor Query Processor Buffer Manager Log Manager Memory Mgr Disk Space Mgr Replication Services Admin Utilities Shared Utilities [Anatomy of a Db System. J. Hellerstein & M. Stonebraker. Red Book. 4ed.] 23

Access Methods A DBMS stores data on disk by breaking it into pages A page is the size of a disk block. A page is the unit of disk IO Buffer manager caches these pages in memory Access methods do the following: They organize pages into collections called DB files They organize data inside pages They provide an API for operators to access data in these files 24

Data Storage Basic abstraction Collection of records or file Typically, 1 relation = 1 database file A file consists of one or more pages How to organize pages into files? How to organize records inside a file? Simplest approach: heap file (unordered) 25

Heap File Operations Create or destroy a file Insert a record Delete a record with a given rid (rid) rid: unique tuple identifier used to identify disk address of page containing record Get a record with a given rid Scan all records in the file 26

Heap File Implementation 1 Linked list of pages: Data page Data page Data page Data page Header page Full pages Data page Data page Data page Data page Pages with some free space 27

Heap File Implementation 2 Better: directory of pages Header page Data page Data page Directory Directory contains free-space count for each page. Faster inserts for variable-length records Data page 28

Page Formats Issues to consider 1 page = 1 disk block = fixed size (e.g. 8KB) Records: Fixed length Variable length Record id = RID Typically RID = (PageID, SlotNumber) Why do we need RID s in a relational DBMS? See discussion about indexes later in the lecture 29

Types of Files Heap file (what we discussed so far) Unordered Sorted file (also called sequential file) Clustered file (aka indexed file) 30

Searching in a Heap File File is not sorted on any attribute Student(sid: int, age: int, ) 30 18 70 21 1 record 20 20 40 19 1 page 80 19 60 18 10 21 50 22 31

Heap File Search Example 10,000 students 10 student records per page Total number of pages: 1,000 pages Find student whose sid is 80 Must read on average 500 pages Find all students older than 20 Must read all 1,000 pages Can we do better? 32

Sequential File File sorted on an attribute, usually on primary key Student(sid: int, age: int, ) 10 21 20 20 30 18 40 19 50 22 60 18 70 21 80 19 33

Sequential File Example Total number of pages: 1,000 pages Find student whose sid is 80 Could do binary search, read log 2 (1,000) 10 pages Find all students older than 20 Must still read all 1,000 pages Can we do even better? 34

Indexes Index: data structure that organizes data records on disk to optimize selections on the search key fields for the index An index contains a collection of data entries, and supports efficient retrieval of all data entries with a given search key value k 35

Indexes Search key = can be any set of fields not the same as the primary key, nor a key Index = collection of data entries Data entry for key k can be: The actual record with key k In this case, the index is also a special file organization Called: indexed file organization (k, RID) (k, list-of-rids) 36

Primary Index Primary index: determines location of indexed records Dense index: each record in data file is pointed to by a (key,rid) pairs in index Index File Data File 1 data entry 1 page 10 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80 37

Primary Index with Duplicate Keys Sparse index: pointer to lowest search key on each page: Search for 20 10 10...but need to search here too 20 is here... 10 20 30 10 10 20 20 20 30 40 38

Primary Index Example Let s assume all pages of index fit in memory Find student whose sid is 80 Index (dense or sparse) points directly to the page Only need to read 1 page from disk. Find all students older than 20 Must still read all 1,000 pages. How can we make both queries fast? 39

Secondary Indexes To index other attributes than primary key Always dense (why?) 18 18 19 19 10 21 20 20 30 18 40 19 20 21 21 22 50 22 60 18 70 21 80 19 40

Clustered vs. Unclustered Index Data entries (Index File) (Data file) Data entries Data Records Data Records CLUSTERED Clustered = records close in index are close in data UNCLUSTERED 41

Index Classification Summary Primary/secondary Primary = determines the location of indexed records Secondary = cannot reorder data, does not determine data location Dense/sparse Dense = every key in the data appears in the index Sparse = the index contains only some keys Clustered/unclustered Clustered = records close in index are close in data Unclustered = records close in index may be far in data B+ tree / Hash table / 42

Large Indexes What if index does not fit in memory? Why not index the index itself? Hash-based index Tree-based index 43

Hash-Based Index Good for point queries but not range queries age h2(age) = 00 H2 18 18 20 22 10 21 20 20 30 18 h1(sid) = 00 h2(age) = 01 19 21 21 19 40 19 50 22 60 18 70 21 H1 h1(sid) = 11 sid Another example of secondary index 80 19 Another example of primary index and indexed-file organization 44

Tree-Based Index How many index levels do we need? Can we create them automatically? Yes! Can do something even more powerful! 45

B+ Trees Search trees Idea in B Trees Make 1 node = 1 page (= 1 block) Keep tree balanced in height Idea in B+ Trees Make leaves into a linked list : facilitates range queries 46

B+ Trees Data entries (Index File) (Data file) Data entries CLUSTERED Data Records Data Records UNCLUSTERED Note: can also store data records directly as data entries 47

B+ Trees Basics Parameter d = the degree Each node has d <= m <= 2d keys (except root) 30 120 240 Each node also has m+1 pointers Keys k < 30 Keys 30<=k<120 Keys 120<=k<240 Keys 240<=k Each leaf has d <= m <= 2d keys: 40 50 60 Next leaf Data records 40 50 60 48

B+ Tree Example Degree d = 2 Find the key 40 40 80 80 20 60 100 120 140 20 < 40 60 10 15 18 20 30 40 50 60 65 80 85 90 30 < 40 40 10 15 18 20 30 40 50 60 65 80 85 90 49

Searching a B+ Tree Exact key values: Start at the root Proceed down, to the leaf Range queries: Find lowest bound as above Then sequential traversal Index on Student(age) Select name From Student Where age = 25 Select name From Student Where 20 <= age and age <= 30 50

B+ Tree Design How large should d be? Example: Key size = 4 bytes Pointer size = 8 bytes Block size = 4096 bytes 2d x 4 + (2d+1) x 8 <= 4096 d = 170 51

B+ Trees in Practice Typical order: 100. Typical fill-factor: 67%. average fanout = 133 Typical capacities Height 4: 133 4 = 312,900,700 records Height 3: 133 3 = 2,352,637 records Can often hold top levels in buffer pool Level 1 = 1 page = 8 Kbytes Level 2 = 133 pages = 1 Mbyte Level 3 = 17,689 pages = 133 Mbytes 52

Insertion in a B+ Tree Insert (K, P) Find leaf where K belongs, insert If no overflow (2d keys or less), halt If overflow (2d+1 keys), split node, insert in parent: parent parent K3 K1 K2 K3 K4 K5 K1 K2 K4 K5 P0 P1 P2 P3 P4 p5 P0 P1 P2 P3 P4 p5 If leaf, also keep K3 in right node When root splits, new root has 1 key only 53

Insertion in a B+ Tree Insert K=19 80 20 60 100 120 140 10 15 18 20 30 40 50 60 65 80 85 90 10 15 18 20 30 40 50 60 65 80 85 90 54

Insertion in a B+ Tree After insertion 80 20 60 100 120 140 10 15 18 19 20 30 40 50 60 65 80 85 90 10 15 18 19 20 30 40 50 60 65 80 85 90 55

Insertion in a B+ Tree Now insert 25 80 20 60 100 120 140 10 15 18 19 20 30 40 50 60 65 80 85 90 10 15 18 19 20 30 40 50 60 65 80 85 90 56

Insertion in a B+ Tree After insertion 80 20 60 100 120 140 10 15 18 19 20 25 30 40 50 60 65 80 85 90 10 15 18 19 20 25 30 40 50 60 65 80 85 90 57

Insertion in a B+ Tree But now have to split! 80 20 60 100 120 140 10 15 18 19 20 25 30 40 50 60 65 80 85 90 10 15 18 19 20 25 30 40 50 60 65 80 85 90 58

Insertion in a B+ Tree After the split 80 20 30 60 100 120 140 10 15 18 19 20 25 30 40 50 60 65 80 85 90 10 15 18 19 20 25 30 40 50 60 65 80 85 90 59

Deletion from a B+ Tree Delete 30 80 20 30 60 100 120 140 10 15 18 19 20 25 30 40 50 60 65 80 85 90 10 15 18 19 20 25 30 40 50 60 65 80 85 90 60

Deletion from a B+ Tree After deleting 30 May change to 40, or not 80 20 30 60 100 120 140 10 15 18 19 20 25 40 50 60 65 80 85 90 10 15 18 19 20 25 40 50 60 65 80 85 90 61

Deletion from a B+ Tree Now delete 25 80 20 30 60 100 120 140 10 15 18 19 20 25 40 50 60 65 80 85 90 10 15 18 19 20 25 40 50 60 65 80 85 90 62

Deletion from a B+ Tree After deleting 25 Need to rebalance Rotate 80 20 30 60 100 120 140 10 15 18 19 20 40 50 60 65 80 85 90 10 15 18 19 20 40 50 60 65 80 85 90 63

Deletion from a B+ Tree Now delete 40 80 19 30 60 100 120 140 10 15 18 19 20 40 50 60 65 80 85 90 10 15 18 19 20 40 50 60 65 80 85 90 64

Deletion from a B+ Tree After deleting 40 Rotation not possible Need to merge nodes 80 19 30 60 100 120 140 10 15 18 19 20 50 60 65 80 85 90 10 15 18 19 20 50 60 65 80 85 90 65

Deletion from a B+ Tree Final tree 80 19 60 100 120 140 10 15 18 19 20 50 60 65 80 85 90 10 15 18 19 20 50 60 65 80 85 90 66

Summary on B+ Trees Default index structure on most DBMSs Very effective at answering point queries: productname = gizmo Effective for range queries: 50 < price AND price < 100 Less effective for multirange: 50 < price < 100 AND 2 < quant < 20 67

Indexes in Postgres CREATE TABLE V(M int, N varchar(20), P int); CREATE INDEX V1_N ON V(N) CREATE INDEX V2 ON V(P, M) CREATE INDEX VVV ON V(M, N) CLUSTER V USING V2 Makes V2 clustered 68

Index Selection Problem 1 V(M, N, P) Your workload is this 100000 queries: 100 queries: SELECT * FROM V WHERE N=? SELECT * FROM V WHERE P=? Which indexes should we create? 69

Index Selection Problem 1 V(M, N, P) Your workload is this 100000 queries: 100 queries: SELECT * FROM V WHERE N=? SELECT * FROM V WHERE P=? A: V(N) and V(P) (hash tables or B-trees) 70

Index Selection Problem 2 V(M, N, P) Your workload is this 100000 queries: 100 queries: SELECT * FROM V WHERE N>? and N<? SELECT * FROM V WHERE P=? 100000 queries: INSERT INTO V VALUES (?,?,?) Which indexes CSE should 544 - Fall 2015 we create? 71

Index Selection Problem 2 V(M, N, P) Your workload is this 100000 queries: 100 queries: SELECT * FROM V WHERE N>? and N<? SELECT * FROM V WHERE P=? 100000 queries: INSERT INTO V VALUES (?,?,?) A: definitely V(N) (must B-tree); unsure about V(P) 72

Index Selection Problem 3 V(M, N, P) Your workload is this 100000 queries: 1000000 queries: SELECT * FROM V WHERE N=? SELECT * FROM V WHERE N=? and P>? 100000 queries: INSERT INTO V VALUES (?,?,?) Which indexes CSE should 544 - Fall 2015 we create? 73

Index Selection Problem 3 V(M, N, P) Your workload is this 100000 queries: 1000000 queries: SELECT * FROM V WHERE N=? SELECT * FROM V WHERE N=? and P>? 100000 queries: INSERT INTO V VALUES (?,?,?) A: V(N, P) (must be B-tree) 74

Index Selection Problem 4 V(M, N, P) Your workload is this 1000 queries: 100000 queries: SELECT * FROM V WHERE N>? and N<? SELECT * FROM V WHERE P>? and P<? Which indexes CSE should 544 - Fall 2015 we create? 75

Index Selection Problem 4 V(M, N, P) Your workload is this 1000 queries: 100000 queries: SELECT * FROM V WHERE N>? and N<? SELECT * FROM V WHERE P>? and P<? A: V(N) secondary, CSE 544 - Fall V(P) 2015 primary index 76