Chapter 13: Indexing. Chapter 13. ? value. Topics. Indexing & Hashing. value. Conventional indexes B-trees Hashing schemes (self-study) record

Similar documents
CS 245: Database System Principles

CS 525: Advanced Database Organization 04: Indexing

CSE 562 Database Systems

Access Methods. Basic Concepts. Index Evaluation Metrics. search key pointer. record. value. Value

CS232A: Database System Principles INDEXING. Indexing. Indexing. Given condition on attribute find qualified records Attr = value

Topics to Learn. Important concepts. Tree-based index. Hash-based index

CS143: Index. Book Chapters: (4 th ) , (5 th ) , , 12.10

Material You Need to Know

amiri advanced databases '05

Some Practice Problems on Hardware, File Organization and Indexing

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

Physical Level of Databases: B+-Trees

Find the block in which the tuple should be! If there is free space, insert it! Otherwise, must create overflow pages!

Data Organization B trees

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25

Intro to DB CHAPTER 12 INDEXING & HASHING

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

CSIT5300: Advanced Database Systems

Hashed-Based Indexing

Database Technology. Topic 7: Data Structures for Databases. Olaf Hartig.

Indexing. Announcements. Basics. CPS 116 Introduction to Database Systems

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing

COMP 430 Intro. to Database Systems. Indexing

Chapter 11: Indexing and Hashing

Indexing and Hashing

Chapter 11: Indexing and Hashing" Chapter 11: Indexing and Hashing"

Announcements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6)

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing

CSE 544 Principles of Database Management Systems. Magdalena Balazinska Winter 2009 Lecture 6 - Storage and Indexing

Storage hierarchy. Textbook: chapters 11, 12, and 13

Indexing: Overview & Hashing. CS 377: Database Systems

Kathleen Durant PhD Northeastern University CS Indexes

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11

7RSLFV ,QGH[HV 7HUPVÃDQGÃ'LVWLQFWLRQV. Conventional Indexes B-Tree Indexes Hashing Indexes

CARNEGIE MELLON UNIVERSITY DEPT. OF COMPUTER SCIENCE DATABASE APPLICATIONS

Chapter 12: Indexing and Hashing

CMSC 424 Database design Lecture 13 Storage: Files. Mihai Pop

Extra: B+ Trees. Motivations. Differences between BST and B+ 10/27/2017. CS1: Java Programming Colorado State University

Chapter 12: Indexing and Hashing (Cnt(

Remember. 376a. Database Design. Also. B + tree reminders. Algorithms for B + trees. Remember

Chapter 11: Indexing and Hashing

Indexing Methods. Lecture 9. Storage Requirements of Databases

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana

File Structures and Indexing

Database System Concepts, 5th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Database index structures

Chapter 18 Indexing Structures for Files. Indexes as Access Paths

Database Management and Tuning

Outline. Database Management and Tuning. What is an Index? Key of an Index. Index Tuning. Johann Gamper. Unit 4

Database System Concepts

Advances in Data Management Principles of Database Systems - 2 A.Poulovassilis

Selection Queries. to answer a selection query (ssn=10) needs to traverse a full path.

Tree-Structured Indexes

Overview of Storage and Indexing

ACCESS METHODS: FILE ORGANIZATIONS, B+TREE

Advanced Database Systems

Background: disk access vs. main memory access (1/2)

Chapter 13: Query Processing

Physical Disk Structure. Physical Data Organization and Indexing. Pages and Blocks. Access Path. I/O Time to Access a Page. Disks.

Database Systems. File Organization-2. A.R. Hurson 323 CS Building

Physical Database Design: Outline

THE B+ TREE INDEX. CS 564- Spring ACKs: Jignesh Patel, AnHai Doan

Review. Support for data retrieval at the physical level:

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

Chapter 13: Query Processing Basic Steps in Query Processing

Tree-Structured Indexes

Database Applications (15-415)

(i) It is efficient technique for small and medium sized data file. (ii) Searching is comparatively fast and efficient.

CS 245 Midterm Exam Winter 2014

Copyright 2007 Ramez Elmasri and Shamkant B. Navathe. Slide 14-1

Chapter 17. Disk Storage, Basic File Structures, and Hashing. Records. Blocking

B-Tree. CS127 TAs. ** the best data structure ever

Chapter 17 Indexing Structures for Files and Physical Database Design

Problem. Indexing with B-trees. Indexing. Primary Key Indexing. B-trees: Example. B-trees. primary key indexing

More B-trees, Hash Tables, etc. CS157B Chris Pollett Feb 21, 2005.

System Structure Revisited

QUIZ: Buffer replacement policies

Review: Memory, Disks, & Files. File Organizations and Indexing. Today: File Storage. Alternative File Organizations. Cost Model for Analysis

Lecture 13. Lecture 13: B+ Tree

Chapter 12: Query Processing

CS 525: Advanced Database Organization 03: Disk Organization

Introduction to Indexing R-trees. Hong Kong University of Science and Technology

Chapter 5: Physical Database Design. Designing Physical Files

Chapter 12: Query Processing. Chapter 12: Query Processing

Goals for Today. CS 133: Databases. Example: Indexes. I/O Operation Cost. Reason about tradeoffs between clustered vs. unclustered tree indexes

Hash-Based Indexing 1

Index Tuning. Index. An index is a data structure that supports efficient access to data. Matching records. Condition on attribute value

M-ary Search Tree. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. B-Trees (4.7 in Weiss)

Data and File Structures Chapter 11. Hashing

File Organization and Storage Structures

Chapter 18. Indexing Structures for Files. Chapter Outline. Indexes as Access Paths. Primary Indexes Clustering Indexes Secondary Indexes

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

CS-245 Database System Principles

CSE 444: Database Internals. Lectures 5-6 Indexing

Last Class Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications

Tree-Structured Indexes

Chapter 1 Disk Storage, Basic File Structures, and Hashing.

Hashing file organization

Transcription:

Chapter 13: Indexing (Slides by Hector Garcia-Molina, http://wwwdb.stanford.edu/~hector/cs245/notes.htm) Chapter 13 1 Chapter 13 Indexing & Hashing value record? value Chapter 13 2 Topics Conventional indexes B-trees Hashing schemes (self-study) Chapter 13 3 1

Sequential File 0 Chapter 13 4 Dense Index 0 1 1 Sequential File 0 Chapter 13 5 Sparse Index 1 1 1 1 1 2 2 Sequential File 0 Chapter 13 6 2

Sparse 2nd level 1 2 3 4 4 5 1 1 1 1 1 2 2 Sequential File 0 Chapter 13 7 Question: Can we build a dense, 2nd level index for a dense index? Chapter 13 8 Notes on pointers: (1) Block pointer (sparse index) can be smaller than record pointer BP RP Chapter 13 9 3

Notes on pointers: (2) If file is contiguous, then we can omit pointers (i.e., compute them) Chapter 13 R1 K1 K2 K3 K4 R2 R3 R4 say: 24 B per block if we want K3 block: get it at offset (3-1)24 = 48 bytes Chapter 13 11 Sparse vs. Dense Tradeoff Sparse: Less index space per record can keep more of index in memory Dense: Can tell if any record exists without accessing file (Also: sparse better for insertions dense needed for secondary indexes) Chapter 13 12 4

Terms Index sequential file Search key ( primary key) Primary index (on Sequencing field) Secondary index Dense index (all Search Key values in) Sparse index Multi-level index Chapter 13 13 Next: Duplicate keys Deletion/Insertion Secondary indexes Chapter 13 14 Duplicate keys 45 Chapter 13 15 5

Duplicate keys Dense index, one way to implement? 45 Chapter 13 16 Duplicate keys Dense index, better way? 45 Chapter 13 17 Duplicate keys Sparse index, one way? careful if looking for or! 45 Chapter 13 18 6

Duplicate keys Sparse index, another way? place first new key from block should this be? 45 Chapter 13 19 Summary Duplicate values, primary index Index may point to first instance of each value only File Index a a a.. b Chapter 13 Deletion from sparse index 1 1 1 Chapter 13 21 7

Deletion from sparse index delete record 1 1 1 Chapter 13 22 Deletion from sparse index delete record 1 1 1 Chapter 13 23 Deletion from sparse index delete records & 1 1 1 Chapter 13 24 8

Deletion from dense index Chapter 13 25 Deletion from dense index delete record Chapter 13 26 Insertion, sparse index case Chapter 13 27 9

Insertion, sparse index case insert record 34 our lucky day! we have free space where we need it! 34 Chapter 13 28 Insertion, sparse index case insert record 15 Illustrated: Immediate reorganization Variation: insert new block (chained file) update index Chapter 13 29 15 Insertion, sparse index case insert record 25 25 overflow blocks (reorganize later...) Chapter 13

Insertion, dense index case Similar Often more expensive... Chapter 13 31 Secondary indexes Sequence field 0 Chapter 13 32 Secondary indexes Sparse index 0... does not make sense! 0 Sequence field Chapter 13 33 11

Secondary indexes Dense index Sequence field... sparse high level... 0 Chapter 13 34 With secondary indexes: Lowest level is dense Other levels are sparse Also: Pointers are record pointers (not block pointers; not computed) Chapter 13 35 Summary so far Conventional index Basic Ideas: sparse, dense, multi-level Duplicate Keys Deletion/Insertion Secondary indexes Chapter 13 36 12

Conventional indexes Advantage: - Simple - Index is sequential file good for scans Disadvantage: - Inserts expensive, and/or - Lose sequentiality & balance Chapter 13 37 Outline: Conventional indexes B-Trees NEXT Hashing schemes (self-study) Chapter 13 38 NEXT: Another type of index Give up on sequentiality of index Try to get balance Chapter 13 39 13

B+Tree Example n=3 Root 3 5 11 35 0 1 1 1 1 1 156 179 1 0 1 1 1 0 Chapter 13 Sample non-leaf 57 81 95 to keys to keys to keys to keys < 57 57 k<81 81 k<95 95 Chapter 13 41 Sample leaf node: From non-leaf node 57 to next leaf in sequence To record with key 57 To record with key 81 To record with key 85 81 95 Chapter 13 42 14

Size of nodes: n+1 pointers (fixed) n keys Chapter 13 43 Don t want nodes to be too empty Use at least Non-leaf: Leaf: (n+1)/2 pointers (n+1)/2 pointers to data Chapter 13 44 n=3 Full node min. node Non-leaf 1 1 1 Leaf 3 5 11 35 counts even if null Chapter 13 45 15

B+tree rules: tree of order n (1) All leaves are at the same lowest level (balanced tree) (2) Pointers in leaves point to records, except for sequence pointer Chapter 13 46 (3) Number of pointers/keys for B+tree Max Max Min ptrs keys ptrs data Min keys Non-leaf (non-root) n+1 n (n+1)/2 (n+1)/2-1 Leaf (non-root) n+1 n (n+1)/2 (n+1)/2 Root n+1 n 1 1 Chapter 13 47 Insert into B+tree (a) simple case space available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root Chapter 13 48 16

(a) Insert key = 32 n=3 3 5 11 31 32 0 Chapter 13 49 (a) Insert key = 7 n=3 3 5 3 5 11 7 31 7 0 Chapter 13 (c) Insert key = 1 n=3 1 156 179 1 179 1 0 1 1 1 1 0 1 Chapter 13 51 17

(d) New root, insert 45 n=3 new root 1 2 3 12 25 32 45 Chapter 13 52 Deletion from B+tree (a) Simple case - no example (b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf Chapter 13 53 (b) Coalesce with sibling Delete n=4 0 Chapter 13 54 18

(c) Redistribute keys Delete n=4 35 35 0 35 Chapter 13 55 (d) Non-leaf coalese Delete 37 n=4 new root 1 3 14 22 25 26 37 45 25 25 Chapter 13 56 B+tree deletions in practice Often, coalescing is not implemented Too hard and not worth it! Chapter 13 57 19

Variation on B+tree: B-tree (no +) Idea: Avoid duplicate keys Have record pointers in non-leaf nodes Chapter 13 58 K1 P1 K2 P2 K3 P3 to record to record to record with K1 with K2 with K3 to keys to keys to keys to keys < K1 K1<x<K2 K2<x<k3 >k3 Chapter 13 59 B-tree example n=2 sequence pointers not useful now! (but keep space for simplicity) 65 125 0 1 1 1 1 1 1 1 1 25 45 85 5 145 165 Chapter 13

Tradeoffs: B-trees have faster lookup than B+trees in B-tree, non-leaf & leaf different sizes in B-tree, deletion more complicated B+trees preferred! Chapter 13 61 But note: If blocks are fixed size (due to disk and buffering restrictions) Then lookup for B+tree is actually better!! Chapter 13 62 So... 8 records B+ B ooooooooooooo ooooooooo 156 records 8 records Total = 116 Conclusion: For fixed block size, B+ tree is better because it is bushier Chapter 13 63 21

Outline/summary Conventional Indexes Sparse vs. dense Primary vs. secondary B trees B+trees vs. B-trees B+trees vs. indexed sequential Hashing schemes (self-study) Chapter 13 64 22