CARNEGIE MELLON UNIVERSITY DEPT. OF COMPUTER SCIENCE DATABASE APPLICATIONS

Similar documents
Data Organization B trees

Material You Need to Know

Problem. Indexing with B-trees. Indexing. Primary Key Indexing. B-trees: Example. B-trees. primary key indexing

CSIT5300: Advanced Database Systems

Topics to Learn. Important concepts. Tree-based index. Hash-based index

CS143: Index. Book Chapters: (4 th ) , (5 th ) , , 12.10

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana

B-Tree. CS127 TAs. ** the best data structure ever

Chapter 11: Indexing and Hashing

Chapter 12: Indexing and Hashing. Basic Concepts

Intro to DB CHAPTER 12 INDEXING & HASHING

Chapter 12: Indexing and Hashing

Chapter 11: Indexing and Hashing" Chapter 11: Indexing and Hashing"

Lecture 8 Index (B+-Tree and Hash)

Hash-Based Indexes. Chapter 11

Access Methods. Basic Concepts. Index Evaluation Metrics. search key pointer. record. value. Value

Carnegie Mellon Univ. Dept. of Computer Science /615 DB Applications. Outline. (Static) Hashing. Faloutsos - Pavlo CMU SCS /615

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Hash-Based Indexing 1

Hashed-Based Indexing

Spring 2017 B-TREES (LOOSELY BASED ON THE COW BOOK: CH. 10) 1/29/17 CS 564: Database Management Systems, Jignesh M. Patel 1

Indexing: Overview & Hashing. CS 377: Database Systems

Data and File Structures Chapter 11. Hashing

Tree-Structured Indexes

Physical Disk Structure. Physical Data Organization and Indexing. Pages and Blocks. Access Path. I/O Time to Access a Page. Disks.

Chapter 12: Indexing and Hashing (Cnt(

Remember. 376a. Database Design. Also. B + tree reminders. Algorithms for B + trees. Remember

Chapter 11: Indexing and Hashing

Database Applications (15-415)

Chapter 11: Indexing and Hashing

Hash-Based Indexes. Chapter 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Module 3: Hashing Lecture 9: Static and Dynamic Hashing. The Lecture Contains: Static hashing. Hashing. Dynamic hashing. Extendible hashing.

Kathleen Durant PhD Northeastern University CS Indexes

CSC 261/461 Database Systems Lecture 17. Fall 2017

Selection Queries. to answer a selection query (ssn=10) needs to traverse a full path.

Indexing and Hashing

Chapter 12: Indexing and Hashing

Introduction to Indexing R-trees. Hong Kong University of Science and Technology

Announcements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6)

Tree-Structured Indexes. Chapter 10

Find the block in which the tuple should be! If there is free space, insert it! Otherwise, must create overflow pages!

Physical Level of Databases: B+-Trees

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25

COMP 430 Intro. to Database Systems. Indexing

Overview of Storage and Indexing

Extra: B+ Trees. Motivations. Differences between BST and B+ 10/27/2017. CS1: Java Programming Colorado State University

Introduction to Data Management. Lecture 15 (More About Indexing)

Chapter 17. Disk Storage, Basic File Structures, and Hashing. Records. Blocking

Chapter 6. Hash-Based Indexing. Efficient Support for Equality Search. Architecture and Implementation of Database Systems Summer 2014

CS 525: Advanced Database Organization 04: Indexing

More B-trees, Hash Tables, etc. CS157B Chris Pollett Feb 21, 2005.

Some Practice Problems on Hardware, File Organization and Indexing

(2,4) Trees Goodrich, Tamassia. (2,4) Trees 1

Introduction to Data Management. Lecture 21 (Indexing, cont.)

Indexing Methods. Lecture 9. Storage Requirements of Databases

Tree-Structured Indexes

Chapter 11: Indexing and Hashing

M-ary Search Tree. B-Trees. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. Maximum branching factor of M Complete tree has height =

Indexing. Announcements. Basics. CPS 116 Introduction to Database Systems

Fundamentals of Database Systems Prof. Arnab Bhattacharya Department of Computer Science and Engineering Indian Institute of Technology, Kanpur

I think that I shall never see A billboard lovely as a tree. Perhaps unless the billboards fall I ll never see a tree at all.


Database System Concepts, 5th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Hashing Techniques. Material based on slides by George Bebis

File Organization and Storage Structures

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

System Structure Revisited

ΗΥ360 Αρχεία και Βάσεις εδοµένων

THE B+ TREE INDEX. CS 564- Spring ACKs: Jignesh Patel, AnHai Doan

Hashing IV and Course Overview

Tree-Structured Indexes

M-ary Search Tree. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. B-Trees (4.7 in Weiss)

Storage hierarchy. Textbook: chapters 11, 12, and 13

Principles of Data Management. Lecture #5 (Tree-Based Index Structures)

B-Trees. Disk Storage. What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree. Deletion in a B-tree

Tree-Structured Indexes

Tree-Structured Indexes ISAM. Range Searches. Comments on ISAM. Example ISAM Tree. Introduction. As for any index, 3 alternatives for data entries k*:

Chapter 18 Indexing Structures for Files. Indexes as Access Paths

Introduction. Choice orthogonal to indexing technique used to locate entries K.

Chapter 13: Indexing. Chapter 13. ? value. Topics. Indexing & Hashing. value. Conventional indexes B-trees Hashing schemes (self-study) record

Hash-Based Indexes. Chapter 11 Ramakrishnan & Gehrke (Sections ) CPSC 404, Laks V.S. Lakshmanan 1

Tree-Structured Indexes

UNIT III BALANCED SEARCH TREES AND INDEXING

INDEXES MICHAEL LIUT DEPARTMENT OF COMPUTING AND SOFTWARE MCMASTER UNIVERSITY

Background: disk access vs. main memory access (1/2)

Tree-Structured Indexes

Advances in Data Management Principles of Database Systems - 2 A.Poulovassilis

Indexing. Chapter 8, 10, 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

amiri advanced databases '05

Physical Database Design: Outline

Tree-Structured Indexes. A Note of Caution. Range Searches ISAM. Example ISAM Tree. Introduction

5. Hashing. 5.1 General Idea. 5.2 Hash Function. 5.3 Separate Chaining. 5.4 Open Addressing. 5.5 Rehashing. 5.6 Extendible Hashing. 5.

CPS352 Lecture - Indexing

CSE 530A. B+ Trees. Washington University Fall 2013

Lecture 3: B-Trees. October Lecture 3: B-Trees

ACCESS METHODS: FILE ORGANIZATIONS, B+TREE

Database index structures

Database Management and Tuning

Outline. Database Management and Tuning. What is an Index? Key of an Index. Index Tuning. Johann Gamper. Unit 4

Transcription:

CARNEGIE MELLON UNIVERSITY DEPT. OF COMPUTER SCIENCE 15-415 DATABASE APPLICATIONS C. Faloutsos Indexing and Hashing 15-415 Database Applications http://www.cs.cmu.edu/~christos/courses/dbms.s00/

general overview - rel. model relational model - SQL - formal & commercial query languages functional dependencies normalization physical design indexing 15-415 Database Applications 2/37 C. Faloutsos

overview - detailed ordered indices - primary / secondary indices - index-sequential - multilevel (ISAM) B - trees, B+ - trees hashing - static hashing - dynamic hashing 15-415 Database Applications 3/37 C. Faloutsos

motivation once the records are stored in a file, how do you search efficiently? brute force: retrieve all records, report the qualifying ones better: use indices (pointers) to locate the records directly 15-415 Database Applications 4/37 C. Faloutsos

we need additional structures indexing structure what structures? how many indices / pointers? 123 smith main st. 234 jones forbes ave 300 stevens main st. 15-415 Database Applications 5/37 C. Faloutsos

what is a good indexing technique?..depends on the database & the queries we want to answer range queries? retrieval time? insertion / deletion? space overhead? reorganization? 15-415 Database Applications 6/37 C. Faloutsos

ordered indices search keys are sorted in the index file and point to the actual records primary vs. secondary indices 123 234 300 forbes ave main st. 123 smith main st. 234 jones forbes ave 300 stevens main st. 15-415 Database Applications 7/37 C. Faloutsos

index-sequential files (primary indices) records are organized sequentially within the file (linked-list), according to a chosen key index file on the same key forbes ave main st. forbes ave jones 234 main st. smith 123 main st. stevens 300 15-415 Database Applications 8/37 C. Faloutsos

dense vs. sparse index 123 150 234 236 300 123 smith main st. 150 gates walnut st. 234 jones forbes ave 236 holmes walnut st. 300 stevens main st. 15-415 Database Applications 9/37 C. Faloutsos

dense vs. sparse index 123 234 300 123 smith main st. 150 gates walnut st. 234 jones forbes ave 236 holmes walnut st. 300 stevens main st. 15-415 Database Applications 10/37 C. Faloutsos

multilevel indices (ISAM) if index is too large to fit in main memory, store it on disk and keep index on the index (in memory) memory 123 234 123 150.. 234 236 disk..smith....holmes.. index file record file 15-415 Database Applications 11/37 C. Faloutsos

multilevel indices (ISAM) usually two levels of indices, one firstlevel entry per disk block (why?) typically, blocks 80% full initially (why? what are potential problems / inefficiencies?) 15-415 Database Applications 12/37 C. Faloutsos

secondary indices the record file is already sorted on some other attribute sec. index buckets forbes ave main st. walnut st. 123 smith main st. 150 gates walnut st. 234 jones forbes ave 236 holmes walnut st. 300 stevens main st. 15-415 Database Applications 13/37 C. Faloutsos

secondary indices only dense. clustering index how to organize the sec. index? performance? search is very good, insertions / deletions are expensive 15-415 Database Applications 14/37 C. Faloutsos

summary of ordered indices primary index sec. index dense sparse..ordered indices suffer in the presence of frequent updates alternative indexing structure: B - trees 15-415 Database Applications 15/37 C. Faloutsos

overview - detailed ordered indices - primary / secondary indices - index-sequential - multilevel (ISAM) B - trees, B+ - trees hashing - static hashing - dynamic hashing 15-415 Database Applications 16/37 C. Faloutsos

B - trees the most successful family of index schemes balanced n-way search trees a b - tree node: k 1 k 2... k n-1 15-415 Database Applications 17/37 C. Faloutsos

B - trees, definition each node, in a B-tree of order n: - at most n pointers - at least n/2 pointers (except root) - all leaves at the same level - if number of pointers is k, then node has exactly k-1 keys - (leaves are empty) 15-415 Database Applications 18/37 C. Faloutsos

B - trees, properties block aware nodes O(log (N)) for everything! typically, if m = 50-100, then 2-3 levels utilization >= 50%, guaranteed. on average 69% 15-415 Database Applications 19/37 C. Faloutsos

B - trees, operations insertion - split: preserves B - tree property. notice how it grows: level increases when root overflows deletion - may need to merge 15-415 Database Applications 20/37 C. Faloutsos

insertion INSERTION OF KEY K find the correct leaf node L ; if ( L overflows ){ split L, by pushing the middle key upstairs to parent node P ; if ( P overflows){ repeat the split recursively; } else{ add the key K in node L ; /* maintaining the key order in L */ } 15-415 Database Applications 21/37 C. Faloutsos

deletion (ouch!) DELETION OF KEY K locate key K, in node N if( N is a non-leaf node) { delete K from N ; find the immediately largest key K1 ; /* which is guaranteed to be on a leaf node L */ copy K1 in the old position of K ; invoke this DELETION routine on K1 from the leaf node L ; else { /* N is a leaf node */... (next slide..) 15-415 Database Applications 22/37 C. Faloutsos

ouch! ouch! /* N is a leaf node */ if( N underflows ){ let N1 be the sibling of N ; if( N1 is "rich"){ /* ie., N1 can lend us a key */ borrow a key from N1 THROUGH the parent node; }else{ /* N1 is 1 key away from underflowing */ MERGE: pull the key from the parent P, and merge it with the keys of N and N1 into a new node; if( P underflows){ repeat recursively } } } 15-415 Database Applications 23/37 C. Faloutsos

B - trees come in different flavors what about range queries, proximity searches? B + - trees facilitate sequential ops leaf nodes have all the keys, replicate keys in non-leaf nodes 15-415 Database Applications 24/37 C. Faloutsos

B + - trees, insertion INSERTION OF KEY K insert search-key value to L such that the keys are in order; if ( L overflows) { split L ; insert (ie., COPY) smallest search-key value of new node to parent node P ; if ( P overflows) { repeat the B-tree split procedure recursively; /* Notice: the B-TREE split; NOT the B+ -tree */ } } /* ATTENTION: a split at the leaf level is handled by COPYING the middle key upstairs; " " " a higher level " " " PUSHING " " " ". */ 15-415 Database Applications 25/37 C. Faloutsos

still more flavors should leaves be empty? - practical B - trees how to increase the utilization of B - trees?..with B* - trees! 15-415 Database Applications 26/37 C. Faloutsos

B - trees, summary a great structure. block aware all B - trees can be used either as primary ( = sparse, clustering), or secondary (= dense, non-clustering) index 15-415 Database Applications 27/37 C. Faloutsos

overview - detailed ordered indices - primary / secondary indices - index-sequential - multilevel (ISAM) B - trees, B+ - trees hashing - static hashing - dynamic hashing 15-415 Database Applications 28/37 C. Faloutsos

hashing: the idea it would be nice to be able to map key values to record positions e.g. (123, smith) is stored in 123 block number what is the problem with this mapping? 15-415 Database Applications 29/37 C. Faloutsos

hash functions key value -> bucket (with pointer to records) k -> h(k) suppose we have M buckets. this is a hash function, based on division: h(k) = k mod M M... 15-415 Database Applications 30/37 C. Faloutsos

hash functions another hash function, using multiplication: h(k) = [k * φ mod 1] * M good hash functions: uniformity good hash functions: randomness 15-415 Database Applications 31/37 C. Faloutsos

hashing: ups and downs speed!..but at the cost of loss of key ordering - no range queries - no proximity queries - no sequential scan 15-415 Database Applications 32/37 C. Faloutsos

hashing flavors fixed or variable number of buckets? how to handle overflows? 2 main hashing categories: - static hashing - dynamic hashing 15-415 Database Applications 33/37 C. Faloutsos

static hashing number of buckets M, is fixed collision resolution? - open addressing linear probing double hashing - chaining 15-415 Database Applications 34/37 C. Faloutsos

static hashing problem: overflow? problem: underflow? (underutilization) idea: shrink / expand hash table on demand....dynamic hashing 15-415 Database Applications 35/37 C. Faloutsos

dynamic hashing many approaches, we examine extendable hashing hash each key to an infinite bit string, and use as many bits as necessary idea: directory that doubles on demand 15-415 Database Applications 36/37 C. Faloutsos

discussion comparison multiple-key access? SQL statements - create index <index-name> on <relation-name> (<attribute-list>) - create unique index <index-name> on <relation-name> (<attribute-list>) - drop index <index-name> 15-415 Database Applications 37/37 C. Faloutsos