Material You Need to Know

Similar documents
CSIT5300: Advanced Database Systems

CARNEGIE MELLON UNIVERSITY DEPT. OF COMPUTER SCIENCE DATABASE APPLICATIONS

B-Tree. CS127 TAs. ** the best data structure ever

Data Organization B trees

Problem. Indexing with B-trees. Indexing. Primary Key Indexing. B-trees: Example. B-trees. primary key indexing

Chapter 11: Indexing and Hashing

Physical Level of Databases: B+-Trees

Chapter 12: Indexing and Hashing. Basic Concepts

Selection Queries. to answer a selection query (ssn=10) needs to traverse a full path.

Chapter 12: Indexing and Hashing

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana

Intro to DB CHAPTER 12 INDEXING & HASHING

Chapter 11: Indexing and Hashing

Chapter 11: Indexing and Hashing

Chapter 12: Indexing and Hashing

Extra: B+ Trees. Motivations. Differences between BST and B+ 10/27/2017. CS1: Java Programming Colorado State University

Chapter 12: Indexing and Hashing (Cnt(

Multi-way Search Trees. (Multi-way Search Trees) Data Structures and Programming Spring / 25

CS143: Index. Book Chapters: (4 th ) , (5 th ) , , 12.10

Chapter 11: Indexing and Hashing" Chapter 11: Indexing and Hashing"

Topics to Learn. Important concepts. Tree-based index. Hash-based index

Hash-Based Indexing 1

(2,4) Trees. 2/22/2006 (2,4) Trees 1

Lecture 8 Index (B+-Tree and Hash)

(2,4) Trees Goodrich, Tamassia. (2,4) Trees 1

Find the block in which the tuple should be! If there is free space, insert it! Otherwise, must create overflow pages!

Multiway searching. In the worst case of searching a complete binary search tree, we can make log(n) page faults Everyone knows what a page fault is?

Database System Concepts, 5th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

M-ary Search Tree. B-Trees. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. Maximum branching factor of M Complete tree has height =

Hashed-Based Indexing

(2,4) Trees Goodrich, Tamassia (2,4) Trees 1

M-ary Search Tree. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. B-Trees (4.7 in Weiss)

Indexing and Hashing

Remember. 376a. Database Design. Also. B + tree reminders. Algorithms for B + trees. Remember

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25

CPS352 Lecture - Indexing

Hash-Based Indexes. Chapter 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

Chapter 13: Indexing. Chapter 13. ? value. Topics. Indexing & Hashing. value. Conventional indexes B-trees Hashing schemes (self-study) record

B-Trees & its Variants

Data Structures and Algorithms

Storage hierarchy. Textbook: chapters 11, 12, and 13

CS 525: Advanced Database Organization 04: Indexing

Goals for Today. CS 133: Databases. Example: Indexes. I/O Operation Cost. Reason about tradeoffs between clustered vs. unclustered tree indexes

Systems Infrastructure for Data Science. Web Science Group Uni Freiburg WS 2014/15

CSE 530A. B+ Trees. Washington University Fall 2013

Physical Database Design: Outline

Announcements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6)

else // m + 1 = d + }{{} 1 + d

Indexing and Hashing

amiri advanced databases '05

Tree-Structured Indexes

Laboratory Module X B TREES

Spring 2017 B-TREES (LOOSELY BASED ON THE COW BOOK: CH. 10) 1/29/17 CS 564: Database Management Systems, Jignesh M. Patel 1

Chapter 11: Indexing and Hashing

Hash-Based Indexes. Chapter 11

2-3 Tree. Outline B-TREE. catch(...){ printf( "Assignment::SolveProblem() AAAA!"); } ADD SLIDES ON DISJOINT SETS

Motivation for B-Trees

Multi-way Search Trees! M-Way Search! M-Way Search Trees Representation!

An AVL tree with N nodes is an excellent data. The Big-Oh analysis shows that most operations finish within O(log N) time

Tree-Structured Indexes

Introduction to Indexing R-trees. Hong Kong University of Science and Technology

CS 310 B-trees, Page 1. Motives. Large-scale databases are stored in disks/hard drives.

Lecture 3: B-Trees. October Lecture 3: B-Trees

Indexing. Announcements. Basics. CPS 116 Introduction to Database Systems

THE B+ TREE INDEX. CS 564- Spring ACKs: Jignesh Patel, AnHai Doan

Hash-Based Indexes. Chapter 11 Ramakrishnan & Gehrke (Sections ) CPSC 404, Laks V.S. Lakshmanan 1

Advances in Data Management Principles of Database Systems - 2 A.Poulovassilis

Indexing: B + -Tree. CS 377: Database Systems

Background: disk access vs. main memory access (1/2)

QUIZ: Buffer replacement policies

Kathleen Durant PhD Northeastern University CS Indexes

Introduction to Data Management. Lecture 15 (More About Indexing)

2-3 and Trees. COL 106 Shweta Agrawal, Amit Kumar, Dr. Ilyas Cicekli

Tree-Structured Indexes

Indexing. Chapter 8, 10, 11. Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1

B-Trees. Disk Storage. What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree. Deletion in a B-tree

Lecture 13. Lecture 13: B+ Tree

Lecture 11: Multiway and (2,4) Trees. Courtesy to Goodrich, Tamassia and Olga Veksler

CSCI Trees. Mark Redekopp David Kempe

Administrivia. Tree-Structured Indexes. Review. Today: B-Tree Indexes. A Note of Caution. Introduction

Database index structures

Tree-Structured Indexes

CS350: Data Structures B-Trees

System Structure Revisited

Homework 2 (by Ao Zeng) Solutions Due: Friday Sept 28, 11:59pm

ΗΥ360 Αρχεία και Βάσεις εδοµένων

Introduction. Choice orthogonal to indexing technique used to locate entries K.

File Structures and Indexing

Tree-Structured Indexes. Chapter 10

Balanced Search Trees

CSC 261/461 Database Systems Lecture 17. Fall 2017

Chapter 17 Indexing Structures for Files and Physical Database Design

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

CS Fall 2010 B-trees Carola Wenk

Module 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree.

Access Methods. Basic Concepts. Index Evaluation Metrics. search key pointer. record. value. Value

Another solution: B-trees

CS F-11 B-Trees 1

Database Applications (15-415)

Transcription:

Review Quiz 2

Material You Need to Know Normalization Storage and Disk File Layout Indexing B-trees and B+ Trees Extensible Hashing Linear Hashing

Decomposition Goals: Lossless Joins, Dependency preservation Redundancy Avoidance BCNF Guarantees no redundancy and lossless joins (Not DP!) Relation schema R, with FD set F, is in BCNF if: For all nontrivial X Y in F+: X R

Storage Disk Know the memory hierarchy (Do not worry about having to do the calculation in the warm up problems on the homework) What type of questions do you think might be asked for this section?

File Layout Differences between fixed record size and variable record size Understand the tradeoffs of all the different layouts talked about in the slides

Indexes Sparse Index - Pointer for every group Dense Index - pointer value for every value in the table Explain the difference between each of the following: 1. Primary versus secondary indexes. 2. Dense versus sparse indexes. 3. If you were about to create an index on a relation, what considerations would guide your choice with respect to each pair of properties listed above?

B-Trees Balanced n-way tree (every path from root to a leaf must be of the same length) Can be used for primary/secondary, clustering/non-clustering index O(log B N) for all operations (search, insert, delete), where B is the number of pointers (branching factor) and N is the number of records 50% utilization at minimum; each non-leaf must have between ceil(n/2) and n children, while leaves must have at least ceil((n-1)/2) values Some overhead on each update but far from the overhead of file reorganization

B-Trees: Insert Insert in leaf, if room If overflow: Split (create new internal node) Redistribute keys Recursively push middle key up (height increases when root overflows and splits)

B-Trees: Insertion with Overflow - Insert 3 4 7 1 2 5 6 8 9

B-Trees: Insertion with Overflow - Insert 3 2 4 7 1 3 5 6 8 9

B-Trees: Insertion with Overflow - Insert 3 2 4 7 1 3 5 6 8 9

B-Trees: Insertion with Overflow - Insert 3 4 2 7 1 3 5 6 8 9

B-Trees: Delete Delete key and promote if needed (promote largest from left subtree or smallest from right subtree) If underflow: Rich sibling: borrow key through parent Poor sibling: recursively merge, by pulling key from parent

B-Trees: Deletion with Underflow & Rich Sibling - Delete 9 4 2 7 1 3 5 6 9

B-Trees: Deletion with Underflow & Rich Sibling - Delete 9 4 2 1 3 5 6 7

B-Trees: Deletion with Underflow & Rich Sibling - Delete 9 4 2 6 1 3 5 7

B-Trees: Deletion with Underflow & Poor Sibling - Delete 7 4 2 6 1 3 5 7

B-Trees: Deletion with Underflow & Poor Sibling - Delete 7 4 2 6 1 3 5

B-Trees: Deletion with Underflow & Poor Sibling - Delete 7 4 2 1 3 5 6

B-Trees: Deletion with Underflow & Poor Sibling - Delete 7 2 4 1 3 5 6

B+Trees All keys appear at leaf level - replicate keys at non-leaf levels To lookup, must traverse down to leaf level Each leaf has a pointer to the next Facilitate sequential operations

B+Trees: Insert Insert key into leaf If overflow: Split leaf and copy smallest value of new node to parent (copy key up) If further overflow, do recursive B-tree split (push key up)

* adapted from last year s recitation B+Trees: Insertion with Overflow - Insert N O C F J S A B D E G H K L M P Q T U

* adapted from last year s recitation B+Trees: Insert N O C F J S A B D E G H K L P Q M N T U

* adapted from last year s recitation B+Trees: Insert N O C F J M S A B D E G H K L P Q M N T U

* adapted from last year s recitation B+Trees: Insert N O C F J M S A B D E G H K L P Q M N T U

* adapted from last year s recitation B+Trees: Insert N J O C F M S A B D E G H K L P Q M N T U

Hashing Best for equality selections, cannot be used for range searches Static hashing: number of bucket pages is fixed, overflow pages used when buckets are filled Dynamic hashing: extendible and linear

Extendible Hashing Allocate to buckets by least significant bits of hash value Directory grows by doubling when buckets overflow, shrinks when enough values are removed Global depth: maximum number of bits needed to tell which bucket a value hashes to Local depth: number of bits needed to tell if the value hashes to a particular bucket

Extendible Hashing: Insert If bucket overflows, increment local depth: If local depth > global depth, double directory and redistribute records If local depth <= global depth, allocate new page with local depth, redistribute records, and add page to directory

Extendible Hashing: Delete If deletion makes bucket empty, merge with its split image Can halve when each directory element points to the same bucket as its split image

Extendible Hashing Given bucket size = 4 and the following catalog, hash the following values in order: 41 38 45 40 catalog buckets 00 4 16 28 32 01 9 13 37 10 2 10 22 30 11 7 23 27

Global depth: 2 Local depth 00 2 4 16 28 32 01 9 13 37 41 2 10 2 10 22 30 2 11 7 23 27 2

Global depth: 3 Local depth 000 2 001 9 13 37 41 2 010 2 10 3 011 7 23 27 2 100 101 110 3 111 4 16 28 32 22 30 38

Global depth: 3 Local depth 000 2 001 9 41 3 010 2 10 3 011 7 23 27 2 100 101 3 110 3 111 4 16 28 32 13 37 45 22 30 38

Global depth: 3 Local depth 000 3 001 9 41 2 010 2 10 3 011 7 23 27 2 100 3 101 3 110 3 111 16 32 40 4 28 13 37 45 22 30 38

Linear Hashing Uses temporary overflow pages and splits bucket in a round-robin fashion Algorithm goes in rounds (level) next pointer determines the next bucket to be split Round is completed when all buckets that were there at the start of the round have been split Space utilization better than extendible hashing since splits are not concentrated on data-dense areas

Linear Hashing: Insertion If bucket is full: Add overflow page Split the bucket next points to If next was pointing at the last bucket originally there at the beginning of the round, reset it to zero and increment the round number; else increment next

Linear Hashing Given bucket size = 4 and the following catalog, hash the following values in order: 41 38 45 40 catalog buckets 00 4 16 28 32 01 9 13 37 10 2 10 22 30 11 7 23 27

h 0 00 next 4 16 28 32 01 10 11 9 13 37 41 2 10 22 30 7 23 27

h 1 h 0 000 00 16 32 001 01 9 13 37 41 next 010 10 011 11 100 00 2 10 22 30 7 23 27 4 28 38

h 1 h 0 000 00 001 01 16 32 9 41 010 10 2 10 22 30 38 next 011 11 100 00 101 01 7 23 27 4 28 13 37 45

h 1 h 0 000 00 001 01 16 32 40 9 41 010 10 2 10 22 30 38 next 011 11 100 00 101 01 7 23 27 4 28 13 37 45