Indexing and Hashing

Similar documents
Indexing and Hashing

2, 3, 5, 7, 11, 17, 19, 23, 29, 31

Chapter 12: Indexing and Hashing. Basic Concepts

Chapter 12: Indexing and Hashing

Chapter 11: Indexing and Hashing

Chapter 12: Indexing and Hashing

Database System Concepts, 5th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

Intro to DB CHAPTER 12 INDEXING & HASHING

CSIT5300: Advanced Database Systems

Chapter 12: Indexing and Hashing (Cnt(

Hashing file organization

Find the block in which the tuple should be! If there is free space, insert it! Otherwise, must create overflow pages!

(2,4) Trees Goodrich, Tamassia. (2,4) Trees 1

Data Structures and Algorithms

Material You Need to Know

Balanced Search Trees

Database System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use

CSE 530A. B+ Trees. Washington University Fall 2013

Background: disk access vs. main memory access (1/2)

Instructor: Amol Deshpande

Module 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree.

Chapter 11: Indexing and Hashing

Data Organization B trees

Multi-way Search Trees! M-Way Search! M-Way Search Trees Representation!

Chapter 11: Indexing and Hashing

Introduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana

Physical Level of Databases: B+-Trees

CS 310 B-trees, Page 1. Motives. Large-scale databases are stored in disks/hard drives.

Extra: B+ Trees. Motivations. Differences between BST and B+ 10/27/2017. CS1: Java Programming Colorado State University

CS F-11 B-Trees 1

Chapter 11: Indexing and Hashing

DATABASE PERFORMANCE AND INDEXES. CS121: Relational Databases Fall 2017 Lecture 11

Databases 2 Lecture IV. Alessandro Artale

Main Memory and the CPU Cache

B-Trees. Version of October 2, B-Trees Version of October 2, / 22

Chapter 11: Indexing and Hashing" Chapter 11: Indexing and Hashing"

B-Trees. Disk Storage. What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree. Deletion in a B-tree

B-Trees & its Variants

Chapter 12: Query Processing

2-3 Tree. Outline B-TREE. catch(...){ printf( "Assignment::SolveProblem() AAAA!"); } ADD SLIDES ON DISJOINT SETS

Database index structures

THE B+ TREE INDEX. CS 564- Spring ACKs: Jignesh Patel, AnHai Doan

Query Types Exact Match Partial Match Range Match

amiri advanced databases '05

Indexing: B + -Tree. CS 377: Database Systems

Indexing. Jan Chomicki University at Buffalo. Jan Chomicki () Indexing 1 / 25

Multi-way Search Trees. (Multi-way Search Trees) Data Structures and Programming Spring / 25

Multi-Attribute Indexing

Multiway searching. In the worst case of searching a complete binary search tree, we can make log(n) page faults Everyone knows what a page fault is?

M-ary Search Tree. B-Trees. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. Maximum branching factor of M Complete tree has height =

Spring 2017 B-TREES (LOOSELY BASED ON THE COW BOOK: CH. 10) 1/29/17 CS 564: Database Management Systems, Jignesh M. Patel 1

CS350: Data Structures B-Trees

Tree-Structured Indexes

CS127: B-Trees. B-Trees

Advances in Data Management Principles of Database Systems - 2 A.Poulovassilis

Storage hierarchy. Textbook: chapters 11, 12, and 13

Access Methods. Basic Concepts. Index Evaluation Metrics. search key pointer. record. value. Value

M-ary Search Tree. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. B-Trees (4.7 in Weiss)

CS143: Index. Book Chapters: (4 th ) , (5 th ) , , 12.10

Topics to Learn. Important concepts. Tree-based index. Hash-based index

CARNEGIE MELLON UNIVERSITY DEPT. OF COMPUTER SCIENCE DATABASE APPLICATIONS

CS 525: Advanced Database Organization 04: Indexing

Laboratory Module X B TREES

Overview of Storage and Indexing

Chapter 13: Query Processing

CMSC424: Database Design. Instructor: Amol Deshpande

CS 350 : Data Structures B-Trees

Indexing. Week 14, Spring Edited by M. Naci Akkøk, , Contains slides from 8-9. April 2002 by Hector Garcia-Molina, Vera Goebel

Multidimensional Indexes [14]

Tree-Structured Indexes. Chapter 10

Chapter 12: Query Processing. Chapter 12: Query Processing

An AVL tree with N nodes is an excellent data. The Big-Oh analysis shows that most operations finish within O(log N) time

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology

Algorithms. Deleting from Red-Black Trees B-Trees

Introduction to Data Management. Lecture 15 (More About Indexing)

CS301 - Data Structures Glossary By

B-Trees and External Memory

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

Chapter 13: Query Processing Basic Steps in Query Processing

B-Trees. Introduction. Definitions

B-Tree. CS127 TAs. ** the best data structure ever

Selection Queries. to answer a selection query (ssn=10) needs to traverse a full path.

CSE 562 Database Systems

else // m + 1 = d + }{{} 1 + d

Lecture 8 Index (B+-Tree and Hash)

Tree-Structured Indexes

Copyright The McGraw-Hill Companies, Inc. Permission required for reproduction or display.

B-Trees and External Memory

The B-Tree. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland

Database System Concepts

Tree-Structured Indexes

Tree-Structured Indexes

CMSC424: Database Design. Instructor: Amol Deshpande

Remember. 376a. Database Design. Also. B + tree reminders. Algorithms for B + trees. Remember

Announcements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6)

2-3 and Trees. COL 106 Shweta Agrawal, Amit Kumar, Dr. Ilyas Cicekli

Query Processing & Optimization

CS122 Lecture 11 Winter Term,

Introduction. for large input, even access time may be prohibitive we need data structures that exhibit times closer to O(log N) binary search tree

Chapter 12: Query Processing

Transcription:

C H A P T E R 1 Indexing and Hashing Solutions to Practice Exercises 1.1 Reasons for not keeping several search indices include: a. Every index requires additional CPU time and disk I/O overhead during inserts and deletions. b. Indices on non-primary keys might have to be changed on updates, although an index on the primary key might not (this is because updates typically do not modify the primary key attributes). c. Each extra index requires additional storage space. d. For queries which involve conditions on several search keys, efficiency might not be bad even if only some of the keys have indices on them. Therefore database performance is improved less by adding indices when many indices already exist. 1. In general, it is not possible to have two primary indices on the same relation for different keys because the tuples in a relation would have to be stored in different order to have same values stored together. We could accomplish this by storing the relation twice and duplicating all values, but for a centralized system, this is not efficient. 53

54 Chapter 1 Indexing and Hashing 1.3 The following were generated by inserting values into the B + -tree in ascending order. A node (other than the root) was never allowed to have fewer than n/ values/pointers. a. 5 9 3 5 7 17 3 9 31 b. 7 3 5 7 17 3 9 31 c. 3 5 7 17 3 9 31 1.4 With structure 1.3.a: Insert 9: 5 9 9 3 5 7 17 3 9 31 Insert 10: 5 9 9 3 5 7 10 17 3 9 31

Exercises 55 Insert 8: 5 9 9 3 5 7 10 17 3 9 31 Delete 3: 5 9 3 5 7 8 9 10 17 9 31 Delete : 5 9 9 3 5 7 8 9 10 17 9 31 With structure 1.3.b: Insert 9: 7 3 5 7 9 17 3 9 31 Insert 10: 7 3 5 7 9 10 17 3 9 31

56 Chapter 1 Indexing and Hashing Insert 8: 7 10 3 5 7 8 9 10 17 9 3 9 31 Delete 3: 7 10 3 5 7 8 9 10 17 9 31 Delete : 7 10 3 5 With structure 1.3.c: Insert 9: 7 8 9 10 17 9 31 3 5 7 9 17 3 9 31 Insert 10: 3 5 7 9 10 17 3 9 31 Insert 8: 3 5 7 8 9 10 17 3 9 31 Delete 3: 3 5 7 8 9 10 17 9 31

Exercises 57 Delete : 3 5 7 8 9 10 17 9 31 1.5 If there are K search-key values and m 1 siblings are involved in the redistribution, the expected height of the tree is: log (m 1)n/m (K) 1.6 The algorithm for insertion into a B-tree is: Locate the leaf node into which the new key-pointer pair should be inserted. If there is space remaining in that leaf node, perform the insertion at the correct location, and the task is over. Otherwise insert the key-pointer pair conceptually into the correct location in the leaf node, and then split it along the middle. The middle key-pointer pair does not go into either of the resultant nodes of the split operation. Instead it is inserted into the parent node, along with the tree pointer to the new child. If there is no space in the parent, a similar procedure is repeated. The deletion algorithm is: Locate the key value to be deleted, in the B-tree. a. If it is found in a leaf node, delete the key-pointer pair, and the record from the file. If the leaf node contains less than n/ 1 entriesasaresult of this deletion, it is either merged with its siblings, or some entries are redistributed to it. Merging would imply a deletion, whereas redistribution would imply change(s) in the parent node s entries. The deletions may ripple upto the root of the B-tree. b. If the key value is found in an internal node of the B-tree, replace it and its record pointer by the smallest key value in the subtree immediately to its right and the corresponding record pointer. Delete the actual record in the database file. Then delete that smallest key value-pointer pair from the subtree. This deletion may cause further rippling deletions till the root of the B-tree. Below are the B-trees we will get after insertion of the given key values. We assume that leaf and non-leaf nodes hold the same number of search key values.

58 Chapter 1 Indexing and Hashing a. 5 17 9 3 7 3 31 b. 7 3 3 5 17 9 31 c. 3 5 7 17 3 9 31

Exercises 59 1.7 Extendable hash structure 17 000 001 010 0 100 101 0 1 3 3 3 3 5 9 7 3 31 1.8 a. Delete : From the answer to Exercise 1.7, change the third bucket to: 3 3 At this stage, it is possible to coalesce the second and third buckets. Then it is enough if the bucket address table has just four entries instead of eight. For the purpose of this answer, we do not do the coalescing. b. Delete 31: From the answer to 1.7, change the last bucket to: 7 3 c. Insert 1: From the answer to 1.7, change the first bucket to: 1 17 d. Insert 15: From the answer to 1.7, change the last bucket to:

60 Chapter 1 Indexing and Hashing 7 15 3 1.9 Let i denote the number of bits of the hash value used in the hash table. Let bsize denote the maximum capacity of each bucket. delete(value K l ) begin j =firsti high-order bits of h(k l ); delete value K l from bucket j; coalesce(bucket j); end coalesce(bucket j) begin i j = bits used in bucket j; k =anybucketwithfirst(i j 1)bitssameasthat of bucket j while the bit i j is reversed; i k = bits used in bucket k; if(i j i k ) return; /* buckets cannot be merged */ if(entries in j +entriesink>bsize) return; /* buckets cannot be merged */ move entries of bucket k into bucket j; decrease the value of i j by 1; make all the bucket-address-table entries, which pointed to bucket k, pointtoj; coalesce(bucket j); end Note that we can only merge two buckets at a time. The common hash prefix of the resultant bucket will have length one less than the two buckets merged. Hence we look at the buddy bucket of bucket j differingfromitonlyatthelast bit. If the common hash prefix of this bucket is not i j, then this implies that the buddy bucket has been further split and merge is not possible. When merge is successful, further merging may be possible, which is handled by a recursive call to coalesce at the end of the function. 1.10 If the hash table is currently using i bits of the hash value, then maintain a count of buckets for which the length of common hash prefix is exactly i. Consider a bucket j with length of common hash prefix i j.ifthebucketis being split, and i j is equal to i, then reset the count to 1. If the bucket is being

Exercises 61 split and i j is one less that i, then increase the count by 1. It the bucket if being coalesced, and i j is equal to i then decrease the count by 1. If the count becomes 0, then the bucket address table can be reduced in size at that point. However, note that if the bucket address table is not reduced at that point, then the count has no significance afterwards. If we want to postpone the reduction, we have to keep an array of counts, i.e. a count for each value of common hash prefix. The array has to be updated in a similar fashion. The bucket address table can be reduced if the i th entryofthearrayis0,wherei is the number of bits the table is using. Since bucket table reduction is an expensive operation, it is not always advisable to reduce the table. It should be reduced only when sufficient number of entries at the end of count array become 0. 1. We reproduce the account relation of Figure 1.5 below. Bitmaps for branch name A-17 Brighton 750 A-101 Downtown 500 A-1 10 Downtown 600 A-15 Mianus 700 A-10 Perryridge 400 A-01 Perryridge 900 A-18 Perryridge 700 A- Redwood 700 A-305 Round Hill 350 Brighton 1 0 0 0 0 0 0 0 0 Downtown 0 1 1 0 0 0 0 0 0 Mianus 0 0 0 1 0 0 0 0 0 Perryridge 0 0 0 0 1 1 1 0 0 Redwood 0 0 0 0 0 0 0 1 0 Round hill 0 0 0 0 0 0 0 0 1 Bitmaps for balance L 1 0 0 0 0 0 0 0 0 0 L 0 0 0 0 1 0 0 0 1 L 3 0 1 1 1 0 0 1 1 0 L 4 1 0 0 0 0 1 0 0 0 where, level L 1 is below 50, level L is from 50 to below 500, L 3 from 500 to below 750 and level L 4 is above 750.

6 Chapter 1 Indexing and Hashing To find all accounts in Downtown with a balance of 500 or more, we find the union of bitmaps for levels L 3 and L 4 and then intersect it with the bitmap for Downtown. Downtown 0 1 1 0 0 0 0 0 0 L 3 0 1 1 1 0 0 1 1 0 L 4 1 0 0 0 0 1 0 0 0 L 3 L 4 1 1 1 1 0 1 1 1 0 Downtown 0 1 1 0 0 0 0 0 0 Downtown (L 3 L 4 ) 0 1 1 0 0 0 0 0 0 Thus, the required tuples are A-101 and A-0. 1.1 No answer