In-Memory Searching. Linear Search. Binary Search. Binary Search Tree. k-d Tree. Hashing. Hash Collisions. Collision Strategies.
|
|
- Lydia Cobb
- 6 years ago
- Views:
Transcription
1 In-Memory Searching Linear Search Binary Search Binary Search Tree k-d Tree Hashing Hash Collisions Collision Strategies Chapter 4
2 Searching A second fundamental operation in Computer Science We review O(n) linear search and O(n lg n) binary search We next discuss more sophisticated approaches Two techniques form the basis for very large dataset search on disk Trees Hashing 2
3 Linear Search One of the simplest search algorithms Take a collection of n records Scan from start to end, looking for a record with primary key k Best case target is first record O 1 Worse case target is last record or not in collection O(n) Average case search ~ n Τ2 records to find target also O(n) Purpose of linear search is two-fold: Simple to implement, used when n is small or search is rare Represents a hard upper bound on acceptable search performance 3
4 Binary Search If a collection is sorted, we can perform binary search Discards n Τ2 records from further consideration on first comparison Discards additional n Τ4 records from further consideration on second comparison Continues until record found or no data left to search An algorithm that splits collection in half repeatedly runs in O(lg n) However, building a sorted collection requires O(n lg n), so binary search maintenance requires O(n lg n) 4
5 Recursive Binary Search Algorithm binary_search(k, A, lf, rt) Input: k, target key; A, sorted array to search; lf, start of search range; rt, end of search range n = rt lf + 1 if n 0 then return -1 end n Τ2 c = lf + if k == A[c] then return c else if k < A[c] then return binary_search( k, A, lf, c 1 ) else return binary_search( k, A, c + 1, rt ) end // Searching empty range // Center of search region // Target record found // Search left half of search region // Search right half of search region 5
6 Binary Search Tree Choose data structure to implement binary search Sorted array O(lg n) search performance Adding element: O(lg n) to find position, O(n) to make space Similarly, deleting elements requires O(n) to fill hole Common alternative, binary search tree (BST) Tree, each holds primary key, reference to (up to) two child nodes All keys in left subtree are smaller than parent key All keys in right subtree are larger than parent key 6
7 BST Search Algorithm bst_search(k, node) Input: k, target key; node, node in BST to begin search if node == null then return null end if k = node. key then return node else if k < node. key return bst_search( k, node. left) else return bst_search( k, node. right) end // Searching empty tree // Target record found // Search left subtree // Search right subtree 7
8 BST Operations Insertion. Insert record with key k, search for k in BST If k found, duplicate record, replace node with new record If empty subtree found, insert new node holding k s record Deletion. Delete record with key k, search for k in BST 1. If k not found, delete fails 2. If k s node has no children, remove k s node, stop 3. If k s node has one subtree, promote subtree s root 4. If k s node has two subtrees: a. Find successor to k s node smallest key value k in right subtree by walking right once, then walking left as far as possible b. Remove successor (since it has empty left subtree must match case 2 or 3 above) c. Promote successor to k s node s position 8
9 BST Deletion Cases deletion with no children deletion with one child deletion with two children 9
10 BST Performance Search, add, delete all require initial search Add, delete perform search followed by constant-time operation, so dominated by search performance BST roughly balanced: left subtree height right subtree height throughout tree Comparison discards half of remaining nodes, search in O(lg n) BST unbalanced: many empty subtrees at internal levels Search degenerates into linear search, O(n) Inserting nearly sorted data into BST produces unbalanced tree Can maintain balance with self-correction trees (e.g., AVL) 10
11 k-d Tree k-dimensional (k-d) tree is binary tree that subdivides a collection into ranges for k attributes Proposed by Jon Louis Bentley in 1975 Designed to support associative, or multiattribute, searches E.g., consider a collection of weather reports with temp and precip k-d tree can efficiently support queries like Return all records with temp < 0 and precip > 4cm k-d tree structure similar to BST, but we rotate through the k dimensions at each tree level 11
12 k-d Tree Index k-d tree structure similar to BST, but we rotate through the k dimensions at each tree level E.g., 2-d temp + precip tree subdivides by temp on root level, by precip on second level, again by temp on third level, and so on Each k-d node contains key k c and left, right subtrees Unlike BST, records are not stored in internal nodes Target key k t used to determine which subtree to enter k t k c, left subtree; k t > k c, right subtree Leaf nodes contain collections of records that satisfy conditions along root-to-leaf path 12
13 k-d Tree Example Suppose we want to use k-d tree to subdivide dwarves by height ht and weight wt Snow White and the seven dwarves define initial tree structure Name ht wt Sleepy Happy Doc Dopey Grumpy Sneezy Bashful Ms. White
14 k-d Tree Index Construction Construction identical to BST, but rotate between k = 2 dimensions at each level of tree 1. Sleepy inserted into root of tree, which uses ht as subdivision attribute 2. Happy and Doc are inserted as children of Sleepy a. Happy s ht = 34 36, inserted left of root b. Doc s ht = 38 > 36, inserted right of root c. Both Happy and Doc use wt as their subdivision attribute (second level) 3. Dopey inserted a. Dopey s ht = 37 puts him right of root, wt = 51 puts him left of b. Dopey s wt = 54 > 51 puts him right of Doc c. Dopey rotates to use ht as his subdivision attribute (third level) 4. Remaining dwarves and Snow White are inserted using identical approach 14
15 Snow White k-d Tree first three insertions, ht subdivides root, wt subdivides second level fourth insertion ht subdivides third level remaining insertions, leaves hold individual records 15
16 k-d Tree Record Management k-d index used to locate records based on ht and wt Buckets placed at each (null) leaf Designed to hold records as they are inserted into k-d tree Critical to choose index records that represent distributions of attribute values within data collection Poor choice of index records will produce poor distribution of records Index records are normally first records inserted into k-d tree 16
17 Spatial Interpretation k-d index subdivides k-dimensional space of all possible records into subspaces for each dimension k-d index subdivides k-dimensional space using (k 1)- dimensional cutting planes representing entries in the index Consider visualizing ht wt index Since k = 2, we subdivide 2D plane using 1D lines Each region represents a bucket in the index 17
18 Snow White k-d Tree Index 18
19 k-d Tree Search 1. Identify all paths whose internal nodes satisfy target attribute ranges, may produce multiple paths 2. Perform in-memory search of each path s buckets for records that match target criteria 3. Return records within buckets that satisfy search criteria Want to control size of buckets Re-indexing collection is expensive Size of index dictates maximum number of buckets, and therefore average expected bucket size 19
20 k-d Tree Search Example Search for records with ht 36 and wt At root, branch left (ht 36) 2. At next node, branch left again (wt 49) 3. At next node, branch left and right (ht 35 and ht > 35 both fall within target range of ht 36) 4. Along right path, reach bucket 3 5. Along left path, branch left (wt 50), reaching bucket 1 Bucket 1: ht 35, wt 50; bucket 3: 35 < ht 36, wt 52 Both buckets may include records with ht 36 and wt 47 No other buckets could contain these types of records From (Bashful, Sneezy) and (Sleepy), only Sneezy meets criteria (ht = 35, wt = 46) 20
21 k-d Tree Performance k-d tree s index has critical impact on performance Index should subdivide data stored in tree in balanced manner All buckets at same or nearly same level in tree Same or nearly same number of elements in each bucket If data known a-prior, median elements used to construct index E.g., the Snow White k-d tree is designed for individuals with ht 37 and wt 55 Anything outside this range will be forced into one of two buckets For dynamic trees, maintaining balance is complicated Adaptive k-d trees exist to try to maintain balance 21
22 Hashing Hashing, a second major class of efficient search algorithms Hash function converts key k t into numeric value h on a fixed range 0 n 1 h used as location/address for k t within a hash table A of size n Analogous to array indexing, can store/retrieve k t at A[h] If hash function is O(1), search, insert, delete are also O(1) Unfortunately, the number of possible keys m n is normally much larger than the size of A 22
23 Hash Function Requirements Because m n, h is not identical to an array index Three important properties distinguish h from an array index Hash value for k t should appear random Hash values should be distributed uniformly over range 0 n 1 Two different keys k s and k t can hash to the same h, a collision Collisions are a major issue, especially if each location in A can only hold one key Minimizing collisions are a main area for consideration 23
24 Perfect Hashing Choose a hash function that does not produce collisions Suppose we store credit cards, use card number as key For card numbers of form , m = possible numbers (keys), or 10 quadrillion Clearly, not possible to create in-memory A of size n = Of course, not every possible number is in use Numbers do span around to so array still not feasible Even for small number of keys, perfect hashing very difficult E.g., to store m = 4000 keys in A of size n = 5000, only 1 in functions perfect 24
25 Fold-and-Add Hash Function Common function, fold-and-add 1. Convert k t to a numeric sequence 2. Fold and add the numbers, correcting for overflow 3. Divide result by prime number, return the remainder as h k t = Subramanian, convert to numeric sequence by mapping characters to ASCII codes, binding pairs of codes S u b r a m a n i a n Assume largest pair is zz with combined ASCII code of To manage overflow divide by prime number slightly larger than after each add, keep remainder 25
26 Fold-and-Add (cont d) S u b r a m a n i a n = mod = = mod = = mod = = mod = = mod = We divide the final result by the size of the hash table Hash table size should itself be prime We choose A of size n = 101, producing a final h of h = mod 101 = 40 26
27 Hash Value Distributions Given a hash table size n holding r records, what is the likelihood that No key hashes to a particular address in the table? One key hashes to a particular address? Two keys hash to a particular address? Assume hash function uniformly distributes hash values For any key, probability it hashes to address k is b For any key, probability it does not hash to address k is a b = 1 n, a = 1 1 n (4.1) 27
28 Collision Probability Given a and b, insert two keys into hash table Compute individual cases Probability first key hits and address, second key misses Probability both keys hit same address (collision) ba = 1 n 1 1 n = 1 n 1 n 2 (4.2) bb = 1 n n n = 1 n 2 Probability x of r keys hash to a common address C = r n = r! x! r x! x 1 1 r x (4.3) Pr = Cb x a r x = C 1 n n 28
29 Estimated Collision Probability Since r! in its equation, C expensive to compute Poisson distribution Pr(x) does good job of estimating our probability Cb x a r x Pr x = ( rτ n) x e ( rτ n ) x! (4.4) Since x normally small, x! in denominator is not an issue 29
30 Estimated Collision Example Consider extreme case Store r = 1000 keys in a hash table of size n = 1000 Here, r Τn = 1, can use this ratio to calculate Pr(0), Pr(1), Pr(2), Pr 0 = 10 e 1 0! Pr 1 = 11 e 1 1! Pr 2 = 12 e 1 2! = = (4.5) = Given hash table size n = 1000, we expect npr(0) = = 368 entries that are empty npr(1) = = 368 entries holding 1 key npr(2) = = 184 entries that try to hold 2 keys, and so on 30
31 Estimating Collision Count Consider previous example r = n = 1000 npr(0) = 368 entries in table hold no keys npr(1) = 368 entries in table hold 1 key 1000 npr 0 npr 1 = 264 entries try to hold > 1 keys First 264 keys are stored, = 368 keys collide Collision rate of 36.8% 368 entries 0 keys 368 entries 1 key 368 keys inserted 264 entries > 1 keys = 632 keys inserted 264 keys accepted = 368 keys collide 31
32 Larger Hash Table Increase n = 2000, reduce packing rate to r Τn = npr 0 = e 0.5 0! npr 1 = e 0.5 1! = = Τ 2000 = npr 0 npr 1 = 178 entries try to hold > 1 keys 178 keys are stored, 214 keys collide, collision rate of 21.4% 1214 entries 0 keys 608 entries 1 key 608 keys inserted 178 entries > 1 keys = 392 keys inserted 178 keys accepted = 214 keys collide 32
33 Progressive Overflow Since collisions cannot be avoided, they must be managed Progressive overflow insertion Attempt to insert key at its hash position h If h already occupied and table is full, insertion fails Otherwise, walk forward through hash table until empty position found Progressive overflow deletion Start searching for key at its hash position h If n positions are examined and key is not found, deletion fails Otherwise, mark key s position as dirty: empty but previously occupied 33
34 Progressive Insertion Algorithm progressive_insert(rec, tbl, n) Input: rec, record to insert; tbl, hash table; n, table size num = 0 h = hash(rec. key) while num < n do if tbl[h] is empty then tbl h = rec break else end h = h + 1 % n num++ end return (num == n)? false : true // Number of insertion attempts // Store record // Try next table position // Return status of insert attempt 34
35 Progressive Search Algorithm progressive_search(key, tbl, dirty, n) Input: key, target key; tbl, hash table; dirty, dirty entry table n, table size num = 0 h = hash(key) while num < n do if key == tbl h. key then return tbl h else if tbl[h] is empty and! dirty[h] then return false else end h = h + 1 % n num++ end return false // Number of insertion attempts // Target record found // Search failed // Try next table position // Search failed 35
36 Progressive Delete Algorithm progressive_delete(key, tbl, dirty, n) Input: key, target key; tbl, hash table; dirty, dirty entry table n, table size h = progressive_search (key, tbl, dirty, n) if h!= false then tbl[ h ] = empty dirty[ h ] = true else return false end // Set table position empty // Mark table position dirty 36
37 Progressive Overflow Search To search, hash key to h, start search at position h If record found, return it If entire table examined, search fails Empty positions are handled using dirty bit If position empty and dirty bit set a. Position may have been occupied when record inserted b. And we may have walked forward looking for an empty position c. So we need to keep searching If position empty and dirty bit not set a. Position was never occupied, so we could not have walked over it b. Therefore, record not in table and search fails 37
38 Progressive Overflow Disadvantages 1. Hash table can become full Hash function uses n, so every record s hash value changes if n changes Must remove all records, increase size, re-insert all records 2. Runs form as records are inserted Multiple records hash to the same h, runs of contiguous records form Expensive to find a record near the start of the run 3. If runs merge with one another, super-runs form Very long collection of contiguous records Searching may walk over records without same h as target record If table > 75% full, search deteriorates to O(n) 38
39 Multi-Record Buckets Alternative approach, reduce collisions by storing more than one record in each hash table entry Implement bucket as expandable array or linked list Insertion and deletion are identical to simple hash table We do not need to worry about exceeding capacity Search for k with hash value h, load bucket A[h], scan for k If we use buckets, packing density of A is Τ r bn n is size of A, b is maximum entries in each A[i] position 39
40 Single vs. Multi-Record Buckets r = 700, n = 1000, b = 1 r Τn = 700Τ 1000 = 0.7 P 0 = 0.70 e 0.7 = ! P 1 = 0.71 e 0.7 1! = r = 700, n = 500, b = 2 r Τn = 700Τ 500 = 1.4 P 0 = 1.40 e 1.4 = ! P 1 = 1.41 e 1.4 1! P 2 = 1.42 e 1.4 2! = = entries 0 keys 347 entries 1 key 347 recs 155 entries > 1 keys 352 recs 197 collisions 28.1% 124 entries 0 keys 172 entries 1 key 172 recs 121 entries 2 keys 242 recs 83 entries > 2 keys 286 recs 120 collisions 17.1% 40
41 Bucket Advantages and Disadvantages Simply rearranging 1000 table entries into a two-bucket table reduced collision rate from 28.1% to 17.1% Multi-bucket tables still have disadvantages If r n, buckets become long, search deteriorates to O(n) Check for duplicate keys, deletion also deteriorates to O(n) Size of table n still cannot be efficiently changed 41
Hash Table and Hashing
Hash Table and Hashing The tree structures discussed so far assume that we can only work with the input keys by comparing them. No other operation is considered. In practice, it is often true that an input
More informationData Structures and Algorithms
Data Structures and Algorithms CS245-2008S-19 B-Trees David Galles Department of Computer Science University of San Francisco 19-0: Indexing Operations: Add an element Remove an element Find an element,
More informationModule 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree.
The Lecture Contains: Index structure Binary search tree (BST) B-tree B+-tree Order file:///c /Documents%20and%20Settings/iitkrana1/My%20Documents/Google%20Talk%20Received%20Files/ist_data/lecture13/13_1.htm[6/14/2012
More informationBalanced Search Trees. CS 3110 Fall 2010
Balanced Search Trees CS 3110 Fall 2010 Some Search Structures Sorted Arrays Advantages Search in O(log n) time (binary search) Disadvantages Need to know size in advance Insertion, deletion O(n) need
More informationB-Trees & its Variants
B-Trees & its Variants Advanced Data Structure Spring 2007 Zareen Alamgir Motivation Yet another Tree! Why do we need another Tree-Structure? Data Retrieval from External Storage In database programs,
More informationCS301 - Data Structures Glossary By
CS301 - Data Structures Glossary By Abstract Data Type : A set of data values and associated operations that are precisely specified independent of any particular implementation. Also known as ADT Algorithm
More informationData Structures Lesson 7
Data Structures Lesson 7 BSc in Computer Science University of New York, Tirana Assoc. Prof. Dr. Marenglen Biba 1-1 Binary Search Trees For large amounts of input, the linear access time of linked lists
More informationAn AVL tree with N nodes is an excellent data. The Big-Oh analysis shows that most operations finish within O(log N) time
B + -TREES MOTIVATION An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations finish within O(log N) time The theoretical conclusion
More informationB-Trees. Version of October 2, B-Trees Version of October 2, / 22
B-Trees Version of October 2, 2014 B-Trees Version of October 2, 2014 1 / 22 Motivation An AVL tree can be an excellent data structure for implementing dictionary search, insertion and deletion Each operation
More informationUNIT III BALANCED SEARCH TREES AND INDEXING
UNIT III BALANCED SEARCH TREES AND INDEXING OBJECTIVE The implementation of hash tables is frequently called hashing. Hashing is a technique used for performing insertions, deletions and finds in constant
More informationTrees. (Trees) Data Structures and Programming Spring / 28
Trees (Trees) Data Structures and Programming Spring 2018 1 / 28 Trees A tree is a collection of nodes, which can be empty (recursive definition) If not empty, a tree consists of a distinguished node r
More informationMulti-way Search Trees! M-Way Search! M-Way Search Trees Representation!
Lecture 10: Multi-way Search Trees: intro to B-trees 2-3 trees 2-3-4 trees Multi-way Search Trees A node on an M-way search tree with M 1 distinct and ordered keys: k 1 < k 2 < k 3
More informationQuestion Bank Subject: Advanced Data Structures Class: SE Computer
Question Bank Subject: Advanced Data Structures Class: SE Computer Question1: Write a non recursive pseudo code for post order traversal of binary tree Answer: Pseudo Code: 1. Push root into Stack_One.
More informationPhysical Level of Databases: B+-Trees
Physical Level of Databases: B+-Trees Adnan YAZICI Computer Engineering Department METU (Fall 2005) 1 B + -Tree Index Files l Disadvantage of indexed-sequential files: performance degrades as file grows,
More information9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology
Introduction Chapter 4 Trees for large input, even linear access time may be prohibitive we need data structures that exhibit average running times closer to O(log N) binary search tree 2 Terminology recursive
More informationMultiway searching. In the worst case of searching a complete binary search tree, we can make log(n) page faults Everyone knows what a page fault is?
Multiway searching What do we do if the volume of data to be searched is too large to fit into main memory Search tree is stored on disk pages, and the pages required as comparisons proceed may not be
More informationComputational Optimization ISE 407. Lecture 16. Dr. Ted Ralphs
Computational Optimization ISE 407 Lecture 16 Dr. Ted Ralphs ISE 407 Lecture 16 1 References for Today s Lecture Required reading Sections 6.5-6.7 References CLRS Chapter 22 R. Sedgewick, Algorithms in
More informationTREES. Trees - Introduction
TREES Chapter 6 Trees - Introduction All previous data organizations we've studied are linear each element can have only one predecessor and successor Accessing all elements in a linear sequence is O(n)
More informationSorted Arrays. Operation Access Search Selection Predecessor Successor Output (print) Insert Delete Extract-Min
Binary Search Trees FRIDAY ALGORITHMS Sorted Arrays Operation Access Search Selection Predecessor Successor Output (print) Insert Delete Extract-Min 6 10 11 17 2 0 6 Running Time O(1) O(lg n) O(1) O(1)
More informationQuiz 1 Solutions. (a) f(n) = n g(n) = log n Circle all that apply: f = O(g) f = Θ(g) f = Ω(g)
Introduction to Algorithms March 11, 2009 Massachusetts Institute of Technology 6.006 Spring 2009 Professors Sivan Toledo and Alan Edelman Quiz 1 Solutions Problem 1. Quiz 1 Solutions Asymptotic orders
More informationB-Trees. Disk Storage. What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree. Deletion in a B-tree
B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree Deletion in a B-tree Disk Storage Data is stored on disk (i.e., secondary memory) in blocks. A block is
More information1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1
Asymptotics, Recurrence and Basic Algorithms 1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1 2. O(n) 2. [1 pt] What is the solution to the recurrence T(n) = T(n/2) + n, T(1)
More informationThe B-Tree. Yufei Tao. ITEE University of Queensland. INFS4205/7205, Uni of Queensland
Yufei Tao ITEE University of Queensland Before ascending into d-dimensional space R d with d > 1, this lecture will focus on one-dimensional space, i.e., d = 1. We will review the B-tree, which is a fundamental
More informationChapter 12 Advanced Data Structures
Chapter 12 Advanced Data Structures 2 Red-Black Trees add the attribute of (red or black) to links/nodes red-black trees used in C++ Standard Template Library (STL) Java to implement maps (or, as in Python)
More information9/24/ Hash functions
11.3 Hash functions A good hash function satis es (approximately) the assumption of SUH: each key is equally likely to hash to any of the slots, independently of the other keys We typically have no way
More informationChapter 5 Hashing. Introduction. Hashing. Hashing Functions. hashing performs basic operations, such as insertion,
Introduction Chapter 5 Hashing hashing performs basic operations, such as insertion, deletion, and finds in average time 2 Hashing a hash table is merely an of some fixed size hashing converts into locations
More informationAlgorithms in Systems Engineering ISE 172. Lecture 16. Dr. Ted Ralphs
Algorithms in Systems Engineering ISE 172 Lecture 16 Dr. Ted Ralphs ISE 172 Lecture 16 1 References for Today s Lecture Required reading Sections 6.5-6.7 References CLRS Chapter 22 R. Sedgewick, Algorithms
More informationIntroduction. hashing performs basic operations, such as insertion, better than other ADTs we ve seen so far
Chapter 5 Hashing 2 Introduction hashing performs basic operations, such as insertion, deletion, and finds in average time better than other ADTs we ve seen so far 3 Hashing a hash table is merely an hashing
More informationExtra: B+ Trees. Motivations. Differences between BST and B+ 10/27/2017. CS1: Java Programming Colorado State University
Extra: B+ Trees CS1: Java Programming Colorado State University Slides by Wim Bohm and Russ Wakefield 1 Motivations Many times you want to minimize the disk accesses while doing a search. A binary search
More informationOrganizing Spatial Data
Organizing Spatial Data Spatial data records include a sense of location as an attribute. Typically location is represented by coordinate data (in 2D or 3D). 1 If we are to search spatial data using the
More informationSome Search Structures. Balanced Search Trees. Binary Search Trees. A Binary Search Tree. Review Binary Search Trees
Some Search Structures Balanced Search Trees Lecture 8 CS Fall Sorted Arrays Advantages Search in O(log n) time (binary search) Disadvantages Need to know size in advance Insertion, deletion O(n) need
More informationIntroduction. for large input, even access time may be prohibitive we need data structures that exhibit times closer to O(log N) binary search tree
Chapter 4 Trees 2 Introduction for large input, even access time may be prohibitive we need data structures that exhibit running times closer to O(log N) binary search tree 3 Terminology recursive definition
More informationSection 05: Solutions
Section 05: Solutions 1. Asymptotic Analysis (a) Applying definitions For each of the following, choose a c and n 0 which show f(n) O(g(n)). Explain why your values of c and n 0 work. (i) f(n) = 5000n
More informationChapter 12: Indexing and Hashing. Basic Concepts
Chapter 12: Indexing and Hashing! Basic Concepts! Ordered Indices! B+-Tree Index Files! B-Tree Index Files! Static Hashing! Dynamic Hashing! Comparison of Ordered Indexing and Hashing! Index Definition
More informationDATA STRUCTURES/UNIT 3
UNIT III SORTING AND SEARCHING 9 General Background Exchange sorts Selection and Tree Sorting Insertion Sorts Merge and Radix Sorts Basic Search Techniques Tree Searching General Search Trees- Hashing.
More informationWe assume uniform hashing (UH):
We assume uniform hashing (UH): the probe sequence of each key is equally likely to be any of the! permutations of 0,1,, 1 UH generalizes the notion of SUH that produces not just a single number, but a
More informationAlgorithms. AVL Tree
Algorithms AVL Tree Balanced binary tree The disadvantage of a binary search tree is that its height can be as large as N-1 This means that the time needed to perform insertion and deletion and many other
More informationSection 05: Solutions
Section 05: Solutions 1. Memory and B-Tree (a) Based on your understanding of how computers access and store memory, why might it be faster to access all the elements of an array-based queue than to access
More informationCSIT5300: Advanced Database Systems
CSIT5300: Advanced Database Systems L08: B + -trees and Dynamic Hashing Dr. Kenneth LEUNG Department of Computer Science and Engineering The Hong Kong University of Science and Technology Hong Kong SAR,
More informationChapter 12: Indexing and Hashing
Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL
More informationCS102 Binary Search Trees
CS102 Binary Search Trees Prof Tejada 1 To speed up insertion, removal and search, modify the idea of a Binary Tree to create a Binary Search Tree (BST) Binary Search Trees Binary Search Trees have one
More informationRecall: Properties of B-Trees
CSE 326 Lecture 10: B-Trees and Heaps It s lunch time what s cookin? B-Trees Insert/Delete Examples and Run Time Analysis Summary of Search Trees Introduction to Heaps and Priority Queues Covered in Chapters
More informationkd-trees Idea: Each level of the tree compares against 1 dimension. Let s us have only two children at each node (instead of 2 d )
kd-trees Invented in 1970s by Jon Bentley Name originally meant 3d-trees, 4d-trees, etc where k was the # of dimensions Now, people say kd-tree of dimension d Idea: Each level of the tree compares against
More informationBINARY SEARCH TREES cs2420 Introduction to Algorithms and Data Structures Spring 2015
BINARY SEARCH TREES cs2420 Introduction to Algorithms and Data Structures Spring 2015 1 administrivia 2 -assignment 7 due tonight at midnight -asking for regrades through assignment 5 and midterm must
More information2-3 Tree. Outline B-TREE. catch(...){ printf( "Assignment::SolveProblem() AAAA!"); } ADD SLIDES ON DISJOINT SETS
Outline catch(...){ printf( "Assignment::SolveProblem() AAAA!"); } Balanced Search Trees 2-3 Trees 2-3-4 Trees Slide 4 Why care about advanced implementations? Same entries, different insertion sequence:
More informationComputer Science 210 Data Structures Siena College Fall Topic Notes: Binary Search Trees
Computer Science 10 Data Structures Siena College Fall 018 Topic Notes: Binary Search Trees Possibly the most common usage of a binary tree is to store data for quick retrieval. Definition: A binary tree
More informationWeek 10. Sorting. 1 Binary heaps. 2 Heapification. 3 Building a heap 4 HEAP-SORT. 5 Priority queues 6 QUICK-SORT. 7 Analysing QUICK-SORT.
Week 10 1 Binary s 2 3 4 5 6 Sorting Binary s 7 8 General remarks Binary s We return to sorting, considering and. Reading from CLRS for week 7 1 Chapter 6, Sections 6.1-6.5. 2 Chapter 7, Sections 7.1,
More informationSelf-Balancing Search Trees. Chapter 11
Self-Balancing Search Trees Chapter 11 Chapter Objectives To understand the impact that balance has on the performance of binary search trees To learn about the AVL tree for storing and maintaining a binary
More informationDatabase System Concepts, 6 th Ed. Silberschatz, Korth and Sudarshan See for conditions on re-use
Chapter 11: Indexing and Hashing Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files Static
More informationCS 3114 Data Structures and Algorithms READ THIS NOW!
READ THIS NOW! Print your name in the space provided below. There are 7 short-answer questions, priced as marked. The maximum score is 100. This examination is closed book and closed notes, aside from
More informationIntro to DB CHAPTER 12 INDEXING & HASHING
Intro to DB CHAPTER 12 INDEXING & HASHING Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing
More informationCSCI2100B Data Structures Heaps
CSCI2100B Data Structures Heaps Irwin King king@cse.cuhk.edu.hk http://www.cse.cuhk.edu.hk/~king Department of Computer Science & Engineering The Chinese University of Hong Kong Introduction In some applications,
More informationCSCI Trees. Mark Redekopp David Kempe
CSCI 104 2-3 Trees Mark Redekopp David Kempe Trees & Maps/Sets C++ STL "maps" and "sets" use binary search trees internally to store their keys (and values) that can grow or contract as needed This allows
More informationWeek 10. Sorting. 1 Binary heaps. 2 Heapification. 3 Building a heap 4 HEAP-SORT. 5 Priority queues 6 QUICK-SORT. 7 Analysing QUICK-SORT.
Week 10 1 2 3 4 5 6 Sorting 7 8 General remarks We return to sorting, considering and. Reading from CLRS for week 7 1 Chapter 6, Sections 6.1-6.5. 2 Chapter 7, Sections 7.1, 7.2. Discover the properties
More informationBinary Trees. BSTs. For example: Jargon: Data Structures & Algorithms. root node. level: internal node. edge.
Binary Trees 1 A binary tree is either empty, or it consists of a node called the root together with two binary trees called the left subtree and the right subtree of the root, which are disjoint from
More informationHashing for searching
Hashing for searching Consider searching a database of records on a given key. There are three standard techniques: Searching sequentially start at the first record and look at each record in turn until
More informationBalanced Search Trees
Balanced Search Trees Michael P. Fourman February 2, 2010 To investigate the efficiency of binary search trees, we need to establish formulae that predict the time required for these dictionary or set
More informationSome Practice Problems on Hardware, File Organization and Indexing
Some Practice Problems on Hardware, File Organization and Indexing Multiple Choice State if the following statements are true or false. 1. On average, repeated random IO s are as efficient as repeated
More informationComp 335 File Structures. B - Trees
Comp 335 File Structures B - Trees Introduction Simple indexes provided a way to directly access a record in an entry sequenced file thereby decreasing the number of seeks to disk. WE ASSUMED THE INDEX
More informationOperations on Heap Tree The major operations required to be performed on a heap tree are Insertion, Deletion, and Merging.
Priority Queue, Heap and Heap Sort In this time, we will study Priority queue, heap and heap sort. Heap is a data structure, which permits one to insert elements into a set and also to find the largest
More informationChapter 11: Indexing and Hashing
Chapter 11: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition in SQL
More informationCS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics
CS2223: Algorithms Sorting Algorithms, Heap Sort, Linear-time sort, Median and Order Statistics 1 Sorting 1.1 Problem Statement You are given a sequence of n numbers < a 1, a 2,..., a n >. You need to
More informationQuiz 1 Solutions. Asymptotic growth [10 points] For each pair of functions f(n) and g(n) given below:
Introduction to Algorithms October 15, 2008 Massachusetts Institute of Technology 6.006 Fall 2008 Professors Ronald L. Rivest and Sivan Toledo Quiz 1 Solutions Problem 1. Asymptotic growth [10 points]
More informationLecture 3 February 9, 2010
6.851: Advanced Data Structures Spring 2010 Dr. André Schulz Lecture 3 February 9, 2010 Scribe: Jacob Steinhardt and Greg Brockman 1 Overview In the last lecture we continued to study binary search trees
More information2-3 and Trees. COL 106 Shweta Agrawal, Amit Kumar, Dr. Ilyas Cicekli
2-3 and 2-3-4 Trees COL 106 Shweta Agrawal, Amit Kumar, Dr. Ilyas Cicekli Multi-Way Trees A binary search tree: One value in each node At most 2 children An M-way search tree: Between 1 to (M-1) values
More informationIndexing and Hashing
C H A P T E R 1 Indexing and Hashing This chapter covers indexing techniques ranging from the most basic one to highly specialized ones. Due to the extensive use of indices in database systems, this chapter
More informationvoid insert( Type const & ) void push_front( Type const & )
6.1 Binary Search Trees A binary search tree is a data structure that can be used for storing sorted data. We will begin by discussing an Abstract Sorted List or Sorted List ADT and then proceed to describe
More informationData and File Structures Chapter 11. Hashing
Data and File Structures Chapter 11 Hashing 1 Motivation Sequential Searching can be done in O(N) access time, meaning that the number of seeks grows in proportion to the size of the file. B-Trees improve
More informationChapter 6 Heaps. Introduction. Heap Model. Heap Implementation
Introduction Chapter 6 Heaps some systems applications require that items be processed in specialized ways printing may not be best to place on a queue some jobs may be more small 1-page jobs should be
More informationBST Deletion. First, we need to find the value which is easy because we can just use the method we developed for BST_Search.
BST Deletion Deleting a value from a Binary Search Tree is a bit more complicated than inserting a value, but we will deal with the steps one at a time. First, we need to find the value which is easy because
More informationA6-R3: DATA STRUCTURE THROUGH C LANGUAGE
A6-R3: DATA STRUCTURE THROUGH C LANGUAGE NOTE: 1. There are TWO PARTS in this Module/Paper. PART ONE contains FOUR questions and PART TWO contains FIVE questions. 2. PART ONE is to be answered in the TEAR-OFF
More informationBalanced Search Trees
Balanced Search Trees Computer Science E-22 Harvard Extension School David G. Sullivan, Ph.D. Review: Balanced Trees A tree is balanced if, for each node, the node s subtrees have the same height or have
More information- 1 - Handout #22S May 24, 2013 Practice Second Midterm Exam Solutions. CS106B Spring 2013
CS106B Spring 2013 Handout #22S May 24, 2013 Practice Second Midterm Exam Solutions Based on handouts by Eric Roberts and Jerry Cain Problem One: Reversing a Queue One way to reverse the queue is to keep
More informationCS 525: Advanced Database Organization 04: Indexing
CS 5: Advanced Database Organization 04: Indexing Boris Glavic Part 04 Indexing & Hashing value record? value Slides: adapted from a course taught by Hector Garcia-Molina, Stanford InfoLab CS 5 Notes 4
More informationLecture 5. Treaps Find, insert, delete, split, and join in treaps Randomized search trees Randomized search tree time costs
Lecture 5 Treaps Find, insert, delete, split, and join in treaps Randomized search trees Randomized search tree time costs Reading: Randomized Search Trees by Aragon & Seidel, Algorithmica 1996, http://sims.berkeley.edu/~aragon/pubs/rst96.pdf;
More informationBinary Search Trees. Analysis of Algorithms
Binary Search Trees Analysis of Algorithms Binary Search Trees A BST is a binary tree in symmetric order 31 Each node has a key and every node s key is: 19 23 25 35 38 40 larger than all keys in its left
More informationAnnouncements. Reading Material. Recap. Today 9/17/17. Storage (contd. from Lecture 6)
CompSci 16 Intensive Computing Systems Lecture 7 Storage and Index Instructor: Sudeepa Roy Announcements HW1 deadline this week: Due on 09/21 (Thurs), 11: pm, no late days Project proposal deadline: Preliminary
More informationHashing. Hashing Procedures
Hashing Hashing Procedures Let us denote the set of all possible key values (i.e., the universe of keys) used in a dictionary application by U. Suppose an application requires a dictionary in which elements
More informationBalanced Binary Search Trees. Victor Gao
Balanced Binary Search Trees Victor Gao OUTLINE Binary Heap Revisited BST Revisited Balanced Binary Search Trees Rotation Treap Splay Tree BINARY HEAP: REVIEW A binary heap is a complete binary tree such
More informationCS F-11 B-Trees 1
CS673-2016F-11 B-Trees 1 11-0: Binary Search Trees Binary Tree data structure All values in left subtree< value stored in root All values in the right subtree>value stored in root 11-1: Generalizing BSTs
More informationTree-Structured Indexes
Introduction Tree-Structured Indexes Chapter 10 As for any index, 3 alternatives for data entries k*: Data record with key value k
More informationHash Tables Outline. Definition Hash functions Open hashing Closed hashing. Efficiency. collision resolution techniques. EECS 268 Programming II 1
Hash Tables Outline Definition Hash functions Open hashing Closed hashing collision resolution techniques Efficiency EECS 268 Programming II 1 Overview Implementation style for the Table ADT that is good
More informationTreaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19
CSE34T/CSE549T /05/04 Lecture 9 Treaps Binary Search Trees (BSTs) Search trees are tree-based data structures that can be used to store and search for items that satisfy a total order. There are many types
More informationCISC 235: Topic 4. Balanced Binary Search Trees
CISC 235: Topic 4 Balanced Binary Search Trees Outline Rationale and definitions Rotations AVL Trees, Red-Black, and AA-Trees Algorithms for searching, insertion, and deletion Analysis of complexity CISC
More informationHash Tables. CS 311 Data Structures and Algorithms Lecture Slides. Wednesday, April 22, Glenn G. Chappell
Hash Tables CS 311 Data Structures and Algorithms Lecture Slides Wednesday, April 22, 2009 Glenn G. Chappell Department of Computer Science University of Alaska Fairbanks CHAPPELLG@member.ams.org 2005
More informationDesign and Analysis of Algorithms Lecture- 9: B- Trees
Design and Analysis of Algorithms Lecture- 9: B- Trees Dr. Chung- Wen Albert Tsao atsao@svuca.edu www.408codingschool.com/cs502_algorithm 1/12/16 Slide Source: http://www.slideshare.net/anujmodi555/b-trees-in-data-structure
More informationChapter 12: Indexing and Hashing (Cnt(
Chapter 12: Indexing and Hashing (Cnt( Cnt.) Basic Concepts Ordered Indices B+-Tree Index Files B-Tree Index Files Static Hashing Dynamic Hashing Comparison of Ordered Indexing and Hashing Index Definition
More informationAugmenting Data Structures
Augmenting Data Structures [Not in G &T Text. In CLRS chapter 14.] An AVL tree by itself is not very useful. To support more useful queries we need more structure. General Definition: An augmented data
More informationLecture 11: Multiway and (2,4) Trees. Courtesy to Goodrich, Tamassia and Olga Veksler
Lecture 11: Multiway and (2,4) Trees 9 2 5 7 10 14 Courtesy to Goodrich, Tamassia and Olga Veksler Instructor: Yuzhen Xie Outline Multiway Seach Tree: a new type of search trees: for ordered d dictionary
More informationProperties of a heap (represented by an array A)
Chapter 6. HeapSort Sorting Problem Input: A sequence of n numbers < a1, a2,..., an > Output: A permutation (reordering) of the input sequence such that ' ' ' < a a a > 1 2... n HeapSort O(n lg n) worst
More informationCS350: Data Structures Red-Black Trees
Red-Black Trees James Moscola Department of Engineering & Computer Science York College of Pennsylvania James Moscola Red-Black Tree An alternative to AVL trees Insertion can be done in a bottom-up or
More informationCSE 530A. B+ Trees. Washington University Fall 2013
CSE 530A B+ Trees Washington University Fall 2013 B Trees A B tree is an ordered (non-binary) tree where the internal nodes can have a varying number of child nodes (within some range) B Trees When a key
More informationIntroduction to Indexing 2. Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana
Introduction to Indexing 2 Acknowledgements: Eamonn Keogh and Chotirat Ann Ratanamahatana Indexed Sequential Access Method We have seen that too small or too large an index (in other words too few or too
More informationSolution READ THIS NOW! CS 3114 Data Structures and Algorithms
READ THIS NOW! Print your name in the space provided below. There are 5 short-answer questions, priced as marked. The maximum score is 100. This examination is closed book and closed notes, aside from
More informationTree-Structured Indexes
Tree-Structured Indexes Chapter 9 Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Introduction As for any index, 3 alternatives for data entries k*: ➀ Data record with key value k ➁
More informationLecture 3: B-Trees. October Lecture 3: B-Trees
October 2017 Remarks Search trees The dynamic set operations search, minimum, maximum, successor, predecessor, insert and del can be performed efficiently (in O(log n) time) if the search tree is balanced.
More informationThus, it is reasonable to compare binary search trees and binary heaps as is shown in Table 1.
7.2 Binary Min-Heaps A heap is a tree-based structure, but it doesn t use the binary-search differentiation between the left and right sub-trees to create a linear ordering. Instead, a binary heap only
More informationV Advanced Data Structures
V Advanced Data Structures B-Trees Fibonacci Heaps 18 B-Trees B-trees are similar to RBTs, but they are better at minimizing disk I/O operations Many database systems use B-trees, or variants of them,
More informationIntroduction to Indexing R-trees. Hong Kong University of Science and Technology
Introduction to Indexing R-trees Dimitris Papadias Hong Kong University of Science and Technology 1 Introduction to Indexing 1. Assume that you work in a government office, and you maintain the records
More informationHeap Model. specialized queue required heap (priority queue) provides at least
Chapter 6 Heaps 2 Introduction some systems applications require that items be processed in specialized ways printing may not be best to place on a queue some jobs may be more small 1-page jobs should
More information