B-trees and Pólya urns

Similar documents
Distribution of tree parameters

Probabilistic analysis of algorithms: What s it good for?

March 20/2003 Jayakanth Srinivasan,

A fast-growing subset of Av(1324)

Data Structures and Algorithms

Module 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree.

The number of spanning trees of random 2-trees

Trees. Q: Why study trees? A: Many advance ADTs are implemented using tree-based data structures.

CS 310 B-trees, Page 1. Motives. Large-scale databases are stored in disks/hard drives.

Self-Balancing Search Trees. Chapter 11

Trees. Eric McCreath

1 The range query problem

Recursively Defined Functions

(2,4) Trees. 2/22/2006 (2,4) Trees 1

Lower bound for comparison-based sorting

Uses for Trees About Trees Binary Trees. Trees. Seth Long. January 31, 2010

Recursive Sequences. Lecture 24 Section 5.6. Robb T. Koether. Hampden-Sydney College. Wed, Feb 26, 2014

Recursive Sequences. Lecture 24 Section 5.6. Robb T. Koether. Hampden-Sydney College. Wed, Feb 27, 2013

Well labeled paths and the volume of a polytope

Level-Balanced B-Trees

Organizing Spatial Data

Multi-way Search Trees. (Multi-way Search Trees) Data Structures and Programming Spring / 25

CS F-11 B-Trees 1

EE 368. Weeks 5 (Notes)

Algorithms for Memory Hierarchies Lecture 2

Binary Trees

STRAIGHT LINE ORTHOGONAL DRAWINGS OF COMPLETE TERNERY TREES SPUR FINAL PAPER, SUMMER July 29, 2015

R10 SET - 1. Code No: R II B. Tech I Semester, Supplementary Examinations, May

Motivation for B-Trees

CSE 203A: Randomized Algorithms

B-Trees. Based on materials by D. Frey and T. Anastasio

CSE 530A. B+ Trees. Washington University Fall 2013

Data Structures and Algorithms Week 4

Recursive Definitions and Structural Induction

Tree Structures. A hierarchical data structure whose point of entry is the root node

CS521 \ Notes for the Final Exam

Midterm solutions. n f 3 (n) = 3

Introduction to Computers and Programming. Concept Question

CS 441 Discrete Mathematics for CS Lecture 26. Graphs. CS 441 Discrete mathematics for CS. Final exam

B-Trees. Version of October 2, B-Trees Version of October 2, / 22

SFU CMPT Lecture: Week 9

Multiway Search Trees. Multiway-Search Trees (cont d)

4 Basics of Trees. Petr Hliněný, FI MU Brno 1 FI: MA010: Trees and Forests

Data Structure Lecture#10: Binary Trees (Chapter 5) U Kang Seoul National University

(2,4) Trees Goodrich, Tamassia (2,4) Trees 1

An Exact Enumeration of Unlabeled Cactus Graphs

AVL Trees. (AVL Trees) Data Structures and Programming Spring / 17

Tree traversals and binary trees

Induction and Recursion. CMPS/MATH 2170: Discrete Mathematics

Recursive Definitions Structural Induction Recursive Algorithms

Search Trees. Undirected graph Directed graph Tree Binary search tree

Trees : Part 1. Section 4.1. Theory and Terminology. A Tree? A Tree? Theory and Terminology. Theory and Terminology

CPSC 320 Midterm 2 Thursday March 13th, 2014

Data Structures and Algorithms Chapter 4

Introduction. for large input, even access time may be prohibitive we need data structures that exhibit times closer to O(log N) binary search tree

Solutions. Suppose we insert all elements of U into the table, and let n(b) be the number of elements of U that hash to bucket b. Then.

Recursive definition of sets and structural induction

Solutions. (a) Claim: A d-ary tree of height h has at most 1 + d +...

Figure 4.1: The evolution of a rooted tree.

Lecture 6: Combinatorics Steven Skiena. skiena

Lecture 5. Treaps Find, insert, delete, split, and join in treaps Randomized search trees Randomized search tree time costs

CS350: Data Structures B-Trees

DATA STRUCTURES AND ALGORITHMS. Hierarchical data structures: AVL tree, Bayer tree, Heap

Notation Index. Probability notation. (there exists) (such that) Fn-4 B n (Bell numbers) CL-27 s t (equivalence relation) GT-5.

Recursive Data Structures and Grammars

Main Memory and the CPU Cache

Exercise set 2 Solutions

Laboratory Module Trees

C SCI 335 Software Analysis & Design III Lecture Notes Prof. Stewart Weiss Chapter 4: B Trees

Multiple-choice (35 pt.)

2-3 Tree. Outline B-TREE. catch(...){ printf( "Assignment::SolveProblem() AAAA!"); } ADD SLIDES ON DISJOINT SETS

Trees 11/15/16. Chapter 11. Terminology. Terminology. Terminology. Terminology. Terminology

Balanced Search Trees

Lower Bound on Comparison-based Sorting

Recursion defining an object (or function, algorithm, etc.) in terms of itself. Recursion can be used to define sequences

B-Trees. CS321 Spring 2014 Steve Cutchin

COSC 2011 Section N. Trees: Terminology and Basic Properties

II (Sorting and) Order Statistics

Dynamic Programming Homework Problems

Lecture Notes for Advanced Algorithms

Binary search trees. Binary search trees are data structures based on binary trees that support operations on dynamic sets.

CSCE 411 Design and Analysis of Algorithms

On Covering a Graph Optimally with Induced Subgraphs

Counting Problems; and Recursion! CSCI 2824, Fall 2012!

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

Background: disk access vs. main memory access (1/2)

CMPS 2200 Fall 2017 Red-black trees Carola Wenk

Range searching. with Range Trees

[ DATA STRUCTURES ] Fig. (1) : A Tree

CSci 231 Homework 7. Red Black Trees. CLRS Chapter 13 and 14

1. Meshes. D7013E Lecture 14

Advanced Tree Data Structures

Red-Black Trees. Based on materials by Dennis Frey, Yun Peng, Jian Chen, and Daniel Hood

Here is a recursive algorithm that solves this problem, given a pointer to the root of T : MaxWtSubtree [r]

CSci 231 Homework 7. Red Black Trees. CLRS Chapter 13 and 14

Data Structures Question Bank Multiple Choice

Ensures that no such path is more than twice as long as any other, so that the tree is approximately balanced

15.4 Longest common subsequence

8. Binary Search Tree

ALTERNATIVE METHODS FOR CLUSTERING

Transcription:

B-trees and Pólya urns Danièle GARDY PRiSM (UVSQ) with B. Chauvin and N. Pouyanne (LMV) and D.-H. Ton-That (PRiSM) AofA, Strobl June 2015

B-trees and algorithms Some enumeration problems Pólya urns

B-trees and algorithms m integer 2 : parameter of the B-tree Database applications : m «large» (several hundreds) B-tree shape planar tree root : between 2 and 2m children other internal nodes : between m and 2m children nodes without children at same level

B-trees and algorithms B-tree shape with parameter m = 2 and with 13 nodes

B-trees and algorithms m integer 2 : parameter of the B-tree B-tree B-tree shape Research tree : nodes contain records (keys) belonging to an ordered set + at each node, the root keys determine the partition of non-root keys into subtrees Root : between 1 and 2m 1 keys Other nodes : between m 1 and 2m 1 keys All keys distinct : a tree with repeated keys in internal nodes cannot be a B-tree

B-trees and algorithms 65, 80 30, 52 75 85,90,95 22, 25,27 33, 45, 49 58, 61 68, 70, 73 76, 77 81,82,84 86 91, 93 97,99,100 A B-tree (m = 2) : B-tree shape + labelling as a research tree.

B-trees and algorithms Variations Nodes have between m and 2m keys (internal nodes : between m + 1 and 2m + 1 children) For such trees and m = 1 : each node has 1 or 2 keys (internal nodes : 2 or 3 children) 2 3 trees Internal nodes may contain just an index, and the actual records are in leaves

B-trees and algorithms Searching for a key X in a B-tree 65, 80 30, 52 75 85,90,95 22, 25,27 33, 45, 49 58, 61 68, 70, 73 76, 77 81,82,84 86 91, 93 97,99,100

B-trees and algorithms Inserting a key X into a B-tree No repeated key Insertion of a new key : in a leaf Research tree a single place in a terminal node to insert X B-tree shape terminal nodes must be at the same level B-tree terminal nodes contain between m 1 and 2m 1 keys ; what if the relevant node is already full?

B-trees and algorithms Insertion of 60 65, 80 30, 52 75 85,90,95 22, 25,27 33, 45, 49 58, 61 68, 70, 73 76, 77 81,82,84 86 91, 93 97,99,100

B-trees and algorithms Insertion of 60 65, 80 30, 52 75 85,90,95 22, 25,27 33, 45, 49 58, 60,61 68, 70, 73 76, 77 81,82,84 86, 88 91, 93 97,99,100

B-trees and algorithms Insertion of 60 65, 80 30, 52 75 85,90,95 22, 25,27 33, 45, 49 58, 60,61 68, 70, 73 76, 77 81,82,84 86, 88 91, 93 97,99,100 What if we now wish to insert 63?

B-trees and algorithms Insertion of 63? 65, 80 30, 52, 60 75 85,90,95 22, 25,27 33, 45, 49 58 61 68, 70, 73 76, 77 81,82,84 86, 88 91, 93 97,99,100 63

B-trees and algorithms An internal node was split : 30, 52 30, 52, 60 22, 25,27 33, 45, 49 58, 60, 61 22, 25,27 33, 45, 49 58 61 A terminal node with maximal number of keys disappears 2 terminal nodes with minimal number of keys appear Parent node could accomodate one more key

B-trees and algorithms Inserting a key X into a B-tree Need to keep the tree balanced intricate algorithm Splitting a node may go all the way up to the root tree grows from the root

B-trees and algorithms Inserting a key X into a B-tree Need to keep the tree balanced intricate algorithm Splitting a node may go all the way up to the root tree grows from the root Analysis much more difficult than for other research trees Pólya urn approach useful for lower level

Some enumeration problems Counting issues for B-trees (shapes) with parameter m

Some enumeration problems Counting issues for B-trees (shapes) with parameter m Relation between height h and number of keys n of a tree log 2m (n + 1) h log m n + 1 2 Number of trees with n keys Number of trees with height h + 1.

Some enumeration problems Number of trees with n keys? Proposition (Odlyzko 82) Define E(z) as the g.f. enumerating 2-3 trees w.r.t. number n of leaves = number n-1 of keys in internal nodes E(z) = z + E(z 2 + z 3 ). Radius of convergence : golden ratio 1+ 5 2 Number of 2-3 trees with n leaves : e n ω(n) n ( 1 + 5 2 ) n (1 + O(1/n)), ω(n) periodic : average 0.71208... and period 0.86792...

Some enumeration problems Number of trees with n keys? Proposition (Odlyzko 82) Define E(z) as the g.f. enumerating 2-3 trees w.r.t. number n of leaves = number n-1 of keys in internal nodes E(z) = z + E(z 2 + z 3 ). Radius of convergence : golden ratio 1+ 5 2 Number of 2-3 trees with n leaves : e n ω(n) n ( 1 + 5 2 ) n (1 + O(1/n)), ω(n) periodic : average 0.71208... and period 0.86792... Similar result for general B-trees?

Some enumeration problems Number of trees with height h? Proposition (Reingold 79) The number a h of 2-3 trees with height h satisfies the recurrence relation a h+1 = a h 2 + a h 3 with a 0 = 2. It is asymptotically equal to ( )) 1 a h = κ (1 3h + O with κ = 2.30992632... 2 3h First values (h 0) : 2, 12, 1872, 6563711232,... Known sequence?

Some enumeration problems

Some enumeration problems Hanoi tower : start from 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 00000000 11111111 00000000 11111111 000000000000 111111111111 000000000000 111111111111 000000000000 111111111111 000000000000 111111111111 000000000000 111111111111 0000000000000000 1111111111111111 0000000000000000 1111111111111111 and move disks (never more than one) ; a disk may never be atop a smaller one ; the end result should be 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 00000000 11111111 00000000 11111111 000000000000 111111111111 000000000000 111111111111 000000000000 111111111111 000000000000 111111111111 000000000000 111111111111 0000000000000000 1111111111111111 0000000000000000 1111111111111111

Some enumeration problems The number of different non-self-crossing ways of moving a tower of Hanoi from one peg onto another peg, with h + 1 disks, is given by the recurrence a h+1 = a h 2 + a h 3 (a 0 = 2)

Some enumeration problems The number of different non-self-crossing ways of moving a tower of Hanoi from one peg onto another peg, with h + 1 disks, is given by the recurrence a h+1 = a h 2 + a h 3 (a 0 = 2) This is exactly the recurrence for the number of 2-3 trees of height h! bijection between 2-3 trees of height h and sequences of non-self-crossing ways to move h + 1 disks?

Some enumeration problems Leaf with one key move a single disk from initial to final peg in one step Leaf with two keys move a single disk from initial to final peg in two steps Recursive structure of the tree recursive sequence of disk moves

Some enumeration problems

Some enumeration problems

Some enumeration problems

Some enumeration problems

Some enumeration problems

Some enumeration problems

Some enumeration problems

Some enumeration problems

Some enumeration problems Quickest way to solve the Hanoi problem thinnest 2-3 tree Slowest way to solve it without redundant moves fattest 2-3 tree Number of disk moves = number of keys in the 2-3 tree Bottom disk at height 1 root at level 0 Number of moves of bottom disk = number of keys in the root node Number of moves of disk at height i 1 = Number of keys at level i

Some enumeration problems Number of trees with height h? Proposition Asymptotic number b h of B-trees with parameter m, height h ( )) (µ+1) b h = κ h 1 m (1 + O, (m + 1) (µ+1)h with µ = 2m or 2m 1 and κ m = v 0 l 0 where c 0 = m + 1 and c h+1 = c h µ+1 ( 1 + 1 c l +... + 1 c m l ) 1 (µ+1) h+1. ( 1 + 1 +... + 1 ). c h c m h

Pólya urns Back to insertion in a B-tree Can we analyze the evolution of a B-tree (as done for binary search trees)? Balancing condition an insertion can have far-reaching consequences : modify the ancestor nodes on a path up to the root plus the sister nodes We can analyze what happens at the lower level

Pólya urns Fringe : Terminal nodes, according to the number of keys in each of them A terminal node has type k when it contains exactly m + k 2 keys (1 k m + 1) There are m + k 1 distinct ways to insert a key in such a node

Pólya urns B-tree with n keys X (k) n G (k) n : number of terminal nodes of type k : number of ways to insert a key in nodes of type k X n = X (1) ṇ. X (m) n ; G n = m G n = DX n with D = m + 1 G (1) ṇ. G (m) n... 2m 1

Pólya urns number of insertion possibilities of type k in a tree with n keys G (k) n G n = G (1) ṇ. G (m) n and replacement matrix m m + 1 R m = is a Pólya urn with m colors, balance S = 1, (m + 1) m + 2... (2m 2) 2m 1 2m (2m 1) The eigenvalues satisfy the equation (λ + m)... (λ + 2m 1) = (2m)! m!.

Pólya urns Equation for eigenvalues λ j (λ + m)... (λ + 2m 1) = (2m)! m! λ 1 = 1 λ 2, λ 2 conjugate with maximal real part < 1 ; σ 2 := R(λ 2 ) m σ 2 57 0.4775726941 58 0.4866133472 59 0.4953467200 60 0.5037882018 61 0.5119521623 62 0.5198520971

Pólya urns Variation of σ 2 according to m

Pólya urns Theorem ( Janson) G n vector for insertion possibilities Gaussian if m 59 : G n nv 1 n towards G converges in distribution non Gaussian if m 60 ) G n = nv 1 + 2R (n λ 2 W v 2 + o (n σ 2 ) with v 1, v 2 are deterministic vectors W is the limit of a complex-valued martingale o( ) is for a.s. and in all L p, p 1.