x-fast and y-fast Tries

Similar documents
x-fast and y-fast Tries

Outline for Today. How can we speed up operations that work on integer data? A simple data structure for ordered dictionaries.

Outline for Today. How can we speed up operations that work on integer data? A simple data structure for ordered dictionaries.

Predecessor. Predecessor Problem van Emde Boas Tries. Philip Bille

Predecessor Data Structures. Philip Bille

Predecessor. Predecessor. Predecessors. Predecessors. Predecessor Problem van Emde Boas Tries. Predecessor Problem van Emde Boas Tries.

Balanced Trees Part Two

Lecture 9 March 4, 2010

Range Minimum Queries Part Two

Fusion Trees Part Two

Lecture 12 March 21, 2007

Balanced Trees Part One

Outline for Today. Why could we construct them in time O(n)? Analyzing data structures over the long term.

Friday Four Square! 4:15PM, Outside Gates

CS 361, Lecture 21. Outline. Things you can do. Things I will do. Evaluation Results

Range Minimum Queries Part Two

Outline for Today. Euler Tour Trees Revisited. The Key Idea. Dynamic Graphs. Implementation Details. Dynamic connectivity in forests.

COMP Analysis of Algorithms & Data Structures

Integer Algorithms and Data Structures

TREES. Trees - Introduction

Outline for Today. Analyzing data structures over the long term. Why could we construct them in time O(n)? A simple and elegant queue implementation.

MITOCW watch?v=ninwepprkdq

Outline for Today. Towards Perfect Hashing. Cuckoo Hashing. The Cuckoo Graph. Analysis of Cuckoo Hashing. Reducing worst-case bounds

Lecture 6: Analysis of Algorithms (CS )

Quiz 1 Solutions. (a) f(n) = n g(n) = log n Circle all that apply: f = O(g) f = Θ(g) f = Ω(g)

Announcements. Problem Set 2 is out today! Due Tuesday (Oct 13) More challenging so start early!

Outline for Today. A quick refresher on simple and efficient hash tables. Hashing with worst-case O(1) lookups.

Binary Search Trees. Nearest Neighbor Binary Search Trees Insertion Predecessor and Successor Deletion Algorithms on Trees.

If you took your exam home last time, I will still regrade it if you want.

Binary Search Trees. Binary Search Trees. Nearest Neighbor. Nearest Neighbor

Binary Trees, Binary Search Trees

Introduction to Algorithms March 11, 2009 Massachusetts Institute of Technology Spring 2009 Professors Sivan Toledo and Alan Edelman Quiz 1

University of Waterloo CS240R Fall 2017 Solutions to Review Problems

Trees. (Trees) Data Structures and Programming Spring / 28

CS102 Binary Search Trees

Lecture 8 13 March, 2012

Lec 17 April 8. Topics: binary Trees expression trees. (Chapter 5 of text)

Randomized incremental construction. Trapezoidal decomposition: Special sampling idea: Sample all except one item

Analysis of Algorithms

Lecture: Analysis of Algorithms (CS )

Computational Optimization ISE 407. Lecture 16. Dr. Ted Ralphs

Operations on Heap Tree The major operations required to be performed on a heap tree are Insertion, Deletion, and Merging.

Summer Final Exam Review Session August 5, 2009

Lecture 3: Cuckoo Hashing (ctd.) Predecessor Data Structures. Johannes Fischer

Binary Search Trees, etc.

TREES Lecture 12 CS2110 Spring 2019

( ) n 5. Test 1 - Closed Book

Practice Midterm Exam Solutions

CS301 - Data Structures Glossary By

Design and Analysis of Algorithms Lecture- 9: Binary Search Trees

Augmenting Data Structures

Hierarchical data structures. Announcements. Motivation for trees. Tree overview

1. Meshes. D7013E Lecture 14

Lecture 23: Binary Search Trees

CS 350 : Data Structures Binary Search Trees

Today s Outline. CS 561, Lecture 8. Direct Addressing Problem. Hash Tables. Hash Tables Trees. Jared Saia University of New Mexico

Welcome to CS166! Four handouts available up front. Today: Also available online! Why study data structures? The range minimum query problem.

Balanced Search Trees. CS 3110 Fall 2010

Announcements. Midterm exam 2, Thursday, May 18. Today s topic: Binary trees (Ch. 8) Next topic: Priority queues and heaps. Break around 11:45am

Algorithms in Systems Engineering ISE 172. Lecture 16. Dr. Ted Ralphs

V Advanced Data Structures

Intro. Speed V Growth

COMP Analysis of Algorithms & Data Structures

CS 261 Spring 2013 Midterm

CS127: B-Trees. B-Trees

(2,4) Trees. 2/22/2006 (2,4) Trees 1

Data Structures and Algorithms

CS251-SE1. Midterm 2. Tuesday 11/1 8:00pm 9:00pm. There are 16 multiple-choice questions and 6 essay questions.

CSCI Trees. Mark Redekopp David Kempe

Some Search Structures. Balanced Search Trees. Binary Search Trees. A Binary Search Tree. Review Binary Search Trees

Binary Search Trees. See Section 11.1 of the text.

COMP Analysis of Algorithms & Data Structures

CS350: Data Structures Binary Search Trees

V Advanced Data Structures

BINARY SEARCH TREES cs2420 Introduction to Algorithms and Data Structures Spring 2015

Binary search trees. Support many dynamic-set operations, e.g. Search. Minimum. Maximum. Insert. Delete ...

18.3 Deleting a key from a B-tree

Binary Trees. Recursive definition. Is this a binary tree?

SFU CMPT Lecture: Week 9

Introduction. for large input, even access time may be prohibitive we need data structures that exhibit times closer to O(log N) binary search tree

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology

Advanced Set Representation Methods

Problem Set 5 Due: Friday, November 2

Midterm solutions. n f 3 (n) = 3

Binary search trees. Binary search trees are data structures based on binary trees that support operations on dynamic sets.

W4231: Analysis of Algorithms

Ensures that no such path is more than twice as long as any other, so that the tree is approximately balanced

We've tried to include the common errors and grading standard for every question.

Cuckoo Hashing for Undergraduates

Data Structure and Algorithm Midterm Reference Solution TA

Search Trees. Undirected graph Directed graph Tree Binary search tree

Balanced Search Trees

Lecture Summary CSC 263H. August 5, 2016

Algorithms. AVL Tree

Binary search trees (BST) Binary search trees (BST)

B-trees. It also makes sense to have data structures that use the minimum addressable unit as their base node size.

Notes on Binary Dumbbell Trees

Trees. Chapter 6. strings. 3 Both position and Enumerator are similar in concept to C++ iterators, although the details are quite different.

Searching: Introduction

Motivation for B-Trees

Transcription:

x-fast and y-fast Tries Problem Problem Set Set 7 due due in in the the box box up up front. front. That's That's the the last last problem problem set set of of the the quarter! quarter!

Outline for Today Bitwise Tries A simple ordered dictionary for integers. x-fast Tries Tries + Hashing y-fast Tries Tries + Hashing + Subdivision + Balanced Trees + Amortization

Recap from Last Time

Ordered Dictionaries An ordered dictionary is a data structure that maintains a set S of elements drawn from an ordered universe and supports these operations: insert(x), which adds x to S. is-empty(), which returns whether S = Ø. lookup(x), which returns whether x S. delete(x), which removes x from S. max() / min(), which returns the maximum or minimum element of S. successor(x), which returns the smallest element of S greater than x, and predecessor(x), which returns the largest element of S smaller than x.

Integer Ordered Dictionaries Suppose that = [ U] = {, 1,, U 1}. A van Emde Boas tree is an ordered dictionary for [U] where min, max, and is-empty run in time O(1). All other operations run in time O(log log U). Space usage is Θ(U). Question: Can we achieve these same time bounds without using Θ(U) space?

The Machine Model We assume a transdichotomous machine model: Memory is composed of words of w bits each. Basic arithmetic and bitwise operations on words take time O(1) each. w = Ω(log n).

A Start: Bitwise Tries

Tries Revisited Recall: A trie is a simple data structure for storing strings. Integers can be thought of as strings of bits. Idea: Store integers in a bitwise trie. 1 1 1 1 1 1 1 1 1 1

Finding Successors To compute successor(x), do the following: Search for x. If x is a leaf node, its successor is the next leaf. If you don't find x, back up until you find a node with a 1 child not already followed, follow the 1, then take the cheapest path down. 1 1 1 1 1 1 1 1 1 1

Bitwise Tries When storing integers in [U], each integer will have Θ(log U) bits. Time for any of the ordered dictionary operations: O(log U). In order to match the time bounds of a van Emde Boas tree, we will need to speed this up exponentially.

Speeding up Successors There are two independent pieces that contribute to the O(log U) runtime: Need to search for the deepest node matching x that we can. From there, need to back up to node with an unfollowed 1 child and then descend to the next leaf. To speed this up to O(log log U), we'll need to work around each of these issues.

????? 1???? 1???? 1???? 1????????? 1?? 11?? 1?? 1?? 11?? 1111?? 1? 11? 1????? 1??? 1??? 11??? 111??? 1???????? 1 1 1 1 1? 1?????? 1????? 11???? 1 1 11? 1111? 1 1 1 1 1 11 111 11 1 111 1111 11111 1

????? 1???? 1???? 1???? 1????????? 1?? 11?? 1?? 1?? 11?? 1111?? 1? 11? 1????? 1??? 1??? 11??? 111??? 1???????? 1 1 1 1 Claim Claim 1: 1: The The node node found found during during the the first 1?????? first phase phase of of a successor successor query query for for x x corresponds corresponds to to the the longest longest prefix prefix of of x x that that appears appears in in 1????? the the trie. trie. 1? 11???? 1 1 11? 1111? 1 1 1 1 1 11 111 11 1 111 1111 11111 1

????? 1???? 1???? 1???? 1????????? 1?? 11?? 1?? 1?? 11?? 1111?? 1? 11? 1????? 1??? 1??? 11??? 111??? 1???????? 1 1 1 1 1? Claim Claim 2: 2: If If a node node v v corresponds 1?????? corresponds to to a prefix prefix of of x, x, all all of of v's v's ancestors ancestors correspond correspond to to prefixes prefixes of of x. x. 1????? 11???? 1 1 11? 1111? 1 1 1 1 1 11 111 11 1 111 1111 11111 1

????? 1???? 1???? 1???? 1????????? 1?? 11?? 1?? 1?? 11?? 1111?? 1? 11? 1????? 1??? 1??? 11??? 111??? 1???????? 1 1 1 1 1? Claim Claim 3: 3: If If a node node v v does does not not correspond 1?????? correspond to to a prefix prefix of of x, x, none none of of v's v's descendants descendants correspond correspond to to prefixes prefixes of of x. x. 1????? 11???? 1 1 11? 1111? 1 1 1 1 1 11 111 11 1 111 1111 11111 1

????? 1???? 1???? 1???? 1????????? 1?? 11?? 1?? 1?? 11?? 1111?? 1? 11? 1????? 1??? 1??? 11??? 111??? 1???????? 1 1 1 1 1? Claim Claim 4: 4: The The deepest deepest node node corresponding corresponding to 1?????? to a prefix prefix of of x x can can be be found found by by doing doing a binary binary search search over over the the 1????? layers layers of of the the trie. trie. 11???? 1 1 11? 1111? 1 1 1 1 1 11 111 11 1 111 1111 11111 1

One Speedup Goal: Encode the trie so that we can do a binary search over its layers. One Solution: Store an array of cuckoo hash tables, one per layer of the trie, that stores all the nodes in that layer. Can now query, in worst-case time O(1), whether a node's prefix is present on a given layer. There are O(log U) layers in the trie. Binary search will take worst-case time O(log log U). Nice side-effect: Queries are now worst-case O(1), since we can just check the hash table at the bottom layer.

The Next Issue We can now find the node where the successor search would initially arrive. However, after arriving there, we have to back up to a node with a 1 child we didn't follow on the path down. This will take time O(log U). Can we do better?

A Useful Observation Our binary search for the longest prefix of x will either stop at a leaf node (so x is present), or an internal node. If we stop at a leaf node, the successor will be the next leaf in the trie. Idea: Thread a doubly-linked list through the leaf nodes.

????? 1???? 1???? 1???? 1????????? 1?? 11?? 1?? 1?? 11?? 1111?? 1? 11? 1????? 1??? 1??? 11??? 111??? 1???????? 1 1 1 1 1? 1?????? 1????? 11???? 1 1 11? 1111? 1 1 1 1 1 11 111 11 1 111 1111 11111 1

Successors of Internal Nodes Claim: If the binary search terminates at an internal node, that node must only have one child. If it doesn't, it has both a child and a 1 child, so there's a longer prefix that can be matched. Idea: Steal the missing pointer and use it to speed up successor and predecessor searches.

Threaded Binary Tries A threaded binary trie is a binary tree where 1 each missing pointer points to the inorder predecessor of the node and each missing 1 points to the inorder successor of the node. Related to threaded binary search trees; read up on them if you're curious! null 1 1 1 1 1 1 1 1 1 1

x-fast Tries An x-fast Trie is a threaded binary trie where leaves are stored in a doubly-linked list and where all nodes in each level are stored in a hash table. Can do lookups in time O(1). null 1 1 1 1 1 1 1 1 1 1 1

x-fast Tries Claim: Can determine successor(x) in time O(log log U). 1 Start by binary searching for the longest prefix of x. If at a leaf node, follow the forward pointer to the successor. If at an internal node with a missing 1, follow the 1 thread. If at an internal node with a missing, follow the thread and follow the forward pointer. null 1 1 1 1 1 1 1 1 1 1

x-fast Trie Maintenance Based on what we've seen: Lookups take worst-case time O(1). Successor and predecessor queries take worst-case time O(log log U). Min and max can be done in time O(log log U) by finding the predecessor of or the successor of -. How efficiently can we support insertions and deletions?

x-fast Tries If we insert(x), we need to Add some new nodes to the trie. Wire x into the doubly-linked list of leaves. Update the thread pointers to include x. Worst-case will be Ω(log U) due to the first and third steps. null 1 1 1 1 1 1 1 1 1

x-fast Tries Here is an (amortized, expected) O(log U) time algorithm for insert(x): 1 Find successor(x). Add x to the trie. null 1 1 Using the successor from before, wire x into the linked list. Walk up from x, its successor, and its predecessor and update threads. 1 1 1 1 1 1 1 11

x-fast Tries Here is an (amortized, expected) O(log U) time algorithm for insert(x): 1 Find successor(x). Add x to the trie. null 1 1 Using the successor from before, wire x into the linked list. Walk up from x, its successor, and its predecessor and update threads. 1 1 1 1 1 1 1 11

x-fast Tries Here is an (amortized, expected) O(log U) time algorithm for insert(x): 1 Find successor(x). Add x to the trie. null 1 1 Using the successor from before, wire x into the linked list. Walk up from x, its successor, and its predecessor and update threads. 1 1 1 1 1 1 1 11

Deletion To delete(x), we need to Remove x from the trie. Splice x out of its linked list. Update thread pointers from x's former predecessor and successor. Runs in expected, amortized time O(log U). Full details are left as a proverbial Exercise to the Reader.

Space Usage How much space is required in an x-fast trie? Each leaf node contributes at most O(log U) nodes in the trie. Total space usage for hash tables is proportional to total number of trie nodes. Total space: O(n log U).

For Reference van Emde Boas tree insert: O(log log U) delete: O(log log U) lookup: O(log log U) max: O(1) succ: O(log log U) is-empty: O(1) Space: O(U) x-fast Trie insert: O(log U) * delete: O(log U) * lookup: O(1) max: O(log log U) succ: O(log log U) is-empty: O(1) Space: O(n log U) * Expected, amortized

What Remains We need to speed up insert and delete to run in time O(log log U). We'd like to drop the space usage down to O(n). How can we do this? x-fast Trie insert: O(log U) * delete: O(log U) * lookup: O(1) max: O(log log U) succ: O(log log U) is-empty: O(1) Space: O(n log U) * Expected, amortized

Time-Out for Announcements!

Midterm: Tonight, 7PM 1PM. Good luck!

Your Questions!

Can you release solutions to PS7 at the end of class so that we can review them before the exam? Yep! Yep! I'll I'll hand hand them them out out at at the the end end of of lecture.

What is your process when writing homework/practice/test/etc. problems? How do you come up with them? For For the the theory questions, I I mostly read read everything I I can can get get my my hands on. on. For For the the coding questions, I I try try picking coding questions that that either solidify details or or have have an an unexpected result.

What are the mean scores on the assignments? What's the grade curve going to be like on the class overall? Trying to figure out whether to switch to C/NC... <3 I I honestly don't don't know. know. The The curve curve will will depend on on how how the the midterm and and final final projects end end up up turning out. out.

Why is there a problem set due the same day as the exam? It's It's the the best best out out of of a lot lot of of not not particularly good good options.

Back to CS166!

y-fast Tries

y-fast Tries The y-fast Trie is a data structure that will match the veb time bounds in an expected, amortized sense while requiring only O(n) space. It's built out of an x-fast trie and a collection of red/black trees.

The Motivating Idea Suppose we have a red/black tree with Θ(log U) nodes. Any ordered dictionary operation on the tree will then take time O(log log U). Idea: Store the elements in the ordered dictionary in a collection of red/black trees with Θ(log U) elements each.

The Idea Each Each of of these these trees trees has has between ½ log log U and and 2 log log U nodes. - 54 65-91 13-133 154-258

The Idea To To perform lookup(x), we we determine which tree tree would contain x, x, then then check check there. - 54 65-91 13-133 154-258

The Idea If If a tree tree gets gets too too big, big, we we can can split split it it into into two two trees trees by by cutting at at the the median element. - 54 65-91 13-133 154-181 221-258

The Idea Similarly, if if trees trees get get too too small, we we can can concatenate the the tree tree with with a neighbor. - 91 13-133 154-181 221-258

The Idea That That might might create a tree tree that's that's too too big, big, in in which case case we we split split it it in in half. half. - 91 13-133 154-181 221-258

The Idea To To determine successor(x), we we find find the the tree tree that that would contain x, x, and and take take its its successor there there or or the the minimum value value from from the the next next tree. tree. - 91 13-133 154-181 221-258

The Idea How How do do we we efficiently determine which tree tree a given given element belongs to? to? - 91 13-133 154-181 221-258

The Idea To To do do lookup(x), find find successor(x) in in the the set set of of maxes, then then go go into into the the preceding tree. tree. 91 133 181-91 13-133 154-181 221-258

The Idea To To determine successor(x), find find successor(x) in in the the maxes, then then return the the successor of of x in in that that subtree or or the the min min of of the the next next subtree. 91 133 181-91 13-133 154-181 221-258

The Idea To To insert(x), compute successor(x) and and insert insert x into into the the tree tree before it. it. If If the the tree tree splits, insert insert a new new max max into into the the top top list. list. 91 133 181-91 13-133 154-181 221-258

The Idea To To delete(x), do do a lookup for for x and and delete it it from from that that tree. tree. If If x was was the the max max of of a tree, tree, don't don't delete it it from from the the top top list. list. Contract trees trees if if necessary. 91 133 181-91 13-133 154-181 221-258

The Idea How How do do we we store store the the set set of of maxes so so that that we we get get efficient successor queries? 91 133 181-91 13-133 154-181 221-258

y-fast Tries A y-fast Trie is constructed as follows: Keys are stored in a collection of red/black trees, each of which has between ½ log U and 2 log U keys. From each tree (except the first), choose a representative element. Representatives demarcate the boundaries between trees. Store each representative in the x-fast trie. Intuitively: The x-fast trie helps locate which red/black trees need to be consulted for an operation. Most operations are then done on red/black trees, which then take time O(log log U) each.

Analyzing y-fast Tries The operations lookup, successor, min, and max can all be implemented by doing O(1) BST operations and one call to successor in the x-fast trie. Total runtime: O(log log U). insert and delete do O(1) BST operations, but also have to do O(1) insertions or deletions into the x-fast trie. Total runtime: O(log U). or is it?

Analyzing y-fast Tries Each insertion does O(log log U) work inserting and (potentially) splitting a red/black tree. The insertion in the x-fast trie takes time O(log U). However, we only split a red/black tree if its size doubles from log U to 2 log U, so we must have done at least O(log U) insertions before we needed to split. The extra cost amortizes across those operations to O(1), so the amortized cost of an insertion is O(log log U).

Analyzing y-fast Tries Each deletion does O(log log U) work deleting from, (potentially) joining a red/black tree, and (potentially) splitting the resulting red/black tree. The insertions and deletions in the x-fast trie take time at most O(log U). However, we only join a tree with its neighbor if its size dropped from log U to ½ log U, which means there were O(log U) intervening deletions. The extra cost amortizes across those operations to O(1), so the amortized cost of an insertion is O(log log U).

Space Usage So what about space usage? Total space used across all the red/black trees is O(n). The x-fast trie stores Θ(n / log U) total elements. Space usage: Θ((n / log U) log U) = Θ(n). We're back down to linear space!

For Reference van Emde Boas tree insert: O(log log U) delete: O(log log U) lookup: O(log log U) max: O(1) succ: O(log log U) is-empty: O(1) Space: O(U) x-fast Trie insert: O(log log U) * delete: O(log log U) * lookup: O(log log U) max: O(log log U) succ: O(log log U) is-empty: O(1) Space: O(n) * Expected, amortized.

What We Needed An x-fast trie requires tries and cuckoo hashing. The y-fast trie requires amortized analysis and split/join on balanced, augmented BSTs. y-fast tries also use the blocking technique from RMQ we used to shave off log factors.

Next Time Disjoint-Set Forests A data structure for incremental connectivity in general graphs. The Ackermann Inverse Function One of the slowest-growing functions you'll ever encounter in practice.

Why All This Matters

Best of luck on the exam!