CSE 332: Data Structures & Parallelism Lecture 7: Dictionaries; Binary Search Trees. Ruth Anderson Winter 2019

Similar documents
CSE373: Data Structures & Algorithms Lecture 4: Dictionaries; Binary Search Trees. Kevin Quinn Fall 2015

Where we are. CSE373: Data Structures & Algorithms Lecture 4: Dictionaries; Binary Search Trees. The Dictionary (a.k.a. Map) ADT

CSE373: Data Structures & Algorithms Lecture 5: Dictionaries; Binary Search Trees. Aaron Bauer Winter 2014

CSE373: Data Structures & Algorithms Lecture 6: Binary Search Trees. Linda Shapiro Spring 2016

CSE373: Data Structures & Algorithms Lecture 5: Dictionary ADTs; Binary Trees. Linda Shapiro Spring 2016

Binomial Queue deletemin ADTs Seen So Far

CSE 326: Data Structures Binary Search Trees

Priority Queues Heaps Heapsort

CSE332: Data Abstractions Lecture 7: B Trees. James Fogarty Winter 2012

Lecture 23: Binary Search Trees

CSE 326: Data Structures Lecture #8 Binary Search Trees

CSE373: Data Structures & Algorithms Lecture 9: Priority Queues and Binary Heaps. Linda Shapiro Spring 2016

CSE 373 OCTOBER 11 TH TRAVERSALS AND AVL

CSE373: Data Structures & Algorithms Lecture 28: Final review and class wrap-up. Nicki Dell Spring 2014

CSE373: Data Structures & Algorithms Lecture 9: Priority Queues. Aaron Bauer Winter 2014

CSE 332: Data Structures & Parallelism Lecture 3: Priority Queues. Ruth Anderson Winter 2019

Trees. (Trees) Data Structures and Programming Spring / 28

Binary Trees. CSE 326: Data Structures Lecture #7 Binary Search Trees. Naïve Implementations. Dictionary & Search ADTs

Recall: Properties of B-Trees

CSE373: Data Structures & Algorithms Lecture 9: Priority Queues and Binary Heaps. Nicki Dell Spring 2014

Readings. Priority Queue ADT. FindMin Problem. Priority Queues & Binary Heaps. List implementation of a Priority Queue

Module 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree.

Self-Balancing Search Trees. Chapter 11

CSE 373 APRIL 17 TH TREE BALANCE AND AVL

CMSC 341 Lecture 14: Priority Queues, Heaps

UNIT III BALANCED SEARCH TREES AND INDEXING

CS24 Week 8 Lecture 1

Balanced Binary Search Trees. Victor Gao

Trees. CSE 373 Data Structures

3. Priority Queues. ADT Stack : LIFO. ADT Queue : FIFO. ADT Priority Queue : pick the element with the lowest (or highest) priority.

Announcements. Lilian s office hours rescheduled: Fri 2-4pm HW2 out tomorrow, due Thursday, 7/7. CSE373: Data Structures & Algorithms

Binary Heaps. COL 106 Shweta Agrawal and Amit Kumar

CPSC 221: Data Structures Lecture #5. CPSC 221: Data Structures Lecture #5. Learning Goals. Today s Outline. Tree Terminology.

3137 Data Structures and Algorithms in C++

Priority Queues Heaps Heapsort

Binary Search Trees. Analysis of Algorithms

CSE 373 Autumn 2010: Midterm #1 (closed book, closed notes, NO calculators allowed)

Sorted Arrays. Operation Access Search Selection Predecessor Successor Output (print) Insert Delete Extract-Min

Course Review. Cpt S 223 Fall 2009

Hash Tables. CS 311 Data Structures and Algorithms Lecture Slides. Wednesday, April 22, Glenn G. Chappell

CS301 - Data Structures Glossary By

Chapter 5 Data Structures Algorithm Theory WS 2016/17 Fabian Kuhn

FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 ( Marks: 1 ) - Please choose one The data of the problem is of 2GB and the hard

Computer Science 210 Data Structures Siena College Fall Topic Notes: Trees

B-Trees. Version of October 2, B-Trees Version of October 2, / 22

Binary Search Trees Treesort

COSC 2007 Data Structures II Final Exam. Part 1: multiple choice (1 mark each, total 30 marks, circle the correct answer)

Announcements. Midterm exam 2, Thursday, May 18. Today s topic: Binary trees (Ch. 8) Next topic: Priority queues and heaps. Break around 11:45am

CSE 373 Winter 2009: Midterm #1 (closed book, closed notes, NO calculators allowed)

Uses for Trees About Trees Binary Trees. Trees. Seth Long. January 31, 2010

CSE 373: Data Structures and Algorithms

Computational Optimization ISE 407. Lecture 16. Dr. Ted Ralphs

Section 4 SOLUTION: AVL Trees & B-Trees

CS350: Data Structures Heaps and Priority Queues

CSE 326: Data Structures Splay Trees. James Fogarty Autumn 2007 Lecture 10

CSE373 Fall 2013, Midterm Examination October 18, 2013

Algorithms. AVL Tree

Priority Queues and Heaps

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17

Programming Abstractions

Tree traversals and binary trees

Data Structure. A way to store and organize data in order to support efficient insertions, queries, searches, updates, and deletions.

TREES. Trees - Introduction

! Tree: set of nodes and directed edges. ! Parent: source node of directed edge. ! Child: terminal node of directed edge

CMSC351 - Fall 2014, Homework #2

(2,4) Trees. 2/22/2006 (2,4) Trees 1

Data Structures and Algorithms

Design and Analysis of Algorithms

CMSC 341 Priority Queues & Heaps. Based on slides from previous iterations of this course

CS350: Data Structures Red-Black Trees

CPSC 221: Algorithms and Data Structures ADTs, Stacks, and Queues

Learning Goals. CS221: Algorithms and Data Structures Lecture #3 Mind Your Priority Queues. Today s Outline. Back to Queues. Priority Queue ADT

CPSC 221: Algorithms and Data Structures Lecture #1: Stacks and Queues

Priority Queues. Lecture15: Heaps. Priority Queue ADT. Sequence based Priority Queue

Binary Trees, Binary Search Trees

Announcements. Problem Set 2 is out today! Due Tuesday (Oct 13) More challenging so start early!

INF2220: algorithms and data structures Series 1

Spring 2018 Mentoring 8: March 14, Binary Trees

Programming Abstractions

Course goals. exposure to another language. knowledge of specific data structures. impact of DS design & implementation on program performance

CSE373: Data Structures & Algorithms Priority Queues

Cpt S 223 Fall Cpt S 223. School of EECS, WSU

Priority Queues. Binary Heaps

Course Review. Cpt S 223 Fall 2010

COMP Analysis of Algorithms & Data Structures

Advanced Set Representation Methods

Taking Stock. IE170: Algorithms in Systems Engineering: Lecture 7. (A subset of) the Collections Interface. The Java Collections Interfaces

B+ Tree Review. CSE332: Data Abstractions Lecture 10: More B Trees; Hashing. Can do a little better with insert. Adoption for insert

(2,4) Trees Goodrich, Tamassia (2,4) Trees 1

Lecture 13: AVL Trees and Binary Heaps

Binary Heaps. CSE 373 Data Structures Lecture 11

CSE 373 Spring 2010: Midterm #1 (closed book, closed notes, NO calculators allowed)

3137 Data Structures and Algorithms in C++

Data Structure Advanced

COMP Analysis of Algorithms & Data Structures

CE 221 Data Structures and Algorithms

Algorithms in Systems Engineering ISE 172. Lecture 16. Dr. Ted Ralphs

Priority Queues (Heaps)

Multi-way Search Trees! M-Way Search! M-Way Search Trees Representation!

Queues. ADT description Implementations. October 03, 2017 Cinda Heeren / Geoffrey Tien 1

Transcription:

CSE 332: Data Structures & Parallelism Lecture 7: Dictionaries; Binary Search Trees Ruth Anderson Winter 2019

Today Dictionaries Trees 1/23/2019 2

Where we are Studying the absolutely essential ADTs of computer science and classic data structures for implementing them ADTs so far: 1. Stack: push, pop, isempty, 2. Queue: enqueue, dequeue, isempty, 3. Priority queue: insert, deletemin, Next: 4. Dictionary (a.k.a. Map): associate keys with values probably the most common, way more than priority queue 1/23/2019 3

The Dictionary (a.k.a. Map) ADT Data: set of (key, value) pairs keys must be comparable insert ( rea, Ruth Anderson) Operations: insert(key,val): - places (key,val) in map find (jhsia) (If key already used, overwrites existing entry) Justin Hsia, find(key): - returns val associated with key delete(key) rea Ruth Anderson jhsia Justin Hsia We will tend to emphasize the keys, but 1/23/2019 don t forget about the stored values! 4

Comparison: Set ADT vs. Dictionary ADT The Set ADT is like a Dictionary without any values A key is present or not (no repeats) For find, insert, delete, there is little difference In dictionary, values are just along for the ride So same data-structure ideas work for dictionaries and sets Java HashSet implemented using a HashMap, for instance Set ADT may have other important operations union, intersection, is_subset, etc. Notice these are binary operators on sets We will want different data structures to implement these operators 1/23/2019 5

A Modest Few Uses for Dictionaries Any time you want to store information according to some key and be able to retrieve it efficiently a dictionary is the ADT to use! Lots of programs do that! Networks: router tables Operating systems: page tables Compilers: symbol tables Databases: dictionaries with other nice properties Search: inverted indexes, phone directories, Biology: genome maps 1/23/2019 6

Simple implementations For dictionary with n key/value pairs Unsorted linked-list insert find delete Unsorted array Sorted linked list Sorted array We ll see a Binary Search Tree (BST) probably does better, but not in the worst case unless we keep it balanced 1/23/2019 7

Lazy Deletion (e.g. in a sorted array) A general technique for making delete as fast as find: Instead of actually removing the item just mark it deleted No need to shift values, etc. Plusses: Simpler Can do removals later in batches If re-added soon thereafter, just unmark the deletion Minuses: 10 12 24 30 41 42 44 45 50 Extra space for the is-it-deleted flag Data structure full of deleted nodes wastes space find O(log m) time where m is data-structure size (m >= n ) May complicate other operations 1/23/2019 9

Better Dictionary data structures Will spend the next several lectures looking at dictionaries with three different data structures 1. AVL trees Binary search trees with guaranteed balancing 2. B-Trees Also always balanced, but different and shallower B!=Binary; B-Trees generally have large branching factor 3. Hashtables Not tree-like at all Skipping: Other balanced trees (red-black, splay) 1/23/2019 10

Why Trees? Trees offer speed ups because of their branching factors Binary Search Trees are structured forms of binary search 1/23/2019 11

Binary Search find(4) 1 3 4 5 7 8 9 10 1/23/2019 12

Binary Search Tree Our goal is the performance of binary search in a tree representation 1 3 4 5 7 8 9 10 1/23/2019 13

Why Trees? Trees offer speed ups because of their branching factors Binary Search Trees are structured forms of binary search Even a basic BST is fairly good Insert Find Delete Worse-Case O(n) O(n) O(n) Average-Case O(log n) O(log n) O(log n) 1/23/2019 14

Binary Trees Binary tree is empty or a root (with data) a left subtree (maybe empty) a right subtree (maybe empty) A Representation: B C Data D E F left pointer right pointer G H For a dictionary, data will include a key and a value I J 1/23/2019 15

Binary Tree: Some Numbers Recall: height of a tree = longest path from root to leaf (count # of edges) For binary tree of height h: max # of leaves: max # of nodes: min # of leaves: min # of nodes: 1/23/2019 16

Calculating height What is the height of a tree with root root? int treeheight(node root) {??? } 1/23/2019 18

Tree Traversals A traversal is an order for visiting all the nodes of a tree Pre-order: root, left subtree, right subtree + In-order: left subtree, root, right subtree * 5 Post-order: left subtree, right subtree, root 2 4 (an expression tree) 1/23/2019 20

More on traversals void inordertraversal(node t){ if(t!= null) { traverse(t.left); process(t.element); traverse(t.right); } } D B E A F C G Sometimes order doesn t matter Example: sum all elements Sometimes order matters Example: print tree with parent above indented children (pre-order) Example: evaluate an expression tree (post-order) A B C D E F G 1/23/2019 22

Binary Search Tree Structural property ( binary ) each node has 2 children result: keeps operations simple 8 Order property all keys in left subtree smaller than node s key all keys in right subtree larger than node s key 2 5 6 10 11 12 result: easy to find any given key 4 7 9 14 13 1/23/2019 23

Are these BSTs? 5 8 4 8 5 11 1 7 11 2 7 6 10 18 3 4 15 20 21 1/23/2019 24

Find in BST, Recursive 2 5 12 9 15 7 17 10 20 30 Data find(key key, Node root){ if(root == null) return null; if(key < root.key) return find(key,root.left); if(key > root.key) return find(key,root.right); return root.data; } 1/23/2019 26

Find in BST, Iterative 2 5 12 9 15 7 17 10 20 30 Data find(key key, Node root){ while(root!= null && root.key!= key) { if(key < root.key) root = root.left; else(key > root.key) root = root.right; } if(root == null) return null; return root.data; } 1/23/2019 27

Other finding operations Find minimum node 12 Find maximum node 5 15 2 9 20 7 17 10 30 1/23/2019 28

Insert in BST 12 5 15 insert(13) insert(8) insert(31) 2 9 20 7 10 17 30 (New) insertions happen only at leaves easy! 1. Find 2. Create a new node 1/23/2019 29

Deletion in BST 12 5 15 2 9 20 7 10 17 30 Why might deletion be harder than insertion? 1/23/2019 30

Deletion Removing an item disrupts the tree structure Basic idea: find the node to be removed, Remove it fix the tree so that it is still a binary search tree Three cases: node has no children (leaf) node has one child node has two children 1/23/2019 31

Deletion The Leaf Case delete(17) 12 5 15 2 9 20 7 10 17 30 1/23/2019 32

Deletion The One Child Case delete(15) 12 5 15 2 9 20 7 10 30 1/23/2019 33

Deletion The Two Child Case delete(5) 5 12 20 2 9 30 7 10 What can we replace 5 with? 1/23/2019 34

Deletion The Two Child Case Idea: Replace the deleted node with a value guaranteed to be between the two child subtrees Options: successor from right subtree: findmin(node.right) predecessor from left subtree: findmax(node.left) These are the easy cases of predecessor/successor Now delete the original node containing successor or predecessor Leaf or one child case easy cases of delete! 1/23/2019 35

Delete Using Successor 12 12 5 20 7 20 2 9 30 2 9 30 7 10 10 findmin(right sub tree) 7 delete(5) 1/23/2019 36

Delete Using Predecessor 12 12 5 20 2 20 2 9 30 9 30 7 10 7 10 findmax(left sub tree) 2 delete(5) 1/23/2019 37

BuildTree for BST We had buildheap, so let s consider buildtree Insert keys 1, 2, 3, 4, 5, 6, 7, 8, 9 into an empty BST If inserted in given order, what is the tree? 1 What big-o runtime for this kind of sorted input? Is inserting in the reverse order any better? 2 3 1/23/2019 38

Balanced BST Observation BST: the shallower the better! For a BST with n nodes inserted in arbitrary order Average height is O(log n) see text for proof Worst case height is O(n) Simple cases such as inserting in key order lead to the worst-case scenario Solution: Require a Balance Condition that 1. ensures depth is always O(log n) strong enough! 2. is easy to maintain not too strong! 1/23/2019 40

Potential Balance Conditions 1. Left and right subtrees of the root have equal number of nodes 2. Left and right subtrees of the root have equal height 1/23/2019 41

Potential Balance Conditions 3. Left and right subtrees of every node have equal number of nodes 4. Left and right subtrees of every node have equal height 1/23/2019 43

The AVL Balance Condition Left and right subtrees of every node have heights differing by at most 1 Definition: balance(node) = height(node.left) height(node.right) AVL property: for every node x, 1 balance(x) 1 Ensures small depth Will prove this by showing that an AVL tree of height h must have a number of nodes exponential in h Easy (well, efficient) to maintain Using single and double rotations 1/23/2019 45