Trees. Prof. Dr. Debora Weber-Wulff

Trees Prof. Dr. Debora Weber-Wulff Flickr, _marmota, 2007

Major Sources Michell Waite & Robert Lafore, Data Structures & Algorithms in Java Michael T. Goodrich and Roberto Tamassia Data Structures and Algorithms in Java 3

Why use trees? You can search quickly in ordered arrays but inserting is a pain You can easily insert into a linked list, but searching is a pain Is there a structure that affords quick searching and easy insertion? Yes, that s what a tree will offer! 4

Trees Computer science trees grow the "wrong way round"! 5

Trees Root 1. One node is the root. The circles are nodes, the arrows edges. 6

Trees Parent Child 1. One node is the root. 2. Every node c except the root has exactly one parent p. c is called p s child. 7

Trees Path 1. One node is the root. 2. Every node c except the root has exactly one parent p. c is called p s child. 3. There is a unique path from the root to each node. 8

Trees Leaf 1. One node is the root. 2. Every node c except the root has exactly one parent p. c is called p s child. 3. There is a unique path from the root to each node. 4. Childless nodes are called leaves. 9

Trees 1. One node is the root 2. Every node c except the root has exactly one parent p. c is called p s child. 3. There is a unique path from the root to each node. 4. Childless nodes are called leaves. 10

Tree-based structure File System My Computer A: C: E: dww Programs Tmp Windows CS101 CS102 11

Tree-based structure my Computer A: C: dww CS101 CS102 Programs Line-at-a-time Tree 12

Tree-based structure AnimationFigure Head Body Torso Chest Clothing Legs IDGD, HTW Berlin, 2012 13

Tree-based structure Book Chapter 1 Chapter 2 Chapter 2.2 Chapter 2.2.1 Chapter 2.2.2 Chapter 2.3 CC-BY-SA, Debora Weber-Wulff 14

Binary Search Tree Any node has a maximum of two children. D B G A C E F Useful for searching Comparable objects. The objects are stored in an ordered fashion. 15

Binary Search Tree D B G A C E F Simple implementation à average case O(log N) worst case O(N) degenerate Careful implementation à worst case O(log N) 16

The Node Class for Trees public class Node { Comparable data; Node leftchild; Node rightchild; public String displaynode() { // } data leftchild rightchild 17

The Tree Class public class Tree { private Node root; public Node find(comparable obj) { } public void insert(comparable obj) { } public void delete(comparable obj) { } public boolean isempty() { return root == null; } } 18

Finding a Node public Node find(comparable obj) { Node curr = root; while(curr!= null) { switch (curr.data.compareto(obj)) { case 1 : curr = curr.leftchild; break; case 0 : return curr; case -1 : curr = curr.rightchild; break; } } // didn't find it return null; } 19

Finding a Node recursively (Step 1) public Node find(comparable obj) { return find (obj, root); } 20

Finding a Node recursively (Step 2) public Node find(comparable obj, Node n) { if (n == null) return null; switch (n.data.compareto(obj)) { 1 : return find (obj, n.leftchild); 0 : return n; -1 : return find (obj, n.rightchild);} } } 21

Complexity of finding a node? At each step of the loop or the recursion we are only looking at half of the remaining nodes. This makes the complexity O(log N) 22

Inserting a new Node 1. Create the Node, call it newnode 2. Is the root == null? then root = newnode 3. set current = root; 4. while(true) do... root data leftchild rightchild 23

Inserting a new Node 4. while(true) do 4.1. Remember parent as curr 4.2. Go left if curr > obj if current now null, parent.leftchild = newnode return 4.3. else go right if current now null parent.rightchild = newnode return Worst-case? O(N) 24

Result of Insertion Order: D, B, A, F, G, C, E, H D B F A C E G H 25

The order is important! Order: E, B, A, D, G, C, F, H E B G A D F H C 26

An Animation 28

A Degenerate Tree Order: A, B, C, D, E, F, G, H A B We ll see how to avoid this problem next lecture. C D E F G H 29

Tree Traversal If you want to iterate something for every element in a tree, you want to be doing tree traversal or tree-walking. This means that you visit all of the nodes according to a pre-specified rule. There are four different kinds: in-order, pre-order, post-order, level-order. 30

Tree Traversal b a c In-order a b c Pre-order b a c Post-order a c b Level-order b d e a c d b e a c f f 31

Traversing the Tree private void inorder(node local) { if (local!= null) { inorder(local.leftchild); local.displaynode(); inorder(local.rightchild); } } Application: mytree.inorder(root) 32

Traversing the Tree In-order: Pre-order: Post-order: 1. left subtree 2. visit 3. right subtree 1. visit 2. left subtree 3. right subtree 1. left subtree 2. right subtree 3. visit 33

Keyed Trees We often find trees that are not just comparable, but which have a key value in an extra field in order to make some operations faster. key 42 data leftchild rightchild 34

Searching Trees Data in node 3 27 39 1 15 14 Data in leaf, only key in node 1 1 15 3 14 15 27 27 39 3 14 35

Got to here 2018-06-05 36

Deleting a Node Most complicated binary tree operation. But studying the details builds character! 1. Find the node 2. Which case? 2.1 Node is a leaf à 2.2 Node has one child à 2.3 Node has two children à 2.1 Easy! Set child field in parent to null, garbage collection will do the work. 37

2.2 Not too difficult "Snip" the node out by having the parent s child pointer point directly to the child 80 80 52 52 48 63 71 67 delete 48 63 67 Watch out: 4 variations! Node can be left or right child of the parent, and the child can be left or right child. 38

2.3 Fun Replace the node with its in-order successor. 15 5 20 25 50 35 30 47 The in-order successor is the smallest of the set of the nodes that are larger than the original node. Go right, then left, left, left Until no leftchild. 2.3.1 Successor is rightchild of node to be deleted 2.3.2 Successor is left descendent of rightchild 2.3.3 If the successor has a rightchild Fix it! Insert and rearrange! 39 15 5 20 30 50 35 47

Aaaaaargh! Some programmers are afraid of implementing this. They just add a field to a node: isdeleted and skip over them when searching in a tree. 40

How can we avoid degenerate trees? This is a complex algorithm! We have to have balanced trees, that is, we must not insert a node if it inbalances the tree. When the insertion point is found and the node has been inserted, the tree is tested for imbalance, i.e. the maximum path length is computed. If this is longer than k/2, where k is the number of nodes in the tree, then a rebalancing algorithm "shakes up" the tree and selects a new root node. 42

AVL Tree Self-balancing binary search tree Published by the Russian inventors, G.M. Adel son- Vel skiў and E.M. Landis in 1962 The heights of the two child subtrees of any node differ by at most one Lookup, insertion, and deletion all take O(log n) time in both the average and worst cases. AVL-Tutorial: Data Structures and Algorithms with Object- Oriented Design Patterns in Java http://www.brpreiss.com/books/opus5/html 43

Balanced Trees We could say that a binary tree is balanced if the left and right subtrees of every node have the same height. But then the only trees which are balanced are the perfect binary trees. A perfect binary tree of height h has exactly 2 h+1-1 internal nodes. Therefore, it is only possible to create perfect trees with n nodes for n = 1, 3, 7, 15, 31,... nodes. This is rather not what we want. 44

A Better Balance Condition What are the characteristics of a good balance condition? 1. A good balance condition ensures that the height of a tree with n nodes is O (log n) 2. A good balance condition can be maintained efficiently. That is, the additional work necessary to balance the tree when an item is inserted or deleted is O(1) 45

AVL Balance Condition If we write a tree as (root, T left, T right ), then we can say that it is balanced if height left - height right <= 1 It can be shown that all trees that satisfy the AVL balance condition are logarithmic in the number of internal nodes. 46

Inserting in AVL Trees Inserting an item into an AVL tree is a two-part process. First, the item is inserted into the tree using the usual insertion method in binary search trees. After the item has been inserted, it is necessary to check that the resulting tree is still AVL balanced and to balance the tree when it is not. The heights of all the nodes along the access path must be recomputed and the AVL balance condition must be checked if it is violated, the structure of the tree must be changed. 47

Problem: Insert 5 unbalances tree 7 7 4 4 5

Rotation When an AVL tree becomes unbalanced, it is possible to bring it back into balance by performing an operation called a rotation. It turns out that there are only four cases to consider and each case has its own rotation. A node is rotated when it trades places with its in-order predecessor. The rotation function is its own inverse function, if you rotate again, you get the same tree again. 49

rotate left B D rotate F B F F 1 D 2 3 3 1 2

rotate right B F rotate D B D D 3 F 1 2 1 2 3

All four: Single rotations Rotate left child left (LL) Rotate right child right (RR) Double rotations Rotate left child right (LR) Rotate right child left (RL)

Properties of Rotation Rotation leaves a sorted tree sorted No nodes are added or deleted Only the structure is changed and an AVL tree that was unbalanced by insertion is again an AVL tree after the proper rotation. The tree still has the same height! 53

When to rotate It is not necessary to know the exact height! It is sufficient to know the balance factor: -1, 0, 1 54

Let s play! http://www.qmatica.com/datastructures/trees/ AVL/AVLTree.html 55

Your Turn! Balanced or not? 56

Balanced or not? 57

Construct an AVL Tree 10, 40, 35, 25, 60, 30, 80, 50, 27, 28, 38 58

The Result 59

Binary Tree Application Expression Tree 5 + - * 2 (5+((3-1)*2)) 3 1 Easy to evaluate! eval(node) = Node.contents(eval(Node.Left), eval(node.right)) 60

Binary Tree Application Huffman coding Tree Each symbol stored at a leaf, Path marked with 1 or 0 is code 1 1 0 0 R 1 S 1 1 0 0 1 A 0 0 B Z Q Most frequent symbols have shortest code. 62

Other Tree Applications - Decision Trees - Syntax Trees - Derivation Trees - Code Trees - Spanning Trees - Tree-structured search spaces - Searching Trees 63

Trie Trie (pronounced try ) Structure suggested by E. Fredkin in CACM 3 (1960) 490-500 A trie is an M-ary tree, whose nodes are M-place vectors with components corresponding to digits or characters. Each node on level l represents the set of all keys that begin with a certain sequence of l characters Algorithms to be found in Donald E. Knuth, The Art of Computer Programming, Volume 3, Sorting and Searching 73

Trie All the descendants of any one node have a common prefix of the string associated with that node, and the root is associated with the empty string. Values are normally not associated with every node, only with leaves and some inner nodes that happen to correspond to keys of interest. 74

A trie for keys 'to', 'tea', 'ten', 'inn'. 75

Complexity A trie needs 12 iterations to distiguish, for example COMPUTATION and COMPUTATIONS Linked list representation much better spacewise (many empty cells otherwise) 76

Complexity Search time for large N (M is the size of the alphabet): log M N = lg N / lg M iterations Needs N / log M nodes to distinguish between N random inputs Space proportional to M*N / ln M (lg is base 2, ln base e) 77

Long words? Only pays off for many short words. Otherwise - after the list of words gets down below 6, just use a list of words instead of continuing the trie structure. Decreases space needed by a factor of 6 by same running time. 78

Patricia Trie Practical Algorithm To Retrieve Information Coded in Alphanumeric N-node search trees based on the binary representation of the keys Good for extremely long, variable-length keys such as titles or phrases stored within a large bulk file 79

Storage? Tables Linearized lists: (T(H (E * (I (R * (S *)) (S (M *)) (N * (C (E *))) (M * (E *))))) The, Their, Theirs, Theism, Then, Thence, Them, Theme 80

Advantages of a Trie vs. BST Looking up keys is fast. Looking up a key of length m takes only O(m) time; in the worst case in a BST, O(m²) time is required, because initial characters are examined repeatedly during multiple comparisons. Tries require less space. Because the keys are not stored explicitly, only an amortized constant amount of space is needed to store each key. Tries make longest-prefix matching, where we wish to find the key sharing the longest possible prefix with a given key, efficient. They also allow one to associate a value with an entire group of keys that have a common prefix. 81