CMSC 27100 26th Lecture: Graph Theory: Trees. Lecturer: Janos Simon December 2, 2018 1 Trees Definition 1. A tree is an acyclic connected graph. Trees have many nice properties. Theorem 2. The following are equivalent for n-vertex graphs G = (V, E): (i) G is a tree. (ii) G is acyclic and has n 1 edges (iii) G is connected and has n 1 edges (iv) G is connected but e E G {e} is not connected (v) x, y V! path from x to y (! means there is a unique ) Proof. These are relatively easy proofs, and there is a strong intuition behind them. For example, (v) implies that deleting any edge will separate the graph into 2 connected components, since if we delete an edge incident on x it will disconnect it from y as there is a unique path from x to y. The trick is to make all these implications formal. They were assigned as homework... (Hint: make the implications a directed cycle (5 proofs) as opposed to a complete graph (at least 10 proofs)!) 1
Trees are important because they capture the connectivity in the simplest possible way. They have the smallest number of edges that still make the graph connected. A spanning tree of a graph has the same set of vertices as the graph (it is a spanning subgraph) and the same connectivity. (It is an easy observation that a graph is connected iff it has a spanning tree. A (bad) proof of this is the following inefficient algorithm: while the graph remains connected we keep throwing out edges that do not disconnect the graph. When we cannot do this any more, the remaining graph is a spanning tree.) If we want to connect a bunch of computers, and have costs for buying pairwise connections, the cheapest solution will be a spanning tree. We will study efficient algorithms for such problems in Algorithms. 2 Rooted Trees These are the trees you may be familiar with from Computer Science... Rosen uses the following definition: Consider a tree T, and choose a vertex a of T. It will be the root of the rooted tree. Now orient all edges of T so that all paths lead away from the root. This is a rooted tree with root a. The inductive definition below is more elegant, and much more useful for proving properties of rooted trees. Definition 3. (Rooted Tree) A vertex r is a rooted tree with root r. It has depth 0. Let T 1, T 2, be rooted trees with roots r 1, r 2, r k and depths d 1, d 2 d k respectively, and let d = max i d i. Let r be a new vertex, and add directed edges (r, r 1 ), (r, r 2 ), (r, r k ). The resulting directed tree is a rooted tree T with root r and depth d + 1. There are natural definitions of child, parent, descendant and ancestor. A leaf is a vertex with outdegree 0. Non-leaf vertices are called internal vertices. A subtree rooted at an internal vertex a is the rooted tree consisting of the vertices reachable by directed paths from a to a leaf in T. Rooted trees are very useful representation of hierarchies with each element having a single boss. Examples: 2
Hierarchies in an organization Linnæus Classification of Organisms File Systems The Library of Congress book classification system Algebraic Formulas US Code: Title 26 (IRS rules) Definition 4. An m-ary tree is one where every vertex has outdegree at most m. It is full if the number of children is m if the vertex is not a leaf, and complete if all leaves are at the same distance (number of edges traversed) from the root. It is called a binary tree if m = 2. Unless specifically defined otherwise, the children are ordered: in binary trees there is a LEFT child and a RIGHT child. An easy induction shows that the number of leaves of a complete m-ary tree of depth d is m d, in particular for binary trees this is 2 d. This implies that the number of internal vertices is 2 d 1 by noting that there are 2 i vertices at depth i (by the argument that yielded 2 d leaves at level d), and remembering that d 1 i=0 = 2d 1. For trees that are not complete, m d is only an upper bound. Exercise: Draw a binary tree of depth d with O(d) leaves. 3 Traversing Binary Trees In many algorithms one needs to visit all vertices of a binary tree. By visit we mean doing something at each node. For concrete examples, in a file system, checking that all file permissions include a given group,; in a library catalog, collecting statistics about the fraction of books printed before the year 1700 that are on the stacks in the library (one would need to follow pointers from each record to get this information.) We can think of a methodical way to traverse the tree by noting that a natural way to go through the tree will use recursion: when we arrive at a vertex v we will explore the left subtree of v, upon returning we will explore the right subtree, then return to the call that generated exploring 3
the subtree rooted at v (if a subtree is empty, the recursive call is not made, and when returning from the root of the tree we are done.) So we arrive at each vertex 3 times: we have a choice of which of these times we visit the vertex. If we choose to visit before exploring the left subtree the algorithm is a preorder traversal, if we do it after returning from the left subtree it is an inorder traversal, and if we do it after returning from the right subtree it is a postorder traversal. The respective algorithms, PRE(), IN() and POST() are given below. PRE(r) Visit(r) PRE(LEFTCHILD) PRE(RIGHTCHILD) return IN(r) IN(LEFTCHILD) Visit(r) IN(RIGHTCHILD) return POST(r) POST(LEFTCHILD) POST(RIGHTCHILD) Visit(r) return Exercise: Draw a binary tree. Assign distinct letters to all vertices. In the programs above, consider visit(r) to mean print the letter at node r. List the orderings of letters you get from each traversal. 3.1 Applications of Postorder traversal Consider the formula (((x + y) (z u)) + 7) where we used a parenthesis to enclose every binary operation. There is a natural binary tree associated with the formula, where the internal nodes are the binary operations and the leaves are variables or constants. Compilers 4
transform expressions into their trees, which can be evaluated by the simple recursive program get the value of the left subtree get the value of the right subtree perform the operation Clearly, this corresponds to a postorder traversal of the tree. If we perform the Exercise proposed above on the tree, we will get the postorder listing xy + zu 7+ It should be clear that the notation above (expressions of the form operand1 operand2 operation ) is unambiguous and needs no parenthesis. It was invented by the Polish mathematician Lukasiewicz, and is known as Polish Notation (cf. Chinese Remainder Theorem ). It is an easy programming exercise to evaluate an expression given in Polish notation using a stack (Do it!). Nerdy comment: if we were taught Polish notation from elementary school, we would not need parenthesis... 3.2 More about binary tree traversal When traversing (rooted) binary trees, we need three pieces of information at each node: the LEFTCHILD pointer, the RIGHTCHILD pointer, and the PARENT pointer. Many algorithms do not need all three pointers. For example, if all we need is to go to the root given a leaf, we only need the PARENT pointer. In the traversal algorithms above, we avoided having explicitly the PAR- ENT pointer by using recursion (usually implemented by an additional stack.) Can we avoid using a stack? (of course we need to store this information somewhere, but we may be able to reuse to LEFTCHILD and RIGHTCHILD fields to hide the stack there... Challenging Hacking Question Using an extra bit per vertex, and having the ability to change the LEFTCHILD and RIGHTCHILD fields, write an algorithm that implements the postorder traversal algorithm POST() using only a constant number of registers. Even more challeging Implement an algorithm that will visit every vertex and return to the root. It may visit a vertex more than once, but it doesn t 5
use any extra bits at the vertices only the LEFTCHILD and RIGHTCHILD fields, and a constant number of external registers. 4 Representing Graphs in a Computer Adjacency Matrix Let G = (V, E). let V = v 1, v 2 v n. The adjacency matrix of G is an n n Boolean matrix A with A[i, j] = 1 iff (v i, v j ) E (for directed graphs for graphs A[i, j] = 1 iff {v i, v j } E.) Incidence Matrix Let G = (V, E). let V = v 1, v 2 v n and let E = e 1, e 2, e m. The incidence matrix I of G is an n m Boolean matrix with I[i, j] = 1 iff v i is incident with e j. For sparse graphs it is convenient to use an adjacency list: for every vertex v i we have a list of the vertices v k that v i is connected to. Adjacency lists are convenient because many graphs have o(n 2 ) edges, and an algorithm that uses an adjacency matrix must spend Ω(n 2 ) time to set up the adjacency matrix (A has n 2 entries.) Trees, planar graphs (graphs that one can draw on the plane without edges crossing), and many other interesting classes of graphs have a number of edges that is linear in the size of the graph, i.e. V + E which is Θ(n). There are many interesting linear (in n ) time algorithms for such graphs. 6