CS 206 Introduction to Computer Science II

Similar documents
CS 206 Introduction to Computer Science II

EE 368. Weeks 5 (Notes)

An undirected graph is a tree if and only of there is a unique simple path between any 2 of its vertices.

15 July, Huffman Trees. Heaps

Data Structures in Java

CS 171: Introduction to Computer Science II. Binary Search Trees

Huffman Coding Assignment For CS211, Bellevue College (rev. 2016)

Computational Optimization ISE 407. Lecture 16. Dr. Ted Ralphs

CS-301 Data Structure. Tariq Hanif

COMP171. AVL-Trees (Part 1)

AVL Trees (10.2) AVL Trees

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

Balanced Search Trees. CS 3110 Fall 2010

Algorithms. AVL Tree

Huffman Codes (data compression)

Algorithms in Systems Engineering ISE 172. Lecture 16. Dr. Ted Ralphs

CSE 143 Lecture 22. Huffman Tree

CS350: Data Structures Red-Black Trees

Data Structures and Algorithms

Balanced Binary Search Trees. Victor Gao

M-ary Search Tree. B-Trees. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. Maximum branching factor of M Complete tree has height =

Scribe: Virginia Williams, Sam Kim (2016), Mary Wootters (2017) Date: May 22, 2017

Lecture 5. Treaps Find, insert, delete, split, and join in treaps Randomized search trees Randomized search tree time costs

Binary Heaps in Dynamic Arrays

CS106B Handout 34 Autumn 2012 November 12 th, 2012 Data Compression and Huffman Encoding

Greedy Algorithms CHAPTER 16

Algorithms and Data Structures CS-CO-412

Maps; Binary Search Trees

M-ary Search Tree. B-Trees. Solution: B-Trees. B-Tree: Example. B-Tree Properties. B-Trees (4.7 in Weiss)

Binary Search Trees. Analysis of Algorithms

Data Structures and Algorithms for Engineers

CS350: Data Structures AVL Trees

CSE100. Advanced Data Structures. Lecture 8. (Based on Paul Kube course materials)

CSE 326: Data Structures B-Trees and B+ Trees

CMPSCI 240 Reasoning Under Uncertainty Homework 4

MCS-375: Algorithms: Analysis and Design Handout #G2 San Skulrattanakulchai Gustavus Adolphus College Oct 21, Huffman Codes

Algorithms. Deleting from Red-Black Trees B-Trees

Recall: Properties of B-Trees

Lecture: Analysis of Algorithms (CS )

Some Search Structures. Balanced Search Trees. Binary Search Trees. A Binary Search Tree. Review Binary Search Trees

Chapter 10: Trees. A tree is a connected simple undirected graph with no simple circuits.

Module 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree.

Analysis of Algorithms

1. [1 pt] What is the solution to the recurrence T(n) = 2T(n-1) + 1, T(1) = 1

More Bits and Bytes Huffman Coding

Red-Black, Splay and Huffman Trees

Binary search trees (chapters )

Data Representation. Types of data: Numbers Text Audio Images & Graphics Video

Data Structures and Algorithms

Final Examination CSE 100 UCSD (Practice)

TREES. Trees - Introduction

AVL Trees Goodrich, Tamassia, Goldwasser AVL Trees 1

CS 206 Introduction to Computer Science II

AVL Trees / Slide 2. AVL Trees / Slide 4. Let N h be the minimum number of nodes in an AVL tree of height h. AVL Trees / Slide 6

B-Trees. Version of October 2, B-Trees Version of October 2, / 22

Intro. To Multimedia Engineering Lossless Compression

CIS265/ Trees Red-Black Trees. Some of the following material is from:

Lossless Compression Algorithms

CS301 - Data Structures Glossary By

Solved by: Syed Zain Ali Bukhari (BSCS)

CS 758/858: Algorithms

2-3 Tree. Outline B-TREE. catch(...){ printf( "Assignment::SolveProblem() AAAA!"); } ADD SLIDES ON DISJOINT SETS

CS 261 Data Structures. AVL Trees

CS 206 Introduction to Computer Science II

AVL Tree Definition. An example of an AVL tree where the heights are shown next to the nodes. Adelson-Velsky and Landis

CS Transform-and-Conquer

CHAPTER 10 AVL TREES. 3 8 z 4

DATA STRUCTURES AND ALGORITHMS. Hierarchical data structures: AVL tree, Bayer tree, Heap

Binary Search Trees Treesort

Introduction. for large input, even access time may be prohibitive we need data structures that exhibit times closer to O(log N) binary search tree

Analysis of Algorithms

B-Trees. Disk Storage. What is a multiway tree? What is a B-tree? Why B-trees? Insertion in a B-tree. Deletion in a B-tree

Data Structures Lesson 7

CSE100 Practice Final Exam Section C Fall 2015: Dec 10 th, Problem Topic Points Possible Points Earned Grader

Extra: B+ Trees. Motivations. Differences between BST and B+ 10/27/2017. CS1: Java Programming Colorado State University

16 Greedy Algorithms

ECE 242 Data Structures and Algorithms. Trees IV. Lecture 21. Prof.

Binary search trees (chapters )

Lecture 34. Wednesday, April 6 CS 215 Fundamentals of Programming II - Lecture 34 1

Trees. Reading: Weiss, Chapter 4. Cpt S 223, Fall 2007 Copyright: Washington State University

Programming II (CS300)

Module 8: Binary trees

Self-Balancing Search Trees. Chapter 11

Balanced BST. Balanced BSTs guarantee O(logN) performance at all times

Priority Queues. Chapter 9

Chapter 12 Advanced Data Structures

Huffman Coding. Version of October 13, Version of October 13, 2014 Huffman Coding 1 / 27

COMP Analysis of Algorithms & Data Structures

AVL Trees. (AVL Trees) Data Structures and Programming Spring / 17

Advanced Set Representation Methods

Fundamentals of Multimedia. Lecture 5 Lossless Data Compression Variable Length Coding

CSE 143, Winter 2013 Programming Assignment #8: Huffman Coding (40 points) Due Thursday, March 14, 2013, 11:30 PM

CSCI-401 Examlet #5. Name: Class: Date: True/False Indicate whether the sentence or statement is true or false.

9/29/2016. Chapter 4 Trees. Introduction. Terminology. Terminology. Terminology. Terminology

Lecture 9: Balanced Binary Search Trees, Priority Queues, Heaps, Binary Trees for Compression, General Trees

COMP Analysis of Algorithms & Data Structures

Part 2: Balanced Trees

CISC 235: Topic 4. Balanced Binary Search Trees

Binary Trees

Augmenting Data Structures

Transcription:

CS 206 Introduction to Computer Science II 04 / 25 / 2018 Instructor: Michael Eckmann

Today s Topics Questions? Comments? Balanced Binary Search trees AVL trees / Compression Uses binary trees

Balanced BSTs An AVL tree is a BST with the added restriction that for all nodes, the height of its left subtree is at most 1 different than the height of its right subtree. The height of an empty subtree is -1. A node in an AVL tree contains the data item, reference to left child node reference to right child node height of the node

Balanced BSTs When we insert a node into an AVL tree we first insert as we would in a BST and update the height information in all the nodes on its path back to the root but there are several situations that we can encounter: a) the tree retains the properties of an AVL tree (that is, no node has left and right subtrees that differ in height by more than 1) b) or some node, alpha, along the path back to the root has subtrees that differ in height by 2 (stop there with the updating of height information) --- there are 4 cases 1) node inserted into LST of the LC of alpha 2) node inserted into RST of the LC of alpha 3) node inserted into LST of the RC of alpha 4) node inserted into RST of the RC of alpha Note: only nodes on the path back to the root could possibly be affected by the insertion. LST = left subtree, RST = right subtree, LC = left child, RC = right child

Balanced BSTs Let's look at b) case 1) b) some node, alpha, along the path back to the root has subtrees that differ in height by 2 1) node inserted into LST (left subtree) of the LC (left child) of alpha To rebalance the AVL tree in this situation, we do what is called a single rotation. Example general situation for case 1 in handout note: the triangles used to denote subtrees can either all be empty or they have relative heights as shown.

Balanced BSTs Let's look at b) case 4) b) some node, alpha, along the path back to the root has subtrees that differ in height by 2 4) node inserted into RST (right subtree) of the RC (right child) of alpha To rebalance the AVL tree in this situation, again, we perform a single rotation. Example general situation for case 4 in handout. Case 1 and Case 4 are mirror images of each other. As you are about to see, Case 2 and Case 3 are mirror images of each other.

Let's look at b) case 2) Balanced BSTs b) some node, alpha, along the path back to the root has subtrees that differ in height by 2 2) node inserted into RST (right subtree) of the LC (left child) of alpha To rebalance the AVL tree in this situation, we perform what is called a double rotation. Example general situation for case 2 in the handout. Note: either A,B,C and D are all empty OR One of B or C is 2 levels deeper than D

Balanced BSTs Let's look at b) case 3) b) some node, alpha, along the path back to the root has subtrees that differ in height by 2 3) node inserted into LST (left subtree) of the RC (right child) of alpha To rebalance the AVL tree in this situation, we again perform a double rotation.

Balanced BSTs Now, let's go through an example. Insert integer items into an AVL tree (initially empty) one at a time in order from 1 through 7. Then 16 through 10 and then 8 then 9. Recall that we insert like in a BST, but after that we need to check if it is still AVL or not by travelling up towards the root and checking the heights of the LST and RST along the way. If we get to a position where a node's LST and RST differ in height by more than 1, then that is alpha. We'll mark down the following after each insertion (and perform rotations if necessary) we'll note when no rotation is necessary, still AVL and we'll note when a rotation is necessary and state what node alpha is and which case it is (1, 2, 3 or 4)

Compression Compression is the idea of taking some input with some size and converting it to a smaller sized output that represents the input. If the input can be recovered perfectly from the output by some reverse process then it is lossless compression. If the input cannot be recovered perfectly from the output by some reverse process then it is lossy compression.

Huffman coding is a lossless compression technique. Let's consider how text is stored and how huffman coding might work to compress text. Storage for ASCII is 8 bits per character and UNICODE is 16 bits per character --- every character is encoded with the same number of bits.

Huffman coding does this: Given a file of characters we examine the characters and sort them by frequency. The ones that occur most frequently are represented with fewer bits and the ones most frequently occurring are represented with more bits. Hence this is a variable length (each character is encoded with different numbers of bits) coding algorithm. How does that result in compressed data?

Huffman codes for the characters can be generated by creating a tree where the characters are in leaf nodes. Each link contains a 0 or 1. The paths from root to leaves create the huffman code for the character at the leaf. Also, non leaf nodes contain the sum of all the frequencies of children.

Start out with a forest of trees, all being one node each. Each node represents a character and the frequency of that character. The weight of a tree is the sum of its nodes' frequencies and this weight is stored in the root. While there are more than one tree in the forest The two trees (tree1 and tree2) with the smallest weights are joined together into one tree with the root set to have the weight as the sum of the two subtree weights. Tree1 is made to be the left subtree of the root and tree2 is the made to be the right subtree of the root Assign a 0 to the left link and a 1 to the right link of each node. The tree generated will be the optimal Huffman encoding tree.

Let's see an example of generating the huffman codes by creating a huffman tree for the following text: Four score and seven years ago our fathers brought forth, upon this continent, a new nation Things to note: length of the string is 91 characters (including letters, spaces and commas) there are 22 unique characters

Since we are supposed to start out with a forest of trees, where each tree is one node only and each node represents a character and its frequency. What should we do?

Since we are supposed to start out with a forest of trees, where each tree is one node only and each node represents a character and its frequency. What should we do? Count up the frequency of each character in the text we are to compress.

6 a 1 b 2 c 1 d 7 e 2 f 2 g 4 h 9 n 9 o 1 p 7 r 5 s 7 t 4 u 1 v 1 y 15 space 2, 1 F 3 i 1 w

sorted by frequency 1 b 2 g 7 t 1 d 2, 9 n 1 p 3 i 9 o 1 v 4 h 15 space 1 w 4 u 1 y 5 s 1 F 6 a 2 c 7 e 2 f 7 r

Let's build the tree on the board. Start with a forest of trees, where each tree is one node only and each node represents a character and its frequency --- we'll need to refer to the last slide to see the initial forest of trees. Let's look back at the instructions on how to join trees

Now that the tree is built we assign a code to each character from this tree by following a path from the root to each leaf, collecting the 0 and 1 labels along the way. The string of bits collected represent the character in the leaf.

code code code 6 a 1001 1 b 000010 2 c 100001 1 d 000011 7 e 1010 2 f 01000 2 g 01001 4 h 11000 3 i 10001 9 n 001 9 o 011 1 p 000100 7 r 1011 5 s 0101 7 t 1101 4 u 11001 1 v 000101 1 w 000110 1 y 000111 15 space 111 2, 00000 1 F 100000

encoded phrase: Four score and seven years ago I stopped at the word ago for purposes of space: 1000000111100110111110101100001011101110101111001001000011 1110101101000010110100011110001111010100110110101111100101 001011

Now how do we decode the phrase (get the original phrase back)? Note that no character encoding is a prefix of any other character encoding. What does that mean? Why is it necessary? Let's decode this: 100000011110011011111010110000101110111010111100100100001111101 01101000010110100011110001111010100110110101111100101001011 Follow the bits down the tree from the root and once you hit a leaf you... have the character: 100000=F, start at root again 011=o

How many bits to encode the entire phrase?

How many bits to encode the entire phrase? 6*4 + 6 + 2*6 + 6 + 7*4 + 2*5 + 2*5 + 4*5 + 3*5 + 9*3 + 9*3 + 6 + 7*4 + 5*4 + 7*4 + 4*5 + 6 + 6 + 6 + 15*3 + 2*5 + 6 = 366 How many bits to encode the 91 characters using UNICODE? Using ASCII?

How many bits to encode the entire phrase? 6*4 + 6 + 2*6 + 6 + 7*4 + 2*5 + 2*5 + 4*5 + 3*5 + 9*3 + 9*3 + 6 + 7*4 + 5*4 + 7*4 + 4*5 + 6 + 6 + 6 + 15*3 + 2*5 + 6 = 366 How many bits to encode the 91 characters using UNICODE? 91*16 = 1456 Using ASCII? 91*8 = 728 We can do better than those --- how many bits would be necessary (for same length encoding for each) if we wanted to represent 26 lowercase letters, 26 upper case letters, space and comma?

We can do better than those --- how many bits would be necessary (for same length encoding for each) if we wanted to represent 26 lowercase letters, 26 upper case letters, space and comma? That's 54 characters which would require 6 bits (5 wouldn't be enough can only represent 32 different values in 5 bits). 6 bits can represent 64 different values (which is more than enough to represent the 54 characters.) 91*6 = 546

You might think that is overkill (or still an unfair comparison) because we only have 22 unique characters. If we were to use the same length encoding for each of the 22 characters only --- how many bits would be necessary for each character? 5 bits each, notice 4 is too few (only can represent 16 different values in 4 bits) 91*5 = 455

How many bits to encode the entire phrase using huffman coding? 6*4 + 6 + 2*6 + 6 + 7*4 + 2*5 + 2*5 + 4*5 + 3*5 + 9*3 + 9*3 + 6 + 7*4 + 5*4 + 7*4 + 4*5 + 6 + 6 + 6 + 15*3 + 2*5 + 6 = 366 UNICODE, 16 bit same length encoding: 91*16 = 1456 ASCII, 8 bit same length encoding: 91*8 = 728 54 characters w/ same length encoding: 91*6 = 546 Only the 22 unique characters w/ same length encoding: 91*5 = 455 Huffman code beats all these choices 366 (variable length huffman) < 455 (best same length encoding)

Huffman coding is an optimal coding for coding each symbol (character) independently. Other coding schemes could supply codes for a group of characters and possibly get better compression. e.g. common digraphs like th, er each as a group with one code or a common trigraph like the, ion, ing each with their own code