Priority Queues and Huffman Encoding

Similar documents
Priority Queues and Huffman Encoding

Building Java Programs. Priority Queues, Huffman Encoding

CSE 143, Winter 2013 Programming Assignment #8: Huffman Coding (40 points) Due Thursday, March 14, 2013, 11:30 PM

CSE 143 Lecture 22. Huffman Tree

CSE143X: Computer Programming I & II Programming Assignment #10 due: Friday, 12/8/17, 11:00 pm

Huffman Coding Assignment For CS211, Bellevue College (rev. 2016)

ASCII American Standard Code for Information Interchange. Text file is a sequence of binary digits which represent the codes for each character.

ASCII American Standard Code for Information Interchange. Text file is a sequence of binary digits which represent the codes for each character.

Implementations. Priority Queues. Heaps and Heap Order. The Insert Operation CS206 CS206

CSE100. Advanced Data Structures. Lecture 12. (Based on Paul Kube course materials)

Data Structures and Algorithms

CSE 2123: Collections: Priority Queues. Jeremy Morris

Programming II (CS300)

CSCI 136 Data Structures & Advanced Programming. Lecture 22 Fall 2018 Instructor: Bills

15 July, Huffman Trees. Heaps

Programming II (CS300)

The heap is essentially an array-based binary tree with either the biggest or smallest element at the root.

PRIORITY QUEUES AND HEAPS

The smallest element is the first one removed. (You could also define a largest-first-out priority queue)

Priority Queues. Binary Heaps

Data Structure. A way to store and organize data in order to support efficient insertions, queries, searches, updates, and deletions.

BINARY HEAP cs2420 Introduction to Algorithms and Data Structures Spring 2015

CSE 332 Autumn 2013: Midterm Exam (closed book, closed notes, no calculators)

CSE 214 Computer Science II Heaps and Priority Queues

PRIORITY QUEUES AND HEAPS

COSC-211: DATA STRUCTURES HW5: HUFFMAN CODING. 1 Introduction. 2 Huffman Coding. Due Thursday, March 8, 11:59pm

CH 8. HEAPS AND PRIORITY QUEUES

Priority Queues. Chapter 9

CH. 8 PRIORITY QUEUES AND HEAPS

Computer Science II Fall 2009

Huffman Codes (data compression)

9. Heap : Priority Queue

EE 368. Weeks 5 (Notes)

CSE100. Advanced Data Structures. Lecture 13. (Based on Paul Kube course materials)

Priority Queues. Lecture15: Heaps. Priority Queue ADT. Sequence based Priority Queue

Data Structures and Algorithms

4/16/2012. Data Compression. Exhaustive search, backtracking, object-oriented Queens. Check out from SVN: Queens Huffman-Bailey.

Trees & Tree-Based Data Structures. Part 4: Heaps. Definition. Example. Properties. Example Min-Heap. Definition

More Bits and Bytes Huffman Coding

Digital Image Processing

Greedy Algorithms CHAPTER 16

Announcements HEAPS & PRIORITY QUEUES. Abstract vs concrete data structures. Concrete data structures. Abstract data structures

CSE373: Data Structures & Algorithms Lecture 9: Priority Queues and Binary Heaps. Linda Shapiro Spring 2016

Algorithms and Data Structures

Homework 3 Huffman Coding. Due Thursday October 11

DUKE UNIVERSITY Department of Computer Science. Test 2: CompSci 100

Priority Queues and Binary Heaps

CSC 321: Data Structures. Fall 2016

CSC 321: Data Structures. Fall 2017

Lecture 9: Balanced Binary Search Trees, Priority Queues, Heaps, Binary Trees for Compression, General Trees

14.4 Description of Huffman Coding

CMPSCI 240 Reasoning Under Uncertainty Homework 4

EXAMINATIONS 2017 TRIMESTER 2

CSE 241 Class 17. Jeremy Buhler. October 28, Ordered collections supported both, plus total ordering operations (pred and succ)

CS106X Programming Abstractions in C++ Dr. Cynthia Bailey Lee

Stores a collection of elements each with an associated key value

CIS 121 Data Structures and Algorithms with Java Spring 2018

Recall: Properties of B-Trees

Section 4 SOLUTION: AVL Trees & B-Trees

Analysis of Algorithms - Greedy algorithms -

Operations. Priority Queues & Heaps. Prio-Q as Array. Niave Solution: Prio-Q As Array. Prio-Q as Sorted Array. Prio-Q as Sorted Array

CS106X Handout 29 Winter 2015 February 20 th, 2015 Assignment 5: Huffman

Abstract vs concrete data structures HEAPS AND PRIORITY QUEUES. Abstract vs concrete data structures. Concrete Data Types. Concrete data structures

LECTURE NOTES OF ALGORITHMS: DESIGN TECHNIQUES AND ANALYSIS

The Bucharest University of Economic Studies. Data Structures. Heap Priority Queues Maps and Sets

Binary Heaps. CSE 373 Data Structures Lecture 11

Priority Queues and Binary Heaps. See Chapter 21 of the text, pages

Topic: Heaps and priority queues

Properties of a heap (represented by an array A)

CS/COE 1501

Adding a Node to (Min) Heap. Lecture16: Heap Sort. Priority Queue Sort. Delete a Node from (Min) Heap. Step 1: Add node at the end

The priority is indicated by a number, the lower the number - the higher the priority.

EXAMINATIONS 2016 TRIMESTER 2

CMSC 341 Leftist Heaps

CSE373: Data Structures & Algorithms Priority Queues

CSE 332 Winter 2015: Midterm Exam (closed book, closed notes, no calculators)

3. Priority Queues. ADT Stack : LIFO. ADT Queue : FIFO. ADT Priority Queue : pick the element with the lowest (or highest) priority.

Priority Queues Heaps Heapsort

Administration CSE 326: Data Structures

Computer Science 210 Data Structures Siena College Fall Topic Notes: Priority Queues and Heaps

HEAPS: IMPLEMENTING EFFICIENT PRIORITY QUEUES

Points off Total off Net Score. CS 314 Final Exam Spring Your Name Your UTEID

Priority Queues and Huffman Trees

Binary Trees and Huffman Encoding Binary Search Trees

Priority Queues and Binary Heaps. See Chapter 21 of the text, pages

CS165: Priority Queues, Heaps

Points off Total off Net Score. CS 314 Final Exam Spring 2016

Heaps Goodrich, Tamassia. Heaps 1

Exam Datastrukturer. DIT960 / DIT961, VT-18 Göteborgs Universitet, CSE

CSC 172 Data Structures and Algorithms. Fall 2017 TuTh 3:25 pm 4:40 pm Aug 30- Dec 22 Hoyt Auditorium

CS/COE 1501

Text Compression through Huffman Coding. Terminology

Data compression.

Data Structures. Giri Narasimhan Office: ECS 254A Phone: x-3748

Heaps and Priority Queues

Priority Queues Heaps Heapsort

Heap: A binary heap is a complete binary tree in which each, node other than root is smaller than its parent. Heap example: Fig 1. NPTEL IIT Guwahati

CS/COE 1501

Binary Heaps. COL 106 Shweta Agrawal and Amit Kumar

CS 206 Introduction to Computer Science II

Transcription:

Priority Queues and Huffman Encoding Introduction to Homework 7 Hunter Schafer Paul G. Allen School of Computer Science - CSE 143

I Think You Have Some Priority Issues ER Scheduling. How do we efficiently chose the most urgent case to treat next? Patients with more serious ailments should go first. OS Context Switching. How does your operating system decide which process to give resources to? Some applications are more important than others. How can we solve these problems with the data structures we know? 1

Possible Solution Store elements in an unsorted list add: Add at end remove: Search for highest priority element Store elements in a sorted LinkedList add: Search for position to insert, place there remove: remove from front Store elements in a TreeSet (hope they are unique!) add: Traverse tree for position to insert, place there remove: Traverse tree for smallest element, remove 2

Priority Queue Priority Queue A collection of ordered elements that provides fast access to the minimum (or maximum) element. public class PriorityQueue<E> implements Queue<E> PriorityQueue<E>() add(e value) peek() remove() size() constructs an empty queue adds value in sorted order to the queue returns minimum element in queue removes/returns minimum element in queue returns the number of elements in queue Queue < String > tas = new PriorityQueue < String >(); tas. add(" Jin"); tas. add(" Aaron "); tas. remove (); // " Aaron " 3

Priority Queue Example What does this code print? Queue <TA > tas = new PriorityQueue <TA >() ; tas. add(new TA(" Kyle ", 7)); tas. add(new TA(" Ayaz ", 3)); tas. add(new TA(" Zach ", 6)); System.out.println ( tas ); Prints: [Ayaz: 3, Kyle: 7, Zach: 6] Common Gotchas Elements must be Comparable. tostring doesn t do what you expect! Use remove instead. 4

Inside the Priority Queue Usually implemented with a heap Guarantees children have a lower priority than the parent so the highest priority is at the root (fast access). Take CSE 332 or CSE 373 to learn about how to implement more complicated data structures like heaps! 1 2 3 17 19 36 7 25 99 5

Homework 7: Huffman Coding

File Compression Compression Process of encoding information so that it takes up less space. Compression applies to many things! Store photos without taking up the whole hard-drive Reduce size of email attachment Make web pages smaller so they load faster Make voice calls over a low-bandwidth connection (cell, Skype) Common compression programs: WinZip, WinRar for Windows zip 6

ASCII ASCII (American Standard Code for Information Interchange) Standardized code for mapping characters to integers We need to represent characters in binary so computers can read them. Most text files on your computer are in ASCII. Every character is represented by a byte (8 bits). Character ASCII value Binary Representation 32 00100000 a 97 01100001 b 98 01100010 c 99 01100011 e 101 01100101 z 122 01111010 7

ASCII Example Character ASCII value Binary Representation 32 00100000 a 97 01100001 b 98 01100010 c 99 01100011 e 101 01100101 z 122 01111010 What is the binary representation of the following String? cab z Answer 01100011 01100001 01100010 00100000 01111010 8

Another ASCII Example Character ASCII value Binary Representation 32 00100000 a 97 01100001 b 98 01100010 c 99 01100011 e 101 01100101 z 122 01111010 How do we read the following binary as ASCII? 01100001 01100011 01100101 Answer ace 9

Huffman Idea Huffman s Insight Use variable length encodings for different characters to take advantage of frequencies in which characters appear. Make more frequent characters take up less space. Don t have codes for unused characters. Some characters may end up with longer encodings, but this should happen infrequently. 10

Huffman Encoding Create a Huffman Tree that gives a good binary representation for each character. The path from the root to the character leaf is the encoding for that character; left means 0, right means 1. ASCII Table Character Binary Representation 00100000 a 01100001 b 01100010 c 01100011 e 01100101 z 01111010 Huffman Tree b a c 11

Homework 7: Huffman Coding Homework 7 asks you to write a class that manages creating and using this Huffman code. (A) Create a Huffman Code from a file and compress it. (B) Decompress the file to get original contents. 12

Part A: Making a HuffmanCode Overview Input File Contents bad cab Step 1: Count the occurrences of each character in file { =1, a =2, b =2, c =1, d =1} Step 2: Make leaf nodes for all the characters put them in a PriorityQueue pq c d a b Step 3: Use Huffman Tree building algorithm (described in a couple slides) Step 4: Save encoding to.code file to encode/decode later. { d =00, a =01, b =10, =110, c =111} Step 5: Compress the input file using the encodings Compressed Output: 1001001101110110 13

Step 1: Count Character Occurrences We do this step for you Input File bad cab Generate Counts Array: index... 32... 97 98 99 1001 value 0 0 1 2 2 1 1 0 This is super similar to LetterInventory but works for all characters! 14

Step 2: Create PriorityQueue Store each character and its frequency in a HuffmanNode object. Place all the HuffmanNodes in a PriorityQueue so that they are in ascending order with respect to frequency pq c d a b 15

Step 3: Remove and Merge pq c d a b 16

Step 3: Remove and Merge c pq d a b 16

Step 3: Remove and Merge pq d a b c 16

Step 3: Remove and Merge freq: 3 d a pq b c 16

Step 3: Remove and Merge freq: 3 pq b c d a 16

Step 3: Remove and Merge freq: 4 b c freq: 3 pq d a 16

Step 3: Remove and Merge freq: 4 freq: 3 pq d a b c 16

Step 3: Remove and Merge freq: 7 freq: 3 freq: 4 d a b c pq 16

Step 3: Remove and Merge freq: 7 freq: 3 freq: 4 pq d a b c What is the relationship between frequency in file and binary representation length? 16

Step 3: Remove and Merge Algorithm Algorithm Pseudocode while P. Q. size > 1: remove two nodes with lowest frequency combine into a single node put that node back in the P. Q. 17

Step 4: Print Encodings Save the tree to a file to save the encodings for the characters we made. 0 1 0 d a b 1 c Output of save 100 00 97 01 98 10 32 110 99 111 18

Step 5: Compress the File We do this step for you Take the original file and the.code file produced in last step to translate into the new binary encoding. Input File bad cab Compressed Output 10 01 10111 01 10 Uncompressed Output 01100010 01100001 01100100 00100000 01100011 01100001 01100010 Huffman Encoding 100 'd' 00 97 'a' 01 98 'b' 10 32 ' ' 110 99 'c' 111 19

Part B: Decompressing the File Step 1: Reconstruct the Huffman tree from the code file Step 2: Translate the compressed bits back to their character values. 20

Step 1: Reconstruct the Huffman Tree Now are just given the code file produced by our program and we need to reconstruct the tree. Input code File 97 0 101 100 32 101 112 11 Initially the tree is empty a p e 21

Step 1: Reconstruct the Huffman Tree Now are just given the code file produced by our program and we need to reconstruct the tree. Input code File 97 0 101 100 32 101 112 11 Tree after processing first pair a p e 21

Step 1: Reconstruct the Huffman Tree Now are just given the code file produced by our program and we need to reconstruct the tree. Input code File 97 0 101 100 32 101 112 11 Tree after processing second pair a p e 21

Step 1: Reconstruct the Huffman Tree Now are just given the code file produced by our program and we need to reconstruct the tree. Input code File 97 0 101 100 32 101 112 11 Tree after processing third pair a p e 21

Step 1: Reconstruct the Huffman Tree Now are just given the code file produced by our program and we need to reconstruct the tree. Input code File 97 0 101 100 32 101 112 11 Tree after processing last pair a p e 21

Step 2 Example After building up tree, we will read the compressed file bit by bit. Input 0101110110101011100 Output a papa ape 0 a 1 p e 22

Working with Bits? That Sounds a Little Bit Hard Reading bits in Java is kind of tricky, we are providing a class to help! BitInputStream(String file) hasnextbit() nextbit() public class BitInputStream Creates a stream of bits from file Returns true if bits remain in the stream Reads and returns the next bit in the stream 23

Review - Homework 7 Part A: Compression public HuffmanCode ( int [] counts ) Slides 15-17 public void save ( PrintStream out ) Slide 18 Part B: Decompression public HuffmanCode ( Scanner input ) Slide 21 public void translate ( BitInputStream in, PrintStream out ) Slide 22 24