CS 200 Algorithms and Data Structures, Fall 2012 Programming Assignment #3

Similar documents
CS 200 Algorithms and Data Structures, Fall 2012 Programming Assignment #4

Huffman Coding Assignment For CS211, Bellevue College (rev. 2016)

Out: April 19, 2017 Due: April 26, 2017 (Wednesday, Reading/Study Day, no late work accepted after Friday)

15 July, Huffman Trees. Heaps

An undirected graph is a tree if and only of there is a unique simple path between any 2 of its vertices.

CS15100 Lab 7: File compression

Homework Assignment #3

Graduate-Credit Programming Project

COSC-211: DATA STRUCTURES HW5: HUFFMAN CODING. 1 Introduction. 2 Huffman Coding. Due Thursday, March 8, 11:59pm

CS02b Project 2 String compression with Huffman trees

Homework 3 Huffman Coding. Due Thursday October 11

CS 206 Introduction to Computer Science II

15-122: Principles of Imperative Computation, Spring 2013

CS3114 (Fall 2013) PROGRAMMING ASSIGNMENT #2 Due Tuesday, October 11:00 PM for 100 points Due Monday, October 11:00 PM for 10 point bonus

EE 368. Weeks 5 (Notes)

Chapter 10: Trees. A tree is a connected simple undirected graph with no simple circuits.

CSE 143 Lecture 22. Huffman Tree

Greedy Algorithms. Alexandra Stefan

ASCII American Standard Code for Information Interchange. Text file is a sequence of binary digits which represent the codes for each character.

CSE 143, Winter 2013 Programming Assignment #8: Huffman Coding (40 points) Due Thursday, March 14, 2013, 11:30 PM

Lab Assignment. Lab 1, Part 1: Stretches. Assignment Preparation. The Task. .. Spring 2008 CSC/CPE 365: Database Systems Alexander Dekhtyar..

Data Structures and Algorithms

6. Finding Efficient Compressions; Huffman and Hu-Tucker

Huffman, YEAH! Sasha Harrison Spring 2018

Text Compression through Huffman Coding. Terminology

ENSC Multimedia Communications Engineering Topic 4: Huffman Coding 2

CSE100. Advanced Data Structures. Lecture 12. (Based on Paul Kube course materials)

ENSC Multimedia Communications Engineering Huffman Coding (1)

Data Compression Algorithms

CS 215 Fundamentals of Programming II Fall 2017 Project 7. Morse Code. 30 points. Out: November 20, 2017 Due: December 4, 2017 (Monday) a n m

Programming Standards: You must conform to good programming/documentation standards. Some specifics:

ASCII American Standard Code for Information Interchange. Text file is a sequence of binary digits which represent the codes for each character.

Design and Analysis of Algorithms

4/16/2012. Data Compression. Exhaustive search, backtracking, object-oriented Queens. Check out from SVN: Queens Huffman-Bailey.

Binary Trees Case-studies

Compilers Project 3: Semantic Analyzer

Red-Black, Splay and Huffman Trees

Huffman Coding. Version of October 13, Version of October 13, 2014 Huffman Coding 1 / 27

Chapter 16: Greedy Algorithm

Data compression.

CSE100. Advanced Data Structures. Lecture 13. (Based on Paul Kube course materials)

More Bits and Bytes Huffman Coding

Information Theory and Coding Prof. S. N. Merchant Department of Electrical Engineering Indian Institute of Technology, Bombay

CSE143X: Computer Programming I & II Programming Assignment #10 due: Friday, 12/8/17, 11:00 pm

Search Trees. Data and File Structures Laboratory. DFS Lab (ISI) Search Trees 1 / 17

ITI Introduction to Computing II

More On inheritance. What you can do in subclass regarding methods:

Basic Compression Library

CMPSCI 250: Introduction to Computation. Lecture #14: Induction and Recursion (Still More Induction) David Mix Barrington 14 March 2013

Lecture: Analysis of Algorithms (CS )

2010 Canadian Computing Competition: Senior Division. Sponsor:

Java Collections Framework Intro to Trees

Huffman Codes (data compression)

Algorithms and Data Structures CS-CO-412

CS 344/444 Spring 2008 Project 2 A simple P2P file sharing system April 3, 2008 V0.2

David Rappaport School of Computing Queen s University CANADA. Copyright, 1996 Dale Carnegie & Associates, Inc.

Linked Structures Songs, Games, Movies Part IV. Fall 2013 Carola Wenk

Adaptive Huffman Coding (FastHF) Implementations

Greedy Algorithms. CLRS Chapters Introduction to greedy algorithms. Design of data-compression (Huffman) codes

Intro. To Multimedia Engineering Lossless Compression

CMPSCI 240 Reasoning Under Uncertainty Homework 4

ITI Introduction to Computing II

Trees! Ellen Walker! CPSC 201 Data Structures! Hiram College!

Assignment pts

Lecture 34. Wednesday, April 6 CS 215 Fundamentals of Programming II - Lecture 34 1

Fundamentals of Multimedia. Lecture 5 Lossless Data Compression Variable Length Coding

Efficient Sequential Algorithms, Comp309. Motivation. Longest Common Subsequence. Part 3. String Algorithms

Huffman Coding. (EE 575: Source Coding Project) Project Report. Submitted By: Raza Umar. ID: g

Programming Assignment 1

Encoding. A thesis submitted to the Graduate School of University of Cincinnati in

Priority Queues and Huffman Encoding

16 Greedy Algorithms

Upcoming ACM Events Linux Crash Course Date: Time: Location: Weekly Crack the Coding Interview Date:

Data Compression Fundamentals

Algorithms and Data Structures CS-CO-412

3. When you process a largest recent earthquake query, you should print out:

4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd

A New Compression Method Strictly for English Textual Data

Building Java Programs. Priority Queues, Huffman Encoding

CS155: Computer Security Spring Project #1

Priority Queues and Huffman Encoding

Greedy Algorithms and Huffman Coding

CS 361S - Network Security and Privacy Spring Project #2

CS337 Project 1 : Compression

Com S 227 Assignment Submission HOWTO

Analysis of Algorithms

Computer Science II Fall 2009

0. Introduction. National Taipei University of Technology. Object-Oriented Programming (Fall, 2008) Homework # 1 (Due: Mon., Oct.

14 Data Compression by Huffman Encoding

CS 361S - Network Security and Privacy Spring Project #2

Scribe: Virginia Williams, Sam Kim (2016), Mary Wootters (2017) Date: May 22, 2017

Test #2. Login: 2 PROBLEM 1 : (Balance (6points)) Insert the following elements into an AVL tree. Make sure you show the tree before and after each ro

Technical University of Denmark

Chapter 20: Binary Trees

Binary Search Tree (3A) Young Won Lim 6/2/18

Previous Capstone Project

School of Computer Science

GZIP is a software application used for file compression. It is widely used by many UNIX

ECE 242 Data Structures and Algorithms. Trees IV. Lecture 21. Prof.

Basics of Information Worksheet

Transcription:

Compressing Data using Huffman Coding Due Oct.24 noon Objectives In this assignment, you will implement classes for data compression. You will write: () An implementation of the Huffman Coding using a Tree data structure (2) A class to encode a string Your system should provide a complete implementation of the interfaces and skeleton files that are provided. Background: Huffman Coding Huffman coding was developed by David Huffman in a term paper he wrote in 95 while he was a graduate student at MIT. Huffman coding is a fundamental algorithm in data compression, the subject devoted to reducing the number of bits required to represent information. Huffman coding is extensively used to compress bit strings representing text and it also plays an important role in compressing audio and image files. Based on the symbols and their frequencies, the goal is to construct a rooted binary tree where the symbols are the labels of the leaves. The algorithm begins with a forest of trees. At each step, we combine two trees having the least total weight (here frequency) into a single tree by introducing a new root and placing the tree with larger weight as its left subtree and the tree with smaller weight as its right subtree. The algorithm completes when it has constructed a tree. Table is an example of a document containing only 6 types of characters. Frequency shows the number of appearances of the character in the document. Figure depicts the process of building a Huffman tree with information from the Table. At the end of the process, each of the characters will have a Huffman code associated with them. The decoding procedure starts by visiting the first bit in the stream. The bit is used to determine whether to go left or right in the Huffman tree. When you reach a leaf node, the character stored in the leaf node should be written to the output stream. For the next bit in the bit stream, your algorithm re-starts from the root node of the Huffman tree. Figure depicts an example of decoding. From the root of this tree, the bit stream will result in selecting the right child three times consecutively. Finally, it reaches a leaf node associated with the character A and repeats this steps for successive bits in the stream until it reaches the end of the stream. Task Description Part. Build a Frequency Table The first task is to build a frequency table. Please refer to the example in Table. At this point, you should fill the first two columns of this table: Character and Frequency. The Huffman code is not known yet. In this table, ) Items include characters that appear only in the input string.

2) Characters are case sensitive 3) Characters are listed in the order of appearance in the input string 4) The minimum number of characters is 2 5) The frequency refers to the number of appearances of the character. You do not need to normalize this number. Character Freque -ncy Huffman Code A 8 B 0 0 E 2 0 G 5 00 K 20 0 M 35 00 Table. Frequency Table Initial Forest Step 3 E, 2 G, 5 G, 5 E, 2 Step E, 2 G, 5 Step 2 G, 5 E, 2 Step 4 Step 5 0 *, 00 0 G, 5 0 0 0 E, 2 G, 5 Figure. Huffman Coding of Symbols in Table. E, 2 2

*, 00 0 Encoded bit stream: 000 Decoded Characters: AME 0 0 0 0 K, 20 G, 5 E, B, A, 2 0 8 Figure 2. Decoding Example Part 2. Build a Huffman Coding Tree Follow the algorithm and build a Huffman coding tree based on the information stored in the frequency table. (Example: Rosen 0.2 and example 5) Part 3. List the Huffman Codes Based on the Huffman Coding tree built in the part 2, you should create the Huffman codes for each of the characters that appear in the input string. Please note that you DO NOT need to perform bit operations in this assignment. Please store the encoded information as a String object. For example, to represent the bit stream 0000, you can create a String object with a new String 0000. String huffmancode = 0000 ; The Huffman code should be stored in the corresponding column of the Frequency Table. (Please refer to the Huffman Code column in the Table.) Part 4. Encoding a String Based on the Huffman code generated by part 3, your software should convert the input string into encoded bits. For each of the characters in the input string, look up the bit stream in the Frequency Table and replace the character with the encoded bit stream. Part 5. Decoding bits Decode the bit stream using the Huffman Coding Tree generated in the part 2. 3

Part 6.Testing Your Software Test your software with the included testing program: PA3_Test.java () PA3_Test This program tests, ) Frequency Table 2) Encoding 3) Decoding It takes as argument the input string to be encoded. If you want to include a space in your input string please use quotation marks around your string. (This might work only on the command line not in the Eclipse IDE) % java PA3_Test eeyjjjj char frequency code -------------------------------------- e 2 0 y j 4 0 Encoded bit stream 000000 Total number of bits without Huffman coding: 2 Total number of bits with Huffman coding: 0 Decoded String: eeyjjjj For the set of characters that occur the same number of times, there may be more than one possible set of codes. Similarly, a non-leaf node and a leaf node can have same number of frequencies and it can cause more than one possible set of codes. % java PA3_Test "My Test works totally fine" char frequency code -------------------------------------- M 0 y 2 000 4 000 T 00 e 2 00 s 2 000 t 3 0 w 00 o 2 00 r 000 k a 0 l 2 000 4

f 0 i 00 n 00 Encoded bit stream 00000000000000000000000000000000000 00000000000000000000 Total number of bits without Huffman coding: 46 Total number of bits with Huffman coding: 0 Decoded String: My Test works totally fine Deliverables Submit a tar ball of your java source code including: Decoder.java Encoder.java HuffmanFrequencyTable.java HuffmanTree.java HuffmanTreeNode.java TableItem.java Keep all of your source code in a single flat directory. The skeleton files are provided. Please do not modify PA3_Test.java and interface files Note: You are required to work as a team in this assignment. You and your teammate should submit only ONE copy of the assignment. Please write down the implementer s name(s) on top of each of the source code. Grading This assignment will account for 5% of your final grade. The grading itself will be done on a 50 point scale. Late Policy Please check the late policy available from the course web page 5