Data compression.

Similar documents
15 July, Huffman Trees. Heaps

4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd

Fundamentals of Multimedia. Lecture 5 Lossless Data Compression Variable Length Coding

Lossless Compression Algorithms

CMPSCI 240 Reasoning Under Uncertainty Homework 4

CSE100. Advanced Data Structures. Lecture 12. (Based on Paul Kube course materials)

CSE 421 Greedy: Huffman Codes

An undirected graph is a tree if and only of there is a unique simple path between any 2 of its vertices.

Out: April 19, 2017 Due: April 26, 2017 (Wednesday, Reading/Study Day, no late work accepted after Friday)

ASCII American Standard Code for Information Interchange. Text file is a sequence of binary digits which represent the codes for each character.

15-122: Principles of Imperative Computation, Spring 2013

David Rappaport School of Computing Queen s University CANADA. Copyright, 1996 Dale Carnegie & Associates, Inc.

Algorithms and Data Structures CS-CO-412

MCS-375: Algorithms: Analysis and Design Handout #G2 San Skulrattanakulchai Gustavus Adolphus College Oct 21, Huffman Codes

Entropy Coding. - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic Code

Huffman Coding. (EE 575: Source Coding Project) Project Report. Submitted By: Raza Umar. ID: g

Multimedia Networking ECE 599

6. Finding Efficient Compressions; Huffman and Hu-Tucker

Homework 3 Huffman Coding. Due Thursday October 11

CSE 143 Lecture 22. Huffman Tree

14.4 Description of Huffman Coding

CS 206 Introduction to Computer Science II

Huffman Coding Assignment For CS211, Bellevue College (rev. 2016)

CS/COE 1501

Text Compression through Huffman Coding. Terminology

Programming Abstractions

EE67I Multimedia Communication Systems Lecture 4

Huffman Coding. Version of October 13, Version of October 13, 2014 Huffman Coding 1 / 27

CSE143X: Computer Programming I & II Programming Assignment #10 due: Friday, 12/8/17, 11:00 pm

Search Trees. Data and File Structures Laboratory. DFS Lab (ISI) Search Trees 1 / 17

CMPSC112 Lecture 37: Data Compression. Prof. John Wenskovitch 04/28/2017

ASCII American Standard Code for Information Interchange. Text file is a sequence of binary digits which represent the codes for each character.

Huffman, YEAH! Sasha Harrison Spring 2018

CSE 143, Winter 2013 Programming Assignment #8: Huffman Coding (40 points) Due Thursday, March 14, 2013, 11:30 PM

Digital Image Processing

CS/COE 1501

Greedy Algorithms. Alexandra Stefan

Compression. storage medium/ communications network. For the purpose of this lecture, we observe the following constraints:

Data Compression. An overview of Compression. Multimedia Systems and Applications. Binary Image Compression. Binary Image Compression

EE 368. Weeks 5 (Notes)

Greedy Algorithms and Huffman Coding

! Tree: set of nodes and directed edges. ! Parent: source node of directed edge. ! Child: terminal node of directed edge

14 Data Compression by Huffman Encoding

Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Lecture 10 (Chapter 7) ZHU Yongxin, Winson

Tree: non-recursive definition. Trees, Binary Search Trees, and Heaps. Tree: recursive definition. Tree: example.

Binary Trees. Directed, Rooted Tree. Terminology. Trees. Binary Trees. Possible Implementation 4/18/2013

Postfix (and prefix) notation

CS 241 Data Organization Binary Trees

CS106B Handout 34 Autumn 2012 November 12 th, 2012 Data Compression and Huffman Encoding

CS301 - Data Structures Glossary By

Chapter 20: Binary Trees

CSE100. Advanced Data Structures. Lecture 13. (Based on Paul Kube course materials)

! Tree: set of nodes and directed edges. ! Parent: source node of directed edge. ! Child: terminal node of directed edge

8. Binary Search Tree

Data Structures and Algorithms

ITI Introduction to Computing II

Binary Trees and Huffman Encoding Binary Search Trees

ITI Introduction to Computing II

Greedy Approach: Intro

IMAGE COMPRESSION- I. Week VIII Feb /25/2003 Image Compression-I 1

Greedy Algorithms CHAPTER 16

Huffman Codes (data compression)

CS02b Project 2 String compression with Huffman trees

FINALTERM EXAMINATION Fall 2009 CS301- Data Structures Question No: 1 ( Marks: 1 ) - Please choose one The data of the problem is of 2GB and the hard

Building Java Programs. Priority Queues, Huffman Encoding

Algorithms Dr. Haim Levkowitz

COP 3502 Section 2 Exam #2 Version A Spring /23/2017

Copyright 1998 by Addison-Wesley Publishing Company 147. Chapter 15. Stacks and Queues

6. Finding Efficient Compressions; Huffman and Hu-Tucker Algorithms

Encoding. A thesis submitted to the Graduate School of University of Cincinnati in

Module 4: Index Structures Lecture 13: Index structure. The Lecture Contains: Index structure. Binary search tree (BST) B-tree. B+-tree.

CSCI 136 Data Structures & Advanced Programming. Lecture 22 Fall 2018 Instructor: Bills

Chapter 9. Greedy Technique. Copyright 2007 Pearson Addison-Wesley. All rights reserved.

Lec 17 April 8. Topics: binary Trees expression trees. (Chapter 5 of text)

Red-Black, Splay and Huffman Trees

4/16/2012. Data Compression. Exhaustive search, backtracking, object-oriented Queens. Check out from SVN: Queens Huffman-Bailey.

Analysis of Algorithms - Greedy algorithms -

IX. Binary Trees (Chapter 10)

5.5 Data Compression


Trees! Ellen Walker! CPSC 201 Data Structures! Hiram College!

cs2010: algorithms and data structures

Priority Queues and Huffman Encoding

CS 2301 Exam 3 B-Term 2011

Heaps / Priority Queues

Intro. To Multimedia Engineering Lossless Compression

Priority Queues and Huffman Encoding

Queues. A queue is a special type of structure that can be used to maintain data in an organized way.

Ch. 2: Compression Basics Multimedia Systems

Data Structures. Outline. Introduction Linked Lists Stacks Queues Trees Deitel & Associates, Inc. All rights reserved.

Dictionaries. Priority Queues

MULTIMEDIA COLLEGE JALAN GURNEY KIRI KUALA LUMPUR

Heaps and Priority Queues

Binary Heaps in Dynamic Arrays

Programming Abstractions

Algorithms. Algorithms 5.5 DATA COMPRESSION. introduction run-length coding Huffman compression LZW compression ROBERT SEDGEWICK KEVIN WAYNE

Priority Queues and Huffman Trees

TREES cs2420 Introduction to Algorithms and Data Structures Spring 2015

More Bits and Bytes Huffman Coding

GZIP is a software application used for file compression. It is widely used by many UNIX

Transcription:

Data compression anhtt-fit@mail.hut.edu.vn dungct@it-hut.edu.vn

Data Compression Data in memory have used fixed length for representation For data transfer (in particular), this method is inefficient. For speed and storage efficiencies, data symbols should use the minimum number of bits possible for representation. Methods Used For Compression: Encode high probability symbols with fewer bits Shannon-Fano, Huffman, UNIX compact Encode sequences of symbols with location of sequence in a dictionary PKZIP, ARC, GIF, UNIX compress, V.bis Lossy compression JPEG and MPEG

Variable Length Bit Codings Suppose A appears 50 times in text, but B appears only 0 times ASCII coding assigns 8 bits per character, so total bits for A and B is 60 * 8 = 80 If A gets a -bit code and B gets a -bit code, total is 50 * + 0 * = 30 Compression rules: Use minimum number of bits No code is the prefix of another code Enables left-to-right, unambiguous decoding

Variable Length Bit Codings No code is a prefix of another For example, can t have A map to 0 and B map to 00, because 0 is a prefix (the start of) 00. Enables left-to-right, unambiguous decoding That is, if you see 0, you know it s A, not the start of another character.

Variable-length encoding Use different number of bits to encode different characters. Ex. Morse code. Issue: ambiguity. - - - SOS? IAMIE? EEWNI? V7O?

Huffman code Constructed by using a code tree, but starting at the leaves A compact code constructed using the binary Huffman code construction method Huffman code Algorithm 3 Make a leaf node for each code symbol Add the generation probability of each symbol to the leaf node Take the two leaf nodes with the smallest probability and connect them into a new node Add or 0 to each of the two branches The probability of the new node is the sum of the probabilities of the two connecting nodes If there is only one node left, the code construction is completed. If not, go back to ()

Demo 65demo-huffman.ppt

Compress a text Consider the following short text: Eerie eyes seen near lake. Count up the occurrences of all characters in the text Char Freq. Char Freq. Char Freq. E y k e 8 s. r n i a space l

Building a Tree The queue after inserting all nodes E i y l k. r s n a s p e 8 Null Pointers are not shown

Building a Tree While priority queue contains two or more nodes Create new node Dequeue node and make it left subtree Dequeue next node and make it right subtree Frequency of new node equals sum of frequency of left and right children Enqueue new node back into queue

Building a Tree E i y l k. r s n a s p e 8

Building a Tree y l k. r s n a sp e 8 E i

Building a Tree y l k. r s n a sp e 8 E i

Building a Tree k. r s n a sp e 8 E i y l

Building a Tree k. r s n a sp e E i y l 8

Building a Tree r s n a sp e E i y l 8 k.

Building a Tree r s n a sp e 8 E i y l k. To continue

At the end E i y 0 l k. 6 6 sp e 8 r 6 s 8 n a After enqueueing this node there is only one node left in priority queue.

How to implement? Reuse JRB to represent the tree Each new node is created as a JRB node The edges are directional from the parents to the children. Two edges are created and marked using label 0 or when a parent node is created. Reuse Dllist or JRB to represent the priority queue A queue node contains a key as the frequency of the related node in the tree The queue node s value is a pointer referencing to the node in the tree

Quiz Reuse the graph API defined in previous class to write a function that builds a Huffman tree from a string as the following typedef struct { Graph graph; JRB root; } HuffmanTree; HuffmanTree makehuffman (char * buffer, int size);

Huffman code table In order to compress the data string, we have to build a code table from the Huffman tree. The following data structure is used to represent the code table typedef struct { int size; char bits[]; } Coding; Coding huffmantable[56]; huffmantable['a'] give the coding of 'A'. If the coding s size = 0, the character 'A' is not present in the text. bits contains the huffman code (sequence of bits) of the given character.

Quiz Write a function to create the Huffman code table from a Huffman tree void createhuffmantable(huffmantree htree, Coding* htable); Write a function to compress a text buffer to a Huffman sequence. void compress(char * buffer, int size, char* huffman, int* nbit); The buffer contains size characters. After compressing, the huffman buffer contains nbit bits for output. In order to write this function, you should create a function to add a new character into the huffman buffer as the following void addhuffmanchar(char * ch, Coding* htable, char* huffman, int* nbit);