Data Compression Fundamentals

Similar documents
Ch. 2: Compression Basics Multimedia Systems

ITEC2620 Introduction to Data Structures

Multimedia Networking ECE 599

Entropy Coding. - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic Code

Bayesian Networks and Decision Graphs

Lossless Compression Algorithms

Chapter 7 Lossless Compression Algorithms

Data Compression. An overview of Compression. Multimedia Systems and Applications. Binary Image Compression. Binary Image Compression

Huffman Code Application. Lecture7: Huffman Code. A simple application of Huffman coding of image compression which would be :

Ch. 2: Compression Basics Multimedia Systems

Automata Theory TEST 1 Answers Max points: 156 Grade basis: 150 Median grade: 81%

Chapter 4: Regular Expressions

Data Compression. Guest lecture, SGDS Fall 2011

2017 ACM ICPC ASIA, INDIA REGIONAL ONLINE CONTEST

Fundamentals of Multimedia. Lecture 5 Lossless Data Compression Variable Length Coding

Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Lecture 10 (Chapter 7) ZHU Yongxin, Winson

On the Parikh-de-Bruijn grid

EE67I Multimedia Communication Systems Lecture 4

Data Compression. Media Signal Processing, Presentation 2. Presented By: Jahanzeb Farooq Michael Osadebey

David Rappaport School of Computing Queen s University CANADA. Copyright, 1996 Dale Carnegie & Associates, Inc.

Chapter 4: Application Protocols 4.1: Layer : Internet Phonebook : DNS 4.3: The WWW and s

Intro. To Multimedia Engineering Lossless Compression

Encoding. A thesis submitted to the Graduate School of University of Cincinnati in

15 July, Huffman Trees. Heaps

CS/COE 1501

Data Compression Techniques

SIGNAL COMPRESSION Lecture Lempel-Ziv Coding

A Research Paper on Lossless Data Compression Techniques

Data Compression Techniques

So, what is data compression, and why do we need it?

CS402 Theory of Automata Solved Subjective From Midterm Papers. MIDTERM SPRING 2012 CS402 Theory of Automata

Proof Techniques Alphabets, Strings, and Languages. Foundations of Computer Science Theory

CS/COE 1501

A study in compression algorithms

A New Compression Method Strictly for English Textual Data

Image compression. Stefano Ferrari. Università degli Studi di Milano Methods for Image Processing. academic year

IMAGE PROCESSING (RRY025) LECTURE 13 IMAGE COMPRESSION - I

15 Data Compression 2014/9/21. Objectives After studying this chapter, the student should be able to: 15-1 LOSSLESS COMPRESSION

Notes for Comp 454 Week 2

Analysis of Parallelization Effects on Textual Data Compression

A Comparative Study of Entropy Encoding Techniques for Lossless Text Data Compression

CS 493: Algorithms for Massive Data Sets Dictionary-based compression February 14, 2002 Scribe: Tony Wirth LZ77

JPEG: An Image Compression System. Nimrod Peleg update: Nov. 2003

Assignment No.4 solution. Pumping Lemma Version I and II. Where m = n! (n-factorial) and n = 1, 2, 3

Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay

Repetition 1st lecture

Glynda, the good witch of the North

Novel Lossy Compression Algorithms with Stacked Autoencoders

Digital Image Processing

CSE100. Advanced Data Structures. Lecture 12. (Based on Paul Kube course materials)

Analysis of Algorithms

CGF Lecture 2 Numbers

An undirected graph is a tree if and only of there is a unique simple path between any 2 of its vertices.

ECE 499/599 Data Compression & Information Theory. Thinh Nguyen Oregon State University

Red-Black, Splay and Huffman Trees

2.2: Images and Graphics Digital image representation Image formats and color models JPEG, JPEG2000 Image synthesis and graphics systems

Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay

A Image Comparative Study using DCT, Fast Fourier, Wavelet Transforms and Huffman Algorithm

Digital Image Representation. Image Representation. Color Models

Compiler Construction

CHAPTER TWO LANGUAGES. Dr Zalmiyah Zakaria

Study on LZW algorithm for Embedded Instruction Memory.

CSEP 521 Applied Algorithms Spring Lossy Image Compression

14.4 Description of Huffman Coding

Application of Huffman Coding in Lossless Video Compression

CHAPTER 6. 6 Huffman Coding Based Image Compression Using Complex Wavelet Transform. 6.3 Wavelet Transform based compression technique 106

JNTUWORLD. Code No: R

Indexing Variable Length Substrings for Exact and Approximate Matching

IMAGE COMPRESSION- I. Week VIII Feb /25/2003 Image Compression-I 1

Multimedia Communications ECE 728 (Data Compression)

Engineering Mathematics II Lecture 16 Compression

Lecture 8 JPEG Compression (Part 3)

Lec-5-HW-1, TM basics

Digital Image Processing

CoE4TN4 Image Processing. Chapter 8 Image Compression

Image Compression Algorithm and JPEG Standard

Video Coding in H.26L

IMAGE COMPRESSION. Image Compression. Why? Reducing transportation times Reducing file size. A two way event - compression and decompression

Fundamentals of Video Compression. Video Compression

Data Encryption on FPGA using Huffman Coding

Comparative Study between Various Algorithms of Data Compression Techniques

Image and Video Compression Fundamentals

2014 Summer School on MPEG/VCEG Video. Video Coding Concept

Image Compression for Mobile Devices using Prediction and Direct Coding Approach

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

JPEG: An Image Compression System

and n is an even positive integer, then A n is a

Greedy Algorithms. Alexandra Stefan

1. Which of the following regular expressions over {0, 1} denotes the set of all strings not containing 100 as a sub-string?

Greedy Algorithms CHAPTER 16

ASCII American Standard Code for Information Interchange. Text file is a sequence of binary digits which represent the codes for each character.

A.1 Numbers, Sets and Arithmetic

UCSD ECE154C Handout #21 Prof. Young-Han Kim Thursday, June 8, Solutions to Practice Final Examination (Spring 2016)

Data compression.

Data Representation. Types of data: Numbers Text Audio Images & Graphics Video

CHAPTER 4 REVERSIBLE IMAGE WATERMARKING USING BIT PLANE CODING AND LIFTING WAVELET TRANSFORM

Huffman Coding. Version of October 13, Version of October 13, 2014 Huffman Coding 1 / 27

Arithmetic Coding. Arithmetic Coding

Keywords Data compression, Lossless data compression technique, Huffman Coding, Arithmetic coding etc.

Lecture 5: Compression I. This Week s Schedule

Transcription:

1 Data Compression Fundamentals Touradj Ebrahimi Touradj.Ebrahimi@epfl.ch

2 Several classifications of compression methods are possible Based on data type :» Generic data compression» Audio compression» Image compression» Video compression» Virtual reality compression

3 Based on compression type :» Lossless : The decoded (uncompressed) data will be exactly equal to the original (modest compression ratios).» Lossy : The decoded (uncompressed) data will be a replica of the original, but not necessarily the same (higher compression ratios).

4 Information theory was developed to provide a mathematical tool to better design data compression algorithms. Entropy of the source generating the data: It is impossible to compress data in a lossless way with a bitrate less than the entropy of the source that generated it. The entropy H pf the source generating a data is in general impossible to measure in practice, due to the large amount of interdependencies (of infinite order) and the non-stationarities. Usually, a zero-order entropy measure is used to estimate the entropy of the source: H 0 = p i.log 2 (p i ) i S

5 Lossless data compression is widely used in computers. They are based on the following approaches Huffman coding Arithmetic coding Substitutional (dictionary based) coding

6 Huffman codes represent every symbol with a number of bits inversely proportional to its frequency (probability of appearance) Mechanism : A binary tree is built bottom-up by grouping the symbols with the smallest probabilities, and by assigning the sum of the probability of children to the parent node. Each symbol is then represented by one leaf of the tree and coded by its address

7 Huffman coding (example) symbol frequency A B C D E 15 7 6 6 5

8 Huffman coding (example) A (15) B (7) C (6) D (6) E (5)

9 Huffman coding (example) 0 1 DE (11) A (15) B (7) C (6) D (6) E (5)

10 Huffman coding (example) 0 1 0 1 BC (13) DE (11) A (15) B (7) C (6) D (6) E (5)

11 Huffman coding (example) 0 1 BCDE (24) 0 1 0 1 BC (13) DE (11) A (15) B (7) C (6) D (6) E (5)

12 Huffman coding (example) 0 1 ABCDE (39) 0 1 BCDE (24) 0 1 0 1 BC (13) DE (11) A (15) B (7) C (6) D (6) E (5)

13 Huffman coding (example) 0 1 symbol code ABCDE (39) 0 1 BCDE (24) 0 1 0 1 A B C 0 100 101 BC (13) DE (11) A (15) B (7) C (6) D (6) E (5) D E 110 111

14 Arithmetic codes represent a sequence of symbols by assigning to them the binary representation of an interval of a length smaller than one. Always more efficient than Huffman codes Separate the model from bit assignment and therefore allow a simpler adaptive scheme Computationally efficient

15 Arithmetic coding (example) P(A) = 1/3 P(B) = 2/3 1 A 2/3 B 0

16 Arithmetic coding (example) P(A) = 1/3 P(B) = 2/3 1 A 8/9 AA AB 2/3 BA 4/9 B BB 0

17 Arithmetic coding (example) P(A) = 1/3 P(B) = 2/3 1 A 8/9 AA AB AAA AAB ABA ABB 2/3 B 4/9 BA BB 16/27 8/27 BAA BAB BBA BBB 0

18 Arithmetic coding (example) P(A) = 1/3 P(B) = 2/3 segment code 1 A 8/9 AA AB AAA AAB ABA ABB 31/32.11111 15/16.1111 14/16.1110 6/8.110 2/3 B 4/9 BA BB 16/27 8/27 BAA BAB BBA BBB 10/16.1010 4/8.100 3/8.011 1/4.01 0

19 Adaptive arithmetic coding Encoder Decoder Model Model

20 Substitutional codes use a dictionary of words while coding a sequence of symbols and use the position of the words to encode words rather than single symbols. Efficient compression Computationally efficient

21 Substitutional coding (example) input output dictionary /W E D / WE / W E D 256 256=/W 257=WE 258=ED 259=D/ 260=/WE... E 261=...

22 For lossy coding, the rate-distortion theory was developed Rate-distortion optimization : Find the lowest bitrate possible for a certain distortion, or the lowest distortion for a given bitrate. The most popular distortion measure is the mean square error (MSE): N MSE= 1 [x(i) x^ (i)] 2 N i=1 The MSE does not always reflect the real distortion perceived by human visual system.

23 Distortion Optimal R-D curve best solution target rate Rate