Data Compression. An overview of Compression. Multimedia Systems and Applications. Binary Image Compression. Binary Image Compression

Similar documents
EE67I Multimedia Communication Systems Lecture 4

Ch. 2: Compression Basics Multimedia Systems

Repetition 1st lecture

Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Lecture 10 (Chapter 7) ZHU Yongxin, Winson

Lossless Compression Algorithms

Chapter 7 Lossless Compression Algorithms

Entropy Coding. - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic Code

A Comparative Study of Entropy Encoding Techniques for Lossless Text Data Compression

Fundamentals of Multimedia. Lecture 5 Lossless Data Compression Variable Length Coding

Intro. To Multimedia Engineering Lossless Compression

Ch. 2: Compression Basics Multimedia Systems

Digital Image Processing

Multimedia Systems. Part 20. Mahdi Vasighi

Compression I: Basic Compression Algorithms

Multimedia Networking ECE 599

Engineering Mathematics II Lecture 16 Compression

A Research Paper on Lossless Data Compression Techniques

David Rappaport School of Computing Queen s University CANADA. Copyright, 1996 Dale Carnegie & Associates, Inc.

CS/COE 1501

Compression. storage medium/ communications network. For the purpose of this lecture, we observe the following constraints:

Noise Reduction in Data Communication Using Compression Technique

An Advanced Text Encryption & Compression System Based on ASCII Values & Arithmetic Encoding to Improve Data Security

International Journal of Trend in Research and Development, Volume 3(2), ISSN: A Review of Coding Techniques in the Frequency

Image coding and compression

Keywords Data compression, Lossless data compression technique, Huffman Coding, Arithmetic coding etc.

Data Compression Fundamentals

CS/COE 1501

A Comparative Study Of Text Compression Algorithms

A Comparative Study of Lossless Compression Algorithm on Text Data

Optimized Compression and Decompression Software

Data Compression. Media Signal Processing, Presentation 2. Presented By: Jahanzeb Farooq Michael Osadebey

Compressing Data. Konstantin Tretyakov

Chapter 1. Digital Data Representation and Communication. Part 2

EE-575 INFORMATION THEORY - SEM 092

ECE 499/599 Data Compression & Information Theory. Thinh Nguyen Oregon State University

ITCT Lecture 8.2: Dictionary Codes and Lempel-Ziv Coding

Data Compression. Guest lecture, SGDS Fall 2011

Data compression with Huffman and LZW

CSE 421 Greedy: Huffman Codes

CS106B Handout 34 Autumn 2012 November 12 th, 2012 Data Compression and Huffman Encoding

Encoding. A thesis submitted to the Graduate School of University of Cincinnati in

So, what is data compression, and why do we need it?

Greedy Algorithms CHAPTER 16

15 July, Huffman Trees. Heaps

ASCII American Standard Code for Information Interchange. Text file is a sequence of binary digits which represent the codes for each character.

Analysis of Parallelization Effects on Textual Data Compression

A New Compression Method Strictly for English Textual Data

Digital Image Processing

MCS-375: Algorithms: Analysis and Design Handout #G2 San Skulrattanakulchai Gustavus Adolphus College Oct 21, Huffman Codes

Data Compression Techniques

A Compression Technique Based On Optimality Of LZW Code (OLZW)

15 Data Compression 2014/9/21. Objectives After studying this chapter, the student should be able to: 15-1 LOSSLESS COMPRESSION

Image Compression for Mobile Devices using Prediction and Direct Coding Approach

Lecture 6 Review of Lossless Coding (II)

Data Structures and Algorithms

IMAGE PROCESSING (RRY025) LECTURE 13 IMAGE COMPRESSION - I

Department of electronics and telecommunication, J.D.I.E.T.Yavatmal, India 2

Text Compression through Huffman Coding. Terminology

IMAGE COMPRESSION- I. Week VIII Feb /25/2003 Image Compression-I 1

CS 335 Graphics and Multimedia. Image Compression

Do not turn this page over until instructed to do so by the Senior Invigilator.

Dictionary techniques

Data Compression 신찬수

A Comprehensive Review of Data Compression Techniques

DEFLATE COMPRESSION ALGORITHM

Data compression.

7: Image Compression

ASCII American Standard Code for Information Interchange. Text file is a sequence of binary digits which represent the codes for each character.

Efficient Sequential Algorithms, Comp309. Motivation. Longest Common Subsequence. Part 3. String Algorithms

Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay

WIRE/WIRELESS SENSOR NETWORKS USING K-RLE ALGORITHM FOR A LOW POWER DATA COMPRESSION

FPGA based Data Compression using Dictionary based LZW Algorithm

Image Coding and Compression

Overview. Last Lecture. This Lecture. Next Lecture. Data Transmission. Data Compression Source: Lecture notes

Lossless compression II

Basic Compression Library

Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay

VIDEO SIGNALS. Lossless coding

Compression; Error detection & correction

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201

VC 12/13 T16 Video Compression

Lempel-Ziv-Welch (LZW) Compression Algorithm

Without the move-to-front step abcddcbamnopponm is encoded as C* = (0, 1, 2, 3, 3, 2, 1, 0, 4, 5, 6, 7, 7, 6, 5, 4) (Table 1.14b).

Figure-2.1. Information system with encoder/decoders.

TEXT COMPRESSION ALGORITHMS - A COMPARATIVE STUDY

An undirected graph is a tree if and only of there is a unique simple path between any 2 of its vertices.

Image compression. Stefano Ferrari. Università degli Studi di Milano Methods for Image Processing. academic year

7.5 Dictionary-based Coding

An Overview 1 / 10. CS106B Winter Handout #21 March 3, 2017 Huffman Encoding and Data Compression

Procedural Compression: Efficient, Low Bandwidth Remote Android Graphics

Improving LZW Image Compression

Simple variant of coding with a variable number of symbols and fixlength codewords.

Chapter 5 VARIABLE-LENGTH CODING Information Theory Results (II)

Data and information. Image Codning and Compression. Image compression and decompression. Definitions. Images can contain three types of redundancy

EE 368. Weeks 5 (Notes)

Distributed source coding

CIS 121 Data Structures and Algorithms with Java Spring 2018

A Novel Image Compression Technique using Simple Arithmetic Addition

LZW Compression. Ramana Kumar Kundella. Indiana State University December 13, 2014

Image Coding and Data Compression

Transcription:

An overview of Compression Multimedia Systems and Applications Data Compression Compression becomes necessary in multimedia because it requires large amounts of storage space and bandwidth Types of Compression Lossless compression = data is not altered or lost in the process Lossy = some info is lost but can be reasonably reproduced using the data. James Wang 2 Binary Image Compression RLE (Run Length Encoding) Also called Packed Bits encoding E.g. aaaaaaaaaaaaaaaaaaaa0000000 Can be coded as: Byte Byte 2 Byte3 Byte4 Byte5 Byte6 20 a 05 07 0 This is a one dimensional scheme. Some schemes will also use a flag to separate the data bytes Binary Image Compression Disadvantage of RLE scheme: When groups of adjacent pixels change rapidly, the run length will be shorter. It could take more bits for the code to represent the run length that the uncompressed data negative compression. It is a generalization of zero suppression, which assumes that just one symbol appears particularly often in sequences. http://astronomy.swin.edu.au/~pbourke/dataformats/rle/ http://datacompression.info/rle.shtml 3 http://www.data-compression.info/algorithms/rle/ 4 5 Lossless Compression Algorithms (Entropy Encoding) Adapted from: http://www.cs.cf.ac.uk/dave/multimedia/node207.html 6 Basics of Information Theory According to Shannon, the entropy of an information source S is defined as: where p i is the probability that symbol S i in S will occur. log 2 indicates the amount of information pi contained in S i, i.e., the number of bits needed to code S i. H( S) pi log2 i pi For example, in an image with uniform distribution of grey-level intensity, i.e. p i = /256, then the number of bits needed to code each grey level is 8 bits. The entropy of this image is 8.

The Shannon-Fano Algorithm A simple example will be used to illustrate the algorithm: Symbol A B C D E Count 5 7 6 6 5 The Shannon-Fano Algorithm Encoding for the Shannon-Fano Algorithm: A top-down approach. Sort symbols according to their frequencies/probabilities, e.g., ABCDE. 2. Recursively divide into two parts, each with approx. same number of counts. 7 8 MORE FREQUENT LESS BITS, LESS FREQUENT MORE BITS The Shannon-Fano Algorithm Huffman Coding Encoding for Huffman Algorithm: 9 Symbol Count log 2 (/p i ) Code Subtotal (# of bits) =Count X Code Size A 5.38 00 30 B 7 2.48 0 4 C 6 2.70 0 2 D 6 2.70 0 8 E 5 2.96 5 TOTAL (# of bits): 89 0 A bottom-up approach. Initialization: Put all nodes in an OPEN list, keep it sorted at all times (e.g., ABCDE). 2. Repeat until the OPEN list has only one node left: (a) From OPEN pick two nodes having the lowest frequencies/probabilities, create a parent node of them. (b) Assign the sum of the children's frequencies/probabilities to the parent node and insert it into OPEN. (c) Assign code 0, to the two branches of the tree, and delete the children from OPEN. Huffman Coding Huffman Coding Symbol Count log 2 (/p i ) Code Subtotal (# of bits) =Count X Code Size A 5.38 0 5 B 7 2.48 00 2 C 6 2.70 0 8 D 6 2.70 0 8 E 5 2.96 5 2 TOTAL (# of bits): 87 2

Huffman Coding Discussions: Motivations: 3 Decoding for the above two algorithms is trivial as long as the coding table (the statistics) is sent before the data. (There is a bit overhead for sing this, negligible if the data file is big.) Unique Prefix Property: no code is a prefix to any other code (all symbols are at the leaf nodes) --> great for decoder, unambiguous. If prior statistics are available and accurate, then Huffman coding is very good. In the above example: entropy = (5 x.38 + 7 x 2.48 + 6 x 2.7 + 6 x 2.7 + 5 x 2.96) / 39 = 85.26 / 39 = 2.9 Number of bits needed for Human Coding is: 87 / 39 = 2.23 4 (a) The previous algorithms require the statistical knowledge which is often not available (e.g., live audio, video). (b) Even when it is available, it could be a heavy overhead especially when many tables had to be sent when a non-order0 model is used, i.e. taking into account the impact of the previous symbol to the probability of the current symbol (e.g., "qu" often come together,...). The solution is to use adaptive algorithms. As an example, the is examined below. The idea is however applicable to other adaptive compression algorithms. ENCODER DECODER Initialize_model(); Initialize_model(); while ((c = getc (input))!= eof) while ((c = decode (input))!= eof) { { encode (c, output); encode (c, output); update_model (c); update_model (c); } } 5 6 7 Increasing weight Adaptive Huffman Tree 8 The key is to have both encoder and decoder to use exactly the same initialization and update_model routines. update_model does two things: (a) increment the count, (b) update the Huffman tree. During the updates, the Huffman tree will be maintained its sibling property, i.e. the nodes (internal and leaf) are arranged in order of increasing weights (see figure). When swapping is necessary, the farthest node with weight W is swapped with the node whose weight has just been increased to W+. Note: If the node with weight W has a sub tree beneath it, then the sub tree will go with it. The Huffman tree could look very different after node swapping, e.g., in the third tree, node A is again swapped and becomes the #5 node. It is now encoded using only 2 bits. 3

W+ W (A) (D) 9 Increasing weight Adaptive Huffman Tree 20 Note: Code for a particular symbol changes during the adaptive coding process. Lempel-Ziv-Welch Algorithm Motivation: Suppose we want to encode the Webster's English dictionary which contains about 59,000 entries. Why not just transmit each word as an 8 bit number? Problems: (a) Too many bits, (b) everyone needs a dictionary, (c) only works for English text. Solution: Find a way to build the dictionary adaptively. Original methods due to Ziv and Lempel in 977 and 978. Terry Welch improved the scheme in 984 (called LZW compression). It is used in e.g., UNIX compress, GIF, V.42 bis. Note: Code for a particular symbol changes during the adaptive coding process. Reference: Terry A. Welch, "A Technique for High Performance Data Compression", IEEE Computer, Vol. 7, No. 6, 984, pp. 8-9. 2 22 23 : w = NIL; while ( read a character k ) { if wk exists in the dictionary w = wk; else { add wk to the dictionary(so wk is stored); output the code for w; w = k; } } 24 2 3 4 5 6 Original LZW used dictionary with 4K entries, first 256 (0-255) are ASCII codes. Example: Input string is "^WED^WE^WEE^WEB^WET". Steps: w k Output Index Symbol NIL ^ ^ W ^ 256 ^W W E W 257 WE E D E 258 ED D ^ D 259 D^ 4

7 8 9 0 ^W E 256 260 ^WE E ^ E 26 E^ ^ W ^W E ^WE E 260 262 ^WEE 25 26 Steps:- 6 ^ W Steps:- 2 E ^ 3 E^ W 26 263 E^W 4 W E 5 WE B 257 264 WEB 6 B ^ B 265 B^ Steps:- 7 ^ W 8 ^W E 9 ^WE T 260 266 ^WET 20 T EOF T A 9-symbol input has been reduced to 7-symbol plus 5-code output. Each code/symbol will need more than 8 bits, say 9 bits. Usually, compression doesn't start until a large number of bytes (e.g., > 00) are read in 27 28 LZW Decompression Algorithm: read a character k; output k; w = k; while ( read a character k ) /* k could be a character or a code. */ { entry = dictionary entry for k; output entry; add w + entry[0] to dictionary; w = entry; } Example: Input string is "^WED<256>E<260><26><257>B<260>T". Steps: w k Output Index Symbol ^ ^ 2 ^ W W 256 ^W 3 W E E 257 WE 4 E D D 258 ED 5 D <256> ^W 259 D^ 29 30 5

Steps:- 6 <256> E E 260 ^WE 7 E <260> ^WE 26 E^ 8 <260> <26> E^ 262 ^WEE 9 <26> <257> WE 263 E^W 0 <257> B B 264 WEB B <260> ^WE 265 B^ 2 <260> T T 266 ^WET Problem: What if we run out of dictionary space? Solution : Keep track of unused entries and use LRU (Least Recently Used) Solution 2: Monitor compression performance and flush dictionary when performance is poor. Implementation Note: LZW can be made really fast; it grabs a fixed number of bits from input stream, so bit parsing is very easy. Table lookup is automatic. 3 32 33 Huffman vs. Arithmetic Code Lowest L ave for Huffman codes is. Suppose H <<? One option: use one code symbol for several source symbols Another option: Arithmetic code. Idea behind arithmetic code: Represent the probability of a sequence by a binary number. 34 Arithmetic Encoding Assume source alphabet has values 0 and, p 0 = p, p = p. A sequence of symbols s, s 2, s m is represented by a probability interval found as follows: Initialize, lo = 0; hi = For i = 0 to m if s i = 0 else hi = lo + (hi-lo)* p 0 lo = lo + (hi-lo)* p 0 S binary fraction x such that lo x < hi. This will require m x bits, where x log 2 P(s i ) i Arithmetic Encoding Arithmetic coding: example 35 Assume source alphabet has values 0 and, p 0 = p, p = p. A sequence of symbols s, s 2, s m is represented by a probability interval found as follows: Initialize, lo = 0; range = For i = 0 to m if s i = 0 range = range*p else // s i = lo = lo + range*p range = range*(-p) S binary fraction x such that lo x < hi. This will require log 2 range bits 36 p 0 = 0.2, source sequence is 0 bit low high 0 0.2 0.36 0 0.36 0.488 0.3856 0.488 Number of bits = ceiling(-log 2 (0.024)) = 4 Bits sent: 0 p 0 = 0.2, source sequence is 0 symbol low range 0 0.2000 0.8000 0.3600 0.6400 0 0.3600 0.280 0.3856 0.024 0 + (-0)*0.2 = 0.2 0.36 + (-0.36)*0.2 = 0.36 + 0.28 Number of bits = ceiling(-log 2 (0.024)) = 4 low 2 =.00000, (low+range) 2 =.000 Bits sent: 0 6

Arithmetic Decoding Arithmetic Decoding We receive x, a binary fraction lo = 0; hi = for i = to m We receive x, a binary fraction lo = 0; range = ; for i = to m if (x - lo) < p*(hi-lo) if (x - lo) < p*range si = 0 hi = lo + (hi-lo)*p si = 0 range = p*range else else 37 si = lo = lo + (hi-lo)*p m and p are sent to the decoder in the header. 38 si = lo = lo +range*p range = range*( - p) Arithmetic Decoding Arithmetic decoding example We receive x, a binary fraction for i = to m if x < p si = 0 x = x/p Else // x > p si = x = (x - p)/( - p) Receive x = 0 = 0.4375 p = 0.2 symbol x range 0.4375 0.2969 0.8000 0.2 0.6400 0 0.6055 0.280 0.5068 0.024 Receive 0 (0.4375), decode 4 bits, p0 = 0.2 x= 0.4375 low high bit 0 0.2 0.36 0 0.36 0.488 symbol low range 0 0.2000 0.8000 0.3600 0.6400 0 0.3600 0.280 0.3856 0.024 39 40 Magic Features of Arithmetic Coding Discussion on Arithmetic Coding Remember I (information) = - log 2 p p = 0.5, I = p = 0.25, I = 3 p = 0.99, I = 0.045 (wow!) High p symbol, less than code bit per symbol! In encoder, hi - lo = I(symbols) When the length of the original can not be predetermined, how does the decoder know where to stop? A special ing symbol, similar to EOF. Sing fixed length chunks. How do we determine the value for P? Based on estimating the probabilities of symbols to appear in the sequence. Can we encode data with symbols other than 0 and? 4 42 7

43 Conclusions Huffman maps fixed length symbols to variable length codes. Optimal only when symbol probabilities are powers of 2. Lempel-Ziv-Welch (LZW) is a dictionary-based compression method. It maps a variable number of symbols to a fixed length code. Adaptive algorithms do not need a priori estimation of probabilities, they are more useful in real applications. Arithmetic algorithms: Complexity: requires arithmetic (multiplications, divisions), rather than just table lookups Algorithms are complex, accuracy (significant bits) is tricky Can be made to operate incrementally Both encoder and decoder can output symbols with limited internal memory Provides important compression savings in certain settings 44 References The Data Compression Book, Mark Nelson,M&T Books, 995. Introduction to Data Compression, Khalid Sayood, Morgan Kaufmann, 996. 8