Ch. 2: Compression Basics Multimedia Systems

Similar documents
Ch. 2: Compression Basics Multimedia Systems

Multimedia Networking ECE 599

Chapter 7 Lossless Compression Algorithms

EE67I Multimedia Communication Systems Lecture 4

Data Compression. An overview of Compression. Multimedia Systems and Applications. Binary Image Compression. Binary Image Compression

Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Lecture 10 (Chapter 7) ZHU Yongxin, Winson

Data Compression Fundamentals

Fundamentals of Multimedia. Lecture 5 Lossless Data Compression Variable Length Coding

Lossless Compression Algorithms

Repetition 1st lecture

Multimedia Systems. Part 20. Mahdi Vasighi

IMAGE COMPRESSION. Image Compression. Why? Reducing transportation times Reducing file size. A two way event - compression and decompression

Entropy Coding. - to shorten the average code length by assigning shorter codes to more probable symbols => Morse-, Huffman-, Arithmetic Code

Intro. To Multimedia Engineering Lossless Compression

VC 12/13 T16 Video Compression

CS 335 Graphics and Multimedia. Image Compression

IMAGE PROCESSING (RRY025) LECTURE 13 IMAGE COMPRESSION - I

IMAGE COMPRESSION- I. Week VIII Feb /25/2003 Image Compression-I 1

Digital Image Processing

ECE 499/599 Data Compression & Information Theory. Thinh Nguyen Oregon State University

Compression; Error detection & correction

Encoding. A thesis submitted to the Graduate School of University of Cincinnati in

Chapter 1. Digital Data Representation and Communication. Part 2

A Research Paper on Lossless Data Compression Techniques

A Comparative Study of Entropy Encoding Techniques for Lossless Text Data Compression

Data Compression. Guest lecture, SGDS Fall 2011

Engineering Mathematics II Lecture 16 Compression

SIGNAL COMPRESSION Lecture Lempel-Ziv Coding

15 Data Compression 2014/9/21. Objectives After studying this chapter, the student should be able to: 15-1 LOSSLESS COMPRESSION

IMAGE COMPRESSION TECHNIQUES

Digital Image Processing

Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay

7.5 Dictionary-based Coding

Optimized Compression and Decompression Software

Data Compression. Media Signal Processing, Presentation 2. Presented By: Jahanzeb Farooq Michael Osadebey

IMAGE COMPRESSION. Chapter - 5 : (Basic)

Department of electronics and telecommunication, J.D.I.E.T.Yavatmal, India 2

IMAGE COMPRESSION TECHNIQUES

CMPT 365 Multimedia Systems. Media Compression - Image

Data Storage. Slides derived from those available on the web site of the book: Computer Science: An Overview, 11 th Edition, by J.

Image coding and compression

DigiPoints Volume 1. Student Workbook. Module 8 Digital Compression

ECE 417 Guest Lecture Video Compression in MPEG-1/2/4. Min-Hsuan Tsai Apr 02, 2013

David Rappaport School of Computing Queen s University CANADA. Copyright, 1996 Dale Carnegie & Associates, Inc.

Fundamentals of Video Compression. Video Compression

Lempel-Ziv-Welch (LZW) Compression Algorithm

A study in compression algorithms

Image Coding and Compression

Analysis of Parallelization Effects on Textual Data Compression

ITCT Lecture 8.2: Dictionary Codes and Lempel-Ziv Coding

A QUAD-TREE DECOMPOSITION APPROACH TO CARTOON IMAGE COMPRESSION. Yi-Chen Tsai, Ming-Sui Lee, Meiyin Shen and C.-C. Jay Kuo

15 July, Huffman Trees. Heaps

Bits and Bit Patterns

Compression I: Basic Compression Algorithms

Indexing. CS6200: Information Retrieval. Index Construction. Slides by: Jesse Anderton

Source Coding Basics and Speech Coding. Yao Wang Polytechnic University, Brooklyn, NY11201

Compression; Error detection & correction

A New Compression Method Strictly for English Textual Data

MCS-375: Algorithms: Analysis and Design Handout #G2 San Skulrattanakulchai Gustavus Adolphus College Oct 21, Huffman Codes

Lecture 6: Compression II. This Week s Schedule

Lecture 8 JPEG Compression (Part 3)

FPGA based Data Compression using Dictionary based LZW Algorithm

Lecture 8 JPEG Compression (Part 3)

2.2: Images and Graphics Digital image representation Image formats and color models JPEG, JPEG2000 Image synthesis and graphics systems

More Bits and Bytes Huffman Coding

Digital Image Representation. Image Representation. Color Models

14.4 Description of Huffman Coding

EE-575 INFORMATION THEORY - SEM 092

Compression. storage medium/ communications network. For the purpose of this lecture, we observe the following constraints:

Data and information. Image Codning and Compression. Image compression and decompression. Definitions. Images can contain three types of redundancy

Wireless Communication

Data compression with Huffman and LZW

CoE4TN4 Image Processing. Chapter 8 Image Compression

VIDEO SIGNALS. Lossless coding

Noise Reduction in Data Communication Using Compression Technique

So, what is data compression, and why do we need it?

7: Image Compression

ENSC Multimedia Communications Engineering Topic 4: Huffman Coding 2

CS/COE 1501

A Comparative Study of Lossless Compression Algorithm on Text Data

Chapter 1. Data Storage Pearson Addison-Wesley. All rights reserved

CS/COE 1501

Topic 5 Image Compression

CIS 121 Data Structures and Algorithms with Java Spring 2018

Basic Compression Library

Lossless compression II

Digital Communication Prof. Bikash Kumar Dey Department of Electrical Engineering Indian Institute of Technology, Bombay

Improving LZW Image Compression

WIRE/WIRELESS SENSOR NETWORKS USING K-RLE ALGORITHM FOR A LOW POWER DATA COMPRESSION

CS 493: Algorithms for Massive Data Sets Dictionary-based compression February 14, 2002 Scribe: Tony Wirth LZ77

CS106B Handout 34 Autumn 2012 November 12 th, 2012 Data Compression and Huffman Encoding

Dictionary techniques

Data Compression Techniques

Data Compression Techniques

Data Compression 신찬수

An Advanced Text Encryption & Compression System Based on ASCII Values & Arithmetic Encoding to Improve Data Security

A Comprehensive Review of Data Compression Techniques

Image Coding and Data Compression

Image Compression - An Overview Jagroop Singh 1

International Journal of Trend in Research and Development, Volume 3(2), ISSN: A Review of Coding Techniques in the Frequency

Transcription:

Ch. 2: Compression Basics Multimedia Systems Prof. Ben Lee School of Electrical Engineering and Computer Science Oregon State University Outline Why compression? Classification Entropy and Information Theory Lossless compression techniques Run Length Encoding Variable Length Coding Dictionary-Based Coding Lossy encoding Chapter 2: Compression Basics 2

Why Compression? Uncompressed MB,, Compressed 37G, 342M.G 62M 8M 4.5M 53M M 4.9M 46M 468K.6M 6:... 4K 2K 2: 32K 5: 79K 2: 4: 5: 7: 25: Text Fax Computer 35MM Image Slide SVGA Quality Stereo Audio CD- Videoconf - erence ISDN Video- Quality TV- Quality Page Page Image Slide 5 Minute Presentation HDTV High- Quality Complexity Example HD quality video: 92 8 pixel frame 3 bytes per pixel (Red, Green, Blue) 6 fps Bandwidth requirement: 373 MB/second => 2.99 Gbps Storage requirement 4.9 GB DVD would hold 3. seconds worth 2 minutes would require 2.69 TB! Some form of compression required! Chapter 2: Compression Basics 4 2

Outline Why compression? Classification Entropy and Information Theory Lossless compression techniques Run Length Encoding Variable Length Coding Dictionary-Based Coding Lossy encoding Chapter 2: Compression Basics 5 Classification () Compression is used to save storage space, and/or to reduce communications capacity requirements. Classification Lossless: No information is lost. Decompressed data are identical to the original uncompress data. Essential for text files, databases, binary object files, etc. Lossy: Approximation of the original data. Better compression ratio than lossless compression. Tradeoff between compression ratio and fidelity. e.g., audio, image, and video. 3

Classification (2) Entropy Coding Lossless encoding Used regardless of media s specific characteristics Data taken as a simple digital sequence Decompression process regenerates data completely e.g., run-length coding, Huffman coding, Arithmetic coding Source Coding Lossy encoding Takes into account the semantics of the data Degree of compression depends on data content e.g., content prediction technique - DPCM, delta modulation Hybrid Coding (used by most multimedia systems) Combine entropy with source encoding e.g., JPEG, MPEG-, MPEG-2, MPEG-4, H.264, H.265. Outline Why compression? Classification Entropy and Information Theory Lossless compression techniques Run Length Encoding Variable Length Coding Dictionary-Based Coding Lossy encoding Chapter 2: Compression Basics 8 4

Entropy and Information Theory Exploit statistical redundancies in the data to be compressed. The average information per symbol X, called entropy H, is defined as H(X) = p i log 2 i p i where, p i is the probability of the i th distinct symbol. Entropy represents the lower bound for lossless compression: i.e., when the probability of each symbol is fixed, it needs to be represented with at least with H bits on average. Chapter 2: Compression Basics 9 Entropy The closer the compressed bits per symbol to H, the better the compression. e.g., an image with uniform distribution of graylevel intensity, i.e., p i = /256, then the number of bits needed to code each gray level is 8 bits. The entropy of this image is 8. In general, the length of code i, L i, satisfies log 2 (/P i ) L i log 2 (/P i ) + Thus, average length of a code word, E[L], satisfies H(X) E[L] H(X) + 5

Entropy and Compression In compression, we are interested in constructing an optimum code, i.e., one that gives the minimum average length for the encoded message. Easy if the probability of occurrence for all the symbols are the same, e.g., last example. But if symbols have different probability of occurrence, e.g., Alphabet. See Huffman and Arithmetic coding. Chapter 2: Compression Basics Outline Why compression? Classification Entropy and Information Theory Lossless compression techniques Run Length Encoding Variable Length Coding Dictionary-Based Coding Lossy encoding Chapter 2: Compression Basics 2 6

Lossless Compression Techniques Run length encoding (RLE) Variable Length Coding (VLC) Huffman encoding Arithmetic encoding Dictionary-Based Coding Lempel-Ziv-Welch (LZW) encoding Chapter 2: Compression Basics 3 Run-Length Encoding Redundancy is exploited by not transmitting consecutive pixels or character values that are equal. The value of long runs can be coded once, along with the number of times the value is repeated (length of the run). Particularly useful for encoding black and white images where the data units would be single bit pixels, e.g., fax. Methods Binary RLE PackBits Chapter 2: Compression Basics 4 7

Binary RLE Transmit length of each run of s: Do not transmit runs of s. Two consecutive s are implicitly separated by a zerolength run. e.g., suppose 4 bits are used to represent the run length............... 9 bits 4 9 2 3 4 bits! All s indicate the next group is part of the same run What if stream started with? It would start with! Chapter 2: Compression Basics 5 Binary RLE Performance Worst case behavior: Transition occurs on each bit. Since we use k bits to encode each transition, k-of those bits are wasted. Best case behavior: No transition. k bits are used to represent a sequence of 2 k - consecutive bits. e.g., if n = 8, 2 k - = 255 Compression ratio is 2 k - => 255:8 => 32: Chapter 2: Compression Basics 6 8

RLE - PackBits Introduced by Apple. One of the standard forms of compression available in TIFF format (TIFF compression type 32773). TIFF = Tag Image File format. Each run is encoded into bytes as follows: Count Value... Value n- Header Byte Trailer Bytes : Literal run : Fill run Chapter 2: Compression Basics 7 PackBits - Literal Run Header byte: MSB = and 7 bits describe the length of the run of literal bytes. Trailer bytes: Characters are stored verbatim. Example: Mary had a little lamb <22>, M, a, r, y,, h, a, d,, a,, l, i, t, t, l, e,, l, a, m, b Chapter 2: Compression Basics 8 9

PackBits - Fill Run Header byte: MSB = and 7 bits describe the length of the run. Trailer byte: Byte that should be copied Example: AAAABBBBB <28+4>A<28+5>B What about for ABCCCCCCCCCDEFFFFGGG? <29>A<29>B<37>C<29>D<29>E<32>F<3>G Chapter 2: Compression Basics 9 PackBits Performance Worst case behavior: If entire image is transmitted without redundancy, the overhead is a byte (for the header byte) for each 28 bytes. Alternating runs add more overhead, up to 25%. Poorly designed encoder could do worse. Best case behavior: If entire image is redundant, then 2 bytes are used for each 28 consecutive bytes. This provides a compression ratio of 64:. Achieves a typical compression ratio of 2: to 5:. Chapter 2: Compression Basics 2

Huffman Coding Input values are mapped to output values of varying length, called variable-length code (VLC): Most probable inputs are coded with fewer bits. No code word can be prefix of another code word. Chapter 2: Compression Basics 2 Huffman Code Construction Initialization: Put all nodes (values, characters, or symbols) in an sorted list. Repeat until the list has only one node: Combine the two nodes having the lowest frequency (probability). Create a parent node assigning the sum of the children s probabilities and insert it in the list. Assign code, to the two branches, and delete children from the list. Chapter 2: Compression Basics 22

Example of Huffman Coding P() =.5 P() =.2 P() =. P() =.8 P() =.5 P() =.4 P() =.3.5.2..8.7.5.5.2.2..8.5.2.8.2.5.3.2.5.5. Chapter 2: Compression Basics 23 Huffman Decoding P() =.5 P() =.2 P() =. P() =.8 P() =.5 P() =.4 P() =.3.5.2..8.7.5.5.2.2..8.5.2.8.2.5.3.2.5.5. Chapter 2: Compression Basics 24 2

Efficiency of Huffman Coding Entropy H of the previous example:.5 +.2 2.32 +. 3.32 +.8 3.64 +.5 4.32 +.4 4.64 +.3 5.6 = 2.42 Average number of bits E[L] used to encode a symbol:.5 + 2.2 + 4. + 4.8 +4.5 + 5.4 + 5.3 = 2.7 Efficiency (2.42/2.7) of this code is.987. Very close to optimum! If we had simply used 3 bits: 3 (.5 +.2 +. +.8 +.5 +.4 +.3) = 3 So Huffman code is better! Chapter 2: Compression Basics 25 Adaptive Huffman Coding The previous algorithm requires the statistical knowledge, which is often not available (e.g., live audio, video) Even if it s available, it could be heavy overhead if the coding table is sent before the data: Negligible if the data file is large. Solution - adjust the Huffman tree on the fly, based on data previously seen. Encoder initial_code(); while ((c=getc(input))!=eof { } encode (c,output); update_tree(c); Decoder initial_code(); while ((c=decode(input))!=eof){ putc (c,output); update_tree(c); } Chapter 2: Compression Basics 26 3

Adaptive Huffman Coding initial_code() assigns symbols with some fixed codes. e.g., ASCII update_tree() constructs an adaptive Huffman tree: Huffman tree must maintain its sibling property: All nodes (internal and leaf) are arranged in the order of increasing counts => from left to right, bottom to top. When swap is necessary, the farthest nodes with count N is swapped with the nodes whose count has just been increased to N+. May need to be performed multiple times. Chapter 2: Compression Basics 27 Adaptive Huffman Coding () 5. w=3 7. w=7 9. 7 6. w=4 Hit on A A B C D. 2. w=2 3.w=2 4.w=2 E 8. Sibling Property Increasing order of weights 9. 8 5. w=4 7. w=8 6. w=4 E 8. A B C D. w=2 2. w=2 3.w=2 4.w=2 Chapter 2: Compression Basics 28 4

Adaptive Huffman Coding (2) 2nd hit on A 5. w=4 7. w=8 9. 8 6. w=4 Swap A B C D. w=2+ 2. w=2 3.w=2 4.w=2 E 8. Farthest from A with count = N = 2 9. 9 7. w=9 5. w=4 6. w=5 D B C A. w=2 2. w=2 3.w=2 4.w=3 E 8. Encoded symbols: A=, B=, C= D =, E = Chapter 2: Compression Basics 29 Adaptive Huffman Coding (3) 7. A 5. w=5 5. w=4 9. 9 7. w=9 Swap 9. 9 6. w=5 D B C A. w=2 2. w=2 3.w=2 4.w=3+2 Swap 6. w=6 4. w=4 C 3. w=2 D B. w=2 2. w=2 E 8. E 8. 2 more hits on A Encoded symbols: A=, B=, C= D =, E = 9. w=2 7. E 8. 6. w=6 A 5. w=5 4. w=4 C 3. w=2 D B. w=2 2. w=2 Chapter 2: Compression Basics 3 5

Adaptive Huffman Coding - Initialization Initially there is no tree. Symbols sent for the first time use fixed code: e.g., ASCII Any new node spawned from New. Except for the first symbol, any symbol sent afterwards for the first time is preceded by a special symbol New. Count for New is always. Code for symbol New depends on the location of the node New in the tree. Code for symbol sent for the first time is code for New + fixed code for symbol All other codes are based on the Huffman tree. Chapter 2: Compression Basics 3 Initialization - Example Suppose we are sending AABCDAD Fixed code: A =, B =, C =, D =, New = Initially, there is no tree. Send AABCDA Output = Decode = A Update tree for A Update tree for A New w= A New w= A Send AABCDA Update tree for A Output = Decode = A Update tree for A w=2 w=2 New w= A w=2 New w= A w=2 Chapter 2: Compression Basics 32 6

Initialization - Example Send AABCDA Update tree for B w=3 A w=2 New w= B Output = Decode = B Update tree for B w=3 A w=2 New w= B Send AABCDA Update tree for C w=4 w=2 A w=2 B New w= C Output = Decode = C Update tree for C w=4 w=2 A w=2 B New w= C Chapter 2: Compression Basics 33 Initialization - Example Send AABCDA Update tree for D Output = w=5 w=3 A w=2 w=2 B C New D w= Decode = D Update tree for D w=5 w=3 A w=2 w=2 B C New D w= Exchange w=5 w=3 A w=2 w=2 B C New D w= Exchange w=5 w=3 A w=2 w=2 B C New D w= Chapter 2: Compression Basics 34 7

Initialization - Example Send AABCDA Update tree for A Output = w=6 w=3 A w=3 w=2 B C New D w= Decode = A Update tree for A w=6 w=3 A w=3 w=2 B C New D w= Final transmitted code A A B C D A Chapter 2: Compression Basics 35 Arithmetic Coding Huffman coding is optimal only when the symbol probabilities are integral powers of /2, which rarely is the case. Each symbol is coded by considering prior data: Coding efficiency can be improved when symbols are combined. Yields a single code word for each string of symbols. Each symbol (or string of symbols) is a fractional number. Arithmetic encoding provides slightly better coding efficiency than Huffman. Chapter 2: Compression Basics 36 8

Arithmetic Coding Algorithm BEGIN low =.; high =.; range =.; while (symbol!= terminator) { low = low + range * Range_low(symbol); high = low + range * Range_high(symbol); range = high - low; } output a code so that low <= code < high; END Chapter 2: Compression Basics 37 Arithmetic Coding Example. Example: P(a) =.5, P(b) =.25, P(c) =.25 Coding the sequence b,a,a,c b a a c... a.5.25 b c.5.375.25.5.4375.375.25.5.4375 [.4375,.45325).25 Binary rep. of final interval [.,.).25 P(a)=.25.25 P(a)=.625.625 P(c)=.563 Chapter 2: Compression Basics 38 9

Arithmetic Coding Size of the final subinterval, i.e., range s, is.4375 -.45325 =.5625, which is also determined by the product of the probabilities of the source message, P(b) P(a) P(a) P(c). The number of bits needed to specify the final subinterval is given by at most log 2 s = 6 bits. Choose the smallest binary fraction that lies within [.,.). Code is (from.)! Entropy H is.5() +.25(2) +.25(2) =.5 bits/symbol 4 symbols encoded with 6 bits gives.5 bits/symbol! Fixed length code would have required 2 bits/symbol. Chapter 2: Compression Basics 39 Code Generator for Encoder BEGIN code = ; k = ; while (value(code) < low) { assign to the kth binary fraction bit; if (value(code) >= high) replace the kth bit by ; k = k + ; } END Chapter 2: Compression Basics 4 2

Code Generator Example For the previous example low =.4375 and high =.45325. k = : NIL < low,. (.5) > high => continue (.) k = 2:. (.) < low,. (.25) < high => continue (.) k = 3:. (.25) < low,. (.375) < high => continue (.) k = 4:. (.375) < low,. (.4375) < high => continue (.) K =5:. (.4375) = low => terminate Output code is (from.) Chapter 2: Compression Basics 4 Decoding BEGIN get binary code and convert to decimal value = value(code); do { find a symbol so that Range_low(symbol) <= value < Range_high(symbol) output symbol; low = Range_low(symbol); high = Range_high(symbol); range = high - low; value = [value - low] / range } until symbol is a terminator END Chapter 2: Compression Basics 42 2

Decoding Example. = b. = a. = a. = c a.75 a a a.5.5.5.5.4375 b b b b.25.25.25.25 c c c c.4375 [.4375-.25]/P(b) =.75 [.75-.5]/P(a) =.5 [.5-.5]/P(a) = Chapter 2: Compression Basics 43 Arithmetic Coding Issues Need to know when to stop, e.g., in the previous example, can be c, cc, ccc, etc.: A special character is included to signal end of message. Precision can grow without bound as length of message grows: Fixed precision registers are used with detection and management of overflow and underflow. Chapter 2: Compression Basics 44 22

Lempel-Ziv-Welch Coding Patterns that occur frequently can be substituted by single byte: e.g. begin, end, if Dynamically builds a dictionary of phrases from the input stream (in addition to the dictionary of 256 ASCII characters). Checks to see if a phase is recorded in the dictionary: If not, add the phase in the dictionary. Otherwise, output a code for that phrase. Can be applied to still images, audio, video. A good choice for compressing text files: Used in GIF,.Z, and.gzip compression algorithms. Chapter 2: Compression Basics 45 LZW Compression Example: Input string is "^WED^WE^WEE^WEB^WET". s = first input character while ( not EOF ) { } c = next input character; if s+c exists in the dictionary else { { s = s+c; output the code for s; add s+c to the dictionary; s = c; output code for s; 9-symbol input has been reduced to 7-symbol + 5-code output! s c Output Index Symbol ^ W ^ 256 ^W W E W 257 WE E D E 258 ED D ^ D 259 D^ ^ W ^W E 256 26 ^WE E ^ E 26 E^ ^ W ^W E ^WE E 26 262 ^WEE E ^ E^ W 26 263 E^W W E WE B 257 264 WEB B ^ B 265 B^ ^ W ^W E ^WE T 26 266 ^WET T EOF T Chapter 2: Compression Basics 46 23

LZW Decompression Example: Compressed string is "^WED<256>E<26><26><257>B<26>T". s = NIL While (not EOF) { k = next input code; entry = dictionary entry for k; output entry; if (s!= NIL) add s + entry[] to dictionary; s = entry; } s k Output Index Symbol NIL ^ ^ ^ W W 256 ^W W E E 257 WE E D D 258 ED D <256> ^W 259 D^ <256> E E 26 ^WE E <26> ^WE 26 E^ <26> <26> E^ 262 ^WEE <26> <257> WE 263 E^W <257> B B 264 WEB B <26> ^WE 265 B^ 26 T T 266 ^WET Decompressor builds its own dictionary, so only the code needs to be sent! Chapter 2: Compression Basics 47 LZW Algorithm Details () Example: Input string is ABABBABCABBABBAX. Compressed string is AB<256><257>BC<258><262>X. s c Output Index Symbol A B A 256 AB B A B 257 BA A B AB B 256 258 ABB B A BA B 257 259 BAB B C B 26 BC C A C 26 CA A B AB B ABB A 258 262 ABBA A B AB B ABB A ABBA X 262 263 ABBAX X X => s k Output Index Symbol NIL A A A B B 256 AB B <256> AB 257 BA <256> <257> BA 258 ABB <257> B B 259 BAB B C C 26 BC C <258> ABB 26 CA <258> <262>??? 262??? <262> = ABBA =A + BB + A Not yet in the dictionary! Whenever encoder encounters Character + String + Character, it creates a new code and use it right away before the decoder has a chance to create it! Chapter 2: Compression Basics 48 24

LZW Algorithm Details (2) What if Input string is ABABBABCABBXBBAX. Compressed string is AB<256><257>BC<258>XB<257>X. s c Output Index Symbol A B A 256 AB B A B 257 BA A B AB B 256 258 ABB B A BA B 257 259 BAB B C B 26 BC C A C 26 CA A B AB B ABB X 258 262 ABBX X B X 263 XB B B B 264 BB B A BA X 257 265 BAX X X => s k Output Index Symbol NIL A A A B B 256 AB B <256> AB 257 BA <256> <257> BA 258 ABB <257> B B 259 BAB B C C 26 BC C <258> ABB 26 CA <258> X X 262 ABBX X B B 263 XB B <257> BA 264 BB <257> X X 265 BAX No problem! Chapter 2: Compression Basics 49 Modified LZW Decompression s = NIL While (not EOF) { } k = next input code; entry = dictionary entry for k; /* Exception Handler */ if (entry == NULL) entry = s + s[]; output entry; if (s!= NIL) add s + entry[] to dictionary; s = entry; Compressed string is AB<256><257>BC<258><262>X. s k Output Index Symbol A A A B B 256 AB B <256> AB 257 BA <256> <257> BA 258 ABB <257> B B 259 BAB B C C 26 BC C <258> ABB 26 CA <258> <262> ABBA 262 ABBA ABBA X X 263 ABBAX ABB + A s s[]; first character of s Chapter 2: Compression Basics 5 25

Outline Why compression? Classification Entropy and Information Theory Lossless compression techniques Run Length Encoding Variable Length Coding Dictionary-Based Coding Lossy encoding Chapter 2: Compression Basics 5 Lossy Encoding Coding is lossy: Consider sequences of symbols S, S2, S3, etc., where values are not zeros but do not vary very much. Calculate difference from previous value => S, S2-S, S3-S2, etc. e.g., Still image Calculate difference between nearby pixels or pixel groups. Edges characterized by large values, areas with similar luminance and chrominance are characterized by small values. Zeros can be compressed by run-length encoding and nearby pixels with large values can be encoded as differences. 26

Questions? 27