010.141 Engineering Mathematics II Lecture 16 Compression Bob McKay School of Computer Science and Engineering College of Engineering Seoul National University 1
Lossless Compression Outline Huffman & Shannon-Fano Arithmetic Compression The LZ Family of Algorithms Lossy Compression Fourier compression Wavelet Compression Fractal Compression 2
Lossless Compression Lossless encoding methods guarantee to reproduce exactly the same data as was input to them 3
Run Length Encoding Original Data String Encoded Data String $******55.72 $ *<6>55.72 --------- -<9> Guns Butter Guns <10>Butter 4
Relative Encoding Useful when there are sequences of runs of data that vary only slightly from one run to the next: eg the lines of a fax The position of each change is denoted relative to the start of the line Position indicator can be followed by a numeric count indicating the number of successive changes For further compression, the position of the next change can be denoted relative to the previous 5
Statistical Compression For the examples below, we will use a simple alphabet with the following frequencies of occurrence (after Held) Character Probability X1 0.10 X2 0.05 X3 0.20 X4 0.15 X5 0.15 X6 0.25 X7 0.10 6
Huffman Encoding Arrange the character set in order of decreasing probability While there is more than one probability class: Merge the two lowest probability classes and add their probabilities to obtain a composite probability At each branch of the binary tree, allocate a '0' to one branch and a '1' to the other The code for each character is found by traversing the tree from the root node to that character 7
8 Huffman Encoding Character X6 X3 X4 X5 X1 X7 X2 Probability 0.25 0.2 0.15 0.15 0.1 0.1 0.05 0.25 0.35 0.25 0.15 0.6 0.4 1.0 Character X6 X3 X4 X5 X1 X7 X2 Probability 00 010 011 100 101 110 111 0 1 0 1 0 1 0 1 0 1 0 1
Shannon-Fano Algorithm Arrange the character set in order of decreasing probability While a probability class contains more than one symbol: Divide the probability class in two so that the probabilities in the two halves are as nearly as possible equal Assign a '1' to the first probability class, and a '0' to the second 9
Shannon-Fano Encoding Character X6 X3 X4 X5 Probability 0.25 0.2 0.15 0.15 1 1 0 1 1 0 Code 11 10 011 010 X1 X7 X2 0.1 0.1 0.05 0 0 1 0 1 0 001 0001 0000 10
Arithmetic Coding Arithmetic coding assumes there is a model for statistically predicting the next character of the string to be encoded An order-0 model predicts the next symbol based on its probability, independent of previous characters For example, an order-0 model of English predicts the highest probability for e An order-1 model predicts the next symbol based on the preceding character For example, if the preceding character is q, then u is a likely next character And so on for higher order models ert erty, etc. 11
Arithmetic Coding Arithmetic coding assumes the coder and decoder share the probability table The main data structure of arithmetic coding is an interval, representing the string constructed so far Its initial value is [0,1] At each stage, the current interval [min,max] is subdivided into sub-intervals corresponding to the probability model for the next character The interval chosen will be the one representing the actual next character The more probable the character, the larger the interval The coder output is a number in the final interval 12
Arithmetic Coding Character Probability X1 0.10 X2 0.05 X3 0.20 X4 0.15 X5 0.15 X6 0.25 X7 0.10 13
Arithmetic Coding Suppose we want to encode the string X1X3X7 After X1, our interval is [0,0.1] After X3, it is [0.015,0.035] After X7, it is [0.033,0.035] The natural output to choose is the shortest binary fraction in [0.033,0.035] Obviously, the algorithm as stated requires infinite precision Slight variants re-normalise at each stage to remain within computer precision 14
Substitutional Compression The basic idea behind a substitutional compressor is to replace an occurrence of a particular phrase with a reference to a previous occurrence There are two main classes of schemes Named after Jakob Ziv and Abraham Lempel, who first proposed them in 1977 and 1978 15
LZW LZW is an LZ78-based scheme designed by T Welch in 1984 LZ78 schemes work by putting phrases into a dictionary when a repeat occurrence of a particular phrase is found, outputting the dictionary index instead of the phrase LZW starts with a 4K dictionary entries 0-255 refer to individual bytes entries 256-4095 refer to substrings Each time a new code is generated it means a new string has been parsed New strings are generated by adding current character K 16 to the end of an existing string w (until dictionary is full)
LZW Algorithm set w = NIL loop read a character K if wk exists in the dictionary w = wk else output the code for w add wk to the string table w = K endloop 17
LZW The most remarkable feature of this type of compression is that the entire dictionary has been transmitted to the decoder without actually explicitly transmitting the dictionary At the end of the run, the decoder will have a dictionary identical to the one the encoder has, built up entirely as part of the decoding process Codings in this family are behind such representations as.gif They were previously under patent, but many of the relevant patents are now expiring 18
Lossy Compression Lossy compression algorithms do not guarantee to reproduce the original input They achieve much higher compression by limiting their compression to what is near enough to be acceptably detectable Usually, this means detectable by a human sense - sight (jpeg), hearing (mp3), motion understanding (mp4) This requires a model of what is acceptable The model may only be accurate in some circumstances Which is why compressing a text or line drawing with jpeg is a bad idea 19
Fourier Compression (jpeg) The Fourier transform of a dataset is a frequency representation of that dataset You have probably already seen graphs of Fourier transforms the frequency diagram of a sound sample is a graph representation of the Fourier transform of the original data, which you see graphed as the original time/amplitude diagram 20
Fourier Compression From our point of view, the important features of the Fourier transform are: it is invertible original dataset can be rebuilt from the Fourier transform graphic images of the World usually contain spatially repetitive information patterns Human senses are (usually) poor at detecting low-amplitude visual frequencies The Fourier transform usually has information concentrated at particular frequencies, depleted at others The depleted frequencies can be transmitted at low precision without serious loss of overall information. 21
Discrete Cosine Transform A discretised version of the Fourier transform Suited to representing spatially quantised (ie raster) images in a frequency quantised (ie tabular) format. Mathematically, the DCT of a function f ranging over a discrete variable x (omitting various important constants) is given by F(n) = Σx f(x) cos(nπx) Of course, we re usually interested in two-dimensional images, and hence need the two-dimensional DCT, given (omitting even more important constants) by F(m,n) = Σx Σyf(x,y) cos(mπx) cos(nπy) 22
Fourier Compression Revisited Fourier-related transforms are based on sine (or cosine) functions of various frequencies The transform is a record of how to add together the periodic functions to obtain the original function Really, all we need is a basis set of functions A set of functions that can generate all others 23
The Haar Transform Instead of periodic functions, we could instead add together discrete functions such as: +--+ +------+ + +------------------ + +-------------- +--+ +------+ +--+ +------+ ------+ +------------ --------------+ + +--+ +------+ +--+ +-------------+ ------------+ +------ + + +--+ +-------------+ +--+ +---------------------------+ ------------------+ + + + +--+ This would give us the Haar transform It can also be used to compress image data, though not as efficiently as the DCT images compressed at the same rate as the DCT tend to look blocky, so higher compression is required to give 24 the same impression
Wavelet Compression Wavelet compression uses a basis set intermediate between Fourier and Haar transforms The functions are smoothed versions of the Haar functions They have a sinusoidal rather than square shape They don t die out abruptly at the edges They decay into lower amplitude Wavelet compression can give very high ratios attributed to similarities between wavelet functions and the edge detection present in the human retina wavelet functions encode just the detail that we see best 25
Vector Quantisation Relies on building a codebook of similar image portions Only one copy of the similar portions is transmitted Just as LZ compression relies on building a dictionary of strings seen so far just transmitting references to the dictionary 26
Fractal Compression Rely on self-similarity of (parts of) the image to reduce transmission It has a similar relation to vector quantisation methods as LZW has to LZ LZW can be thought of as LZ in which the dictionary is derived from the part of the text seen so far fractal compression can be viewed as deriving its dictionary from the portion of the image seen so far 27
Compression Times For transform encodings such as DCT or wavelet compression and decompression times are roughly comparable For fractal compression Compression takes orders of magnitude longer than decompression Difficult to find the right codebook Fractal compression is well suited where precanned images will be accessed many times over 28
Lossless Compression Summary Huffman & Shannon-Fano Arithmetic Compression The LZ Family of Algorithms Lossy Compression Fourier compression Wavelet Compression Fractal Compression 29
감사합니다 30