Lecture 12: Compression The Digital World of Multimedia Prof. Mari Ostendorf
Announcements Lab3: Finish this week Lab 4: Finish *at least* parts 1-2 this week Read the lab *before* lab You probably need to spend some time outside of lab to finish by next week Reminder: Books are on reserve in the engineering library. HW 4: Involves reading online articles don t wait until the last minute
Lab 4: Image Processing & Photo Mosaic Mosaic: Replace KxK block in the original with a KxK block from a picture set, choosing the one with the most similar color properties (3-dimensional distance for each region of the block)
Goals for Today Basics of Compression Lossless compression Orsak et al. 6.1-6.4 Cyganski & Orr Chpt 7 Image compression Cyganski & Orr Chpt 8
COMPRESSION What is compression? Representing an signal (sound, image or video) with fewer total bits Compression ratio = (# bits in original)/(# bits in compressed image) Why bother? Storage: You can fit more songs & videos on your IPOD Communication: faster downloads; real-time video conferencing Compression involves: Encoding: typically transform to a form that differs from normal audio/image presentation Decoding: undo the transformations done in encoding Compress, Encode Storage or Communication Decompress, Decode
Example: JPEG Encoding/Compression Transform: Encode Divide the image into blocks of 8x8 pixels Storage or Comm. Perform the discrete cosine transform (DCT) on each block Quantize the coefficients in each block (lossy step) Lossless compression Reorder according to increasing spatial frequency Use entropy (or arithmentic) coding on the resulting values Decoding/Decompression Undo lossless coding & reordering Reconstruct the signal Perform inverse DCT on quantized coefficients for each blocks Put the blocks back together Decode
Two Types of Compression Lossless compression (fewer bits, same sound/image) Takes advantage of redundancy (the next word in a sequence is more predictable when given the previous words) relative frequency (some words occur more than others) Lossy compression (fewer bits, some differences from the original) Takes advantage of human perception Usually gives better compression ratio
Lossy vs. Lossless Compression? Lossy: good for signals that humans perceive (sound, images, video) Lossless: good for written documents, financial data Of course, often you use both! (as in mpeg & jpeg) What about medical signals, biometric signals, or other signals to be analyzed by the doctor or a computer for decision making? Practically possible to use lossy compression, but what about legal issues (malpractice suits) Changes in technology could change the choice of what to keep
Many Meanings of Frequency Audio signal processing: Frequency content Time resolution: sampling frequency (sampling rate = 1/sampling interval in samples/sec) Image processing: Frequency content (spatial frequency) Spatial resolution: pixels per inch (image equivalent to sampling frequency) Color: Different colors correspond to different frequencies of light (frequency= 1/wavelength) Lossless coding: Frequency of codeword/symbol used
Lossless Compression Fewer bits + no change to the final signal Basic idea: Allow a variable-length code Leverage information about frequency, i.e. use Short codes for more frequent things Long codes for infrequent things Make sure that you can tell symbols apart
Example: Run-length Coding Code the number of repetitions of a symbol Example 1: 36/25 = 1.44 compression factor 100000000011100001111111100000000000 (36 bits) 1(1) (9) (3) (4) (8) (11) (1+6*4 = 25 bits, using 4 bit code) 1 0001 1001 0011 0100 1000 1011 Example 2: 26/13=2 (max run known), 26/20=1.3 (no max length) 11111111110000011111111111 (26 bits) 1(10) (5) (11) (1+3*4 = 13 bits, using 4 bit code) 1(7) 1(3) 0(5) 1(7) 1(4) (5*4=20 bits, using 1 value + 3-bit code) Most effective if it can take advantage of long repeating strings
Example: Morse Code.103 E..080 T _.064 A._.057 N _..063 O _.051 S.058 I...022 C _._.?? Ä._._.001 Q._?? É.._..??,.. Frequent letters (a,e,i,n,t) have 1-2 taps per letter. Less frequent letters (c,q,z,..., foreign symbols) have 3-4 taps each. Numbers & foreign symbols have 4-5 taps. Punctuation: 6 taps Identifiability: _._. C vs. NN?
Example: Entropy Coding Takes advantage of unbalanced distribution Like the 20 questions game: answers to questions are 0/1 bits Stop asking questions when you have the answer The first questions you ask are the ones that give the most information (more frequent cases have fewer bits) Entropy coding helps. High Entropy Medium Entropy Zero Entropy No benefit from entropy coding No need to send anything!
What if Your Frequencies are Wrong? Codeword Case I Case II Case III Case IV 1 0.25 0.60 0.97 0.1 01 0.25 0.25 0.01 0.2 001 0.25 0.10 0.01 0.3 0001 0.25 0.05 0.01 0.4 Avg bits/word 2.5 1.6 0.99 3.0 Avg bits/word = Σ i freq(i)*length(i) Note: if code and frequencies are mismatched, bit rate is worse than using fixed-length code (00,01,10,11).
How do you know the frequencies? Design different codes for different sources Learn frequencies from other examples of that source type Attach code identifier to coded document Design code for the document Count occurrences in the document Attach the whole code to the coded document