Shafq ur Réhman Image and Video Compression

Shafq ur Réhman Shafiq.urrehman@umu.se Image and Video Compression

outline mage/video compression: what and why source coding basics basic idea symbol codes stream codes compression systems and standards system standards and quality measures image coding JPEG video coding and MPEG Summary 2

need for compression Image: 6.0 million pixel camera, 3000x2000 18 MB per image -> 56 pictures / 1GB Video: DVD Disc 4.7 GB video 720x480, RGB, 30 f/s -> 31.1MB/sec audio 16bits x 44.1KHz stereo -> 176.4KB/s ->1.5 min per DVD disc Send video from cellphone: 352*240, RGB, 15 frames / second 3.8 MB/sec ->$38.00/sec levied by AT&T 3

Data Compression Wikipedia: data compression, or source coding, is the process of encoding information using fewer bits (or other information-bearing units) than an unencoded representation would use through use of specific encoding schemes. Applications General data compression:.zip,.gz Image over network: telephone/internet/wireless/etc Slow device: 1xCD-ROM 150KB/s, bluetooth v1.2 up to ~0.25MB/s Large multimedia databases Understanding compression: what are behind jpeg/mpeg/mp4 formats? what are the good/fine/super fine quality modifiers in my Canon 400D? why/when do I want to use raw/jpeg format in my digital camera? why doesn t zipping jpeg files help? what are the best ways to do compression? are we doing our best? (yes/no/maybe) 4

what can we compress? Goals of compression Remove redundancy Reduce irrelevance irrelevance or perceptual redundancy not all visual information is perceived by eye/brain, so throw away those that are not. redundant : exceeding what is necessary or normal symbol redundancy: the common and uncommon values cost the same to store. spatial and temporal redundancy: Temperatures: tend to be similar in adjacent geographical areas, also tend to be similar in the same month over different years 5

symbol/inter-symbol redundancy Letters and words in English e, a, i, s, t, a, the, me, I good, magnificent, fyi, btw, ttyl In the evolution of language we naturally chose to represent frequent meanings with shorter representations. 6

Data and information Data is not the same thing as information. Data is the means with which information is expressed. The amount of data can be much larger than the amount of information. Redundant data doesn't provide additional information. Image coding or compression aims at reducing the amount of data while keeping the information by reducing the amount of redundancy. 7

Image Compression Image compression addresses the problem of reducing the amount of data required to represent a digital image. The underlying basis of reduction process is the removal of redundant data Transforming a 2-D pixel array into a statistically uncorrelated data set

Different Types of Redundancy Coding Redundancy: Some gray levels are more common than others. Inter-pixel Redundancy: The same gray level may cover a large area. Psycho-Visual Redundancy: The eye can only resolve about 32 gray levels locally M. C. Escher, Drawing Hands, 1948 9

Redundancy Spatial redundancy Similarities between adjacent pixels 250 252 249 250 2-3 Temporal redundancy Similarities between pixels in adjacent frames 250 252 249 250 2-1 10

Modes of compression Lossless preserve all information, perfectly recoverable examples: morse code, zip/gz Lossy throw away perceptually insignificant information cannot recover all bits 11

Image Compression Image compression can be: Reversible (lossless), with no loss of information. The image after compression and decompression is identical to the original image. Often necessary in image analysis applications. The compression ratio is typically 2 to 10 times. Non reversible (lossy), with loss of some information. Lossy compression is often used in image communication, compact cameras, video, www, etc. The compression ratio is typically 10 to 30 times. 12

Image Coding and Compression Image coding How the image data can be represented. Image compression Reducing the amount of data required to represent an image. Enabling efficient image storage and transmission. f ^f 13

Lossless compression 14

Objective Measures of Image Quality: Rate 15

Objective Measures of Image Quality: Compression Ratio Compression ratio serves as the primary measure of a compression technique s effectiveness. It is a measure of the number of bits that can be eliminated from an uncompressed representation of a source image. Let N1 be the total number of bits required to store an uncompressed (raw) source image and let N2 be the total number of bits required to store the compressed data. The compression ratio Cr is then defined as the ratio of N1 to N2 Larger compression ratios indicate more effective compression Smaller compression ratios indicate less effective compression Compression ratios less than one indicate that the compressed representation is actually larger than the uncompressed representation. 16

Objective Measures of Image Quality 17

Objective Measures of Image Quality Objective fidelity criteria 18

MSE vs PSNR 19

Subjective Measures of Image Quality The problem The objective image quality measures previously shown does not always fit with our perception of image quality. One solution Let a number of test persons rate the image quality of the images on a scale. This will result in a subjective measure of image quality, or rather fidelity, but it will be based on how we perceive the quality of the images. Subjective fidelity criteria Excellent Fine Passable Marginal Inferior unusable 20

Information Measure Using Elements of Information theory 21

Example Assume that the grading levels are A, B, C, D, E, F These are equally distributed How much information do you have if you know that you don t have grade F? How much information is needed to know your exact grade? (5/6)*-log 2 (5/6) = 0,22 bits log 2 (6) - ((1/6)*-log 2 (1/6)) = 2,15 bits

Information Measure: Entropy Shannon Entropy = average information per source output H(z)= L 1 k=0 p r k log 2 p(rk) where rk is gray level number k, and L is the number of gray levels.

Measure the Amount of Data The average length of the code words assigned to various gray-level values is found by summing the product of the number of bits used to represent each gray level and the probability that the gray level occurs 24

Example 3-bit image 25

The noiseless coding theorem It is possible to make L avg/n arbitrary close to H(z) by coding infinitely long extensions of the source lim n L avg / n n H ( z)

Redundancy Coding Redundancy: Some gray levels are more common than others. Inter-pixel Redundancy: The same gray level may cover a large area. Psycho-Visual Redundancy: The eye can only resolve about 32 gray levels locally M. C. Escher, Drawing Hands, 1948 27

Definitions (revisited) Larger compression ratios indicate more effective compression Smaller compression ratios indicate less effective compression Compression ratios less than one indicate that the compressed representation is actually larger than the uncompressed representation. 28

Dealing with coding redundancy Basic idea: Different gray levels occur with different probability (non uniform histogram). Use shorter code words for the more common gray levels and longer code words for less common gray levels. This is called Variable Code Length. Code 1 Lavg = 3 Code 2 Lavg = 2,7 29

Revisit: Desired properties of symbol codes good codes are not only short but also easy to encode/decode Non-singular: every symbol in X maps to a different code word. Uniquely decodable: every sequence {x1, xn} maps to different codeword sequence. Instantaneous: no codeword is a prefix of any other codeword a.k.a. prefix code, self-punctuating code, prefix-free code. 30

Huffman Coding First 1. Sort the gray levels by decreasing probability 2. Sum the two smallest probabilities. 3. Sort the new value into the list. 4. Repeat 1 to 3 until only two probabilities remains. David Albert Huffman Second 1. Give the code 0 to the highest probability, and the code 1 to the lowest probability in the summed pair. 2. Go backwards through the tree one node and repeat from 1 until all gray levels have a unique code. 31

Example of Huffman coding 32

Example of Huffman coding Assigning codes 34

Example of Huffman coding 35

Huffman coding. The Huffman code is completely reversible, i.e., lossless. The table for the translation has to be stored together with the coded image. The resulting code is unambiguous. That is, for the previous example, the encoded string 011011101011 can only be parsed into the code words 0, 110, 1110, 1011 and decoded as 7, 4, 5, 0. The Huffman code does not take correlation between adjacent pixels into consideration. 36

Interpixel Redundancy Also called spatial or geometric redundancy Adjacent pixels are often correlated, i.e., the value of neighboring pixels of an observed pixel can often be predicted from the value of the observed pixel. Coding methods: Run-length coding Difference coding 37

Run-length coding Every code word is made up of a pair (g,l) where g is the gray level, and l is the number of pixels with that gray level (length or run ). E.g., results in the run-length code (1,3)(3,3)(4,1)(2,4)(1,5) The code is calculated row by row in this scan pattern: (Newer methods can take advantage of runs of repetitive patterns like: 8 5 5 8 5 5 8 5 5.) 38

Difference coding Definition: E.g., The code is calculated row by row in the following scan pattern: Both run-length and difference coding are reversible and can be combined with, e.g., Huffman coding. 39

Combining Difference and Huffman Coding Original image Difference coding 40

Combining Difference and Huffman Coding 41

Combining Difference and Huffman Coding 42

LZW Coding What if the symbol probabilities are unknown? LZW, Lempel-Ziv-Welch In contrast to Huffman with variable code length, LZW uses fixed lengths of code words which are assigned to variable length sequences of source symbols. The coding is done from left-to-right and row-by-row. Requires no a priori knowledge of the probability of occurrence of the symbols to be encoded. Removes some of the inter-pixel redundancy. During encoding a dictionary or code-book with symbol sequences is created which is recreated when decoding. (Many modern lossless compression methods uses Huffman coding in combination with methods like LZW.) 43

LZW-Coding Lempel-Ziv-Welch Integrated to mainstream imaging file formats Graphic interchange format GIF Tagged image file format TIFF Portable document format - PDF Widely used: GIF, TIFF, PDF Its royalty-free variant (DEFLATE) used in PNG, ZIP, Unisys U.S. LZW Patent No. 4,558,302 expired on June 20,2003 http://www.unisys.com/about unisys/lzw

39 39 126 126 39 39 126 126 39 39 126 126 39 39 126 126 39 39 126 126 256 258 260 259 257 126

LZW Coding dictionary (code book) is created while data are being encoded LZW decoder builds an identical decompression dictionary as it decodes the data stream Flush the code book When the codebook is full When coding is inefficient

Bit Plane coding Divide the gray level/color image into series of binary images (with one image per bit). Code each image separately using the above described methods. An 8-bit image will be represented by 8 coded binary images. It is based on the concept of decomposing a multilevel image into a series of binary images and compressing each binary image via one of several well-known binary compression methods Alternative decomposition approach start with representation of Gray code successive gray level differ with one bit

Constant area coding (CAC) Image is divided into areas of size p x q Classify all white, all black, mixed Example white=0, black=10, mixed = 11+pixel values If dominantly white Example white=0, black or mixed = 1 + pixel values

Lossless Predictive Coding Does not require decomposition of an image into a collection of bitplanes Based on eliminating the interpixel interference of closely spaced pixels by extracting and coding only the new information in each pixel Contains encoder, decoder and predictor The output of the predictor is rounded to the nearest integer

Lossless Predictive Coding Prediction error Is coded using an variable length code Decoder reconstructs The predictor uses m previous pixels n n n f f e ˆ n n n f e f ˆ )], ( [ ), ( ˆ 1 m i i n i y x f round y x f

Lossy Predictive Compression A Typical Predictive Coder

Lossy Predictive Compression Reduce the accuracy of the saved image for increased compression One of the simplest lossy predictive coding schemes is known as Delta Modulation. Record the approximate error (difference) between the predicted and actual samples values. For a sequence of samples S that are indexed by k to generate an approximate sequence of samples S by using a fixed delta value, the approximate error is given by 52

Lossy Predictive Compression: Effect of delta value 53

Delta Modulation Example 54

Delta Modulation Example 55

Psycho-Visual Redundancy If the only intended use of an image is visual observation, much of the information can be psycho-visual redundant, i.e., it can be removed without changing the visual appearance or perceived quality of the image. Loss of information inflict a lossy method. 1 721 kb (uncompressed) 78 kb (low quality JPEG) 56

Psycho-Visual redundancy Psycho-Visual redundancy is often reduced by quantification. E.g., Uniform quantification of gray levels Remove the least significant bits of the data. Causes edge effects. The edge effects can be reduced by Improved Gray Scale (IGS). Remove the least significant bit, and add a random number based on the sum of the least significant bits of the present, and the previous pixel. IGS reduces edge effects, but will at the same time unsharpen true edges. 57

Improved Gray Scale (IGS) (a) Original image. (b) Uniform quantization to 16 levels. (c) IGS quantization to 16 levels. 58

Transform Coding A compression technique that is based on modifying the transform of an image For most natural images a significant number of the coefficients have small magnitudes and can be coarsely quantized or discarded with little image distortion

Transform Coding Sub image decomposition Transformation Quantization Coding Adaptive transform coding or nonadaptive transform coding

Transform Coding 1) Divide the image into n n sub-images. 2) Transform each sub-image using a reversible transform (e.g., the Hotelling transform, the discrete Fourier transform (DFT) or the discrete cosine transform (DCT)). 3) Quantify, i.e., truncate the transformed image (e.g., by using DFT, and DCT frequencies with small amplitude can be removed without much information loss). The quantification can be either image dependent (IDP) or image independent (IIP). 4) Code the resulting data, normally using some kind of variable length coding, e.g., Huffman code. The coding is not reversible (unless step 3 is skipped).

g(x,y,u,v) = Forward transformation kernel h(x,y,u,v) = Inverse transformation kernel 1 0 1 0 ),,, ( ), ( ), ( N x N y v u y x g y x f v u T 1 0 1 0 ),,, ( ), ( ), ( N u N v v u y x h v u T y x f

Fourier Transform N vy ux j N vy ux j e N v u y x h e N v u y x g )/ ( 2 2 )/ ( 2 2 1 ),,, ( 1 ),,, (

Walsh-Hadamard Transform - WHT 1 0 )] ( ) ( ) ( ) ( [ 1) ( 1 ),,, ( ),,, ( m i i i i i v p y b u p x b N v u y x h v u y x g The Fourier Transform consists of a projection onto a set of orthogonal sinusoidal waveforms. The FT coefficients are called frequency components and the waveforms are ordered by frequency. The Hadamard Transform consists of a projection onto a set of square waves called Walsh functions. The HT coefficients are called sequence components and the Walsh functions are ordered by the number of their zero-crossings.

Discrete Cosine Transform - DCT The discrete cosine transform (DCT) is used to transform a signal from the spatial domain into the frequency domain. The reverse process, that of transforming a signal from the frequency domain into the spatial domain, is called the inverse discrete cosine transform (IDCT). A signal in the frequency domain contains the same information as that in the spatial domain. The order of values obtained by applying the DCT is coincidentally from lowest to highest frequency This feature and the psychological observation that the human eye and ear are less sensitive to recognizing the higher-order frequencies leads to the possibility of compressing a spatial signal by transforming it to the frequency domain and dropping high-order values and keeping low-order ones. When reconstructing the signal, and transforming it back to the spatial domain, the results are remarkably similar to the original signal.

Discrete Cosine Transform - DCT 1 2... 1, 2 / 0 1/ ) ( ) 2 1) (2 )cos( 2 1) (2 )cos( ( ) ( ),,, ( N u N u N u N v y N u x v u v u y x h

JPEG - Sequential baseline system Limited to 8-bit words DCT-values restricted to 11 bit DCT computation, quantization, variable length coding Subimages 8 x 8 left to right, top to bottom

Image size selection

JPEG: example of transform coding File size in bytes JPEG quality 1711 60 % 1287 40 % 822 20 % 533 10 % 380 5 % 9486 100 % 3839 90 % 2086 80 % 70

Wavelet Coding The principal difference between wavelet coding and transform coding is the omisson of the subimage processing stage JPEG2000

File Formats with Lossy Compression JPEG, Joint Photographic Experts Group, based on a cosine transform on 8x8 pixel blocks and Run- Length coding. Give arise to ringing and block artifacts. (.jpg.jpe.jpeg) JPEG2000, created by the Joint Photographic Experts Group in 2000. Based on wavelet transform and is superior to JPEG. Give arise only to ringing artifacts and allows flexible decompression (progressive transmission, region of interest,...) and reading. (.jp2.jpx) 72

JPEG vs JPEG-2000 73

Typical steps in losy image compression 74

How to represent a face image 75

DCT Approach (block-based) =a +b +f +g +x

PCA Approach (full-frame) 77

JPEG/JFIF overview 78

Using the 2D FFT for image compression Image = 200x320 matrix of values Compress by keeping largest 2.5% of FFT components Similar idea used by jpeg 79

Trade-off 80

File Formats with Lossless Compression TIFF, Tagged Image File Format, flexible format often supporting up to 16 bits/pixel in 4 channels. Can use several different compression methods, e.g., Huffman, LZW. GIF, Graphics Interchange Format. Supports 8 bits/pixel in one channel, that is only 256 colors. Uses LZW compression. Supports animations. PNG, Portable Network Graphics, supports up to 16 bits/pixel in 4 channels (RGB + transparency). Uses Deflate compression (~LZW and Huffman). Good when interpixel redundancy is present. 81

Vector based file formats 82

Vector based file formats PS, PostScript, is a page description language developed in 1982 for sending text documents to printers. EPS, Encapsulated PostScript, like PS but can embed raster images internally using the TIFF format. PDF, Portable Document Format, widely used for documents and are supported by a wide range of platforms. Supports embedding of fonts and raster/bitmap images. Beware of the choice of coding. Both lossy and lossless compressions are supported. SVG, Scalable Vector Graphics, based on XML supports both static and dynamic content. All major web browsers supports it (Internet Explorer from version 9). 83

Choosing image file format Image analysis Lossless formats are vital. TIFF supports a wide range of different bit depths and lossless compression methods. Images for use on the web JPEG for photos (JPEG2000), PNG for illustrations. GIF for small animations. Vector format: SVG, nowadays supported by web browsers. Line art, illustrations, logotypes, etc. Lossless formats such as PNG etc. (or a vector format) 84

Video Compression 85

Video Compression Standards Once video is in digital format, it makes sense to compress it Similarly to image compression, we want to store video data as efficiently as possible Again, we want to both maximize quality and minimize storage space and processing resources This time, we can exploit correlation in both space and time domains

Video Compression Standards Unlike image encoding, video encoding is rarely done in lossless form No storage medium has enough capacity to store a practical sized lossless video file Lossless DVD video - 221 Mbps Compressed DVD video - 4 Mbps 50:1 compression ratio! Teleconference H.261, H.262, H.263, H.230 Multimedia video MPEG-1 MPEG-2 MPEG-4

Two organizations dominate video compression standardization: ITU-T Video Coding Experts Group (VCEG) International Telecommunications Union Telecommunications Standardization Sector (ITU-T, a United Nations Organization, formerly CCITT), ISO/IEC Moving Picture Experts Group (MPEG) International Standardization Organization and International Electrotechnical Commission, Joint Technical Committee Number 1, Subcommittee 29, Working Group 11 88

Definitions Bitrate Information stored/transmitted per unit time Usually measured in Mbps (Megabits per second) Ranges from < 1 Mbps to > 40 Mbps Resolution Number of pixels per frame Ranges from 160x120 to 1920x1080 FPS (frames per second) Usually 24, 25, 30, or 60 Don t need more because of limitations of the human eye 89

Scan types Interlaced scan Odd and even lines displayed on alternate frames Initially used to save bandwidth on TV transmission When displaying interlaced video on a progressive scan display, can see comb effect Progressive scan Display all lines on each frame New fixed-resolution displays (such as LCD, Plasma) all use progressive scan Deinterlacing is not a trivial task 90

MPEG (Moving Pictures Expert Group) Committee of experts that develops video encoding standards Until recently, was the only game in town (still the most popular, by far) Suitable for wide range of videos Low resolution to high resolution Slow movement to fast action Can be implemented either in software or hardware 91

MPEG (Moving Pictures Expert Group) MPEG:s main components are: Block (8 8 pixels) Macro block (2 2 block) Slice (One row of macro blocks) Picture (An entire video frame) Group of pictures (GOP) Video Sequence (One or more GOP:s) 92

MPEG 8 8 Block 16 16 Macro block Slice 93

Evolution of MPEG MPEG-1 Initial audio/video compression standard Used by VCD s MP3 = MPEG-1 audio layer 3 Target of 1.5 Mb/s bitrate at 352x240 resolution Only supports progressive pictures MPEG-2 Current de facto standard, widely used in DVD and Digital TV Ubiquity in hardware implies that it will be here for a long time Transition to HDTV has taken over 10 years and is not finished yet Different profiles and levels allow for quality control

Evolution of MPEG MPEG-2 Current de facto standard, widely used in DVD and Digital TV Ubiquity in hardware implies that it will be here for a long time Transition to HDTV has taken over 10 years and is not finished yet Different profiles and levels allow for quality control

Evolution of MPEG MPEG-3 Originally developed for HDTV, but abandoned when MPEG-2 was determined to be sufficient MPEG-4 Includes support for AV objects, 3D content, low bitrate encoding, and DRM In practice, provides equal quality to MPEG-2 at a lower bitrate, but often fails to deliver outright better quality MPEG-4 Part 10 is H.264, which is used in HD-DVD and Blu-Ray MPEG-7, 2001 : metadata for audio-video streams, Multimedia Content Description Interface MPEG-21, 2002 : distribution, exchange, user access of multimedia data and intellectual property management

MPEG Block Diagram

MPEG technical specification Part 1 - Systems - describes synchronization and multiplexing of video and audio. Part 2 - Video - compression codec for interlaced and noninterlaced video signals. Part 3 - Audio - compression codec for perceptual coding of audio signals. A multichannel-enabled extension of MPEG-1 audio. Part 4 - Describes procedures for testing compliance. Part 5 - Describes systems for Software simulation. Part 6 - Describes extensions for DSM-CC (Digital Storage Media Command and Control.) Part 7 - Advanced Audio Coding (AAC) Part 8 - Deleted Part 9 - Extension for real time interfaces. Part 10 - Conformance extensions for DSM-CC.

MPEG video spatial domain processing Spatial domain handled very similarly to JPEG Convert RGB values to YUV colorspace Split frame into 8x8 blocks 2-D DCT on each block Quantization of DCT coefficients Run length and entropy coding

Intra-Frame Encoded Quantization major reduction controls quality Zig-Zag Scan, Run-length coding 100

MPEG YUV compression Y Y Y Y U V Y Y Y Y U U V V 4:2:0 4:2:2 Y Y Y Y U V Y Y U U V V Y Y Y Y U V Y Y U U V V 4:1:1 4:4:4 4:2:0 or 4:1:1 common in consumer products (DV) 4:2:2 common on professional products (DVCPro) 4:4:4 is rarely used gives no visible improvement compared with 4:2:2 101

MPEG Block compression The same as JPEG 8 High frequencies 99 50 35-11 12 0 0 0-74 28 21 24 0 0 0 0 87-49 54 16 0 0 0 0 8 DCT Quantization 55 95 68 40 35 22-17 8 4 2 8 0 0 0 0 0 44 57 25 12 0 3 0 0-25 32 60 18 33 24 14 5 0 0 0-1 0 0 0 0 Low Frequencies Spatial domain Frequency domain 0 0 0 0 0-1 0 0 0 0 0 0 3 0 5 14 24 0 0 0 0 0 0 2 8 12 RLE Huffman 102

MPEG video time domain processing (Temporal compression) Adjacent frames share large similarities Temporal compression can be achieved in two ways: Discarding images (reduce the frame rate) Through motion estimation and motion vectors 103

MPEG video time domain processing Totally new ballgame (this concept doesn t exist in JPEG) General idea Use motion vectors to specify how a 16x16 macroblock translates between reference frames and current frame, then code difference between reference and actual block

Types of frames I frame (intra-coded) Coded without reference to other frames P frame (predictive-coded) Coded with reference to a previous reference frame (either I or P) Size is usually about 1/3 rd of an I frame B frame (bi-directional predictive-coded) Coded with reference to both previous and future reference frames (either I or P) Size is usually about 1/6 th of an I frame

GOP (Group of Pictures) GOP is a set of consecutive frames that can be decoded without any other reference frames Usually 12 or 15 frames Transmitted sequence is not the same as displayed sequence Random access to middle of stream Start with I frame

Things about prediction Only use motion vector if a close match can be found Evaluate closeness with MSE or other metric Can t search all possible blocks, so need a smart algorithm If no suitable match found, just code the macroblock as an I-block If a scene change is detected, start fresh Don t want too many P or B frames in a row Predictive error will keep propagating until next I frame Delay in decoding

MPEG Group Of Pictures (GOP) MPEG uses three types of frames Grouped in a Group Of Pictures (GOP) I-pictures (Intracoded) P-pictures (Predictive Coded) B-pictures (Bidirectionally interpolated) I B B P B B P Forward Prediction I B B P B B P Bidirectional Prediction 108

Compressed video stream 109

Temporal Redundancy Reduction I frames are independently encoded P frames are based on previous I, P frames B frames are based on previous and following I and P frames In case something is uncovered

MPEG Motion Estimation Macro block Calculate the position for the macro block in the new image Store the motion vector and difference in appearance 111

MPEG Motion Estimation Help understanding the content of image sequence Help reduce temporal redundancy of video For compression Stabilizing video by detecting and removing small, noisy global motions For building stabilizer in camcorder A hard problem in general! 112

Bitrate allocation CBR Constant BitRate Streaming media uses this Easier to implement VBR Variable BitRate DVD s use this Usually requires 2-pass coding Allocate more bits for complex scenes This is worth it, because you assume that you encode once, decode many times

MPEG Data stream Display order I B B P B B P 1 2 3 4 5 6 7 Order in data stream I P B B P B B 1 4 2 3 7 5 6 114

MPEG - Audio MPEG-1 3 layers of increasing quality, layer 3 being the most common (MP3) 16 bits Samping rate - 32, 44.1, or 48 khz Bitrate 32 to 320 kbps De facto - 44.1 khz sample rate, 192 kbps bitrate MPEG-2 Supports > 2 channels, lower sampling frequencies, low bitrate improvement AAC (Advanced Audio Coding) More sample frequencies (8 khz to 96 khz) Higher coding efficiency and simpler filterbank 96 kbps AAC sounds better than 128 kbps MP3 Usually CBR, but can do VBR

MPEG Container Format Container format is a file format that can contain data compressed by standard codecs 2 types for MPEG Program Stream (PS) Designed for reasonably reliable media, such as disks Transport Stream (TS) Designed for lossy links, such as networks or broadcast antennas

AV Synchronization Want audio and video streams to be played back in sync with each other Video stream contains presentation timestamps MPEG-2 clock runs at 90 khz Good for both 25 and 30 fps PCR (Program Clock Reference) timestamps are sent with data by sender Receiver uses PLL (Phase Lock Loop) to synchronize clocks

Real time video encoding Motion estimation will be worse, so need higher bitrate to compensate Very hard to do in software, need dedicated hardware or hardware assistance Tivo, ReplayTV do this

Streaming media Common types include Flash, RealVideo, Quicktime Usually have low bandwidth available, need to optimize as such Want dedicated network protocols for this purpose TCP will wait indefinitely for retransmission, so is often not suitable

MPEG data stream

Analysis Pros Overall sharp picture Audio and video stay in sync with each other What if we were transmitting this over a network? Cons Picture flashes, blurs when there is too much movement on screen Higher bitrate often does not solve this problem

Image/Video Compression Standards Bitstream useful only if the recipient knows the code! Standardization efforts are important Technology and algorithm benchmark System definition and development Patent pool management Defines the bitstream (decoder), not how you generate them (encoder)! 122

current industry focus: H.264 encoding/decoding on mobile devices, low-latency video transmission over various networks, low-power video codec 123

audio coding versus image coding 124

VC demo 125