CSCD 443/533 Advanced Networks Fall PDF Free Download

CSCD 443/533 Advanced Networks Fall 2017 Lecture 18 Compression of Video and Audio 1

Topics Compression technology Motivation Human attributes make it possible Audio Compression Video Compression Performance 2

Motivation, Why Compress? Why do we need to compress streaming media? Look at one instance 640 X 480 pixel frames 24 bits color/pixel 30 frames / sec No compression, takes over 200 Mbps to transmit just video Do you have a 200 Mbps link? We need massive compression to be able to view streaming video and audio with our current network

Motivation, Why Compress? What does compression buy us? Lossless DVD video - 221 Mbps Compressed DVD video - 4 Mbps 50:1 compression ratio!

Why Compress? In a Nutshell To reduce the file size To deliver stream to the user To conserve storage space Choosing a compression rate is a balance: Quality of Available the Media bandwidth

So, Why Compress? Delivering video over Web means compromises Mostly trading image quality for lower bit rates In general, Video and audio are compressed Stuffed into a container and Delivered to you via web If done well, you won't notice The missing bits and The delivery of media Discuss individual format, codecs and tradeoffs

Definitions File Format Particular way information is stored in a file Known as containers for streaming media Codec Codec is an acronym for Compression/Decompression Codec is any technology for compressing and decompressing data. Compression Reduces file size by removing audio or video information Takes advantage of human perception

Format vs. Codec Example Flash Video (FLV) is a file format H.264, On2, VP6, Sorenson Spark are codecs for the flash video file

Container File Formats Purpose of container formats Examples Function as "black boxes" for holding a variety of media formats Good container formats can handle files compressed with a variety of different codecs In a perfect world, you could put any codec in any container format Unfortunately are some incompatibilities MPEG-2, Advanced Systems Format (ASF) from Microsoft, AVI, Quicktime (MOV), MP4, Flash (FLV) RealMedia

Multimedia Container Files Multimedia file extensions.mov,.ogg,.wmv,.flv,.mp4,.mpeg Essentially, videos packaged Into encapsulation containers, or wrapper formats, that contain all information needed to present video You can think of file formats as being containers that hold all this information Very similar to a.zip,.sit or.rar file

Differences in Containers Why are certain formats are popular? Popular Support File Size How widely supported is the format? Larger is not better for streaming files Support for advanced codec functionality Older formats such as AVI do not support new codec features like B-frames or VBR audio Support for advanced content Such as chapters, subtitles, meta-tags, userdata.

Compression

Compression Two Types: Lossless Lossy Keeps All Bits Removes Bits

Lossy Compression Lossy compression schemes reduce file size by discarding some amount of data during encoding before sent over Internet Once received by client, codec attempts to reconstruct information that was lost or discarded

Video Lossy Compression Image Compression Image format uses lossy compression to sample an image and discard unnecessary color/contrast information

Can you really see difference?

Video Lossy Compression Why can you do lossy compression? Spatial and temporal redundancy Pixel values are not independent, correlated with their neighbors both within same frame and across frame Value of pixel is predictable given values of neighboring pixels Psychovisual redundancy Human eye has limited response to fine spatial detail, Less sensitive to detail near object edges or around shotchanges Impairments introduced by bit rate reduction should not be visible to human viewer

Audio Lossy Compression Audio compression Lossy discards frequencies on high and low end of spectrum and attempts to locate and remove unnecessary audio data More on this Nice description and example programs http://www.videograbber.net/compress-audio-file.html

Audio Streaming Formats Many formats and standards for streaming audio RealNetworks' RealAudio, streaming MP3, Macromedia's Flash and Director Shockwave, Microsoft's Windows Media, and Apple's QuickTime Also recognized standard formats, including Liquid Audio, MP3, MIDI, WAV, and AU

Audio Lossy Compression First, player decompresses audio file as it downloads to your computer Then fills in missing information according to the instructions set by codec Compressed file is unintelligible to listener Decompressed file is intelligible but of a lower quality than original

MP3 Audio Lossy Compression Example - MP3 MP3 lossy audio data compression algorithm takes advantage of perceptual limitation of human hearing Auditory Masking Discovered (in late 1800's) that tone could be rendered inaudible by another tone of lower frequency How your brain perceives similar sounds

MP3 Audio Lossy Compression Uncompressed audio, Like CDs, stores more data than your brain can actually process For example, Two notes are very similar and very close together, your brain may perceive only one of them Two sounds are different, one is much louder than the other, your brain may never perceive the quieter signal

MP3 Audio Lossy Compression Study these auditory phenomena Psychoacoustics, Can be accurately described in tables and charts, Mathematical models representing human hearing patterns These can be stored in the codec as reference tables Article on psychoacoustics http://www.uaudio.com/blog/how-the-ear-works/

MP3 Audio Lossy Compression MP3 Encoding Tools Analyze incoming source signal, Break it down into mathematical patterns, and Compare these patterns to psychoacoustic models stored in encoder itself Encoder can then discard most of data that doesn't match stored models, keeping that which does Shrinks file by discarding great deal of extra data

MP3 Audio Lossy Compression MP3 encoding process two-pass system Step 1 Run all psychoacoustic models, discarding data Then compress what's left to shrink storage space Step 2 Huffman coding, does not discard any data Lets you store what's left in a smaller amount of space Uses fewer bits to store most common symbols Steps 2a - Break resulting audio stream into frames assembled into a bitstream, with header information preceding each data frame Headers contain "meta-data" specific to that frame Such as an ID, bitrate, audio frequency, padding, type of frame, MPEG1 or 2

Basic Structure of Audio Encoder Limit values to audible tones Note: A decoder works in just the opposite manner

Processes of and Audio Encoder Mapping Block divides audio inputs into 32 equalwidth frequency subbands (samples) Psychoacoustic Block calculates masking threshold for each subband

Processes of and Audio Encoder Bit-Allocation Block (in Quantizer block) allocates bits using outputs of the Mapping and Psychoacoustic blocks Quantizer & Coding Block scales and quantize (reduce) the samples Frame Packing Block formats the headers into an encoded stream samples with

Video Encoding, Standards

MPEG Organization Moving Picture Experts Group Established in 1988 Standards under International Organization for standardization (ISO) and International Electro technical Commission (IEC) Official name: ISO/IEC JTC1 SC29 WG11 Responsible for MPEG standards

Evolution of MPEG MPEG-1 Initial audio/video compression standard Used by VCD s 1990's MP3 = MPEG-1 audio layer 3 Target of 1.5 Mb/s bitrate at 352x240 resolution Only supports progressive pictures, no interlaced pictures

Evolution of MPEG MPEG-2 Standard, still widely used in DVD and Digital TV Support in current hardware implies that it will be here for a long time Transition to HDTV has taken over 10 years and is not finished yet Different profiles and levels allow for quality control

Evolution of MPEG MPEG-3 Originally developed for HDTV, but abandoned when MPEG-2 was determined to be sufficient MPEG-4 Includes support for AV objects, 3D content, low bitrate encoding, and DRM In practice, provides equal quality to MPEG-2 at a lower bitrate MPEG-4 Part 10 is H.264, which is used in HD- DVD and Blu-Ray H.264 is the encoding used in video

MPEG technical specification Part 1 - Systems - describes synchronization and multiplexing of video and audio. Part 2 - Video - compression codec for interlaced and noninterlaced video signals. Part 3 - Audio - compression codec for perceptual coding of audio signals. A multichannel-enabled extension of MPEG-1 audio. Part 4 - Describes procedures for testing compliance. Part 5 - Describes systems for Software simulation. Part 6 - Describes extensions for DSM-CC (Digital Storage Media Command and Control.) Part 7 - Advanced Audio Coding (AAC) Part 8 - Deleted Part 9 - Extension for real time interfaces. Part 10 - Conformance extensions for DSM-CC.

MPEG Video spatial domain processing Spatial Domain Handled Similarly to JPEG Convert RGB values to YUV colorspace One Brightness and two other color representations RGB from Television, YUV graphics processing Y represents luminosity, U,V color Can represent YUV with fewer bits since human eye can't tell if color is missing We care more about brightness Split frame into 8x8 blocks

8 x 8 Blocks

MPEG Video spatial domain processing 2-D Discrete Cosine Transform (DCT) on each block Similar to a Fourier Transform for Signal Processing Transforms blocks into higher frequency and lower frequency values Pushes more important least frequent values to upper quadrant of the 8 X 8 block For typical image, most of visually significant information about image is concentrated in just a few coefficients of DCT Quantization of DCT coefficients Values that are near zero, converted to zero Values that are smaller, shrunk All are represented by integers

Quantization matrix matrix divides each coefficient by a number. The quantization matrix is pre-calculated and defined by the JPEG standard and favors the items in the top left corner of the matrix, the more frequency significant terms. Each coefficient has a different weighting

Run-length Encoding The regular JPEG standard uses an advanced version of Huffman coding

DCT Transform on Blocks Final Result Reduction in Number of Bits De-compression is the reverse process However, the lossy part of this, can't quite get back to the original image there is a loss of information Nice Examples using Discrete Cosine Transform http://www.dspguide.com/ch27/6.htm http://datagenetics.com/blog/november32012/index.html

MPEG video time domain processing Totally new ballgame (this concept doesn t exist in JPEG) General idea Use motion vectors to specify how a 16x16 macroblock translates between reference frames and current frame, then code difference between reference and actual block

MPEG video time domain processing GOP (Group of Pictures) GOP is a set of consecutive frames that can be decoded without any other reference frames Usually 12 or 15 frames Starts with I frame

MPEG video time domain processing Group of Pictures (GOP) I-frames Can be reconstructed without any reference to other frames, like still pictures P-frames Forward predicted from last I-frame and P-frames, Code differences like movement Two to 4 frames in the future B-frames Forward and backward predicted

MPEG Processing GOP

MPEG GOP

Final Comments on Prediction Only use motion vector if a close match can be found Evaluate closeness with Mean Standard Error or other metric Can t search all possible blocks, so need a smart algorithm If no suitable match found, just code the macroblock as an I-block If a scene change is detected, start fresh Don t want too many P or B frames in a row Predictive error will keep propagating until next I frame Delay in decoding

MPEG Usefulness Multimedia Communications Webcasting Broadcasting Video on Demand Interactive Digital Media Telecommunications Mobile communications

References Overviews of Codecs and Container Formats http://www.divxland.org/en/article/15/multimedia_container_formats http://www.pcworld.com/article/213612/all_about_video_codecs_and _containers.html?page=2 Ripping CD's and Encoding audio http://www.blog.gartonhill.com/ripping-your-cd-collection-part-1/ http://www.blog.gartonhill.com/ripping-your-cd-collection-part-2- building-your-library/ Mp3 Audio http://oreilly.com/catalog/mp3/chapter/ch02.html Audio Streaming http://oreilly.com/catalog/sound/chapter/ch05.html

Summary Video and audio has become a huge part of our daily interaction with the Internet New codecs and file formats being proposed all the time Number of devices with different needs driving the push for more efficient ways to compress and deliver streaming media

End New program is up last assignment

Motivation, Why Compress? What does compression buy us? Lossless DVD video - 221 Mbps Compressed DVD video - 4 Mbps 50:1 compression ratio! 4

Why Compress? In a Nutshell To reduce the file size To deliver stream to the user To conserve storage space Choosing a compression rate is a balance: Quality of Available the Media bandwidth 5

Differences in Containers Why are certain formats are popular? Popular Support How widely supported is the format? File Size Larger is not better for streaming files Support for advanced codec functionality Older formats such as AVI do not support new codec features like B-frames or VBR audio Support for advanced content Such as chapters, subtitles, meta-tags, userdata. 11

Mapping-divide into 32 subbands, or frequency samples Psychoacoustic- below which noise is imperceptible to the human ear (Map & Psycho can be done independently Bit-Allocation-total noise to mask ratios can be minimized, over all the channels and subbands Frame Packing header includes bit allocation and scaling information (scale factor) Quantizer & Coding scaled and quantized according to the bit allocation 27

8 x 8 Blocks 36

Run-length Encoding The regular JPEG standard uses an advanced version of Huffman coding 39

The MPEG file consists of compressed video data, called the video stream. The basic unit of the video stream is a "Group of Pictures" (GOP), made up of three picture types, also called frames: I, P, and B. The I -frames can be restructured without any references to other frames. On average, the I -frames can occur one in every ten-fifteen frames of motion picture. This type of frames contains information only about itself. P -frames can only be recreated by references from previous I-frame or P-frame; it is impossible to construct them without any data of another frame. The B -frames are referred to as bi-directional frames, because they can be recreated based on forward and backward predictions from the information presented in the nearest preceding and following I or P frame. 43