Full Issue Full Issue - PDF Free Download

Contact Us WAVE In a Nutshell WAVE In a Nutshell WAVE Topics WAVE Topics Content Content Format Format Issue Timing Issue Timing Issue Length Issue Length Editors Editors Editorial Style Editorial Style Editorial Operations Editorial Operations Editorial Calendar Editorial Calendar Readers Readers Translations Translations Advertising Advertising Copyright Notice Copyright Notice Full Issue Full Issue 2004 2004 2003 2003 2002 2002 2001 2001 2000 2000 1999 1999 1998 1998 1997 1997 1996 1996

Articles Articles DigitalPhotoLog.com DigitalPhotoLog.com Events Events Places Places DigitalPhotoLog.com DigitalPhotoLog.com Wave Report Pictures Wave Report Pictures Info Appliances Info Appliances Home Networks Home Networks 3D 3D Fixed Wireless Fixed Wireless Satellite/VSAT Satellite/VSAT CellularPCS CellularPCS Semiconductor/Chips Semiconductor/Chips Audio Audio Television Television Optical Optical E-commerce and Web Tech E-commerce and Web Tech Cable Modems and DSL Cable Modems and DSL Send us a link to your company Send us a link to your company

BPL LMDS GPU VoP OLED DSP Opera Browser The FCC More... Start Search Search hints Video Compression Technology Video Compression Tutorial At its most basic level, compression is performed when an input video stream is analyzed and information that is indiscernible to the viewer is discarded. Each e is then assigned a code - commonly occurring events are assigned few bits and events will have codes more bits. These steps are commonly called signal ana quantization and variable length encoding respectively. There are four methods compression, discrete cosine transform (DCT), vector quantization (VQ), fracta compression, and discrete wavelet transform (DWT). Discrete cosine transform is a lossy compression algorithm that samples an image at regular intervals, analyzes the frequency components present in the sample, and discards those frequencies which do not affect the image as the human eye perceives it. DCT is the basis of standards such as JPEG, MPEG, H.261, and H.263. We covered the definition of both DCT and wavelets in our tutorial on Wavelets Theory. Web Services Summit Fair Use or Copyright? Deregulation Smoke and Mirrors More... Vector quantization is a lossy compression that looks at an array of data, instead of individual values. It can then generalize what it sees, compressing redundant data, while at the same time retaining the desired object or data stream's original intent. Fractal compression is a form of VQ and is also a lossy compression. Compression is performed by locating self-similar sections of an image, then using a fractal algorithm to generate the sections. Like DCT, discrete wavelet transform mathematically transforms an image into frequency components. The process is performed on the entire image, which differs from the other methods (DCT), that work on smaller pieces of the desired data. The result is a hierarchical representation of an image, where each layer represents a frequency band. Compression Standards MPEG stands for the Moving Picture Experts Group. MPEG is an ISO/IEC work group, established in 1988 to develop standards for digital audio and video form There are five MPEG standards being used or in development. Each compress standard was designed with a specific application and bit rate in mind, although MPEG compression scales well with increased bit rates. They include: MPEG-1 Designed for up to 1.5 Mbit/sec Standard for the compression of moving pictures and audio. This

was based on CD-ROM video applications, and is a popular standard for video on the Internet, transmitted as.mpg files. In addition, level 3 of MPEG-1 is the most popular standard for digital compression of audio--known as MP3. MPEG-1 is the standard of compression for VideoCD, the most popular video distribution format thoughout much of Asia. MPEG-2 Designed for between 1.5 and 15 Mbit/sec Standard on which Digital Television set top boxes and DVD compression is based. It is based on MPEG-1, but designed for the compression and transmission of digital broadcast television. The most significant enhancement from MPEG-1 is its ability to efficiently compress interlaced video. MPEG-2 scales well to HDTV resolution and bit rates, obviating the need for an MPEG-3. MPEG-4 Standard for multimedia and Web compression. MPEG-4 is based on object-based compression, similar in nature to the Virtual Reality Modeling Language. Individual objects within a scene are tracked separately and compressed together to create an MPEG4 file. This results in very efficient compression that is very scalable, from low bit rates to very high. It also allows developers to control objects independently in a scene, and therefore introduce interactivity. MPEG-7 - this standard, currently under development, is also called the Multimedia Content Description Interface. When released, the group hopes the standard will provide a framework for multimedia content that will include information on content manipulation, filtering and personalization, as well as the integrity and security of the content. Contrary to the previous MPEG standards, which described actual content, MPEG-7 will represent information about the content. MPEG-21 - work on this standard, also called the Multimedia Framework, has just begun. MPEG-21 will attempt to describe the elements needed to build an infrastructure for the delivery and consumption of multimedia content, and how they will relate to each other. JPEG stands for Joint Photographic Experts Group. It is also an ISO/IEC worki group, but works to build standards for continuous tone image coding. JPEG is lossy compression technique used for full-color or gray-scale images, by explo the fact that the human eye will not notice small color changes. JPEG 2000 is an initiative that will provide an image coding system using compression techniques based on the use of wavelet technology. DV is a high-resolution digital video format used with video cameras and camco The standard uses DCT to compress the pixel data and is a form of lossy compression. The resulting video stream is transferred from the recording devic FireWire (IEEE 1394), a high-speed serial bus capable of transferring data up t MB/sec. H.261 is an ITU standard designed for two-way communication over ISDN lines

Terms (video conferencing) and supports data rates which are multiples of 64Kbit/s. T algorithm is based on DCT and can be implemented in hardware or software an uses intraframe and interframe compression. H.261 supports CIF and QCIF resolutions. H.263 is based on H.261 with enhancements that improve video quality over modems. It supports CIF, QCIF, SQCIF, 4CIF and 16CIF resolutions. DivX Compression DivX is a software application that uses the MPEG-4 standard to compress digital video, so it can be downloaded over a DSL/cable modem connection in a relatively short time with no reduced visual quality. The latest version of the codec, DivX 4.0, is being developed jointly by DivXNetworks and the open source community. DivX works on Windows 98, ME, 2000, CE, Mac and Linux. Lossy compression - reduces a file by permanently eliminating certain redunda information, so that even when the file is uncompressed, only a part of the origi information is still there. ISO/IEC International Organization for Standardization - a non-governmental organization that works to promote the development of standardization to facilitate the international exchange of goods and services and spur worldwide intellectual, scientific, technological and economic activity. International Electrotechnical Commission - international standards and assessment body for the fields of electrotechnology Codec - A video codec is software that can compress a video source (encoding well as play compressed video (decompress). CIF - Common Intermediate Format - a set of standard video formats used in videoconferencing, defined by their resolution. The original CIF is also known a CIF (FCIF). QCIF - Quarter CIF (resolution 176x144) SQCIF - Sub quarter CIF (resolution 128x96) 4CIF - 4 x CIF (resolution 704x576) 16CIF - 16 x CIF (resolution 1408x1152 Additional sources of information* TECH Online Review - Video Compression Overview DataCompression.info IGM - Desktop Video - Compression Standards

Comments? E-mail webmaster Page updated 5/25/02 Copyright 4th Wave Inc, 2003 *The WAVE Report is not responsible for content on additional sites.

4.2. Video Compression JPEG H.261 MPEG Reference: Chapter 6 of Steinmetz and Nahrstedt Motivations: 1. Uncompressed video and audio data are huge. In HDTV, the bit rate easily exceeds 1 Gbps. --> big problems for storage and network communications. 2. The compression ratio of lossless methods (e.g., Huffman, Arithmetic, LZW) is not high enough for image and video compression, especially when distribution of pixel values is relatively flat. The following will be discussed: Spatial Redundancy Removal -- Intraframe coding (JPEG) Spatial and temporal Redundancy Removal -- Intraframe and Interframe coding (H.261, MPEG) 4.2.1. JPEG 1. What is JPEG?

"Joint Photographic Expert Group". Voted as international standard in 1992. Works with color and grayscale images, e.g., satellite, medical,... 2. JPEG overview Encoding Decoding -- Reverse the order 3. Major Steps DCT (Discrete Cosine Transformation) Quantization Zigzag Scan DPCM on DC component RLE on AC Components Entropy Coding 3a. Discrete Cosine Transform (DCT) Overview:

Definition (8 point DCT): Question: What is F[0,0]? -- define DC and AC components. The 64 (8 x 8) DCT basis functions Why DCT not FFT? DCT is like FFT, but can approximate lines well with few coeff.

Computing the DCT o Factoring reduces problem to a series of 1D DCTs: 3b. Quantization o Most software implementations use fixed point arithmetic. Some fast implementations approximate coefficients so all multiplies are shifts and adds. o World record is 11 multiplies and 29 adds. (C. Loeffler, A. Ligtenberg and G. Moschytz, "Practical Fast 1-D DCT Algorithms with 11 Multiplications", Proc. Int'l. Conf. on Acoustics, Speech, and Signal Processing 1989 (ICASSP `89), pp. 988-991) Why? -- To throw out bits Example: 101101 = 45 (6 bits). Truncate to 4 bits: 1011 = 11. Truncate to 3 bits: 101 = 5.

Quantization error is the main source of the Lossy Compression. Uniform quantization Divide by constant N and round result (N = 4 or 8 in examples above). Non powers-of-two gives fine control (e.g., N = 6 loses 2.5 bits) Quantization Tables In JPEG, each F[u,v] is divided by a constant q(u,v). Table of q(u,v) is called quantization table. ---------------------------------- 16 11 10 16 24 40 51 61 12 12 14 19 26 58 60 55 14 13 16 24 40 57 69 56 14 17 22 29 51 87 80 62 18 22 37 56 68 109 103 77 24 35 55 64 81 104 113 92 49 64 78 87 103 121 120 101 72 92 95 98 112 100 103 99 ---------------------------------- Eye is most sensitive to low frequencies (upper left corner), less sensitive to high frequencies (lower right corner) Standard defines 2 default quantization tables, one for luminance (above), one for chrominance. Q: How would changing the numbers affect the picture (e.g., if I doubled them all)? Quality factor in most implementations is the scaling factor for default quantization tables. Custom quantization tables can be put in image/scan header. 3c. Zig-zag Scan Why? -- to group low frequency coefficients in top of vector. Maps 8 x 8 to a 1 x 64 vector

3d. Differential Pulse Code Modulation (DPCM) on DC component DC component is large and varied, but often close to previous value (like lossless JPEG). Encode the difference from previous 8x8 blocks -- DPCM 3e. Run Length Encode (RLE) on AC components 1x64 vector has lots of zeros in it Encode as (skip, value) pairs, where skip is the number of zeros and value is the next non-zero component. Send (0,0) as end-of-block sentinel value. 3f. Entropy Coding Categorize DC values into SSS (number of bits needed to represent) and actual bits. -------------------- Value SSS 0 0-1,1 1-3,-2,2,3 2-7..-4,4..7 3 -------------------- Example: if DC value is 4, 3 bits are needed. Send off SSS as Huffman symbol, followed by actual 3 bits. For AC components (skip, value), encode the composite symbol (skip,sss) using the Huffman coding. Huffman Tables can be custom (sent in header) or default. 4. Overview of the JPEG bitstream

A "Frame" is a picture, a "scan" is a pass through the pixels (e.g., the red component), a "segment" is a group of blocks, a "block" is an 8x8 group of pixels. Frame header: sample precision (width, height) of image number of components unique ID (for each component) horizontal/vertical sampling factors (for each component) quantization table to use (for each component) Scan header Number of components in scan component ID (for each component) Huffman table for each component (for each component) Misc. (can occur between headers) Quantization tables Huffman Tables Arithmetic Coding Tables Comments Application Data 5. Various JPEG Modes Baseline/Sequential -- the one that we described in detail Lossless Progressive Hierarchical "Motion JPEG" -- Baseline JPEG applied to each image in a video. 1. Lossless Mode o A special case of the JPEG where indeed there is no loss

o Take difference from previous pixels (not blocks as in the Baseline mode) as a "predictor". Predictor uses linear combination of previously encoded neighbors. It can be one of seven different predictor based on pixels neighbors o Since it uses only previously encoded neighbors, first row always uses P2, first column always uses P1. o Effect of Predictor (test with 20 images)

Note: "2D" predictors (4-7) always do better than "1D" predictors. 2. Comparison with Other Lossless Compression Programs (compression ratio): 3. ---------------------------------------------------------------- - 4. Compression Program Compression Ratio 5. Lena football F-18 flowers 6. ---------------------------------------------------------------- - 7. lossless JPEG 1.45 1.54 2.29 1.26 8. optimal lossless JPEG 1.49 1.67 2.71 1.33 9. compress (LZW) 0.86 1.24 2.21 0.87 10. gzip (Lempel-Ziv) 1.08 1.36 3.10 1.05 11. gzip -9 (optimal Lempel-Ziv) 1.08 1.36 3.13 1.05 12. pack (Huffman coding) 1.02 1.12 1.19 1.00 13. ---------------------------------------------------------------- - 14. 15. Progressive Mode o Goal: display low quality image and successively improve. o Two ways to successively improve image: 1. Spectral selection: Send DC component, then first few AC, some more AC, etc. 2. Successive approximation: send DCT coefficients MSB (most significant bit) to LSB (least significant bit). 16. Hierarchical Mode A Three-level Hierarchical JPEG Encoder (From V. Bhaskaran and K. Konstantinides, "Image and Video Compression Standards: Algorithms and Architectures", Kluwer Academic Publishers, 1995.)

o Down-sample by factors of 2 in each direction. Example: map 640x480 to 320x240 o Code smaller image using another method (Progressive, Baseline, or Lossless). o Decode and up-sample encoded image o Encode difference between the up-sampled and the original using Progressive, Baseline, or Lossless. o Can be repeated multiple times. o Good for viewing high resolution image on low resolution display. 17. JPEG-2 o Big change was to use adaptive quantization Further Exploration Try the Interactive JPEG examples and the JPEG examples. 4.2.2. H. 261

Developed by CCITT in 1988-1990 Meant for videoconferencing, videotelephone applications over ISDN telephone lines. Baseline ISDN is 64 kbits/sec, and integral multiples (px64) 1. Overview of H.261 Decoded Sequence Frame types are CCIR 601 CIF (352x288) and QCIF (176x144) images with 4:2:0 subsampling. Two frame types: Intraframes (I-frames) and Interframes (P-frames) I-frames use basically JPEG P-frames use "pseudo-differences" from previous frame ("predicted"), so frames depend on each other. I-frame provide us with an accessing point. 2. Intra Frame Coding

Macroblocks are 16x16 pixel areas on Y plane of original image. A macroblock usually consists of 4 Y blocks, 1 Cr block, and 1 Cb block. Quantization is by constant value for all DCT coefficients (i.e., no quantization table as in JPEG). 3. Inter-frame (P-frame) Coding An Coding Example (P-frame)

Previous image is called reference image. Image to code is called target image. Actually, the difference is encoded. Subtle points: 1. Need to used decoded image as reference image, not original. Why? 2. We're using "Mean Absolute Difference" (MAD) to decide best block. Can also use "Mean Squared Error" (MSE) = sum(e*e) 4. Details -- How the Macroblock is Coded Many macroblocks will be exact matches (or close enough). So send address of each block in image --> Addr Sometimes no good match can be found, so send INTRA block --> Type Will want to vary the quantization to fine tune compression, so send quantization value --> Quant Motion vector --> vector Some blocks in macroblock will match well, others match poorly. So send bitmask indicating which blocks are present (Coded Block Pattern, or CBP). Send the blocks (4 Y, 1 Cr, 1 Cb) as in JPEG. 5. H.261 Bitstream Structure

Need to delineate boundaries between pictures, so send Picture Start Code --> PSC Need timestamp for picture (used later for audio synchronization), so send Temporal Reference --> TR Is this a P-frame or an I-frame? Send Picture Type --> PType Picture is divided into regions of 11x3 macroblocks called Groups of Blocks --> GOB Might want to skip whole groups, so send Group Number (Grp #) Might want to use one quantization value for whole group, so send Group Quantization Value --> GQuant Overall, bitstream is designed so we can skip data whenever possible while still unambiguous. 6. H.261 Codec

7. Hard Problems in H.261 Motion vector search Propagation of Errors Bit-rate Control 7a. Motion Vector Search

-- pixels in the macro block with upper left corner (x,y) in the Target. -- pixels in the macro block with upper left corner (x+i,y+j) in the Reference. Cost function is: Where MAE stands for Mean Absolute Error. Goal is to find a vector (u, v) such that MAE (u, v) is minimum Full Search Method: 1. Search the whole [-p,p] searching region.

2. Cost is: operations, assuming that each pixel comparison needs 3 operations (Subtraction, Absolute value, Addition). Two-Dimensional Logarithmic Search: Similar to binary search. MAE function is initially computed within a window of [-p/2, p/2] at nine locations as shown in the figure. Repeat until the size of the search region is one pixel wide: 1. Find one of the nine locations that yields the minimum MAE 2. Form a new searching region with half of the previous size and centered at the location found in step 1. Hierarchical Motion Estimation:

1. Form several low resolution version of the target and reference pictures 2. Find the best match motion vector in the lowerest resolution version. 3. Modify the motion vector level by level when going up Performance comparison: -------------------------------------------------------------- --- Search Method Operation for 720x480 at 30 fps p = 15 p=7 -------------------------------------------------------------- --- Full Search 29.89 GOPS 6.99 GOPS Logarithmic 1.02 GOPS 777.60 MOPS Hierarchical 507.38 MOPS 398.52 MOPS -------------------------------------------------------------- --- 7b. Propagation of Errors Send an I-frame every once in a while Make sure you use decoded frame for comparison 7c. Bit-rate Control Simple feedback loop based on "buffer fullness"

If buffer is too full, increase the quantization scale factor to reduce the data. 4.2.3. MPEG 1. What is MPEG? "Motion Picture Expert Group", established circa 1990 to create standard for delivery of audio and video MPEG-1 Target: VHS quality on a CD-ROM (320 x 240 + CD audio @ 1.5 Mbits/sec) Standard had three parts: 1. Video: based on H.261 and JPEG 2. Audio: based on MUSICAM technology 3. System: control interleaving of streams 2. MPEG Video Recall H.261 dependencies: Problem: many macroblocks need information not in the reference frame. Example: MPEG solution: add third frame type: bidirectional frame, or B-frame B-frames search for macroblock in past and future frames. Typical pattern is IBBPBBPBB IBBPBBPBB IBBPBBPBB

Actual pattern is up to encoder, and need not be regular. 3. Differences from H.261 Larger gaps between I and P frames, so expand motion vector search range. To get better encoding, allow motion vectors to be specified to fraction of a pixel (1/2 pixels). Bitstream syntax must allow random access, forward/backward play, etc. Added notion of slice for synchronization after loss/corrupt data. Example: picture with 7 slices: B frame macroblocks can specify two motion vectors (one to past and one to future), indicating result is to be averaged.

Compression performance of MPEG 1 ------------------------------ Type Size Compression ------------------------------ I 18 KB 7:1 P 6 KB 20:1 B 2.5 KB 50:1 Avg 4.8 KB 27:1 ------------------------------ 4. MPEG Video Bitstream Public domain tool mpeg_stat and mpeg_bits will analyze a bitstream.

Sequence Information 1. Video Params include width, height, aspect ratio of pixels, picture rate. 2. Bitstream Params are bit rate, buffer size, and constrained parameters flag (means bitstream can be decoded by most hardware) 3. Two types of QTs: one for intra-coded blocks (I-frames) and one for inter-coded blocks (P-frames). Group of Pictures (GOP) information 1. Time code: bit field with SMPTE time code (hours, minutes, seconds, frame). 2. GOP Params are bits describing structure of GOP. Is GOP closed? Does it have a dangling pointer broken? Picture Information 1. Type: I, P, or B-frame? 2. Buffer Params indicate how full decoder's buffer should be before starting decode. 3. Encode Params indicate whether half pixel motion vectors are used. Slice information 1. Vert Pos: what line does this slice start on? 2. QScale: How is the quantization table scaled in this slice? Macroblock information 1. Addr Incr: number of MBs to skip. 2. Type: Does this MB use a motion vector? What type? 3. QScale: How is the quantization table scaled in this MB? 4. Coded Block Pattern (CBP): bitmap indicating which blocks are coded.

5. Decoding MPEG Video in Software Software Decoder goals: portable, multiple display types Breakdown of time ------------------------- Function % Time Parsing Bitstream 17.4% IDCT 14.2% Reconstruction 31.5% Dithering 24.5% Misc. Arith. 9.9% Other 2.7% ------------------------- 6. MPEG-2, MPEG-3, and MPEG-4 MPEG-2 target applications ---------------------------------------------------------------- ---- Level size Pixels/sec bit-rate Application (Mbits) ---------------------------------------------------------------- ---- Low 352 x 240 3 M 4 consumer tape equiv. Main 720 x 480 10 M 15 studio TV High 1440 1440 x 1152 47 M 60 consumer HDTV High 1920 x 1080 63 M 80 film production ---------------------------------------------------------------- ---- Differences from MPEG-1 1. Search on fields, not just frames. 2. 4:2:2 and 4:4:4 macroblocks 3. Frame sizes as large as 16383 x 16383 4. Scalable modes: Temporal, Progressive,... 5. Non-linear macroblock quantization factor 6. A bunch of minor fixes (see MPEG FAQ for more details) MPEG-3: Originally for HDTV (1920 x 1080), got folded into MPEG-2 MPEG-4: Very little published information. Originally targeted at very low bitrate communication (4.8 to 64 kb/sec). Now addressing video processing... Further Exploration MPEG Resources on the Web. Last Updated: 6/26/96

Top Chap 4 CMPT 365 Home Page CS Accueil > Tech. infos > Compression rates Compression rates In order to ensure a better adjustment between the specifications of the compression and the different needs, a selection of compression sets are provided. A compression set always contains 10 pre-settings that can be selected by the final user in the compression mo The six compression sets provided answer more precisely to the different specific operating situations. They can easily be adapted to new operating situations. Three sets of compression, that could be defined by the user, have been provided if required. GENERAL TV - 25 (or 30) frames / second Quality level 0 1 2 3 4 5 6 7 8 9 Transmission speed (Kbs) 300 600 900 1200 1500 1800 2100 2400 2700 3000 Screen size CIF CIF CIF CIF CIF CIF FS FS FS FS Compression format X times the duration (64Kbs) MP4 MP4 MP4 MP4 MP4 MP4 MP4 MP4 MP4 MP4 5 10 15 20 25 30 35 40 45 50

EVENT - 25 (or 30) frames / second Quality level 0 1 2 3 4 5 6 7 8 9 Transmission speed (Kbs) 300 600 300 600 900 1200 1500 1800 2100 2400 Screen size QCIF QCIF CIF CIF CIF CIF CIF CIF FS FS WEB - 15 frames / second Quality level 0 1 2 3 4 5 6 7 8 9 Transmission speed (Kbs) 60 180 300 600 300 600 900 1200 1500 1800 MPEG4 TV - 25 (or 30) frames / second Quality level 0 1 2 3 4 5 6 7 8 9

Screen size CIF CIF CIF FS FS FS FS FS FS FS Compression format X times the duration (64Kbs) MP4 MP4 MP4 MP4 MP4 MP4 MP4 MP4 MP4 MP4 5 10 30 40 50 60 70 80 90 100 REPORT TV - 25 (or 30) frames / second Quality level 0 1 2 3 4 5 6 7 8 9 Transmission speed (Kbs) 300 900 1200 1500 1800 2100 2100 2400 2700 3000 Screen size CIF CIF CIF CIF CIF CIF FS FS FS FS Compression format X times the duration (64Kbs) MP4 MP4 MP4 MP4 MP4 MP4 MP4 MP4 MP4 MP4 5 15 20 25 30 35 35 40 45 50 HD TV MP4-25 (or 30) frames / secondes Quality level 0 1 2 3 4 5 6 7 8 9 Transmission speed (Kbs) 3000 3600 4200 4800 5400 6000 7200 7800 8400 9000 Screen size CIF CIF CIF CIF CIF CIF FS FS FS FS Compression format X times the duration (64Kbs) MP4 MP4 MP4 MP4 MP4 MP4 MP4 MP4 MP4 MP4 50 60 70 80 90 100 120 130 140 150

For information PAL NTSC QCIF 180*144 160*120 CIF 360*288 320*240 FS 720*576 640*480 DVonSAT est une marque déposée par la société Nocturnes S.A. - 142 rue de Tocqueville 75017 PARIS - Mars 2003.