Compressed-Domain Video Processing and Transcoding Susie Wee, John Apostolopoulos Mobile & Media Systems Lab HP Labs Stanford EE392J Lecture 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice
Compressed Domain Processing and Transcoding Transcoder Processor High-bandwidth LAN Full resolution Decode and Display Capture and Encode Smart Media Router Low-bandwidth wireless link Low resolution Decode and Display Transcoding: Media streaming over packet networks Processing: Media processing in the compressed domain 2 28 February 2007
Problem Statement Goal: Efficiently process MPEG video streams MPEG stream 100110101011100010 Process MPEG stream 010001101001101101 High-quality video processing with low computational complexity and low memory requirements. 3 28 February 2007
Outline Background Manipulating Temporal Dependencies Compressed-Domain Splicing Compressed-Domain Reverse Play MPEG2-to-H.263 Transcoding Overview and Demo 4 28 February 2007
MPEG Structures P B B I B B P B B P B B I GOP Layer Picture Layer I B B P B B P 4 8x8 DCT 1 MV 8x8 DCT Slice Layer Macroblock Layer Block Layer 5 28 February 2007
MPEG Structures P B B I B B P B B P B B I GOPs* Pictures*: Intraframe (I frame), Forward predicted (P frame), Bidirectionally predicted (B frame) Slices*: (16x16n pixel blocks) Macroblocks (16x16 pixel blocks): 1 MV and 4 DCT blocks per MB (plus chrominance) 6 28 February 2007 I frame: Intra MBs P frame: Intra or Forward MBs B frame: Intra, Forward, Backward, or Bidirectional MBs Blocks (8x8 pixel blocks): 8x8 DCT (* start codes)
MPEG Coding Order Display order I 0 B 1 B 2 P 3 B 4 B 5 P 6 B 7 B 8 I 9 Coding order I 0 P 3 B 1 B 2 P 6 B 4 B 5 I 9 B 7 B 8 Preceding and following I or P frames (anchors) are used to predict each B frame. Anchor frames must be decoded before B data. (Note: In coding order, all arrows point forward.) 7 28 February 2007
MPEG Bitstream I B B P B B P B B I GOP header Picture header Picture data 10110010 Pictures in coding order Start codes (seq, GOP, pic, and slice start codes; seq end code) Unique byte-aligned 32-bit patterns 0x000001nm 23 zeros, 1 one, 1-byte identifier Enables random access 8 28 February 2007
Video Compression Encode 100110101011100010 Raw video 720 x 1280 pixels, 60 fps ~ 1 Gbps 480 x 720 pixels, 30 fps ~ 250 Mbps HDTV quality Broadcast quality MPEG stream 20 Gbytes / 2 hour movie ~ 20 Mbps 5 Gbytes / 2 hour movie ~ 5 Mbps 9 28 February 2007
MPEG Video Compression Encode 100110101011100010 Raw video 1 Gbps 20,000 MOPS MPEG stream 20 Mbps 100110101011100010 MPEG stream 20 Mbps Decode 2,000 MOPS Raw video 1 Gbps 10 28 February 2007
Problem statement How do we process compressed video streams? 2,000 MOPS 20,000 MOPS Decode Process Encode 20 Mbps 1 Gbps 1 Gbps 20 Mbps Difficulties: Computational requirements: 22,000 MOP overhead from decode and encode operations (plus additional processing) Bandwidth requirements: Need to process uncompressed data at 1 Gbps Quality issues: Even without processing, the decode/encode cycle causes quality degradation. 11 28 February 2007
MPEG CDP How do we process compressed video streams? Use efficient compressed-domain processing (CDP) algorithms. Naive Solution: 2,000 MOPS 20,000 MOPS Decode Process Encode 20 Mbps 1 Gbps 1 Gbps 20 Mbps Ideal Solution: MPEG stream 100110101011100010 Transcode MPEG stream 010001101001101101 12 28 February 2007
SPATIAL-DOMAIN Decode Encode PROCESSING 2,000 MOPS > 20,000 MOPS Conventional Approach Transcoder Processor Smart Media Server Transcoder Processor Compressed-Domain Processing Huffman Decode MV+DCT-DOMAIN PROCESSING Huffman Encode 200 MOPS 200 MOPS BITSTREAM PROCESSING OUR APPROACH Develop algorithms to perform equivalent spatial-domain video processing tasks using fast algorithms that operate directly on the compressed-domain data. 13 28 February 2007
Frame-Level Video Processing Splicing Reverse Play 14 28 February 2007
Difficulties and Solutions High compression causes temporal dependencies Frame conversion Coding order vs. display order Data reordering Rate control / Buffer limitations Requantization, Frame conversion MPEG header information (e.g. time stamps, vbv_delay) Rewrite header bits 15 28 February 2007
Outline Background Manipulating Temporal Dependencies Compressed-Domain Splicing Compressed-Domain Reverse Play MPEG2-to-H.263 Transcoding Overview and Demo 16 28 February 2007
Manipulating Temporal Dependencies in Compressed Video Video compression uses temporal prediction Prediction dependencies in coded data Many applications require changes in the dependencies Manipulating (modifying) temporal dependencies in the compressed video [Wee] Adding Removing Changing 17 28 February 2007
Temporal Prediction Original Video Frames: Coded Video Frames: F = ˆ = F { F,F,..., } 1 2 F n 1 2 Fˆ n { Fˆ,Fˆ,..., } Each frame F i is coded with two components 1. Prediction Anchor frames Side information 2. Residual Coded Residual Reconstructed Frame P R Rˆ ( Aˆ, ) ˆ { Fˆ,Fˆ,..., } i i S i A i S i i i = F i i - P i 1 2 Fˆ i-1 ( Aˆ, S ) i = Pi i i ( Aˆ,S ) i Rˆ i Fˆ + 18 28 February 2007
Prediction Dependencies Each coded frame is dependent upon its anchor frames. Fˆ ( ) i = Pi Aˆ i,si + Rˆ i Prediction Depth Level 0 F 1 Level 1 Level 2 F 2 F 3 F 4 F5 F6 F 7 Level 3 F10 Level 4 F 8 F 9 19 28 February 2007
Manipulating Dependencies Fewer levels of dependence Level 0 F 1 F2 F3 F 4 F5 F6 F 7 F 10 Level 1 F 8 F 9 Remove dependence between frames Level 0 F 1 F 7 Level 1 F 4 F 6 Level 2 F 2 F 3 F 5 F 8 F 9 F10 20 28 February 2007
Problem Formulation Compression algorithm has a set of prediction rules. Choose prediction modes during encoding. ˆ i,s ' A i Aˆ ' i,s i Fˆ P Aˆ ' ' ',S + Rˆ ˆ ' Fˆ = P,S + Rˆ i = i i ( ) i i i i ( ) ' A i i i Each choice yields: different compressed representation of frame F i, different distribution between P & R components, different set of prediction dependencies. 21 28 February 2007
Manipulating Dependencies Given a compressed representation with dependencies Aˆ i,s i how do we create a new compressed representation with dependencies ˆ ' '? A i Reconstruct frames,s i = Pi i ( Aˆ,S ) i Rˆ i Fˆ + Compressed-domain approximation ' ' ˆ ' Fˆ ' i F i Ri = Fi Pi A i, Si Calculate new prediction and residual ' ˆ ' ˆ ' R Rˆ + P A,S P, S i i i i ( ) ( ) ( ) i i i A i ' i 22 28 February 2007
Frame Conversions: Remove Dependencies Original P B B I B B P P to I P B B I B B I B to B for B to B back P B f B f I B B P P B b B b I B B P 23 28 February 2007
Frame Conversion: Add Dependencies Original P B B I B B P I to P P B B P B B P 1. Find and code new MVs. MV Resampling 2. Calculate new prediction. 3. Calculate and code new residual. 24 28 February 2007
Compressed-Domain Processing MPEG standard addresses prediction rules, buffer requirements, coding order, and bitstream syntax. 001011011010011001 CDP 110001011010110100 MPEG CDP steps Determine and perform appropriate frame conversions (i.e. manipulate dependencies). Reorder data. Perform rate matching. Update header information. 25 28 February 2007
Outline Background Manipulating Temporal Dependencies Compressed-Domain Splicing Compressed-Domain Reverse Play MPEG2-to-H.263 Transcoding Overview and Demo 26 28 February 2007
Compressed-Domain Splicing Cut & Paste Decode Decode Encode Transcode 100110101011100010 001011011010011001 110001011010110100 Application: Ad Insertion for DTV 27 28 February 2007
Splicing: SMPTE standard SMPTE formed a committee on splicing. Disadvantages: User must predefined splice points during encoding Complicated encoders. Splice points can only occur on I frames Not frame accurate. Advantages: Simple cut-and-paste operation. 28 28 February 2007
The TV Newsroom IEEE Spectrum 29 28 February 2007
Compressed-Domain Splicing Cut & Paste 2,000 MOPS 2,000 MOPS 20,000 MOPS Decode Decode Encode Transcode 100110101011100010 001011011010011001 110001011010110100 < 2,000 MOPS for one splice point every sec 30 28 February 2007
Splicing: Conventional approach 00101101101001 00101101101001 20 Mbps 20 Mbps Decode 2,000 MOPS Decode 1 Gbps 1 Gbps Cut & Paste 1 Gbps Encode 20,000 MOPS 20 Mbps 11000101101011 2,000 MOPS > 24,000 MOPS for one splice point every sec 00101101101001 10010110011001 Splice 11000101101011 31 28 February 2007
Compressed-Domain Processing Can we do better by exploiting properties of 1) the MPEG compression algorithm and 2) the splicing operation? 32 28 February 2007
Splicing: Our approach 00101101101001 10010110011001 Process Head Process Tail Match & Merge 11000101101011 < 2,000 MOPS for one splice point every sec 00101101101001 10010110011001 CD Splice 11000101101011 33 28 February 2007
Splicing Algorithm (simplified overview) Only process the GOPS affected by the splice. Step 1: Process the head stream. Remove (backward) dependencies on dropped frames. Step 2: Process the tail stream. Remove (forward) dependencies on dropped frames. Step 3: Match & Merge the head and tail data. Perform rate matching. Reorder data appropriately. Update header information. 34 28 February 2007
Frame Conversion: Remove Dependencies Original P B B I B B P P to I P B B I B B I B to B for P B f B f I B f B f P B to B back P B b B b I B b B b P 1. Eliminate temporal dependencies. 2. Calculate new prediction (DCT-domain). 3. Calculate and code new residual. 35 28 February 2007
Splicing Head Input Stream Hn-2 Hn-1 Hn GOP with splice-out frame X X Tail Input Stream X GOP with splice-in frame Spliced Output Stream T0 T1 T2 T3 Hn-2 Hn-1 F(Hn,T0) T1 T2 36 28 February 2007
Results Splicing between two sequences with one splice every 20 frames 37 28 February 2007
Splicing: Remarks Proposed Algorithm Frame-accurate splice points Only uses MPEG stream (Encoder is not affected.) Frame conversions in MV+sparse DCT domain. Rate control by requantization and frame conversion. (Do not insert synthetic frames.) Quality only affected near splice points. Computational scalability: Video quality can be improved if extra computing power is available. 38 28 February 2007
Outline Background Manipulating Temporal Dependencies Compressed-Domain Splicing Compressed-Domain Reverse Play MPEG2-to-H.263 Transcoding Overview and Demo 39 28 February 2007
Reverse-Play Transcoding Store & Reorder Decode Encode Transcode 001011011010011001 110001011010110100 Develop computation- and memory-efficient transcoding algorithms for reverse play of a given forward-play MPEG video stream. 40 28 February 2007
Reverse-Play Architecture #1 #1 I, P, B GOP 15 Decode (15) Reorder (15) Frame Buffer Encode (15) Motion Estimation I, P, B High memory requirements Frame buffer must store 15 frames High computational requirements Motion estimation dominates computational needs 41 28 February 2007
Compressed-Domain Processing Can we do better by exploiting properties of 1) the MPEG compression algorithm and 2) the reverse-play operation? 42 28 February 2007
B-Frame Symmetry: Swap MVs Forward Play 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 MV1 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 MV2 3 1 2 6 5 10 9 14 13 7 11 15 4 8 12 16 Anchor Frame 1 B Frame Anchor Frame 2 Reverse Play 4 3 2 1 2 1 7 8 MV2 6 6 5 5 11 12 10 10 9 16 9 15 14 14 13 13 3 7 11 15 4 8 12 16 MV1 1 5 9 13 2 6 10 14 3 7 11 15 4 8 12 16 43 28 February 2007
Reverse-Play Architecture #2 #2 I, P Decode (5) Reorder (5) Frame Buffer Encode (5) Motion Estimation I, P B Exploit symmetry of B frames. Huffman Decode Swap MVs Reduced memory requirements Frame buffer must store 5 frames Reduced computational requirements ME still dominates computational needs Huffman Encode B 44 28 February 2007
Reverse-Play Architecture #3 #3 Reorder (5) I, P Decode (5) Frame Buffer Encode (5) I, P Reverse MVs B Exploit forward MVF in coded data. Huffman Decode Swap MVs Huffman Encode B Reduced memory requirements Reduced computational requirements 45 28 February 2007
Search Area: No correspondence Forward search area Time T 0 Time T 1 Reverse search area Time T 0 Time T 1 Interpret MVs as specifying a match between blocks. Develop motion vector resampling methods. 46 28 February 2007
In-place Reversal Method Forward MV Reverse MV In-place reversal 47 28 February 2007
Maximum Overlap Method MV(-1,-1) W(-1,-1) MV(0,-1) W(0,-1) MV(1,-1) W(1,-1) W(-1,-1) MV(-1,0) W(-1,0) MV(0,0) W(0,0) MV(1,0) W(1,0) W(0,0) W(1,0) MV(-1,1) W(-1,1) MV(0,1) W(0,1) MV(1,1) W(1,1) Local neighborhood Overlap weights Other MV resampling methods exist. 48 28 February 2007
Computational Requirements #1 I, P, B GOP 15 Decode (15) Reorder (15) Frame Buffer Encode (15) Motion Estimation I, P, B #1: 4200 Mcycles Log Search ME #2 I, P Decode (5) Reorder (5) Frame Buffer Encode (5) Motion Estimation I, P #2: 1030 Mcycles Log Search ME B Huffman Decode Swap MVs Huffman Encode B #3 I, P B Decode (5) Huffman Decode Reorder (5) Frame Buffer Reverse MVs Swap MVs Encode (5) Huffman Encode I, P B #3a: 420 Mcycles Maximum Overlap #3b: 330 Mcycles In-place Reverse 49 28 February 2007
Experimental Results 0.2 db 0.65 db Bus Carousel Football Girl 50 28 February 2007
Observations Girl sequence showed largest PSNR loss (.65 db) due to high detail and texture in source. MV accuracy is important! Bus sequence benefits from maximum overlap because of large motions in source. Carousel has little performance loss because block MC does not match motion in source. Football has little performance loss due to blurred source. When MV accuracy is not important, fast approximate methods do not siginificantly sacrifice quality. 51 28 February 2007
Reverse Play Summary Developed several compressed-domain reverse-play transcoding algorithms. CDP: Exploit properties of compression algorithm and reverse-play operation Exploit symmetry of B frames. Exploit MVF information given in forward stream. Order of magnitude reduction in computational requirements. Worst case 0.65 db loss in PSNR quality over baseline transcoding. 52 28 February 2007
Outline Background Manipulating Temporal Dependencies Compressed-Domain Splicing Compressed-Domain Reverse Play MPEG2-to-H.263 Transcoding Overview and Demo 53 28 February 2007
Motivation Video communication requires the seamless delivery of video content to users with a broad range of bandwidth and resource constraints. However, the source signal, communication channel, and client device may be incompatible. Therefore, efficient transcoding algorithms must be designed 001011011010011001 Transcoder 110001011010110100 Standard-compliant input bitstream Standard-compliant output bitstream 54 28 February 2007
MPEG2-to-H.263 Transcoding Media Server DTV or DVD content Media Server Transcoder Low-bandwidth wireless link MPEG-to-H.263 Transcoder Transcode MPEG video streams to lower-rate H.263 video streams. Interlace-to-progressive conversion. Order of magnitude reduction in computational requirements. Stream DVD movies to portable multimedia devices. Stream DTV program material over the internet. 55 28 February 2007
Problem Statement Develop a fast transcoding algorithm that adapts the bitrate, frame rate, spatial resolution, scanning format, and/or coding standard while preserving video quality MPEG-2 input High bitrates DVD, Digital TV >1.5 Mbps, 30 fps Interlaced and progressive Important features: Interlaced-to-progressive transcoder. H.263 output Low bitrates Wireless, internet <500 kbps, 10fps Progressive Inter-standard transcoder, e.g. MPEG-2 to H.263 (or MPEG-4) Contributors: Wee, Apostolopoulos, Feamster (MIT) 56 28 February 2007
Difficulties High cost of conventional transcoding MPEG-2 bitstream H.263 bitstream 001011011010011001 110001011010110100 MPEG-2 Decode Process Frames H.263 Encode Differences in video compression standards Interlaced vs. progressive formats 57 28 February 2007
Issue: Standard Differences Video Formats MPEG-2 H.263 Progressive and Progressive Only Interlaced I frames More (enable random access) Fewer (compression) Frame Coding Types Prediction Modes I, P, B frames I, P, Optional PB frames Field, Frame, 16x8 Frame Only Motion Vectors Inside Picture Only Can point outside picture 58 28 February 2007
Development of Proposed Approach Conventional MPEG Decode Temporal Processing Temporal Processing MPEG Decode Spatial Processing Improved Approach: Exploit bitstream syntax! Spatial Processing Proposed Approach: Also exploit input coded data! H.263 Encode Motion Estimation H.263 Encode Motion Estimation Temporal Processing MPEG Decode Spatial Processing H.263 Encode 59 28 February 2007 Calculate MVs
Block Diagram Drop B frames MPEG IP Decoder Spatial Downsample H.263 IP Encoder MPEG MVs & Coding Modes Estimate MV Refine Search H.263 MVs & Coding Modes 60 28 February 2007
Issue: Match Prediction Modes Match MPEG-2 Fields to H.263 Frames MPEG-2 Fields I(0,t) B(1,t) B(2,t) P(3,t) B(4,t) B(5,t) I(6,t) B(7,t) P(0,b) B(1,b) B(2,b) P(3,b) B(4,b) B(5,b) P(6,b) B(7,b) i(0) p(1) p(2) H.263 Frames Vertical downsampling => discard bottom field Horizontal downsampling => downsample top field Temporal downsampling => drop B frames Match MPEG-2 IPPPPIPPPP to H.263 IPPPPPPPPP Problems Convert MPEG-2 P fields to H.263 P frames Convert MPEG-2 I fields to H.263 P frames 61 28 February 2007
Outline Background Manipulating Temporal Dependencies Compressed-Domain Splicing Compressed-Domain Reverse Play MPEG2-to-H.263 Transcoding Overview and Demo 62 28 February 2007
Compressed-Domain Splicing Cut&Paste 2,000 2,000 20,000 Decode Decode Encode MOPS MOPS MOPS Transcode 1001101010111 0010110110100 1100010110101 < 2,000 MOPS for one splice point every sec Enables Ad Insertion for DTV SMPTE splicing solution User must define splice points during encoding Specialized complex encoders. Restricted splice points HP splicing solution Works with any MPEG stream. Frame-accurate splice points 63 28 February 2007
Reverse-Play Transcoding Decode 2,000 MOPS Store & Reorder Transcode Encode 00101101101001 11000101101011 < 1,000 MOPS 20,000 MOPS VCR functionality for DTV and Set Tops HP reverse-play solution Frame-by-frame reverse play. Works with any MPEG stream. Order of magnitude reduction in computational requirements. 64 28 February 2007
MPEG2-to-H.263 Transcoding Media Server DTV or DVD content Media Server Transcoder Low-bandwidth wireless link MPEG-to-H.263 Transcoder Transcode MPEG video streams to lower-rate H.263 video streams. Interlace-to-progressive conversion. Order of magnitude reduction in computational requirements. Stream DVD movies to portable multimedia devices. Stream DTV program material over the internet. 65 28 February 2007
References References: Manipulating Temporal Dependencies in Compressed Video Data with Applications to Compressed-Domain Processing of MPEG Video, S. Wee, ICASSP 1999. Splicing MPEG Video Streams in the Compressed Domain, S. Wee, B. Vasudev, IEEE Workshop on Multimedia Signal Processing, 1997. Compressed-Domain Reverse Play of MPEG Video Streams, S. Wee, B. Vasudev, SPIE Inter. Sym. On Voice, Video, and Data Communications, 1998. Reversing Motion Vector Fields, S.J. Wee, IEEE International Conference on Image Processing, October 1998. Field-to-frame Transcoding with Spatial and Temporal Downsampling, S. Wee, J. Apostolopoulos, N. Feamster, ICIP 1999. Compressed-Domain Video Processing, S. Wee, B. Shen, J. Apostolopoulos, HP Labs Technical Report, 2002, CRC Handbook of Video Databases, 2004. http://www.hpl.hp.com/techreports/2002/hpl-2002-282.html 66 28 February 2007