Emerging Architectures for HD Video Transcoding Leon Adams Worldwide Manager Catalog DSP Marketing Texas Instruments
Overview The Need for Transcoding System Challenges Transcoding Approaches and Issues Optimization Approaches Conclusions
Connected Home Vision BROADBAND Entertainment, E-Business, Services MOBILE MULTIMEDIA Entertainment, Personal Pictures and Video, Services MEDIA Pre-Recorded Content Personal Media Consumers want their devices to work together Consumers want their and share content devices to work together and share content BROADCAST Services, Entertainment
Codec Trends By Application DVD Satellite Application Security/Surveillance Videophone/ Videoconferencing Internet Streaming Digital Terrestrial TV DSL-Based Video on Demand Digital Still Cameras Digital Video Camcorders Current Algorithms Motion JPEG, H.263, MPEG-4 simple profile H.263 and H.261 Windows Media, Real Video, DivX, MPEG-4 MPEG-2 MP@ML MPEG-2 MP@ML, MP@HL MPEG-2 MPEG-1, low-res MPEG-2 (bandwidth limitations) Motion JPEG and MPEG-4 simple profile DV-25, MPEG-2 Future Codec Considerations JPEG2000, H.264 baseline, WMV9 H.264 baseline Frequent updates, PC platform has allowed support for proprietary codecs H.264, VC-1 required for HD-DVD and Blu-Ray DVD Opportunity for adv CODECs in regions without installed base Moving to H.264 high profile to boost HD channel capacity WMV9, H.264 main profile, On2 VP6 H.264 baseline MPEG-4, H.264 Cellular Media MPEG-4 simple profile Transcoding: Conversions between codec formats, bit rates and resolutions Real Video, H.264 baseline, AVS-M
Comparison of Codecs Features H.261 MPEG-1 MPEG-2 H.263 MPEG-4 H.264 WMV/VC-1 AVS Picture coding type I, P I, P, B I, P, B I, P, B I, P, B I, P, B I, P, B I, P, B Entropy Coding VLC VLC VLC VLC, SAC VLC UVLC, CAVLC, CABAC Multiple table VLC Adaptive VLC MV resolution Int. Pel ½pel ½pel ½pel ¼pel ¼pel ¼pel ¼pel Transform 8x8 DCT 8x8 DCT 8x8 DCT 8x8 DCT 8x8 DCT 4x4 & 8x8 integer 8x8, 8x4, 4x8, 4x4 int DCT 8x8 integer Vector Block size 16x16 16x16 16x16, 16x8 16x16, 8x8 16x16, 8x8 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x4 16x16, 8x8 16x16, 16x8, 8x16, 8x8,8x4, 4x8, 4x4 Spatial Intra Prediction No No No No No Yes No Yes Formats supported Prog Prog Prog/Intr Prog Prog/Intr Prog/Intr Prog/Intr Prog/Intr Prediction Modes Frame Frame Field & Frame Frame Field & Frame Field & Frame Field & Frame Field & Frame De-blocking filter In-loop None Post Annex J in-loop Post In-loop In-loop In-loop
TMS320DM6446 Processor Video Encode and Decode Application Processing Features Core ARM926EJ-S (MPU) Core TMS320C64x+ DSP Core Memory On-Chip L1/SRAM: 112KB DSP, 40KB ARM On-Chip L2/SRAM: 64 KB DSP Peripherals Video Encode/Decode H.264 BP D1 encoding, simultaneous H.264 BP CIF coding H.264 MP@L3, 30-fps SD decoding, VC1/WMV9 full D1 SD decoding, MPEG-2 MP@ML SD decoding, MPEG-4 ASP full D1 SD decoding Video Processing Subsystem Front end Resizer, Image processing engine, 16-bit digital input Back end Integrated OSD, four video DACs, 24-bit digital RGB output The Right Peripherals for Your Video, Audio, Storage and Connectivity Needs Package: 361-Pin BGA Benefits The highly integrated DM6446 Digital Video processor enables OEMs and ODMs to quickly bring new products to market at low consumer price points Applications Video conferencing, video phones, video surveillance, digital media adaptors and IP set-top boxes ARM Subsystem ARM 926EJ-S 300 MHz CPU Peripherals EDMA Serial Interfaces Audio I 2 C UART Serial SPI x3 Port Video- Imaging Coprocessor DSP Subsystem C64x+ TM DSP 600 MHz Core Switched Central Resource (SCR) USB 2.0 PHY Connectivity VLYNQ Video Processing Subsystem Front End CCD Controller Video Interface Back End On-Screen Display (OSD) EMAC With MDIO DDR2 Controller (16b/32b) Video Enc (VENC) Program/Data Storage Async EMIF/ NAND/ SmartMedia Preview Histogram/3A Timer x2 Resizer 10b DAC 10b DAC 10b DAC 10b DAC System WD Timer ATA/ Compact Flash PWM x3 MMC/ SD
DM6446 DSP MHz Consumption Video Codec H.263 / MPEG-4 SP H.264 Base Profile H.264 Main Profile WMV9/VC-1 Main Profile Encoder 250 MHz 410 MHz 590 MHz 350 MHz Decoder 100 MHz 300 MHz 450 MHz 260 MHz TI DM6446 platform for D1 30fps (720X480) for YUV 4:2:0 Decoder performance numbers are for typical bitstreams Encoder performance can vary as a result of feature set used Video camcorder quality assumed in examples above
Video HD Versus SD Decode Video Reference Memory Requirements Comparison SD MPEG-4 Single reference frame Minimum Reference Frame Buffer Requirements Memory One 720x480 4:2:0 frame = ~0.5 MByte Memory 18x Increase in memory requirement HD H.264 Reference Index selects from 3 reference frames NOTE: Neither figure includes additional display buffering and other decoder buffers like stream buffer, tables, etc. Three 1920x1080 4:2:0 frames = ~9 MBytes
Real-Time Transcoding Typical STB Application HD MPEG2 2 Hours 16 GB Real-time HD Transcoding STB Application Real-time HD Transcoding HD V: H.264 MP QVGA A: MPEG4 AAC-LC V: WMA9 MP D1 A: WMA MPEG2 HD Storage V: H.264 HP A: AC3 5.1 HD Storage V: MPEG2 A: MPEG2 AAC-LC 5.1 V: H.264 BP VGA A: MPEG4 AAC-HE 2 Hours 8 GB
System Challenges Requires multi-format HD decode and encode capabilities Achieving high quality re-encode on low-cost device Huge I/O bandwidth requirements e.g., H.264 HD decoder by itself requires ~1.4 GBytes/s of I/O Broadcast encoder uses 10s of GBytes/s of I/O for high-quality motion estimation Artifacts in original bitstream can get compounded
HD Encode System Tradeoffs HD Encode Application Key Priorities 2006 Processor Requirements Solution Memory Requirements Video Bitrate Typical Resolution Typical Codec Broadcast High quality for highaction sports 10s of 1 GHz DSPs & FPGAs Multiple GBytes 10-20 mbps 1080i MPEG-2, H.264 High Profile Video- Conferencing Low latency, best resolution for available bandwidth Multiple 720 MHz DSPs 100s of Mbytes >1 mbps 720p30 H.264 Baseline Profile Digital Video Camcorder Lowcomplexity encoder Single-chip 450 MHz Low-power SOC 32-64 MBytes 4-8 mbps 720p30 H.264 Baseline Profile
Brute-Force Transcoding Encoded Bitstream Video Decoder Video Encoder Transcoded Bitstream Pro s Straight forward to implement Con s Lose key information needed to maintain best quality Frame type and mode information High-quality motion vectors created by head-end professional encoder High computational demands Don t leverage available complexity shortcuts I/O bandwidth requirements can be too high for embedded systems
Optimized Transcoding Encoded Bitstream Rate Allocation, Quant Levels Rate Control Video Decoder Resize Σ Forward Transform Forward Quantization Entropy Encoder Resize Motion Vectors Frame Type & MB Modes Coding Control Frame Prediction Video Encoder Σ Inverse Quantization Inverse Transform Transcoded Bitstream Motion Estimation Frame Buffer I/O Full Encode Processing Optimized Transcode Function Memory
Mid-Filter Functions Info (motion type, motion vectors, DCT type, Q scale) Bitstream MPEG-2 Decoder Reconstructed macroblock De-ringing Filter De-blocking Filter H.264 Encoder Best transcoding quality and bitrate requires filtering between decode and re-encode De-ringing reduces mosquito noise in the source De-blocking reduces block edge artifacts
Potential Transcode Solutions Combine Decoder and Encoder Devices at System Level Consumer-class encoders typically don t support broadcast quality Throwing lots of key information away Fixed Combination Transcoder ASIC HD MPEG-2 -> HD H.264 Doesn t support universal multi-format decoder Only supports 1 of the critical emerging transcode requirements Integrate Multi-format Decoder + Encoder Hardware Blocks Fixed rate control, mode decisions, vector scaling, etc Very difficult due to # of transcode scenarios and maturity of R&D on transcode algorithms Multi-format encoders not common on market High Performance Media DSP+Accelerator Combination Rate control, motion estimation control including vector re-use algorithms, & mode decisions in programmable DSP + high-performance accelerators for multi-format decode & encode
Transcode Task Partitioning Decode Control Picture Layer Processing Error Concealment MB data DSP MB info (Mode, MVs, etc) Encode Control ME Decisions Mode Decisions Rate Control HD Decode Acceleration Entropy Decoding IDCT/Inverse Quant Motion Compensation Loop De-blocking HD Encode Acceleration Motion Estimation Intra Prediction DCT/Quant IDCT/Iquant Loop De-blocking Entropy Encoding
STB DVR/DVD Recorder Transcode System Diagram Concept Digital Tuner/ demodulator/ CAS/Demux DVD SD HDD DVD BD Ethernet Main CPU STB / DVD SOC Video out HD /SD DISPLAY Composite S-Video Video Decoder PCI BT656 in Transcoder 32bit Stream I/O MS/SD i/f BT656 out MPEG-2 at 18 mbps requires ~8 GBytes/hour to store, 200 GByte HDD allows 25 hours of recording H.264 at 9 mbps increases recording time to 50 hours for same size HDD HDD DDR2-533
Conclusions HD transcoding presents major challenges for emerging video processing architectures Intelligent transcoding enables best quality within embedded I/O budgets Combining programmable DSP w/ HD video acceleration provides optimized architecture for transcoding
Thanks!
Backup
Transcoding MPEG-2 to MPEG-4 for Wireless Video Wireless device has limited resources Processing power Memory Display capability I I B B P P P B B B B B B P P P P P P P P P P P P I Change GOP structure in MPEG2 to IPPP structure Save memory Reduce decoding complexity Smooth bit rate Frame size down-sampling Large bit-rate reduction Fit the display size of most mobile devices SD 720 x 480 QVGA 320 x 240
Baseball 1: Broadcast Encoder Source 10's of GHz DSPs and FPGAs for encoding Search ranges +/-500 horizontal +/- 250 vertical Motion vector is stable and motion vector refinement is adequate High transcode quality obtained with simple motion vector refinement
Baseball 2: Software Encoder Source Simulation : MV Recovery reduces 0.5Mbps - 9.19Mbps @36.4db Æ 8.66Mbps @ 36.88db - 5.80Mbps @ 34.8db Æ5.34Mbps @ 35.17db Motion vector is random in still area Encoder is not considering Motion vector penalty Even simple MV recovery algorithm yields some benefit Bit rate improvements possible with additional motion vector recovery beyond simple refinement
De-ringing Example De-ringing 30.86 db @ 10.34 mbps 31.85 db @ 10.34 mbps MPEG-2 Decoded Image (magnified) 1 db gain from using the mid-filter in transcoding Filtered Image (magnified)