Video Encoding with. Multicore Processors. March 29, 2007 REAL TIME HD

Similar documents
Chapter 11.3 MPEG-2. MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications:

Scalable Multi-DM642-based MPEG-2 to H.264 Transcoder. Arvind Raman, Sriram Sethuraman Ittiam Systems (Pvt.) Ltd. Bangalore, India

EE 5359 H.264 to VC 1 Transcoding

Advanced Video Coding: The new H.264 video compression standard

Lecture 13 Video Coding H.264 / MPEG4 AVC

THE H.264 ADVANCED VIDEO COMPRESSION STANDARD

H.264/AVC und MPEG-4 SVC - die nächsten Generationen der Videokompression

H.264 / AVC (Advanced Video Coding)

Parallel Scalability of Video Decoders

Laboratoire d'informatique, de Robotique et de Microélectronique de Montpellier Montpellier Cedex 5 France

The Scope of Picture and Video Coding Standardization

Video Compression MPEG-4. Market s requirements for Video compression standard

Video coding. Concepts and notations.

Parallel Scalability of Video Decoders

Upcoming Video Standards. Madhukar Budagavi, Ph.D. DSPS R&D Center, Dallas Texas Instruments Inc.

Multimedia Decoder Using the Nios II Processor

MediaKind CE-x Option Module

High Efficiency Video Coding: The Next Gen Codec. Matthew Goldman Senior Vice President TV Compression Technology Ericsson

Fast frame memory access method for H.264/AVC

Emerging H.26L Standard:

Video Coding Standards. Yao Wang Polytechnic University, Brooklyn, NY11201 http: //eeweb.poly.edu/~yao

Ch. 4: Video Compression Multimedia Systems

An Efficient Table Prediction Scheme for CAVLC

EE Low Complexity H.264 encoder for mobile applications

2014 Summer School on MPEG/VCEG Video. Video Coding Concept

VIDEO COMPRESSION STANDARDS

Video Coding Standards

Interframe coding A video scene captured as a sequence of frames can be efficiently coded by estimating and compensating for motion between frames pri

ECE 417 Guest Lecture Video Compression in MPEG-1/2/4. Min-Hsuan Tsai Apr 02, 2013

Digital Video Processing

Mark Kogan CTO Video Delivery Technologies Bluebird TV

Transcoding Using the MFP Card

High Efficiency Video Coding. Li Li 2016/10/18

International Journal of Emerging Technology and Advanced Engineering Website: (ISSN , Volume 2, Issue 4, April 2012)

Overview, implementation and comparison of Audio Video Standard (AVS) China and H.264/MPEG -4 part 10 or Advanced Video Coding Standard

Professor Laurence S. Dooley. School of Computing and Communications Milton Keynes, UK

Advanced Encoding Features of the Sencore TXS Transcoder

Encoding Video for the Highest Quality and Performance

ALMA TECHNOLOGIES VIDEO ENCODING & IMAGE COMPRESSION PRODUCTS CATALOG. Copyright 2012 ALMA TECHNOLOGIES S.A. All rights reserved.

The Basics of Video Compression

High Efficiency Video Decoding on Multicore Processor

Introduction to Video Coding

Introduction of Video Codec

Video Codecs. National Chiao Tung University Chun-Jen Tsai 1/5/2015

HEVC The Next Generation Video Coding. 1 ELEG5502 Video Coding Technology

WHITE PAPER ON2 TECHNOLOGIES, INC. TrueMotion VP7 Video Codec. January 10, 2005 Document Version: 1.0

OVERVIEW OF IEEE 1857 VIDEO CODING STANDARD

In the name of Allah. the compassionate, the merciful

Emerging Architectures for HD Video Transcoding. Leon Adams Worldwide Manager Catalog DSP Marketing Texas Instruments

Optimized architectures of CABAC codec for IA-32-, DSP- and FPGAbased

Emerging Architectures for HD Video Transcoding. Jeremiah Golston CTO, Digital Entertainment Products Texas Instruments

4G WIRELESS VIDEO COMMUNICATIONS

MPEG-4 Part 10 AVC (H.264) Video Encoding

COMPARATIVE ANALYSIS OF DIRAC PRO-VC-2, H.264 AVC AND AVS CHINA-P7

Ittiam Systems (Pvt.) Ltd.,

JPlaylist. Offline Playlist Editing OVERVIEW PRODUCT FEATURES

IMPLEMENTATION AND ANALYSIS OF DIRECTIONAL DISCRETE COSINE TRANSFORM IN H.264 FOR BASELINE PROFILE SHREYANKA SUBBARAYAPPA

10.2 Video Compression with Motion Compensation 10.4 H H.263

Introduction to Video Encoding

Digital video coding systems MPEG-1/2 Video

DSP Solutions For High Quality Video Systems. Todd Hiers Texas Instruments

PREFACE...XIII ACKNOWLEDGEMENTS...XV

Introduction to Video Compression

EFFICIENT DEISGN OF LOW AREA BASED H.264 COMPRESSOR AND DECOMPRESSOR WITH H.264 INTEGER TRANSFORM

Complexity Estimation of the H.264 Coded Video Bitstreams

5LSE0 - Mod 10 Part 1. MPEG Motion Compensation and Video Coding. MPEG Video / Temporal Prediction (1)

H.264 High Profile: Codec for Broadcast & Professional Video Application

LIST OF TABLES. Table 5.1 Specification of mapping of idx to cij for zig-zag scan 46. Table 5.2 Macroblock types 46

Video Quality Analysis for H.264 Based on Human Visual System

Selected coding methods in H.265/HEVC

Week 14. Video Compression. Ref: Fundamentals of Multimedia

Mali GPU acceleration of HEVC and VP9 Decoder

Objective: Introduction: To: Dr. K. R. Rao. From: Kaustubh V. Dhonsale (UTA id: ) Date: 04/24/2012

ISSCC 2006 / SESSION 22 / LOW POWER MULTIMEDIA / 22.1

Building an Area-optimized Multi-format Video Encoder IP. Tomi Jalonen VP Sales

MPEG-2. ISO/IEC (or ITU-T H.262)

COMPLEXITY REDUCTION FOR VP6 TO H.264 TRANSCODER USING MOTION VECTOR REUSE JAY R PADIA. Presented to the Faculty of the Graduate School of

STUDY AND IMPLEMENTATION OF VIDEO COMPRESSION STANDARDS (H.264/AVC, DIRAC)

Introduction to Video Encoding

Lecture 6: Compression II. This Week s Schedule

Video Compression Standards (II) A/Prof. Jian Zhang

H.264 Video Transmission with High Quality and Low Bitrate over Wireless Network

Fast Decision of Block size, Prediction Mode and Intra Block for H.264 Intra Prediction EE Gaurav Hansda

Video Compression An Introduction

CMPT 365 Multimedia Systems. Media Compression - Video Coding Standards

FPGA based High Performance CAVLC Implementation for H.264 Video Coding

Computer and Machine Vision

Mapping the AVS Video Decoder on a Heterogeneous Dual-Core SIMD Processor. NikolaosBellas, IoannisKatsavounidis, Maria Koziri, Dimitris Zacharis

Deblocking Filter Algorithm with Low Complexity for H.264 Video Coding

Audio and video compression

H.264 AVC 4k Decoder V.1.0, 2014

Welcome Back to Fundamentals of Multimedia (MR412) Fall, 2012 Chapter 10 ZHU Yongxin, Winson

1.1 Bits and Bit Patterns. Boolean Operations. Figure 2.1 CPU and main memory connected via a bus. CS11102 Introduction to Computer Science

Architecture Considerations for Multi-Format Programmable Video Processors

About MPEG Compression. More About Long-GOP Video

Recent, Current and Future Developments in Video Coding

Georgios Tziritas Computer Science Department

Parallelism In Video Streaming

H.264 Parallel Optimization on Graphics Processors

Transcoding from H.264/AVC to High Efficiency Video Coding (HEVC)

Transcription:

Video Encoding with Multicore Processors March 29, 2007

Video is Ubiquitous... Demand for Any Content Any Time Any Where Resolution ranges from 128x96 pixels for mobile to 1920x1080 pixels for full HD Frame rates range from 10 to 60 fps The only constant is that raw digital data always outstrips available capacity! FORMAT lpf ppl bpp Mbpf fps Mbps 2 hr video Channel Mbps Storage GB Sub-QCIF 96 128 8 0.0983 10 0.983 885 MB GSM.0014 POTS.0056 Brdbd.3840 NTSC 480 720 16 5.53 30 166 149 GB Satellite 006 Cable 020 DVD 17 ipod 80 HDTV 1080 1920 24 49.87 60 2,986 2.69 TB T3 044.7 OC-3 155.5 HD-DVD 30 Blu-Ray 50 2

High Definition = High Quality 1080 lines per frame 60 frames per second Frame 16 x 9 aspect ratio 1 01010101010101010101010 1920 pixels per line Le Grand Jatte (aka Sunday in the Park ), 1886 George Seurat 120 x 81 inches About 3.5 million dots = 19 dpi view from 18-25 feet (vs. about 2.1 million pixels on HDTV screen) 24 bits per pixel RGB/YUV 3

Video Encoding Standards Defined by the ISO/IEC MPEG Group MPEG2, AVC/H.264 or MPEG4 Part 10 Can achieve up to 100:1 reduction in HD video data 100 88 Relative file sizes 1 hour video 80 GB 60 40 20 0 Intra-picture redundancy 13 Inter-picture redundancy 24 fps Advanced compression technology 3.5 0.86 D1 uncompressed DV MPEG-2 (DVD) H.264/AVC 4

AVC/H.264 Encoder Overview Fn (current) Fn-1 (reference) F'n-1 (reference) Motion Estmtn Motion Cmpstn (P&B) Inter - Dn Trnsfm & Qntz Buffer depth (n) Intra predctn Intra cmpnst Intra (I) switch F'n (reconstrct) Deblck Filter + D'n Inverse Qntz & Trnsfm Reordr & Encode Transmit (NAL) 5

AVC / H.264 Algorithm Sub Blocks Features Motion Estimation Adaptive block sizes 16x16, 8x8, 4x8, 8x4, 4x4 Transform 4x4, 8x8 (High P) Simple Integer Motion Compensation To ¼ pel Deblocking filter In Loop Intra prediction Modes 13 (4x4 9 modes, 16x16 4 modes) Inter prediction modes Numerous choices of block size, number/type of reference frames Entropy Encoding Context-based Adaptive VLC (CAVLC) & Context-based Binary Arithmetic Coding (CABAC) Quantization Finer range of parameters Next Generation Fidelity Range extensions (FRext) 4:2:0 High; 4:2:2 8-10 bit ; 4:4:4 10-12bit Processing Characteristics Greatest processing requirement (~50%) Highly parallelizable Highly parallelizable Highly parallelizable Partly parallelizable Partly parallelizable Partly parallelizable Bit wise operations Partly parallelizable Highly parallelizable Still higher processing reqrmts. 6

H.264 Improves Video Quality and Bit Rate 20Mbps MPEG2 Bit rate 10Mbps H.264 2Mbps 1990 1995 2000 2005 2010 Ideal Application for multiprocessor, multicore solution Keeps up with performance requirements (5-6X MPEG2) Requires programmability to keep pace with algorithm improvements 7

Telairity-1 1 Video Architecture Architecture designed for High Definition video 5 identical loosely-coupled vector/scalar processor cores in single chip Integrated DRAM controller Integrated Video controller Fully programmable 90nm process technology - up to 750 MHz operation (594 MHz today) Processor P0 TVP400 Processor P1 TVP400 Processor P2 TVP400 Processor P3 TVP400 Processor P4 TVP400 Bit Packing Unit Video Controller 20 bit parallel video I/O 5 SPI Channels DMA & SDRAM Controller 4.8 GB/s Sustained chip performance of 49.5 GigaOPS/s (BOPS) 8

TVP400 Core Block Diagram 4 Vector 44 16-bit Functional Units 16-read 8-write 2K Vector Registers 512 VR/Pipe 8-load 4-store 16-bits 16-bits 1 Scalar 6 32-bit Functional Units 8KB Scratch Memory 3-read 1-write 32-bits 32-bits 256B Local Registers 32-bits single issue Instruction Unit 32-bits 32 KB I Cache Other Cores 128 KB Vector Memory DMA 64-bits 32-bits 512 MB SDRAM Controller 4KB Data 64-bits 9

Multi Pipe Vector Instructions 4 independent 16-bit vector pipes 5 instructions per pipe total of 20 in parallel Vector length of 32 From 1 to 32 vector elements processed sequentially Extremely efficient for several video operations Motion estimation / compensation 8 x 8, 4x4 transforms H.264/AVC 4x4 block Intra-Prediction algorithm Uses multi-pipe vector core Calculates eight modes simultaneously Eightfold speed up in intra-prediction 10

Multiprocessor Encoding Engine Partition video data by parallelizing each frame into multiple slices AND Partition algorithm between top and bottom row of processors Slice 0 Slice 1 Slice 1 Slice 2 Slice 2 Slice 3 P0 P1 P2 P3 49.5 BOPS 49.5 BOPS hand off 49.5 BOPS 49.5 BOPS 49.5 BOPS 49.5 BOPS P4 P5 P6 P7 49.5 BOPS 49.5 BOPS data out hand off 49.5 BOPS 49.5 BOPS data out Slice 0 Slice 1 Slice 2 Slice 3 Frame hand off 49.5 BOPS 49.5 BOPS data out hand off data out High bandwidth chip-chip communication to eliminate slice artifacts 396 sustained BOPS 11

Multicore Processing Partition video data further by parallelizing each slice into multiple macroblocks Processed in parallel using 5 TVP400 cores Tasks divided sequentially across two T1 processors Really, very doable: 720p is 80 x 45 = 3,600 macroblocks per frame x 60 fps = 216,000 mb/sec Divided by 4 slices = 54,000 mb/sec x 2 50 BOPS T1 processors P0 P0 P1 P2 P3 P4 DMA & SDRAM Controller P1 P2 P3 P4 DMA & SDRAM Controller Bit Pack Video Cntrl Bit Pack Video Cntrl 49.5 BOPS 49.5 BOPS 12

BE8000 Peak Performance VP0 2 16-bit ops 2 loads 1 store TVP400 Core0 12.474 BOPS 5+5+5+5+1 = 21 ops/clk x 594 MHz = 12.474 BOPS VP1 2 16-bit ops 2 loads 1 store VP2 2 16-bit ops 2 loads 1 store VP3 2 16-bit ops 2 loads 1 store SP 1 32-bit op TVP400 Processor Core: 5 Pipelines TVP400 Core1 12.474 BOPS 5 x 12.474 = 62.37 BOPS TVP400 Core2 12.474 BOPS TVP400 Core3 12.474 BOPS TVP400 Core4 12.474 BOPS I/O Controller Bit Packing Unit Memory Controller TVP2000 Processor: 5 cores TVP2000 P0 62.37 BOPS TVP2000 P1 62.37 BOPS 8 x 62.37 = 498.96 BOPS TVP2000 P2 62.37 BOPS TVP2000 P3 62.37 BOPS TVP2000 P4 62.37 BOPS TVP2000 P5 62.37 BOPS TVP2000 P6 62.37 BOPS BE8000 Encoder Board: 8 processors TVP2000 P7 62.37 BOPS 13

AVClairity Compression Software Dedicated AVC software All written in-house (C and intrinsics) Runs directly on BE8000 hardware (no OS) Flexible, scalable for different resolutions, standards Main Profile, High Profile of AVC standard Level 4.0 4:2:0, 8-bit compression Video formats: 720 p 50/59.94/60 and 1080 i 25/29.97/30 Constant Bit Rate (CBR) or Variable Bit Rate (VBR) Bit rate: 2-20 Mbps Scene change detection Context Adaptive Entropy Coding: CABAC or CAVLC Spatial and motion compensated temporal filtering 14

Software Issues and Challenges Load balancing between processors One slice/frame may contain extremely dense data relative to other slices/frames Managing idle time waiting for processors to finish Synchronization of data exchange between processors Minimization of latency Efficient and high bandwidth communication between processors and cores >300 MB/sec path between processors Eliminates slice boundary artifacts 5 cores share Vector memory and DDR2 DRAM memory per processor Minimizes inter-core communication requirements 15

Processor P0 TVP400 Processor P1 TVP400 Processor Processor P2 P3 TVP400 TVP400 DMA & SDRAM Controller Processor P4 TVP400 Bit Packing Unit Video Controller 16

BE8000 Video Encoding Platform HD AVC Video Encoding AAC Audio Encoding Highest performance; multiprocessor, multicore solution Runs 40 concurrent threads Each thread has 5 execution units available to it Achieves low latency and high video quality Flexible and software upgradeable for video and bit-rate improvements 17