The Essential Guide to Video Processing

The Essential Guide to Video Processing Second Edition EDITOR Al Bovik Department of Electrical and Computer Engineering The University of Texas at Austin Austin, Texas AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO HI ллг,vi ГЛгч Academic Press is an imprint of Elsevier

Preface About the Author xvii xix CHAPTER 1 Introduction to Digital Video Processing 1 1.1 Sampled Video 3 1.2 Video Transmission 6 1.3 Objectives of this Guide 7 1.4 Organization of the Guide 8 CHAPTER 2 Video Sampling and Interpolation и 2.1 Introduction 11 2.2 Spatiotemporal Sampling Structures 12 2.3 Sampling and Reconstruction of Continuous Time-Varying Imagery 16 2.4 Sampling Structure Conversion 20 2.4.1 Frame-rate Conversion 22 2.4.2 Spatiotemporal Sampling Structure Conversion 26 2.5 Conclusion 28 References 28 Further Information 29 CHAPTER 3 Motion Detection and Estimation 31 3.1 Introduction 31 3.2 Notation and Preliminaries 32 3.2.1 Binary Hypothesis Testing 32 3.2.2 Markov Random Fields 33 3.2.3 MAP Estimation 34 3.2.4 Variational Formulations 34 3.3 Motion Detection 35 3.3.1 Hypothesis Testing with Fixed Threshold 36 3.3.2 Hypothesis Testing with Adaptive Threshold 38 3.3.3 MAP MRF Formulation 41 3.3.4 MAP Variational Formulation 42 3.3.5 Experimental Comparison of Motion Detection Methods 43 3.4 Motion Estimation 44 3.4.1 Motion Models 45 3.4.2 Estimation Criteria 51 3.4.3 Search Strategies 54 V

vi Contents 3.5 Practical Motion Estimation Algorithms 56 3.5.1 Global Motion Estimation 56 3.5.2 Block Matching 59 3.5.3 Phase Correlation 61 3.5.4 Optical Flow via Regularization 62 3.5.5 MAP Estimation of Dense Motion 63 3.5.6 Experimental Comparison of Motion Estimation Methods 64 3.6 Perspectives 65 3.7 Acknowledgments 66 References 66 CHAPTER 4 Video Enhancement and Restoration 69 4.1 Introduction, 69 4.2 Spatiotemporal Noise Filtering 72 4.2.1 Linear Filters 72 4.2.2 Order-Statistic Filters 76 4.2.3 Multiresolution Filters 79 4.3 Coding Artifact Reduction 82 4.3.1 Artifact Reduction in the Spatial Domain 83 4.3.2 Artifact Reduction in the Frequency Domain 83 4.4 Blotch Detection and Removal 84 4.4.1 Blotch Detection 85 4.4.2 Motion Vector Repair and Interpolating Corrupted Intensities 88 4.4.3 Video Inpainting 91 4.4.4 Restoration in Conditions of Difficult Object Motion 92 4.5 Vinegar Syndrome Removal 94 4.6 Intensity Flicker Correction 97 4.6.1 Flicker Parameter Estimation 98 4.6.2 Estimation on Sequences with Motion 99 4.7 Kinescope Moire Removal 101 4.8 Scratch Removal 103 4.9 Conclusions 104 Acknowledgements 105 References 105 CHAPTER 5 Video Stabilization and Mosaicing 109 5.1 Introduction 109 5.1.1 Video Stabilization 110 5.1.2 Outline 110

vii 5.2 Biological Motivation: Insect Navigation Ill 5.2.1 Centering Behavior and Collision Avoidance Ill 5.2.2 Control of Flight Speed and Stabilization 112 5.2.3 Measuring Distance by Integrating Optical Flow 112 5.3 Camera Model and Image Motion Model 113 5.3.1 Camera Model 113 5.3.2 Effect of Camera Motion 114 5.3.3 Image Features 115 5.3.4 Structure from Motion 116 5.3.5 Feature based Algorithms 117 5.4 Flow-Based Approaches 118 5.4.1 Global Flow Models 118 5.4.2 Flow-Based Algorithm 119 5.5 Stabilization and Mosaicing 122 5.5.1 Video Mosaicing 124 5.6 Stabilization and Mosaicing with Additional Information 127 5.6.1 VIVID Metadata 127 5.6.2 Stabilization with Metadata 128 5.6.3 Inertial Measurements 130 5.6.4 Stabilization with Inertial Measurements 130 5.6.5 Motion Segmentation 134 5.7 Motion Super-resolution 134 5.8 Three-dimensional Stabilization 137 5.9 Summary 137 Acknowledgements 138 References 138 CHAPTER 6 Video Segmentation 141 6.1 Introduction 141 6.2 Scene Change Detection 142 6.3 Spatiotemporal Change Detection 144 6.3.1 Spatial Change Detection Using Two Frames 144 6.3.2 Temporal Integration 145 6.3.3 Combination with Spatial Segmentation 146 6.4 Motion Segmentation 146 6.4.1 Dominant Motion Segmentation 147 6.4.2 Multiple Motion Segmentation 150 6.5 Simultaneous Motion Estimation and Segmentation 160 6.5.1 Motion-Field Model and MAP Framework 161 6.5.2 Two-Step Iteration Algorithm 162 6.6 Semantic Video Object Segmentation 164 6.6.1 Chroma-Keying 164 6.6.2 Semiautomatic Segmentation 164

viii Contents 6.7 Examples 165 6.8 Performance Evaluation of Video Segmentation 170 Acknowledgements 170 References 170 CHAPTER 7 Motion Tracking in Video 175 7.1 Introduction 175 7.2 Rigid Object Tracking 180 7.2.1 2D Rigid Object Tracking 180 7.2.2 3D Rigid Object Tracking 205 7.3 Articulated Object Tracking 209 7.3.1 3D Articulated Object Tracking 209 7.3.2 2D Articulated Object Tracking 219 References 220 CHAPTER 8 Basic Transform Video Coding 231 8.1 Introduction to Video Compression 232 8.2 Video Compression Application Requirements 237 8.3 Digital Video Signals and Formats 241 8.3.1 Sampling of Analog Video Signals 241 8.3.2 Digital Video Formats 243 8.4 Video Compression Techniques 245 8.4.1 Entropy and Predictive Coding 246 8.4.2 Block Transform Coding The DCT 248 8.4.3 Quantization 250 8.4.4 MC and Estimation 253 8.5 Transform Coding: Introduction to the Video Encoding Standards 256 8.5.1 Transform Coding Standard Example: The H.261 Video Encoder 258 8.6 Closing Remarks 265 References 265 CHAPTER 9 MPEG-1 and MPEG-2 Video Standards 267 9.1 MPEG-1 Video Coding Standard 267 9.1.1 Introduction 267 9.1.2 MPEG-1 Video Coding versus H.261 268 9.1.3 MPEG-1 Video Structure 270 9.1.4 Summary of the Major Differences between MPEG-1 Video and H.261 274 9.1.5 Simulation Model 275 9.1.6 MPEG-1 Video Bit-Stream Structures 276 9.1.7 Summary 277

ix 9.2 MPEG-2 Video Coding Standard 277 9.2.1 Introduction 277 9.2.2 MPEG-2 Profiles and Levels 279 9.2.3 MPEG-2 Video Input Resolutions and Formats 280 9.2.4 MPEG-2 Video Coding Standard Compared to MPEG-1 281 9.2.5 Scalable Coding 286 9.2.6 Data Partitioning 288 9.2.7 Other Tools for Error Resilience 289 9.2.8 Test Model 289 9.2.9 MPEG-2 Video and System Bit-Stream Structures 290 9.2.10 Summary 291 References 292 CHAPTER 1 0 MPEG-4 Visual and H.264/AVC: Standards for Modern Digital Video 295 10.1 Introduction 295 10.2 Terminology 296 10.3 MPEG-4 Part 2 298 10.3.1 Object-based Representation 298 10.3.2 Video Object Coding 299 10.3.3 Mesh Object Coding 304 10.3.4 Model-based Coding 305 10.3.5 Still Texture Coding 307 10.3.6 Scalability 307 10.3.7 Error Resilience 308 10.3.8 MPEG-4 Part 2 Profiles 308 10.4 MPEG-4 Part 10: H.264/AVC 310 10.4.1 H.264/AVC Video Coding Layer: Technical Overview 310 10.4.2 Profiles 322 10.5 MPEG-4 Compression Performance 323 10.5.1 MPEG-4 Part 2 323 10.5.2 MPEG-4 Part 10: H.264/AVC 325 10.6 MPEG-4 Video Applications 326 10.7 Conclusions and Outlook 327 Acknowledgements 328 References 328 CHAPTER 11 Interframe Subband/Wavelet Scalable Video Coding 331 11.1 Introduction 331 11.2 Motion Estimation and Compensation for MCTF 333 11.2.1 Connected and Unconnected Blocks 335

11.2.2 Using Chroma for Motion Estimation 337 11.2.3 Improving the Haar MCTF 338 11.3 New Haar MCTF 342 11.3.1 Overlapped Block Motion Compensation 344 11.3.2 Scalable Motion Vector Coding 345 11.4 EZBC Coder 348 11.4.1 Coding Process 349 11.4.2 Context Modeling 351 11.4.3 Scalability 352 11.4.4 Packetization 353 11.4.5 Frequency Roll-Off 353 11.5 Extension to LeGall and Tabatabai 5/3 Filtering 355 11.6 Objective and Visual Comparisons 357 11.6.1 Some Visual Results 357 11.7 Multiple Adaptations 360 11.8 Related Coders 361 11.9 Conclusions 363 References 363 CHAPTER 12 Digital Video Transcoding 367 12.1 Introduction 367 12.2 Video Transcoding for Bit Rate Reduction 369 12.2.1 Transcoding of Intracoded Frame 370 12.2.2 Transcoding of Intercoded Frame 372 12.2.3 Fast Video-Transcoding Architectures 372 12.2.4 DCTDomainlMC 375 12.3 Heterogeneous Video Transcoding 377 12.3.1 MV Estimation for Spatial Resolution Reduction 379 12.3.2 MV Estimation for Temporal Resolution Reduction.. 380 12.3.3 Spatial Resolution Reduction 381 12.3.4 Macro-block-Coding Type Decision 383 12.4 Bit Rate Control in Video Transcoding 383 12.5 Error-Resilient Video Transcoding 384 12.6 Concluding Remarks 385 References 386 CHAPTER 13 Embedded Video Codecs 389 13.1 Introduction 389 13.2 Block-Based Video Coding 391 13.3 Embedded Video Codec Requirements and Constraints 393 13.4 Embedded Video Codec Design Flow 397 13.4.1 Understanding the Chip Architecture 397 13.4.2 Understanding the Codec Algorithms 400 13.4.3 Modularity and APIs Definitions 401

xi 13.4.4 Reference Codec Software Development in Golden С 404 13.4.5 Platform-Specific Development and Porting 405 13.4.6 Kernel Optimization and Integration 406 13.4.7 Concurrent Processing 407 13.4.8 Overall Optimization 408 13.4.9 Stress and Conformance Testing 410 13.5 New Trends 411 13.6 Summary 413 References 414 CHAPTER 14 Video Quality Assessment 417 14.1 Introduction 417 14.2 HVS Modeling Based Methods 419 14.3 Feature Based Methods 422 14.3.1 VQM 422 14.4 Motion Modeling Based Methods 425 14.5 Performance 431 14.6 Conclusions 433 References 434 CHAPTER 15 A Unified Framework for Video Indexing, Summarization, Browsing, and Retrieval 437 15.1 Introduction 437 15.1.1 Content Categories 438 15.1.2 Storage and Compression 439 15.1.3 Terminology 440 15.2 Image and Video Features 442 15.2.1 Statistical Features 442 15.2.2 Compressed-Domain Features 450 15.2.3 Content-Based Features 452 15.3 Video Analysis 459 15.3.1 Shot Boundary Detection 459 15.3.2 Key-Frame Extraction 459 15.3.3 Play/Break Segmentation 460 15.3.4 Audio Marker Detection 460 15.3.5 Video Marker Detection 460 15.4 Video Representation 460 15.4.1 Video Representation for Scripted Content 460 15.4.2 Video Representation for Unscripted Content 462 15.5 Video Browsing 463 15.5.1 Video Browsing Using ToC-Based Summary 463 15.5.2 Video Browsing Using Highlights-Based Summary... 463

xii Contents 15.6 Video Retrieval 463 15.6.1 Feature-Based Retrieval (Statistical and Compressed) 464 15.6.2 Content-Based Retrieval 464 15.6.3 Relevance Feedback 465 15.6.4 Query-Concept Learner 466 15.6.5 Efficient Annotation through Active Learning 466 15.6.6 Considerations in Multimedia Databases 467 15.7 A Unified Framework for Indexing, Summarization, Browsing, and Retrieval 468 15.8 Conclusions and Promising Research Directions 469 Acknowledgements 470 References 470 CHAPTER 16 Video Communication Networks 473 16.1 Introduction 474 16.2 Video Compression Standards 475 16.2.1 Introduction 475 16.2.2 Overview 476 16.2.3 MPEG-2 Video Compression Standard 478 16.2.4 MPEG-2 Systems Standard 479 16.3 Video Communication Networks 485 16.3.1 Introduction 485 16.3.2 Hybrid Fiber-Coax Networks 486 16.3.3 Digital Subscriber Loop 487 16.3.4 Wireless Networks 488 16.3.5 Fiber Optics 491 16.3.6 Integrated Services Digital Network 492 16.3.7 ATM Networks 492 16.4 Internet Protocol Networks 499 16.4.1 Introduction 499 16.4.2 Multicast Backbone 502 16.4.3 Real-Time Transport Protocol 503 16.4.4 Real-Time Transport Control Protocol 511 16.4.5 Real-Time Transport Streaming Protocol 517 16.4.6 H.323 518 16.4.7 Session Initiation Protocol 520 16.4.8 Integrated Services Resource Reservation Protocol 521 16.4.9 Differentiated Services DiffServ 523 16.5 Summary 525 References 525

xiii CHAPTER 17 Video Security and Protection 527 17.1 Introduction 527 17.2 Video Encryption 527 17.2.1 Candidate Domains for Encrypting Multimedia 529 17.2.2 Building Blocks for Media Encryption 530 17.2.3 Security Evaluation of Media Encryption 536 17.2.4 Video Encryption System Design 539 17.3 Video Authentication 545 17.3.1 Background 545 17.3.2 Content Level Authentication 546 17.3.3 Stream Level Authentication 551 17.4 Video Fingerprinting for Traitor Tracing 553 17.4.1 The Background 554 17.4.2 Coded Fingerprinting 556 17.4.3 Experimental Results of Video Fingerprinting 563 17.4.4 Intravideo Collusion 566 References 566 CHAPTER 18 Wireless Video Streaming 571 18.1 Introduction 571 18.2 On Joint Source-Channel Coding 575 18.2.1 Rate-Distortion Theory 575 18.2.2 Operational Rate-Distortion Theory 576 18.2.3 Practical Constraints in Video Communications 577 18.2.4 Illustration 578 18.3 Video Compression and Transmission 579 18.3.1 Video Transmission System 579 18.3.2 Video Compression Basics 581 18.3.3 Channel Models 584 18.3.4 End-to-End Distortion 585 18.3.5 Error Resilient Source Coding 587 18.4 Channel Coding 589 18.4.1 Forward Error Correction 590 18.4.2 Retransmission 592 18.5 Joint Source-Channel Coding 594 18.5.1 Problem Formulation 594 18.5.2 Internet Video Transmission 596 18.5.3 Wireless Video Transmission 599 18.6 Distributed Multimedia Communications 604 18.6.1 Video Streaming over Multiuser Networks 604 18.6.2 Mobile TV Standards 608 18.6.3 Peer-to-Peer Internet Video Broadcasting 610 18.6.4 Video Streaming over Multihop Wireless Networks... 611

xiv Contents 18.7 Discussion 612 References 613 CHAPTER 19 Video Surveillance 619 19.1 Introduction 619 19.2 Categorizing Applications, Target Scenes, and Video Analytics 620 19.2.1 Video Surveillance Applications 620 19.2.2 Video Surveillance Target Scenes 624 19.2.3 Video Analytics for Video Surveillance 626 19.3 Review of Video Analytic Algorithms 628 19.3.1 Motion and Change Detection 629 19.3.2 Object Detection 635 19.3.3 Object Tracking 637 19.3.4 Behavioral Analysis Tools 644 19.3.5 Gait Recognition 645 19.3.6 Face Recognition 648 19.4 Conclusion 648 References 649 CHAPTER 20 Face Recognition From Video 653 20.1 Introduction 653 20.2 Properties and Literature Review 655 20.2.1 Set of Observations 655 20.2.2 Temporal Continuity/Dynamics 661 20.2.3 3D Model 663 20.3 A General Framework of Probabilistic Identity Characterization 665 20.3.1 Recognition Setting and Issues 668 20.4 Instances of Probabilistic Identity Characterization 670 20.4.1 ER From a Group of Still Images 670 20.4.2 ER From a Video Sequence 675 20.5 A System Identification Approach 682 20.5.1 TheARMAModel 682 20.5.2 Framework for Recognition 683 20.5.3 Experiments, Results, and Discussion 684 20.6 Conclusions 685 References 685 CHAPTER 21 Audiovisual Speech Processing 689 21.1 Introduction 689 21.2 Analysis of Visual Signals 691 21.2.1 Face Detection, Mouth, and Lip Tracking 692 21.2.2 Visual Features 694 21.2.3 Two Visual Feature Extraction Systems 698

xv 21.3 Audiovisual Information Fusion 700 21.3.1 Speech Classes in Audiovisual Integration 700 21.3.2 Classifiers in Speech Applications 702 21.3.3 Feature and Classifier Fusion 704 21.4 Audiovisual Automatic Speech Recognition 707 21.4.1 Bimodal Corpora for ASR 708 21.4.2 Experimental Results 709 21.5 Audiovisual Speech Synthesis 712 21.5.1 Coarticulation Modeling 713 21.5.2 Facial Animation 714 21.5.3 Visual Text-to-Speech 717 21.5.4 Speech-to-Video Synthesis 718 21.5.5 Visual Speech Synthesis Evaluation 721 21.6 Audiovisual Speaker Recognition 723 21.7 Summary and Discussion 729 References 731 Index 739