EE799 -- Multimedia Signal Processing Multimedia Signal Compression VI (MPEG-4, 7) References: 1. http://www.mpeg.org 2. http://drogo.cselt.stet.it/mpeg/ 3. T. Berahimi and M.Kunt, Visual data compression for multimedia applications, Proc. IEEE, June 1998. 4. IEEE Spectrum, Feb., 1999. CRL Multimedia -- Dr. X.-P. Zhang 1 Scope & Features Provide a set of technologies to satisfy the needs of authors, service providers and end users A new kind of interactivity, with dynamic objects rather than just static ones The integration of natural and synthetic audio and visual material The possibility to influence the way audiovisual material is presented ( composited ) Reusability of both tools and data CRL Multimedia -- Dr. X.-P. Zhang 2 Scope & Features A coded representation that can take into account lower layers, while the application developer need not worry about those layers The simultaneous use of material coming from different sources - and support of material going to different destinations The integration of real time and non-real time (stored) information in a single presentation CRL Multimedia -- Dr. X.-P. Zhang 3
Basic Elements A set of coding tools for audio-visual objects capable of providing support to different functionalities, such as: object-based interactivity and scalability error robustness efficient compression users can assemble the standard MPEG-4 tools to satisfy specific user requirements A syntactic description of coded audio-visual objects providing a formal method for describing the coded representation of these objects and the methods used to code them convey to a decoder the choice of tools made by the encoder CRL Multimedia -- Dr. X.-P. Zhang 4 CRL Multimedia -- Dr. X.-P. Zhang 5 CRL Multimedia -- Dr. X.-P. Zhang 6
What May Be Done in MPEG-4 MPEG-4 provides a standardized way to describe a scene, (e.g. VRML), allowing place media objects anywhere in a given coordinate system apply transforms to change the geometrical or acoustical appearance of a media object group primitive media objects in order to form compound media objects apply streamed data to media objects, in order to modify their attributes (e.g. moving texture belonging to an object; animation parameters animating a moving head) change, interactively, the user s viewing and listening points anywhere in the scene CRL Multimedia -- Dr. X.-P. Zhang 7 Concepts Audio Visual Objects (AV Objects) a representation of a real or virtual object that can be manifested aurally and/or visually generally hierarchical Scalability at least one subset of the bitstream is sufficient for generating a useful presentation of the object Tool a technique that enables one or more MPEG-4 functionalities. Tools may, themselves, consist of tools Examples: such as motion compensation, Sub-band filter, Audiovisual synchronization CRL Multimedia -- Dr. X.-P. Zhang 8 Concepts (Cont.) Algorithm An algorithm is an organized collection of tools that fulfills one or more requirement Examples: Code Excited Linear Prediction, DCT image coding, Reed- Solomon Coding, Speech driven image coding Profile defines the set of a certain type of tools that can be used in a certain MPEG-4 terminal There are Audio, Visual, Graphics, Scene Description and Object Descriptor profiles Level a specification of the constraints and performance criteria on an Audio, Visual, Graphics Scene Description or Object Descriptor Profile, and thus on the corresponding tools CRL Multimedia -- Dr. X.-P. Zhang 9
Major Requirements for Systems Multiplexing of Audio, Visual and Other Information Composition of Audio and Visual Objects Downloading provide the means to download and store AV objects User Interaction provide the means for the user (at the decoder), or for the decoder itself, to define the compositing script as well as coding, decoding, and other parameters Compatibility allow backward compatibility to some audio, video, imaging and audio-visual standards. (MPEG-1, MPEG-2 and H.263 Video streams, and MPEG-1 and MPEG-2 audio streams) CRL Multimedia -- Dr. X.-P. Zhang 10 Major Requirements for Systems Robustness to Information Errors and Loss provide the tools to achieve error resilient object-based streams either in terms of bit errors or cell loss in relevant environments such as mobile networks with severe error conditions, ATM networks or storage media. provide different error protection for individual objects. switch off error protection if there is no need for it. Object-based Bitstream Manipulation and Editing provide the means for editing (e.g. cutting and pasting) or manipulating (e.g. translating, rotating, scaling) objects in a sequence without the need for transcoding (either all or just those which are chosen). CRL Multimedia -- Dr. X.-P. Zhang 11 Major Requirements for Systems Content Management & Protection and Identification Identification of Intellectual Property: ISBN, watermark, etc. Multipoint Operation support sending audio-visual objects to multiple destinations and decoding objects from multiple sources with possibly different time bases Object Content Information (OCI) provide the possibility to associate content description information to the various audiovisual objects in the scene Priority of AV Objects provide means to identify the relative importance of parts of the coded AV information CRL Multimedia -- Dr. X.-P. Zhang 12
Natural Video Objects Object-based Representation binary shape (i.e. without associated texture), binary shape and associated texture gray level (alpha) shape, including exact representation of the original shape, and associated texture Video Content all types of pixel-based video content Object-based Bitstream Manipulation and Editing decode the shape without decoding the associated texture. access the object at different levels of spatial and temporal resolution CRL Multimedia -- Dr. X.-P. Zhang 13 Natural Video Objects Object-based Random Access Object Quality and Fidelity e.g. Good quality intra frames can be used to transmit a background object that subsequently needs no updating anymore Coding of Multiple Concurrent Data Streams support joint coding of at least 4 views of a video scene. For any stereoscopic video, perform at least as well as the MPEG-2 multiview profile Robustness to Information Errors and Loss Object-based Scalability spatial/temporal texture scalability by allowing objects in a scene to be coded with a base layer and up to 4 enhancement layers (spatial, temporal, and/or SNR). CRL Multimedia -- Dr. X.-P. Zhang 14 Natural Video Objects Formats Luminance Spatial Resolutions: SQSIF/SQCIF, QSIF/QCIF, SIF/CIF, 4*SIF/CIF, ITU-R BT.601 and ITU-R BT.709, as well as arbitrary sizes from 8x8 to 2048x2048 Color Spaces: Monochrome, Y/Cr/Cb, R/G/B, combined with up to 3 auxiliary components (the auxiliary components having the same size as Y data) the following Chrominance Sampling Ratios: 4:0:0, 4:2:0, 4:2:2, and 4:4:4 various Temporal Resolutions. Applications with frame rate substantially higher than 60 frame per second are expected. Pixel Depths: up to 12 bits per component Scanning Methods: Progressive and Interlaced Variable aspect ratio, and colorimetry parameters CRL Multimedia -- Dr. X.-P. Zhang 15
Synthetic Video Objects 2D/3D Mesh Compression e.g. Face and body objects in the form of 3D polygon meshes Definition & Animation Parameter Compression compression for Face Animation Parameters (FAP) and Face Definition Parameters (FDP), as well as Body Animation Parameters (BAP) and Body Definition Parameters (BDP) Texture Mapping Text Overlay Image and Graphics Overlay View-Dependent Texture Scalability Geometrical transformations Video Object Tracking efficient coding of mesh-based video object tracking information CRL Multimedia -- Dr. X.-P. Zhang 16 2D Mesh Modeling CRL Multimedia -- Dr. X.-P. Zhang 17 Video Coder CRL Multimedia -- Dr. X.-P. Zhang 18
MPEG-4 Video Coding Scheme CRL Multimedia -- Dr. X.-P. Zhang 19 MPEG-4 Video Coding Scheme The basic coding structure shape coding (for arbitrarily shaped VOs) motion compensation DCT-based texture coding (using standard 8x8 DCT or shape adaptive DCT). Motion prediction Standard 8x8 or 16x16 pixel block-based motion estimation and compensation. Global motion compensation based on the transmission of a static "sprite". CRL Multimedia -- Dr. X.-P. Zhang 20 Sprite Coding of Video Sequence CRL Multimedia -- Dr. X.-P. Zhang 21
Coding of Textures and Still Images visual texture mode of the MPEG-4. based on a zerotree wavelet algorithm that provides very high coding efficiency over a very wide range of bitrates provides spatial and quality scalabilities (up to 11 levels of spatial scalability and continuous quality scalability) and also arbitrary-shaped object coding. provides for scalable bitstream coding in the form of an image resolution pyramid for progressive transmission and temporal enhancement of still images. provides the resolution scalability to deal with a wide range of viewing conditions more typical of interactive applications and the mapping of imagery into 2D and 3D virtual worlds. CRL Multimedia -- Dr. X.-P. Zhang 22 Scalable Coding of Video Objects coding of images and video objects with spatial, temporal and SNR scalability, both with conventional rectangular as well as with arbitrary shape. desired for progressive coding of images and video over heterogeneous networks, as well as for applications where the receiver is not willing or capable of displaying the full resolution or full quality images or video sequences CRL Multimedia -- Dr. X.-P. Zhang 23 Robustness in Error Prone Environments Resynchronization localizing the amount of data discarded by the decoder VOP start code GOB is defined as one or more rows of macroblocks (MBs) all predictively encoded information must be confined within a video packet so as to prevent the propagation of errors. Data Recovery attempt to recover data that in general would be lost e.g., RVLC: designed such that they can be read both in the forward as well as the reverse direction. CRL Multimedia -- Dr. X.-P. Zhang 24
Robustness in Error Prone Environments Error Concealment utilizes data partitioning by separating the motion and the texture requires that a second resynchronization marker be inserted between motion and texture information due to the errors the texture information is discarded, the motion is used to motion compensate the previous decoded VOP CRL Multimedia -- Dr. X.-P. Zhang 25 Scene Description CRL Multimedia -- Dr. X.-P. Zhang 26 Intellectual Property Management and Protection CRL Multimedia -- Dr. X.-P. Zhang 27
Natural Audio Objects Object Based Representation Object Based Bitstream Editing and Manipulation Object Based Scalability Object-based Random Access and User Controls Robustness to Information Errors and Loss Audio Formats sampling frequencies (in khz): 8, 11.025, 12, 16, 22.05, 24, 32, 44.1, 48, 96 Amplitude resolution: up to 24 bit/sample Number of channels: up to 8 audio channels per audio object, including support for monaural, stereo, 3\0 and 5.1 channel configurations CRL Multimedia -- Dr. X.-P. Zhang 28 Synthetic Audio Objects Low Bit Rate Speech Speech coding compression support intelligible speech at 2 kbit/s Synthetic Speech Data Text to Speech Sound Synthesis e.g. Music synthesis. Networked and broadcast distribution of new musical compositions. Sound effects for virtual reality applications and other virtual environments. Internet-based karaoke. Interactive music applications. Sound effects and interactive music for video games. CRL Multimedia -- Dr. X.-P. Zhang 29 CRL Multimedia -- Dr. X.-P. Zhang 30
Delivery of Streaming Data Delivery Multimedia Integration Format a session protocol for the management of multimedia streaming over generic delivery technologies. In principle it is similar to FTP. The only (but essential) difference is that FTP returns data, DMIF returns pointers to where to get (streamed) data MPEG-defined FlexMux tool allows grouping of Elementary Streams (ESs) with a low multiplexing overhead TransMux (Transport Multiplexing) layer offers transport services matching the requested QoS The choice is left to the end user/service provider, and allows MPEG-4 to be used in a wide variety of operation environments CRL Multimedia -- Dr. X.-P. Zhang 31 The MPEG-4 System Layer Model CRL Multimedia -- Dr. X.-P. Zhang 32 CRL Multimedia -- Dr. X.-P. Zhang 33
Buffer Architecture of the System Decoder Model CRL Multimedia -- Dr. X.-P. Zhang 34 CRL Multimedia -- Dr. X.-P. Zhang 35 MPEG-J Framework for MPEG Java API s programmatic system (as opposed to the parametric system offered by MPEG-4 Version 1) which specifies API for interoperation of MPEG-4 media players with Java code The MPEG-J subsystem controlling the Presentation Engine, also referred to as the Application Engine The Java application is delivered as a separate elementary stream to the MPEG-4 terminal CRL Multimedia -- Dr. X.-P. Zhang 36
Architecture of an MPEG-J Enabled MPEG-4 System CRL Multimedia -- Dr. X.-P. Zhang 37 MPEG-7 -- Objectives Multimedia Content Description Interface specify a standard set of descriptors that can be used to describe various types of multimedia information standardise ways to define other descriptors as well as structures (Description Schemes) for the descriptors and their relationships standardise a language to specify description schemes, i.e. a Description Definition Language (DDL). still pictures, graphics, 3D models, audio, speech, video, and information about how these elements are combined in a multimedia presentation ( scenarios, composition information). e.g. may include facial expressions and personal characteristics. CRL Multimedia -- Dr. X.-P. Zhang 38 Example Semantic Information The highest level would give : This is a scene with a barking brown dog on the left and a blue ball that falls down on the right, with the sound of passing cars in the background. All these descriptions are of course coded in an efficient way - efficient for search that is. CRL Multimedia -- Dr. X.-P. Zhang 39
Scope of MPEG-7 CRL Multimedia -- Dr. X.-P. Zhang 40 Applications of MPEG-7 Digital libraries (image catalogue, musical dictionary, ) Multimedia directory services (e.g. yellow pages) Broadcast media selection (radio channel, TV channel, ) Multimedia editing (personalised electronic news service, media authoring... CRL Multimedia -- Dr. X.-P. Zhang 41 Work Plan of MPEG-7 Call for Proposals October 1998 Working Draft December1999 Committee Draft October 2000 Final Committee Draft February2001 Draft International Standard July 2001 International Standard September 2001 CRL Multimedia -- Dr. X.-P. Zhang 42
Video Representation CRL Multimedia -- Dr. X.-P. Zhang 43