MPEG-4 AUTHORING TOOL FOR THE COMPOSITION OF 3D AUDIOVISUAL SCENES

Similar documents
MPEG-4 Authoring Tool for the Composition of 3D Audiovisual Scenes

MPEG-4 Authoring Tool Using Moving Object Segmentation and Tracking in Video Shots

MPEG-4 Authoring Tool Using Moving Object Segmentation and Tracking in Video Shots

An Adaptive Scene Compositor Model in MPEG-4 Player for Mobile Device

IST MPEG-4 Video Compliant Framework

An Efficient Player for MPEG-4 Contents on a Mobile Device

MPEG-4. Today we'll talk about...

MPEG-4: Overview. Multimedia Naresuan University

EE Multimedia Signal Processing. Scope & Features. Scope & Features. Multimedia Signal Compression VI (MPEG-4, 7)

Georgios Tziritas Computer Science Department

Lesson 6. MPEG Standards. MPEG - Moving Picture Experts Group Standards - MPEG-1 - MPEG-2 - MPEG-4 - MPEG-7 - MPEG-21

Interactive Authoring Tool for Extensible MPEG-4 Textual Format (XMT)

ISO/IEC INTERNATIONAL STANDARD. Information technology Coding of audio-visual objects Part 18: Font compression and streaming

MPEG-4 Structured Audio Systems

Lecture 3 Image and Video (MPEG) Coding

THE MPEG-4 STANDARD FOR INTERNET-BASED MULTIMEDIA APPLICATIONS

Thanks for slides preparation of Dr. Shawmin Lei, Sharp Labs of America And, Mei-Yun Hsu February Material Sources

A MULTIPOINT VIDEOCONFERENCE RECEIVER BASED ON MPEG-4 OBJECT VIDEO. Chih-Kai Chien, Chen-Yu Tsai, and David W. Lin

MPEG 기반 AR 표준화현황. 건국대학교컴퓨터공학부윤경로 (yoonk_at_konkuk.ac.kr)

The Virtual Meeting Room

MPEG-4 Tools and Applications: An Overview 1

Outline Introduction MPEG-2 MPEG-4. Video Compression. Introduction to MPEG. Prof. Pratikgiri Goswami

An Adaptive MPEG-4 Streaming System Based on Object Prioritisation

Spatial Scene Level Shape Error Concealment for Segmented Video

ISO/IEC Information technology Coding of audio-visual objects Part 15: Advanced Video Coding (AVC) file format

Facial Deformations for MPEG-4

Compression and File Formats

ISO/IEC INTERNATIONAL STANDARD

Delivery Context in MPEG-21

Overview of the MPEG-4 Version 1 Standard

Introduction to LAN/WAN. Application Layer 4

ISO/IEC TR TECHNICAL REPORT. Information technology Coding of audio-visual objects Part 24: Audio and systems interaction

The MPEG-4 1 and MPEG-7 2 standards provide

RECENTLY, both digital video and computer graphics

Face Synthesis in the VIDAS project

MPEG-4 - Twice as clever?

Bluray (

Optical Storage Technology. MPEG Data Compression

Video Compression MPEG-4. Market s requirements for Video compression standard

Still Image Objective Segmentation Evaluation using Ground Truth

2.5 Animations. Applications. Learning & Teaching Design User Interfaces. Content Process ing. Group Communi cations. Documents.

2.5 Animations. Contents. Basics. S ystem s. Services. Usage. Computer Architectures. Learning & Teaching Design User Interfaces.

ISO/IEC INTERNATIONAL STANDARD. Information technology Multimedia application format (MPEG-A) Part 13: Augmented reality application format

Fine grain scalable video coding using 3D wavelets and active meshes

Overview of the MPEG-4 Standard

ISO/IEC INTERNATIONAL STANDARD. Information technology Coding of audio-

Information technology - Coding of audiovisual objects - Part 2: Visual

CONTENT MODEL FOR MOBILE ADAPTATION OF MULTIMEDIA INFORMATION

move object resize object create a sphere create light source camera left view camera view animation tracks

INTEGRATING MPEG-4 MEDIA IN THE CONTEXT OF HTML5 TECHNOLOGIES DIONISIOS KLADIS. B.A., Technological Educational Institute of Crete, 2010 A THESIS

Offering Access to Personalized Interactive Video

ISO/IEC INTERNATIONAL STANDARD. Information technology Multimedia application format (MPEG-A) Part 4: Musical slide show application format

Multimedia Technology CHAPTER 4. Video and Animation

CMPT 365 Multimedia Systems. Media Compression - Video Coding Standards

ISO/IEC INTERNATIONAL STANDARD. Information technology Coding of audio-visual objects Part 22: Open Font Format

Video coding. Concepts and notations.

USING METADATA TO PROVIDE SCALABLE BROADCAST AND INTERNET CONTENT AND SERVICES

Chapter 11.3 MPEG-2. MPEG-2: For higher quality video at a bit-rate of more than 4 Mbps Defined seven profiles aimed at different applications:

Types and Methods of Content Adaptation. Anna-Kaisa Pietiläinen

ISO/IEC INTERNATIONAL STANDARD. Information technology Multimedia content description interface Part 5: Multimedia description schemes

An Intelligent System for Archiving and Retrieval of Audiovisual Material Based on the MPEG-7 Description Schemes

A novel approach in converting SVG architectural data to X3D worlds

Video Coding Standards

Introduction to X3D. Roberto Ranon HCI Lab University of Udine, Italy

The ToCAI Description Scheme for Indexing and Retrieval of Multimedia Documents 1

White paper: Video Coding A Timeline

Stereo/Multiview Video Encoding Using the MPEG Family of Standards

Animation & Rendering

ISO/IEC INTERNATIONAL STANDARD. Information technology Coding of audio-visual objects Part 12: ISO base media file format

MPEG-4 Systems, concepts and implementation

Unconstrained Free-Viewpoint Video Coding

FRACTAL COMPRESSION USAGE FOR I FRAMES IN MPEG4 I MPEG4

International Journal of Emerging Technology and Advanced Engineering Website: (ISSN , Volume 2, Issue 4, April 2012)

Region-Based Color Image Indexing and Retrieval Kompatsiaris, Ioannis; Triantafyllou, Evangelia; Strintzis, Michael G.

MPEG-7. Multimedia Content Description Standard

Perceptual Coding. Lossless vs. lossy compression Perceptual models Selecting info to eliminate Quantization and entropy encoding

Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards

Annotation Universal Metadata Set. 1 Scope. 2 References. 3 Introduction. Motion Imagery Standards Board Recommended Practice MISB RP 0602.

INTERNATIONAL STANDARD

Interoperable Content-based Access of Multimedia in Digital Libraries

THE FAST evolution of digital technology in the last

Next-Generation 3D Formats with Depth Map Support

Scalable Hierarchical Summarization of News Using Fidelity in MPEG-7 Description Scheme

Wavelet Transform (WT) & JPEG-2000

Envivio Mindshare Presentation System. for Corporate, Education, Government, and Medical

Lecture 5: Video Compression Standards (Part2) Tutorial 3 : Introduction to Histogram

5: Music Compression. Music Coding. Mark Handley

A Transport Infrastructure Supporting Real Time Interactive MPEG-4 Client-Server Applications over IP Networks

signal-to-noise ratio (PSNR), 2

ISO/IEC INTERNATIONAL STANDARD. Information technology Coding of audio-visual objects Part 16: Animation Framework extension (AFX)

MpegRepair Software Encoding and Repair Utility

Information Technology - Coding of Audiovisual Objects Part 3: Audio

ISO/IEC INTERNATIONAL STANDARD. Information technology Multimedia application format (MPEG-A) Part 11: Stereoscopic video application format

/VERVIEW OF THE -0%' 3TANDARD

What is multimedia? Multimedia. Continuous media. Most common media types. Continuous media processing. Interactivity. What is multimedia?

ISO/IEC INTERNATIONAL STANDARD. Information technology MPEG audio technologies Part 3: Unified speech and audio coding

ISO/IEC INTERNATIONAL STANDARD

About MPEG Compression. More About Long-GOP Video

MPEG-4 departs from its predecessors in adopting a new object-based coding:

CARRIAGE OF MPEG-4 OVER MPEG-2 BASED SYSTEMS. Ardie Bahraini Motorola Broadband Communications Sector

Transcription:

MPEG-4 AUTHORING TOOL FOR THE COMPOSITION OF 3D AUDIOVISUAL SCENES P. Daras I. Kompatsiaris T. Raptis M. G. Strintzis Informatics and Telematics Institute 1,Kyvernidou str. 546 39 Thessaloniki, GREECE E-mail: daras@iti.gr Abstract Bringing much new functionality, MPEG-4 offers numerous capabilities and is expected to be the future standard for multimedia applications. In this paper a novel authoring tool fully exploiting the 3D functionalities of the MPEG-4 standard is described. It is based upon an open and modular architecture able to progress with MPEG-4 versions and it is easily adaptable to newly emerging better and higher-level authoring features. I. INTRODUCTION MPEG-4 is the next generation compression standard after MPEG-1 and MPEG-2. MPEG-4 specifies a standard mechanism for coding of audio-visual objects whereas the previous two MPEG standards dealt with coding of audio and video. Apart from natural objects, MPEG-4 also allows coding of two-dimensional and three-dimensional, synthetic and hybrid, audio and visual objects. Coding of objects enables content-based interactivity and scalability. It also improves coding and reusability of content (Figure 1). MPEG-4 Systems facilitates organization of the audio-visual objects that are decoded from elementary streams into a presentation [1]. The coded stream that describes the spatial-temporal relationships between the coded audio-visual objects is called the Scene Description or BIFS (Binary format for scenes) streams. Scene description in MPEG-4 is an extension from VRML (Virtual Reality Markup Language) to include coding and streaming, timing, and integration of 2D and 3D objects [2]. MPEG-4 Authoring is quite a challenge. Far from the past simplicity of MPEG-2 onevideo-plus-2-audio-streams, MPEG-4 allows the content creator to compose together spatially and temporally large numbers of objects of many different types: rectangular video, arbitrarily shaped video, still image, speech synthesis, voice, music, text, 2D graphics, 3D, and more. In [3], the most widely known MPEG-4 authoring tool for the

Figure 1: Overview of MPEG-4 Systems. composition of 2D scenes only is presented. This tool can read/write BIFS text or binary, read/write MP4 file format, import JPEG, AAC, or MPEG-4 video into a MP4 file, create self-contained MP4 files as well as multi-file scenes, can use BIFS and OD as a media, etc. In [4], a MPEG-4 authoring tool, compatible with the 2D player is presented. However, the user cannot preview the objects, which have been inserted in the scene until the scene viewed on the MPEG-4 player. In this paper, we present a 3D MPEG-4 authoring tool, our solution to help authors creating MPEG-4 contents with 3D functionalities from the end-user interface specification phase to the cross-platform MP4 file. We show our choice of an open and modular architecture of an MPEG-4 Authoring System able to integrate new modules. In the following section MPEG-4 BIFS are presented. In Section III an overview of the authoring tool architecture and the graphical user interface is given. The implementation specifics issues and more specifically how OpenGL was used in order to enable a 3D preview of the scene are given in Section IV. Experimental results demonstrate a 3D scene composed with the authoring tool in Section V. Finally, conclusions are drawn in Section VI. II. BINARY FORMAT FOR SCENES (BIFS) The BIFS description language [5], which has been designed as an extension of the VRML 2.0 [2] specification, is a compact binary format representing a pre-defined set of scene objects and behaviors along with their spatio-temporal relationships. In particular, BIFS contains the following four types of information: The attributes of media objects, which define their audio-visual properties. The structure of the scene graph, which contains these objects. The pre-defined spatio-temporal changes of these objects, independent of user input.

The spatio-temporal changes triggered by user interaction. Audiovisual objects have both a spatial and a temporal extent. Temporally, all objects have a single dimension, time. Objects may be located in 2-dimensional or 3-dimensional space. Each object has a local coordinate system. A local coordinate system is one in which the object has a fixed spatio-temporal location and scale (size and orientation). Objects are positioned in the scene by specifying a coordinate transformation from the object s local coordinate system into another coordinate system defined by a parent node in the tree. The coordinate transformation that locates an object in a scene is not part of the object, but rather is part of the scene. This is why the scene description has to be sent as a separate Elementary Stream. This is an important feature for bitstream editing, one of the content-based functionalities in MPEG-4. The scene description follows a hierarchical structure that can be represented as a tree. Each node of the tree is an audiovisual object. Complex objects are constructed by using appropriate scene description nodes. The tree structure is not necessarily static. The relationships can evolve in time and nodes may be deleted, added or be modified. Individual scene description nodes expose a set of parameters through which several aspects of their behavior can be controlled. Examples include the pitch of a sound, the color of a synthetic visual object, or the speed at which a video sequence is to be played. There is a clear distinction between the audiovisual object itself, the attributes that enable the control of its position and behavior, and any elementary streams that contain coded information representing some attributes of the object. The scene description does not directly refer to elementary streams when specifying a media object, but uses the concept of object descriptors. The purpose of the object descriptors framework is to identify and properly associate elementary streams to media objects used in the scene description. Those media objects that necessitate elementary stream data point to an object descriptor by means of a numeric identifier, an ObjectDescriptorID. Each object descriptor is itself a collection of descriptors that describe the elementary streams comprising a single media object. An ES_Descriptor identifies a single stream with a numeric identifier, ES_ID. Each ES_Descriptor contains the information necessary to initiate and configure the decoding process for the stream. A set of descriptors determine the required decoder resources and the precision of encoded timing information. III. MPEG-4 AUTHORING TOOL III-A. System Architecture

Figure 2: System Architecture. The process of creating MPEG-4 contents can be characterized as a development cycle with four stages: Open, Format, Play and Save (Figure 2). In this somewhat simplified model, the contents creators can: edit/format their own scenes inserting 3D objects, such as spheres, cones, cylinders, text, boxes and background. Also, group objects, modify the attributes (3D position, color, texture, etc) of the edited objects or delete objects from the content created. Insert sound and video streams, add interactivity to the scene, using sensors and interpolators and control dynamically the scene using an implementation of the BIFS-Command protocol. Generic 3D models can be created or inserted and modified using the IndexedFaceSet node. The user can insert a synthetic animated face using the implemented Face node. During these procedures the attributes of the objects and the commands as defined in the MPEG-4 standard and more specifically in BIFS, are stored in an internal program structure, which is continuously updated depending on the actions of the user. At the same time, the creator can see in real-time a 3D preview of the scene, on an integrated window using OpenGL tools. present the created content by interpreting the commands issued by the edition phase and allowing the author to check the correctness of the current description. open an existing file. save the file either in custom format or after encoding/multiplexing and packaging in a MP4 file [6], which is expected to be the standard MPEG-4 file format. The MP4 file format is designed to contain the media information of an MPEG-4 presentation in a flexible, extensible format which facilitates interchange, management, editing and presentation of the media.

Figure 3: Main window indicating the different components of the user interface. III-B. User Interface To improve the authoring process, powerful graphical tools must be provided to the author [7]. The temporal dependence and variability of multimedia applications, hinders the author from obtaining a real perception of what he is editing. The creation of an environment with multiple, synchronized views and the use of OpenGL was implemented to overcome this difficulty. The interface is composed of three main views, as shown in Figure 3. Edit/Preview: By integrating the presentation and editing phases in the same view we enable the author to see a partial result of the created object on an OpenGL window. If any given object is inserted in the scene, it can be immediately seen on the presentation window (OpenGL window) located exactly in the given 3D position. But if a particular behavior is assigned to an object, for example a video texture, it can be seen during the scene play only. If an object already has a video texture (image texture) and the user tries to map an image texture (video texture) on it, a message appears and give a warning to the user. For example, if a sound is inserted, a saxophone is displayed on the upper left corner on the presentation window. The integration of the two views is very useful for the initial scene composition. Scene Tree: This attribute provides a structural view of the scene as a tree (a BIFS scene is a graph, but for ease of presentation, the graph is reduced to a tree for display). Since the edit view cannot be used to display the behavior of the objects, the scene tree is used to provide more detailed information concerning them. The drag-n-drop and copypaste modes can also be used in this view. Object Details: This window offers object properties that the author can use to assign values other than those given by default to the objects. These properties are: 3D

position, 3D rotation, 3D scale, color (diffuse, specular, emission), shine, texture, video stream, audio stream, cylinder and cone radius and height, textstyle (plain, bold, italic, bolditalic) and fonts (serif, sans, typewriter), sky and ground background, texture for background, interpolators (color, position, orientation) and sensors (sphere, cylinder, plane, touch, time) for adding interactivity and animation to the scene. Furthermore, the author can insert, create and manipulate generic 3D models using the IndexedFaceSet node. Simple VRML files can be straightforward inserted. Synthetically animated 3D faces can be inserted by the Face node. The author must provide a FAP file [8] and the corresponding EPF file (Encoder Parameter File which is designed to give FAP encoder all the information related to the corresponding FAP file, like I and P frames, masks, frame rate, quantization scaling factor and so on). Then, a bifa file (binary format for animation) is automatically created, which is used in the Scene Description and Object Descriptor files. IV. IMPLEMENTATION SPECIFICS The 3D MPEG-4 authoring tool was developed using C/C++ for Windows, specifically Builder C++ 5.0 and OpenGL interfaced with MPEG-4 implementation group (IM1) decoders. The IM1 3D player is a software implementation of a MPEG-4 Systems player [9]. The player is built on top of the Core framework which includes also tools to encode and multiplex test scenes. It aims to be compliant with the Complete 3D profile. OpenGL [10] is a software interface to graphics hardware. The main purpose of OpenGL is to render two- and three- dimensional objects into a framebuffer. These objects are described as sequences of vertices (that define geometric objects) or pixels (that define images). OpenGL performs several processes on this data to convert it to pixels forming the final desired image in the buffer. V. EXPERIMENTAL RESULTS In this section we present a scene that can be easily constructed by the authoring tool. The scene represents a virtual studio (Figure 5). The scene contains several groups of synthetic objects including a synthetic face, boxes with textures, text objects and IndexedFaceSets (Figure 4). The logo group which is located on the upper left corner of the studio is combined of a rotating box and a text object that describes the name of the channel. The background contains four boxes (left-right side, floor and back side) with image textures. The desk is created with another two boxes. On the upper right corner of the scene a box with video texture is presented. On this video-box relative videos are loaded according to the news. The body of the newscaster is an IndexedFaceSet imported from a VRML 3D model. The 3D face was inserted by using the corresponding button. Finally, a rolling text is inserted in the scene for the headlines. After the selection of a

Figure 4: The virtual studio scene in the authoring tool. FAP (Face Animation Parameters) file and an audio stream (a saxophone appears on the upper left corner), the face is configured to animate according to the selected FAP file. The video stream (H.263) and the audio stream (G.723) are transmitted as two separate elementary streams according to the object descriptor mechanism. All the animation (except the face animation) is implemented using interpolator nodes. VI. CONCLUSIONS In this paper an authoring tool with 3D functionalities for the MPEG-4 multimedia standard was presented. After a short introduction in MPEG-4 BIFS, the proposed editing environment and the underlying architecture were described. The 3D authoring tool was used for the creation of complex 3D scenes and has proven to be user friendly and fully compatible with the MPEG-4 standard. ACKNOWLEDGMENTS This work was supported by the PENED99 project of the Greek Secretariat of Research and Technology. REFERENCES [1] MPEG-4 Systems ISO/IEC 14496-1: Coding of Audio-Visual Objects: Systems, Final Draft International Standard,'' ISO/IEC JTC1/SC29/WG11 N2501, October 1998.

Figure 5: The virtual studio scene in IM1 3D player. [2] ISO/IEC 14472-1, The Virtual Reality Modeling Language, http://www.vrml.org/specifications/vrml97, 1997. [3] S. Boughoufalah, J. C. Dufourd, and F. Bouilhaguet, MPEG-Pro, an Authoring System for MPEG-4, in ISCAS 2000-IEEE International Symposium on Circuits and Systems, (Geneva, Switzerland), May 2000. [4] V. K. Papastathis, I. Kompatsiaris, and M. G. Strintzis, Authoring tool for the composition of MPEG-4 audiovisual scenes, in International Workshop on Synthetic Natural Hybrid Coding and 3D Imaging, (Santorini, Greece), September 1999. [5] J. Signes, Y. Fisher, and A. Eleftheriadis, MPEG-4's Binary Format for Scene Description, Signal Processing: Image Communication, Special issue on MPEG- 4, vol. 15, no. 4-5, pp. 321--345, 2000. [6] R. Koenen, MPEG-4 Overview - (V.16 La BauleVersion), ISO/IEC JTC1/SC29/WG11 N3747, October 2000. [7] B. MacIntyre and S. Feiner, Future multimedia user interfaces, Multimedia Systems, vol. 4, no. 5, pp. 250--268, 1996. [8] University of Genova. Digital Signal Processing Laboratory, http://wwwdsp.com.dist.unige.it/ snhc/fba\_ce/facefrmt.htm, 2000. [9] Z. Lifshitz, Status of the Systems Version 1, 2, 3 Software Implementation, tech. rep., ISO/IEC JTC1/SC29/WG11 N3564, July 2000. [10] OpenGL, The Industry's foundation for High Performance Graphics, http://www.opengl.org, 2000.