An Intelligent System for Archiving and Retrieval of Audiovisual Material Based on the MPEG-7 Description Schemes

Similar documents
USING XML AS A MEDIUM FOR DESCRIBING, MODIFYING AND QUERYING AUDIOVISUAL CONTENT STORED IN RELATIONAL DATABASE SYSTEMS

The ToCAI Description Scheme for Indexing and Retrieval of Multimedia Documents 1

USING METADATA TO PROVIDE SCALABLE BROADCAST AND INTERNET CONTENT AND SERVICES

Offering Access to Personalized Interactive Video

MPEG-7. Multimedia Content Description Standard

ISO/IEC INTERNATIONAL STANDARD. Information technology Multimedia content description interface Part 2: Description definition language

Extraction, Description and Application of Multimedia Using MPEG-7

Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards

ISO/IEC INTERNATIONAL STANDARD. Information technology Multimedia content description interface Part 5: Multimedia description schemes

Increazing interactivity in IPTV using MPEG-21 descriptors

COALA: CONTENT-ORIENTED AUDIOVISUAL LIBRARY ACCESS

Interoperable Content-based Access of Multimedia in Digital Libraries

3. Technical and administrative metadata standards. Metadata Standards and Applications

Bridging XML-Schema and relational databases. A system for generating and manipulating relational databases using valid XML documents.

ISO/IEC INTERNATIONAL STANDARD. Information technology Multimedia content description interface Part 1: Systems

ISO/IEC Information technology Multimedia content description interface Part 7: Conformance testing

Adding Semantics to Audiovisual Content: The FAETHON Project

Lecture 7: Introduction to Multimedia Content Description. Reji Mathew & Jian Zhang NICTA & CSE UNSW COMP9519 Multimedia Systems S2 2009

Using the MPEG-7 Audio-Visual Description Profile for 3D Video Content Description

Lesson 6. MPEG Standards. MPEG - Moving Picture Experts Group Standards - MPEG-1 - MPEG-2 - MPEG-4 - MPEG-7 - MPEG-21

Management of Multimedia Semantics Using MPEG-7

MPEG-7 Audio: Tools for Semantic Audio Description and Processing

ISO/IEC INTERNATIONAL STANDARD. Information technology JPEG 2000 image coding system Part 3: Motion JPEG 2000

HIERARCHICAL VISUAL DESCRIPTION SCHEMES FOR STILL IMAGES AND VIDEO SEQUENCES

The MPEG-7 Description Standard 1

COSC 3351 Software Design. An Introduction to UML (I)

Scalable Hierarchical Summarization of News Using Fidelity in MPEG-7 Description Scheme

Delivery Context in MPEG-21

Lecture 3: Multimedia Metadata Standards. Prof. Shih-Fu Chang. EE 6850, Fall Sept. 18, 2002

Adaptive Multimedia Messaging based on MPEG-7 The M 3 -Box

MPEG-4 AUTHORING TOOL FOR THE COMPOSITION OF 3D AUDIOVISUAL SCENES

AUDIOVISUAL ARCHIVE WITH MPEG-7 VIDEO DESCRIPTION AND XML DATABASE

Video Search and Retrieval Overview of MPEG-7 Multimedia Content Description Interface

MPEG-4. Today we'll talk about...

MULTIMEDIA ARCHIVES AND MEDIATORS

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

SDMX self-learning package No. 3 Student book. SDMX-ML Messages

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

Database Management System 2

Using Metadata Standards Represented in OWL for Retrieving LOs Content

ISO/IEC INTERNATIONAL STANDARD. Information technology Multimedia content description interface Part 4: Audio

Lesson 11. Media Retrieval. Information Retrieval. Image Retrieval. Video Retrieval. Audio Retrieval

An aggregation system for cultural heritage content

Managing Learning Objects in Large Scale Courseware Authoring Studio 1

Utilizing PBCore as a Foundation for Archiving and Workflow Management

This document is a preview generated by EVS

An Infrastructure for MultiMedia Metadata Management

CONTENT MODEL FOR MOBILE ADAPTATION OF MULTIMEDIA INFORMATION

Fundamentals of Design, Implementation, and Management Tenth Edition

XML information Packaging Standards for Archives

MPEG-7 Context and Objectives

Introduzione alle Biblioteche Digitali Audio/Video

Binju Bentex *1, Shandry K. K 2. PG Student, Department of Computer Science, College Of Engineering, Kidangoor, Kottayam, Kerala, India

IST MPEG-4 Video Compliant Framework

GraphOnto: OWL-Based Ontology Management and Multimedia Annotation in the DS-MIRF Framework

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

Web Services Annotation and Reasoning

Graphical Notation for Topic Maps (GTM)

Semantic Annotation of Stock Photography for CBIR using MPEG-7 standards

UML-Based Conceptual Modeling of Pattern-Bases

E-Government strategy in Italy

Metadata Requirements for Digital Museum Environments

Ylvi - Multimedia-izing the Semantic Wiki

Lecture 3 Image and Video (MPEG) Coding

XETA: extensible metadata System

For many years, the creation and dissemination

InterPARES 2 Project

Information mining and information retrieval : methods and applications

Ingegneria del Software Corso di Laurea in Informatica per il Management. Introduction to UML

3 Publishing Technique

ISO/IEC Information technology Coding of audio-visual objects Part 15: Advanced Video Coding (AVC) file format

Using Attribute Grammars to Uniformly Represent Structured Documents - Application to Information Retrieval

A Digital Library Framework for Reusing e-learning Video Documents

Content Structure Guidelines

CONTEXT SENSITIVE SEMANTIC QUERY EXPANSION. Giorgos Akrivas, Manolis Wallace, Giorgos Andreou, Giorgos Stamou and Stefanos Kollias

Overview of the MPEG-7 Standard

The Specifications Exchange Service of an RM-ODP Framework

1 INTRODUCTION CE context and Previous works DESCRIPTION OF THE CE... 3

Automatic video summarization based on MPEG-7 descriptions

VIDEO SEARCHING AND BROWSING USING VIEWFINDER

Annotation Universal Metadata Set. 1 Scope. 2 References. 3 Introduction. Motion Imagery Standards Board Recommended Practice MISB RP 0602.

MULTIMEDIA DATABASES OVERVIEW

ISO/IEC INTERNATIONAL STANDARD. Information technology MPEG systems technologies Part 5: Bitstream Syntax Description Language (BSDL)

A MPEG-4/7 based Internet Video and Still Image Browsing System

EC-TEL Community Hypermedia in Collaborative Marc Spaniol. and Self-reflective E-learning Applications. Marc Spaniol

FAST MOTION ESTIMATION WITH DUAL SEARCH WINDOW FOR STEREO 3D VIDEO ENCODING

ISO/IEC Information technology Multimedia framework (MPEG-21) Part 3: Digital Item Identification

Title: Automatic event detection for tennis broadcasting. Author: Javier Enebral González. Director: Francesc Tarrés Ruiz. Date: July 8 th, 2011

For those of you who may not have heard of the BHL let me give you some background. The Biodiversity Heritage Library (BHL) is a consortium of

Visual Information Retrieval: The Next Frontier in Search

DCMI Abstract Model - DRAFT Update

Adobe Premiere Elements Tutorial

ISO/IEC INTERNATIONAL STANDARD. Information technology JPEG 2000 image coding system: Motion JPEG 2000

System Modeling and Implementation of MPEG-4. Encoder under Fine-Granular-Scalability Framework

Spemmet - A Tool for Modeling Software Processes with SPEM

Digital TV Metadata. VassilisTsetsos

Overview of the MPEG-7 Standard and of Future Challenges for Visual Information Analysis

CHAPTER 8 Multimedia Information Retrieval

Using Multimedia Metadata

ISO INTERNATIONAL STANDARD. Information and documentation Managing metadata for records Part 2: Conceptual and implementation issues

Transcription:

An Intelligent System for Archiving and Retrieval of Audiovisual Material Based on the MPEG-7 Description Schemes GIORGOS AKRIVAS, SPIROS IOANNOU, ELIAS KARAKOULAKIS, KOSTAS KARPOUZIS, YANNIS AVRITHIS ANASTASIOS DELOPOULOS, STEFANOS KOLLIAS, MICHALIS VAZIRGIANNIS, IRAKLIS VARLAMIS Department of Electrical and Computer Engineering National Technical University of Athens 9, Iroon Polytechniou Str., 157 73 Zographou, Athens GREECE gakrivas@image.ece.ntua.gr Abstract: - A system for digitization, storage and retrieval of audiovisual information and its associated data (metainfo) is presented. The principles of the evolving MPEG-7 standard have been adopted for the creation the data model used by the system, permitting efficient separation of database design, content description, business logic and presentation of query results. XML Schema is used in defining the data model, and XML in describing audiovisual content. Issues regarding problems that emerged during system design and their solutions are discussed, such as customization, deviations from the standard MPEG-7 DSs or even the design of entirely custom DSs. Although the system includes modules for digitization, annotation, archiving and intelligent data mining, the paper mainly focuses on the use of MPEG-7 as the information model. Key-Words: - Audiovisual archives, multimedia databases, multimedia description schemes, MPEG-7, retrieval and mining of audiovisual data. 1 Introduction Current multimedia databases contain a wealth of information in the form of audiovisual and text data. Even though efficient search algorithms have been developed for either media, there still exists the need for abstract data presentation and summarization [1]. Moreover, retrieval systems should be capable of providing the user with additional information related to the specific subject of the query, as well as suggest other, possibly interesting topics. The MPEG-7 standard [2] aims to satisfy the above operational requirements by defining a multimedia content description interface and providing a rich set of standardized tools to describe multimedia content. Unlike previous MPEG standards (MPEG-1/2/4), MPEG-7 descriptors do not depend on the ways the described content is coded or stored; it is even possible to create an MPEG-7 description of an analogue movie or of a picture that is printed on paper [3]. Moreover, automatic or semi-automatic feature extraction algorithms will be outside the scope of the standard, similarly to previous MPEG standards. For some features, such as textual description, human intervention seems unavoidable for the foreseeable future. MPEG-7 will specify a standard set of Descriptors (Ds)thatcanbeusedtodescribevarioustypes of multimedia information. It will also specify a rich set of pre-defined structures of Descriptors and their relationships, as well as ways to define one s own structures; these structures are called Description Schemes (DSs). Defining new Description Schemes is done using a special language, the Description Definition Language (DDL), which is also part of the standard. At the 51st MPEG meeting in Noordwijkerhout, it was decided to adopt the XML Schema Language [4] with a set of minimal MPEG- 7-specific extensions as the MPEG-7 DDL [5]. The standard also defines a set of DSs and Ds, which every MPEG-7 parser should read. In this paper, we present a system for efficient digitization, annotation, storage, search and mining of audiovisual data from large distributed multimedia databases through several types of networks. The system has been developed in the context of a Greek RTD project named PANORAMA. The main objectives of the project were the interoperability of databases and the availability of software products and services on networks for open multimedia access. Digitization of audiovisual archive assets provides information of all possible media (video, images, sound, texts, etc.) for wide public and professional use. In order to deal with design, description and management issues, as well as navigation and retrieval in multimedia entities, we have adopted concepts included in MPEG-7, such as Multimedia Description Schemes (MDS). The integrated architecture we have developed can thus be

smoothly integrated in MPEG-7 compatible multimedia database systems. Moreover, the adoption of MPEG-7 description schemes in all system modules permits efficient separation of database design, content description, business logic and presentation of results without having to rearrange the employed schemes. In this paper we present the issues regarding the use of MPEG-7 as the information model. 2 Description Scheme Design Even though the Descriptors (Ds) and Description Schemes (DSs) proposed by the MPEG Group are more than enough for the most systems, they can be extended to suit specific requirements or match existing data and applications. In most cases our design was based on the standard DSs [6] with certain extensions to impose additional constraints or add extra functionality. The hierarchical structure of the system is shown in Figures 1, 2 and 3 in UML format; this format is used here instead of the usual text-based DDL or XML Schema so as to illustrate the employed hierarchy and DSs in a graphical way. The diamond symbol in these figures represents the composition relationship. The range associated to each element represents frequency in the composition relationship, while the arrows denote the class inheritance relationship. In general, the AudioVisualDS shown in Figure 1, represents an AV entity, i.e. a movie or a picture. AudioVisualDS contains whatever data is known about or is extracted from an existing AV entity. Consequently, all the DSs that it contains are optional: syntactic structure (SyntacticDS), semantics (SemanticDS), links between segments and semantic entities (SyntacticSemanticLinkDS), physical storage (), and verbal description (MetaInfoDS). and an Index. In such a context, the Table of Contents (syntactic information) aims to define the structure of the archive, as it does in a book or document, using linear syntax regardless of the internal organization of the material and the linking which occurs with respect to its semantic content. On the other hand, the goal of the Index (semantic information) is not to describe the structure of the content but to provide useful references to the actual material. These references are usually not complete, in the sense that the Table of Contents essentially provides access to every piece of information in the archive, but are selected based on their semantic value to humans and may be recurring for the same item. MetaInfoDS SyntacticDS SummarizationDS SummarizationDS SegmentRef Figure 2. Structure of the audiovisual material in SyntacticDS. Video KeyframeDS StillRegionDS KeyframeDS FaceDS AudioVisualDS MetaInfoDS MovingRegionDS Audio SyntacticDS SyntacticSemanticLinkDS SemanticDS Figure 1. Representation of the AudioVisualDS hierarchy. AudioVisualDS is designed as a metaphor of the typical method of organizing the content in a written document, i.e. with the use of a Table of Contents Figure 3. Definition of segment types, through the inheritance mechanism. In our implementation, syntactic information is contained in the SyntacticDS, shown in Figure 2. The SyntacticDS contains information about the organization of the content in the physical level, as well as signal-based descriptors, such as camera movement or definition of shot groups. The inclusion of recurring s allows the creation of hierarchical Tables of Contents, where

the actual material and accompanying metainformation are presented in a way that preserves the required level of abstraction. In essence, the temporal structure and overall visual properties of a high-level object, i.e. a theme, are represented as a single node and may be decomposed to shot groups or shorter lower-level shots. ObjectRef ObjectRef SemanticDS EventObjectLinkDS EventRef EventRef Figure 4. Description of semantic content through a hierarchy of objects and events in SemanticDS. motion, shape and texture of moving regions, human face or text areas etc. The semantic content is described through the SemanticDS, shown in Figure 4, which contains a hierarchy of objects (ObjectDS) and events (EventDS). The relationship between objects and events is somewhat similar to the relationship between nouns and verbs in natural language. Unlike previous DSs, SemanticDS contains pointers (references) to objects and events, instead of full instances of objects. Therefore, many AV entities can share the same objects (e.g. persons or locations). A SemanticDS can also contain a number of EventObjectLinkDS, which provide a means to link events and objects. Another link is made to semantic entities, the SyntacticSemanticLinkDS shown in Figure 5. This provides a means to relate high-level descriptions (eg a person) to low-level ones (eg a moving region). A SyntacticSemanticLink contains all the SyntacticSemanticRelationDSs, which link segments with eventobject links. SyntacticSemanticLinkDS SyntacticSemanticRelationDS MediaProfileDS SegmentRef EventObjectLinkRef Figure 5. Linking between syntactic and semantic information. A represents a part of the content that can be thought of as an entity, and therefore it contains a verbal description (MetaInfoDS), a number of other segments, which consist a further segmentation of the content, media information, if needed, and possibly a number of summaries as well. Several kinds of segments are defined through the inheritance mechanism, as shown in Figure 3. These are: Video, which represents a segment in time (e.g. a shot), StillRegionDS, which represents a still object, MovingRegionDS, which represents a moving object, and Audio, which represents an audible segment. A Video can contain other Videos, MovingRegionDSs and KeyframeDSs. Two specializations of StillRegionDS are defined: KeyFrameDS and FaceDS. Each also contains several low-level (usually automatically extracted) descriptors such as color features, camera MediaRef MediumPositionDS Figure 6. Media storage information and profiles in. 3 Customization Issues The description schemes discussed so far are mainly based on the standard DSs of the MPEG-7 [6]. However, the description of specific information requires either deviations from the standard DSs or even the design of entirely custom DSs. The main reason for customization is the existence of heavy constraints imposed by the target user of the system (ERT) and the lack of certain information from the standard MPEG-7 DSs at the time of the PANORAMA architectural design. A typical example is the meta-information of audiovisual programs, shown in Figure 6. MetaInfoDS in our system contains all verbal information that is known about an AV entity or a segment. This includes strings (for example names of actors), times (for example time of production), and numbers (for example number of episodes). Each of the variables

can have an unlimited number of values, and new variables can dynamically be added. However, information such as program type, genre etc. is given as predefined enumerations customized by the system user. Another example is the which contains information about physical storage, namely the various copies of the content (MediaProfileDS), and the media that these copies are made on (MediaProfile). was customized to support a wide range of analog and digital media such as films, VHS tapes, digital Betacam, MPEG files, photographs and documents, as well as a large number of format and encoding parameters. Additional restrictions were imposed on the syntactic structure of AV documents by generating a hierarchy of themes, shot groups and shots. Three types of Video are defined using inheritance, namely, ThemeDS, ShotGroupDS and ShotDS. The SyntacticDS shown in Figure 2 is then composed of ThemeDSs. ThemeDSs optionally contain a number of ShotGroupDSs, which in turn contain ShotDSs. Finally, sequential and hierarchical summarization information is supported by embedding a SummarizationDS in SyntacticDS as well as in each individual (optionally), as shown in Figure 2. The SummarizationDS contains a number of references to objects that can be considered sufficient to represent the full set of segments. 4 User Interface The expert person that provides the verbal information about the content (the annotator) is supported by a special application developed for the system. The application takes as input an XML description of the audiovisual content and gives as output the same description enriched by the annotator. The basic features of the annotation application, shown in Figure 7, are the following: ÿ Automatic shot and keyframe detection ÿ Tree representation of the video segments (AV entity/theme/shotgroup/shot/keyframe, summarization and physical storage) ÿ Usage of the MPEG-1 format for the annotation ÿ Customizable annotation of each video segment ÿ Automatic extraction of several low-level features, such as camera motion, moving regions, faces and text Following the Whatever is known and needed is included in the description, the annotator is free to choose the depth in which he decides to annotate. For example, he might annotate only the full AV entity, or annotate part or all the shots. Or he might not take the time to execute the automatic shot extraction feature and limit himself to the annotation only. Figure 7. User interface of the annotation tool. 5 Conclusion An innovative system for handling audiovisual information and its associated metadata was presented. It illustrated the potential of the MPEG-7 evolving standard, as a means to formalize the description of audiovisual content. In particular, the use of MPEG-7 descriptions as a means of communication between the system modules proved to be a powerful and efficient tool. The personalization of user queries as well as the semantic unification of individual archives with custom description schemes are open issues for future research. References: [1] Chiariglione L., MPEG and Multimedia Communications, IEEE Trans. Circuits and Systems for Video Technology, Vol. 7, Feb. 1997, pp. 5-18. [2] ISO/IEC JTC1/SC29/WG11, MPEG-7 Overview (v. 1.0), Doc. N3158, Dec. 1999. [3] ISO/IEC JTC1/SC29/WG11, MPEG-7: Context, Objectives and Technical Roadmap, (v.12), Doc. N2861, July 1999. [4] XML Schema Part 0: Primer, W3C Working Draft, Sept. 2000 (http://www.w3.org/tr/xmlschema-0) [5] ISO/IEC JTC1/SC29/WG11, Text of ISO/IEC CD 15938-2 Information technology Multimedia content description interface Part 2: Description definition language, Doc. N3702, Oct. 2000. [6] ISO/IEC JTC1/SC29/WG11, Text of ISO/IEC 15938-5/CD Information Technology Multimedia Content Description Interface Part 5:

Multimedia Description Schemes, Doc. N3705, Oct. 2000. [7] G. Votsis, A. Drosopoulos, G. Akrivas, V. Tzouvaras and Y. Xirouhakis, An MPEG-7 Compliant Integrated System for Video Archiving, Characterization and Retrieval, IASTED International Conference on Signal and Image Processing (SIP2000), Las Vegas, Nevada, November 2000.