Towards a simplification of COMM-based multimedia annotations

Similar documents
COMM: Designing a Well-Founded Multimedia Ontology for the Web

COMM: A Core Ontology for Multimedia Annotation

COMM: A Core Ontology for Multimedia Annotation

A Semantic Multimedia Web: Create, Annotate, Present and Share your Media

A Semantic Multimedia Web: Create, Annotate, Present and Share your Media

Creation and use of video annotations for presentation generation. Documentary Generation Vox Populi (Stefano Bocconi, CWI & Uni.

Knowledge-Driven Video Information Retrieval with LOD

Semantic Multimedia.

<is web> Information Systems & Semantic Web University of Koblenz Landau, Germany

An Infrastructure for MultiMedia Metadata Management

Ontologies for Multimedia Annotation: An overview

MOWIS: A system for building Multimedia Ontologies from Web Information Sources

Ontology Servers and Metadata Vocabulary Repositories

<is web> Information Systems & Semantic Web

Metadata. Week 4 LBSC 671 Creating Information Infrastructures

Using Metadata Standards Represented in OWL for Retrieving LOs Content

For each use case, the business need, usage scenario and derived requirements are stated. 1.1 USE CASE 1: EXPLORE AND SEARCH FOR SEMANTIC ASSESTS

Annotation Component in KiWi

The MEG Metadata Schemas Registry Schemas and Ontologies: building a Semantic Infrastructure for GRIDs and digital libraries Edinburgh, 16 May 2003

CEN MetaLex. Facilitating Interchange in E- Government. Alexander Boer

A Pattern-based Framework for Representation of Uncertainty in Ontologies

University of Bath. Publication date: Document Version Publisher's PDF, also known as Version of record. Link to publication

Proposal for Implementing Linked Open Data on Libraries Catalogue

Semantic Annotation of Stock Photography for CBIR using MPEG-7 standards

Metadata Workshop 3 March 2006 Part 1

Durchblick - A Conference Assistance System for Augmented Reality Devices

EFFICIENT INTEGRATION OF SEMANTIC TECHNOLOGIES FOR PROFESSIONAL IMAGE ANNOTATION AND SEARCH

Reducing Consumer Uncertainty

Using the Semantic Web in Ubiquitous and Mobile Computing

Towards the Semantic Desktop. Dr. Øyvind Hanssen University Library of Tromsø

WonderWeb Deliverable D19

Towards Linked Data and ontology development for the semantic enrichment of volunteered geo-information

SemaPlorer Interactive Semantic Exploration of Data and Media based on a Federated Cloud Infrastructure

Interlinking Multimedia Principles and Requirements

Ontology-Driven Conceptual Modelling

HIERARCHICAL VISUAL DESCRIPTION SCHEMES FOR STILL IMAGES AND VIDEO SEQUENCES

Requirements for practical multimedia annotation

Yiannis Kompatsiaris Multimedia Knowledge Laboratory CERTH - Informatics and Telematics Institute

The NEPOMUK project. Dr. Ansgar Bernardi DFKI GmbH Kaiserslautern, Germany

Networked Ontologies

Ylvi - Multimedia-izing the Semantic Wiki

XETA: extensible metadata System

<is web> Information Systems & Semantic Web University of Koblenz Landau, Germany

Mymory: Enhancing a Semantic Wiki with Context Annotations

Programming the Semantic Web

XML information Packaging Standards for Archives

Personalized and Ubiquitous Information Services for TV Programs

ISA Action 1.17: A Reusable INSPIRE Reference Platform (ARE3NA)

MPEG-7. Multimedia Content Description Standard

3 Publishing Technique

A Semantic Role Repository Linking FrameNet and WordNet

SEMANTIC SUPPORT FOR MEDICAL IMAGE SEARCH AND RETRIEVAL

An Intelligent System for Archiving and Retrieval of Audiovisual Material Based on the MPEG-7 Description Schemes

Enabling Multimedia Metadata Interoperability by Defining Formal Semantics of MPEG-7 Profiles

Offering Access to Personalized Interactive Video

Pedigree Management and Assessment Framework (PMAF) Demonstration

Coupling OWL with MPEG-7 and TV-Anytime for Domain-specific Multimedia Information Integration and Retrieval

IMAGENOTION - Collaborative Semantic Annotation of Images and Image Parts and Work Integrated Creation of Ontologies

A Formal Definition of RESTful Semantic Web Services. Antonio Garrote Hernández María N. Moreno García

An Entity Name Systems (ENS) for the [Semantic] Web

Opus: University of Bath Online Publication Store

Reducing Consumer Uncertainty Towards a Vocabulary for User-centric Geospatial Metadata

Data is the new Oil (Ann Winblad)

CWI. Multimedia on the Semantic Web. Jacco van Ossenbruggen, Lynda Hardman, Frank Nack. Multimedia and Human-Computer Interaction CWI, Amsterdam

Semantic Annotation and Linking of Medical Educational Resources

GraphOnto: OWL-Based Ontology Management and Multimedia Annotation in the DS-MIRF Framework

Web Ontology for Software Package Management

Mapping provenance in ontologies

Dataset Dashboard a SPARQL Endpoint Explorer

Extracting knowledge from Ontology using Jena for Semantic Web

On the Reduction of Dublin Core Metadata Application Profiles to Description Logics and OWL

Building Consensus: An Overview of Metadata Standards Development

Using a Multimedia Ontology Infrastructure for Semantic Annotation of Multimedia Content

Ontology-based Architecture Documentation Approach

Nuno Freire National Library of Portugal Lisbon, Portugal

Managing Learning Objects in Large Scale Courseware Authoring Studio 1

STS Infrastructural considerations. Christian Chiarcos

Reasoning on Business Processes and Ontologies in a Logic Programming Environment

1. CONCEPTUAL MODEL 1.1 DOMAIN MODEL 1.2 UML DIAGRAM

WP2 Linking hypervideos to Web content

Reusability and Adaptability of Interactive Resources in Web-Based Educational Systems. 01/06/2003

> Semantic Web Use Cases and Case Studies

What are the Important Properties of an Entity?

Ontology Modularization for Knowledge Selection: Experiments and Evaluations

Towards Development of Ontology for the National Digital Library of India

case study The Asset Description Metadata Schema (ADMS) A common vocabulary to publish semantic interoperability assets on the Web July 2011

Europeana update: aspects of the data

Semantic Annotation, Search and Analysis

ISO/IEC INTERNATIONAL STANDARD. Information technology Multimedia content description interface Part 5: Multimedia description schemes

DL-Media: An Ontology Mediated Multimedia Information Retrieval System

Extension and integration of i* models with ontologies

Payola: Collaborative Linked Data Analysis and Visualization Framework

A Lightweight Web Video Model with Content and Context Descriptions for Integration with Linked Data

Modern Trends in Semantic Web

Starting Ontology Development by Visually Modeling an Example Situation - a User Study

The OWL API: An Introduction

PhiloSURFical: Browse Wittgenstein s World with the Semantic Web

W3C WoT call CONTEXT INFORMATION MANAGEMENT - NGSI-LD API AS BRIDGE TO SEMANTIC WEB Contact: Lindsay Frost at

Metadata for Digital Collections: A How-to-Do-It Manual

Linked Data and IIIF

Transcription:

Towards a simplification of COMM-based multimedia annotations Miroslav VACURA a, Raphaël TRONCY b a University of Economics, W. Churchill Sq.4, 13067, Prague 3, Czech Republic b CWI, Kruislaan 413, 1098 SJ Amsterdam, The Netherlands Abstract. Semantic descriptions of multimedia content are a necessary condition for their effective retrieval and presentation to an end-user. COMM is a core ontology for multimedia annotations, expressed in OWL, that provides a comprehensive model for generating fine-grained descriptions of multimedia assets. Similar to other multimedia ontology models, COMM annotations result in complex RDF graphs that can make the search for multimedia content inefficient and its selection to an end-user inappropriate. On the other hand, the richness of the model allows the description of multiple points of view of an asset, distinguishing, for example, the real world objects from their digital representation. We discuss the various advantages of having a rich formal model for expressing semantic annotations of multimedia content. We then show how the model can be simplified and optimized using logical rules and SPARQL queries and we illustrate the usefulness of this approach in an exploratory environment for searching and browsing multimedia content. Keywords. COMM, multimedia ontology, multimedia annotations, D&S, OIO, end-user interfaces 1. Introduction Working with multimedia assets involves their capture, annotation, editing, authoring and/or transfer to other applications for publication and distribution. There is substantial support within the multimedia research community for the collection of machineprocessable semantics during established media workflow practices. An essential aspect of these approaches is that a media asset gains value by the inclusion of information (i.e. metadata) about how or when it is created or used, what it represents, and how it is manipulated and organized. For example, let us imagine that a photographer take a set of photos during an event. S/He wants to publish low resolution versions of these images with appropriate semantic metadata in order to maximise the chance that these photos are found and sold in high resolution to magazines. S/He thus adds the free text description French midfielder Zinedine Zidane converts a penalty kick for the game s first score during the semi-final World Cup football match between Portugal and France at Munich s World Cup Stadium, 05 July 2006. to the image depicted in Figure 1. Using keywords and terms from controlled vocabularies is already common practice in the image industry. Hence, instead of just tagging the photo, the photographer adds semantic metadata, expressed in RDF, for describing the place and date of the event, the scene depicted, and properties regard-

Figure 1. Example of a photo taken by a news agency. Credits AFP PHOTO / ARIS MESSINIS ing the image data such as its resolution. This task can be partially supported with natural language processing tools, such as OpenCalais 1, that extract named entities from the caption. PhotoRDF 2 provides a minimal RDF schema combining EXIF for representing the camera settings and Dublin Core for the general scenery description. PhotoStuff 3 is a more expressive tool that allows the photographer to fully describe the scene using domain specific ontologies and region-based annotations. These models, however, are specific to the image media, and cannot abstract the multimedia data from one particular representation (i.e. the algorithm and encoding parameters of the content). The photographer would like to propagate the semantic annotations to all manifestations of the images (thumbnail, full size, etc.) and link them to other descriptions of the event. Multimedia ontologies, such as those based on MPEG-7, provide the richness for such descriptions [6]. They generate, however, complex RDF graphs that are inefficient for search and inappropriate for presentation to an end-user. In this paper, we investigate how to simplify these models while using all their expressivity. First, we show how multimedia content can be described using COMM [1], a core ontology for multimedia (Section 2). Second, we justify the complexity of the resulting annotations showing the added expressivity gained (Section 3). We then present how to simplify these annotations using SPARQL queries and we illustrate the usefulness of this approach in an exploratory environment for searching and browsing multimedia content (Section 4). Finally, we conclude and present future work in Section 5. 2. Using COMM for Describing Multimedia Content The scenario presented in Section 1 requires the use of a rich and formal ontology model for describing the multimedia content: i) distinguish real world objects from their digital representations that are available in multiple resolutions, ii) make explicit the relationship between the annotations and the media, and the provenance of the annotations and iii) handle fine-grained annotations and the decomposition of the media. The four MPEG-7- based ontologies proposed so far 4 are able to represent this information partially. The 1 http://www.opencalais.com/ 2 http://www.w3.org/tr/photo-rdf/ 3 http://www.mindswap.org/2003/photostuff/ 4 http://www.w3.org/2005/incubator/mmsem/xgr-vocabularies/#formal-mpeg-7

dbpedia:zidane sa1:semantic-anotation satisfies slr1:semantic-label-role m3:method adr1:annotated-data-role dbpedia:ribery sa2:semantic-annotation slr2:semantic-label-role m4:method adr2:annotated-data-role dbpedia:henry sa3:semantic-annotation satisfies satisfies slr3:semantic-label-role m5:method adr3:annotated-data-role amr1:annotated-media-role m1:method defines resulting annotations are for all of them, however, complex RDF graphs [6]. In addition, COMM also represents the context in which the annotations have been obtained. Figure 2 shows the RDF graph corresponding to COMM-based annotations of the image depicted in the Figure 1. This description can be obtained using the open-source KAT annotation tool 5 that makes use of the COMM JAVA API 6. KAT allows the pho- edns:satisfies mp1:media-profile mia1:media-instance-annotation edns:realised-by id1:image-data setting ar1:annotation-role mid1:media-instance-descriptor uid1:unique-id-descriptor uis1:unique-id-string rsr1:root-still-region-role mld1:media-locator-descriptor mu1:media-uri id4:image-data id3:image-data id2:image-data srsd1:still-region-spatial-decomposition rld3:region-locator-descriptor rld2:region-locator-descriptor rld1:region-locator-descriptor srr3:still-region-role edns:requires rb3:region-boundary srr2:still-region-role edns:requires rb2:region-boundary srr1:still-region-role rb1:region-boundary smr3:spatial-mask-role smr2:spatial-mask-role smr1:spatial-mask-role m2:method isr1:input-segment-role edns:requires edns:satisfies Figure 2. RDF graph representing a COMM-based description tographer to use any domain specific ontologies for describing the semantic content of the image. COMM provides additionally the descriptors for decomposing the image into regions, for representing the properties of the image and the low-level descriptors that can be extracted, and for making explicit the link between the annotations and the image. For simplicity, we focus our example on image annotations but both KAT and COMM have been designed for describing audio and video data as well. COMM-based annotations are based on the DOLCE Description and Situations ontology pattern [3]. The upper-right part of the Figure 2 represents the media abstraction that is currently described. It extends the concept of MediaProfile defined in MPEG-7 [4]. The lower part details the segmentation of the image into three regions. The upper-left part corresponds finally to the semantic annotation of these three segments. We detail in the next section the added-value of using COMM for each part. 5 https://launchpad.net/kat/ 6 http://multimedia.semanticweb.org/comm/

3. COMM Added-value for Describing Multimedia Content In this section, we explain the important features of the resulting RDF graph for the three parts mentioned above, namely: the abstraction of the media (3.1), the decomposition of the image (3.2) and the semantic annotation of the scene (3.3). 3.1. Modeling Multimedia Data Properties The upper-right part of the Figure 2-A describes the media (i.e. the image) using the DOLCE Ontology of Information Objects ontology pattern [2]. The central concept is MediaProfile defined in MPEG-7 that describes the characteristics of the digital data that contains the multimedia content. For example, a digital image can be stored in different resolutions (thumbnail, full size), encoded in different formats (jpeg, tiff, png, bmp) and each compression algorithms may have a particular set of parameters. The MediaProfile descriptor is used for representing these properties. The algorithm used to obtain a particular digital data is specified with m1:method in fact, it is interpreted as one run of this algorithm for a set of input parameters (not displayed in Figure 2 for simplification). The Situation is an abstract media annotation represented by the instance mia1:mediainstance-annotation. One important property of this situation is the identification of the digital data itself (usually a file) provided by the entity mid1:media-instancedescriptor. The model handles multiple physical locations of a digital data, represented by several instances of the media-uri property. The same (image) file can therefore be available through file://, http:// or ftp:// URIs. All these identical files are assigned the same unique string represented by the instance uis1:unique-id-string. The instance id1:digital-data is then understood as an abstract representation of the digital content of a file regardless its physical location. We may think of this representation as an input for further processing the image such as a segmentation as detailed in the next section. We observe that several other ontology models are able to represent this abstraction. Hence, the FRBR model defines the work, the expression, the manifestation and the item [5], while the VRA model distinguishes the work from the image [7]. Finally, the CRM model has notions of information carrier and information object. COMM contains a list of axioms stating the equivalence of these concepts and can be seen as a bridge for interoperability between rich formal ontology models. 3.2. Modeling the Decomposition of the Image The lower part of the Figure 2-B describes the image segmentation into three regions. We use again the D&S ontology design pattern for contextualizing the annotation. The Description is in this case the instance m2:method that represents a method of segmentation or a run of a segmentation algorithm that can include some parameters. We have manually drawn three regions within the image using the KAT annotation tool. The Situation is then an abstract segmentation, or in other words, a spatial decomposition represented by the instance srsd1:still-region-spatial-decomposition. The two instances on the right side of the graph form the core of the D&S pattern. The three similar blocks in the middle represent each a region of the image. The abstract digital data of each region is represented by the instance id2:image-data (resp.

id3:image-data and id4:image-data). Each of these instances play a still region role in the segmentation process. The image data of each region is obtained by applying a mask on the image data of the whole image. The mask role is played by region locator instances (i.e. rld1:region-locator-descriptor) that are further specified by region boundaries of the appropriate polygon (in this case, rectangles). This approach enables to define arbitrary segments of multimedia data using the richness of the MPEG-7 mask descriptor. 3.3. Modeling the Semantic Annotations The upper-left part of the Figure 2-C describes the semantic annotations for each abstract region resulting from the segmentation process using again the D&S ontology design pattern. The Situation is represented by the instance sa1:semantic-annotation (resp. sa2:semantic-annotation and sa3:semantic-annotation). The Description is the method or algorithm m3:method (resp. m4:method and m5:method) that corresponds to a manual annotation with no parameters in KAT. The design pattern enables to have multiple semantic annotations attached to a segment through a single method, i.e. a user can can precisely described the scene using several concepts and properties from domain ontologies. Several methods can be jointly used for describing a segment. Hence, it is possible to represent the collaborative annotations from several users, or the annotation of a user that completes the result from an automatic annotation algorithm such as a concept detector. The left part of each semantic annotation graph contains the semantic label used in the annotation. In this example, we use dbpedia URIs for identifying the football players. These URIs are themselves dereferencable and link to further descriptions of the players such as the resume of their career following the Linked Data approach 7. Although at the first sight, the overall annotation graph is complex, we observe that it is based on the repeated use of the Descriptions and Situations ontology design pattern that provides a powerful approach for representing the context and provenance of the annotations. 4. Towards a Simplification of COMM-based Annotations While having the complete COMM-based description of multimedia content stored in an RDF repository is useful for querying the context and provenance of different annotations, or for propagating the annotations through the various representations of digital data, it is not always efficient for simpler search tasks or for presentation to an enduser. For example, searching for images depicting the football player Zinedine Zidane uniquely identified by dbpedia:zidane requires to go though an RDF graph of 8 nodes, but most of this long path is actually fixed by the DOLCE ontology design patterns. We propose to use SPARQL queries for creating additional shortcuts in the complex annotation graph and therefore optimize the search and presentation of multimedia content. For example, we have defined a generic SPARQL query template for retrieving all segments of an image identified by a URI, and for each segment, its semantic annotations (Figure 3). The whole image is considered as the root segment for which it is possible to retrieve its annotations in the same way. The decomposed-to property is a relationship 7 http://linkeddata.org/

semantic-annotation annotated-by polygon-segment decomposed-to image-uri Figure 3. Simpler annotation obtained using SPARQL queries between the image URI and its segments as they are obtained either manually (using for example the KAT tool) or by a segmentation algorithm. It replaces the longer path that comes from the original COMM description. Similarly, the annotated-by property links directly each abstract representation of an image segment to its multiple semantic descriptions. The definition of these properties provided by a library of SPARQL queries is available at http://nb.vse.cz/~vacuram/comm/pl/. 5. Conclusion In this paper, we have shown the usefulness of COMM, a very rich formal ontology for describing multimedia content. Annotations can be automatically obtained using appropriate tools such as KAT. The downside of this model, however, is the complexity of the resulting RDF graphs, that make the search for content often inefficient and the presentation of the metadata to an end-user inappropriate. We have defined a number of SPARQL queries as templates for creating shortcuts in the model. Our ultimate goal is to define a complete presentation layer on top of the ontology model, using semantic web query and rule languages. Acknowledgments The research leading to this paper was partially supported by the European Commission under contract FP6-027026, "Knowledge Space of semantic inference for automatic annotation and retrieval of multimedia content - K-Space". References [1] Richard Arndt, Raphaël Troncy, Steffen Staab, Lynda Hardman, and Miroslav Vacura. COMM: Designing a Well-Founded Multimedia Ontology for the Web. In 6 th International Semantic Web Conference (ISWC), pages 30 43, 2007. [2] Aldo Gangemi, Stefano Borgo, Carola Catenacci, and Jos Lehmann. Task Taxonomies for Knowledge Content. Technical report, Metokis Deliverable 7, 2004. [3] Claudio Masolo, Laure Vieu, Emanuele Bottazzi, Carola Catenacci, Roberta Ferrario, Aldo Gangemi, and Nicola Guarino. Social Roles and their Descriptions. In 9 th Int. Conference on Principles of Knowledge Representation and Reasoning, pages 266 277, 2004. [4] MPEG-7. Multimedia Content Description Interface. ISO/IEC 15938, 2001. [5] Marie-France Plassard, editor. Functional Requirements for Bibliographic Records. UBCIM Publications - New Series Vol.19. International Federation of Library Associations and Institutions (IFLA), 1998. [6] Raphaël Troncy, Óscar Celma, Suzanne Little, Roberto García, and Chrisa Tsinaraki. MPEG-7 based Multimedia Ontologies: Interoperability Support or Interoperability Issue? In 1 st International Workshop on Multimedia Annotation and Retrieval enabled by Shared Ontologies (MAReSO), Genova, Italy, 2007. [7] Visual Resources Association Data Standards Committee VRA. VRA Core Categories, Version 3.0. http://www.vraweb.org/vracore3.htm, 2002.