Digital Presentation and Preservation of Cultural and Scientific Heritage International Conference

Similar documents
ICDEM 2010, July 2010

MULTIMEDIA RETRIEVAL

Features for Art Painting Classification Based on Vector Quantization of MPEG-7 Descriptors

Krassimira Ivanova, Milena Dobreva, Peter Stanchev, Koen Vanhoof

Museum Collections and the Semantic Web

Local Features in APICAS Analyzing of Added Value of the Descriptors Based on MPEG-7 Vector Quantization

Image retrieval based on bag of images

Fifth International Conference INFORMATION RESEARCH AND APPLICATIONS

Multimodal Information Spaces for Content-based Image Retrieval

Content Based Image Retrieval Using Color Quantizes, EDBTC and LBP Features

Analysis of Semantic Information Available in an Image Collection Augmented with Auxiliary Data

Iliya Mitov 1, Krassimira Ivanova 1, Benoit Depaire 2, Koen Vanhoof 2

An Enhanced Image Retrieval Using K-Mean Clustering Algorithm in Integrating Text and Visual Features

Survey of Existing Services in the Mathematical Digital Libraries and Repositories in the EuDML Project

Information mining and information retrieval : methods and applications

General Image Database Model

A Content Based Image Retrieval System Based on Color Features

Mobile Application for Conference Series

Content Based Image Retrieval with Semantic Features using Object Ontology

Features for Art Painting Classification based on Vector Quantization of MPEG-7 Descriptors

Automatic Image Annotation by Classification Using Mpeg-7 Features

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique

Binary Histogram in Image Classification for Retrieval Purposes

Performance Enhancement of an Image Retrieval by Integrating Text and Visual Features

International Journal of Modern Trends in Engineering and Research e-issn No.: , Date: 2-4 July, 2015

Efficient Content Based Image Retrieval System with Metadata Processing

PERFORMANCE EVALUATION OF ONTOLOGY AND FUZZYBASE CBIR

Improving Quality of Products in Hard Drive Manufacturing by Decision Tree Technique

Basic Concepts Weka Workbench and its terminology

An Efficient Semantic Image Retrieval based on Color and Texture Features and Data Mining Techniques

AN EFFICIENT BATIK IMAGE RETRIEVAL SYSTEM BASED ON COLOR AND TEXTURE FEATURES

Query Disambiguation from Web Search Logs

Image Retrieval System Based on Sketch

Structure of Association Rule Classifiers: a Review

COMPARISON OF SOME CONTENT-BASED IMAGE RETRIEVAL SYSTEMS WITH ROCK TEXTURE IMAGES

Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A.

Query-Sensitive Similarity Measure for Content-Based Image Retrieval

Building OWL Ontology of Unique Bulgarian Bells Using Protégé Platform

Graph-based Automatic Suggestion of Relationships among Images of Illuminated Manuscripts

ONTOLOGY MODELLING OF MULTIMODALITY IMAGE RETRIEVAL SYSTEM FOR SPORT NEWS DOMAIN

Image Retrieval Based on its Contents Using Features Extraction

A Method for Construction of Orthogonal Arrays 1

EFFECTIVE METHODS FOR ORGANIZATION AND PRESENTATION OF DIGITIZED CULTURAL HERITAGE

Discovering the Association Rules in OLAP Data Cube with Daily Downloads of Folklore Materials *

An Object-Oriented HLA Simulation Study

Automatic Categorization of Image Regions using Dominant Color based Vector Quantization

Topic Classification in Social Media using Metadata from Hyperlinked Objects

An Empirical Study of Lazy Multilabel Classification Algorithms

Efficient Image Retrieval Using Indexing Technique

Using standards to make sense of data

A Framework for Source Code metrics

Consistent Line Clusters for Building Recognition in CBIR

Image Similarity Measurements Using Hmok- Simrank

FRACTAL DIMENSION BASED TECHNIQUE FOR DATABASE IMAGE RETRIEVAL

Graph Matching Iris Image Blocks with Local Binary Pattern

Optimal Manufacturing Scheduling for Dependent Details Processing

Improving the Efficiency of Fast Using Semantic Similarity Algorithm

Semantic Search in Heterogeneous Digital Repositories: Case Studies

Analysis of Image and Video Using Color, Texture and Shape Features for Object Identification

Programming Languages Research Programme

Semantic Video Indexing

ISSN: , (2015): DOI:

ImgSeek: Capturing User s Intent For Internet Image Search

70 IEEE TRANSACTIONS ON MULTIMEDIA, VOL. 6, NO. 1, FEBRUARY ClassView: Hierarchical Video Shot Classification, Indexing, and Accessing

Dynamic Optimization of Generalized SQL Queries with Horizontal Aggregations Using K-Means Clustering

SEMANTIC AND ABSTRACTION CONTENT OF ART IMAGES

Content Based Image Retrieval in Database of Segmented Images

Session 2 A virtual Observatory for TerraSAR-X data

A Comparative Study of Selected Classification Algorithms of Data Mining

An Efficient Methodology for Image Rich Information Retrieval

A Design-for-All Approach Towards Multimodal Accessibility of Mathematics

Text Extraction in Video

Multicriteria Image Thresholding Based on Multiobjective Particle Swarm Optimization

IMAGE RETRIEVAL SYSTEM: BASED ON USER REQUIREMENT AND INFERRING ANALYSIS TROUGH FEEDBACK

1. Introduction: Why Cultural Heritage and Archival Collections on the Semantic Web?

A Text-Based Approach to the ImageCLEF 2010 Photo Annotation Task

Image Classification Using Wavelet Coefficients in Low-pass Bands

LOAD BALANCING IN MOBILE INTELLIGENT AGENTS FRAMEWORK USING DATA MINING CLASSIFICATION TECHNIQUES

Holistic Correlation of Color Models, Color Features and Distance Metrics on Content-Based Image Retrieval

Improving 3D Shape Retrieval Methods based on Bag-of Feature Approach by using Local Codebooks

A Tactile/Haptic Interface Object Reference Model

Content-Based Image Retrieval of Web Surface Defects with PicSOM

This tutorial has been prepared for computer science graduates to help them understand the basic-to-advanced concepts related to data mining.

Offering Access to Personalized Interactive Video

Quantifying and Assessing the Merge of Cloned Web-Based System: An Exploratory Study

DIGITAL LIBRARY AND SEARCH ENGINE OF BULGARIAN FOLKLORE SONGS

Clustering Technique with Potter stemmer and Hypergraph Algorithms for Multi-featured Query Processing

New wavelet based ART network for texture classification

Content based Image Retrieval Using Multichannel Feature Extraction Techniques

A Study on Low Level Features and High

Content Based Image Retrieval (CBIR) Using Segmentation Process

A Method for the Identification of Inaccuracies in Pupil Segmentation

A Novel Image Retrieval Method Using Segmentation and Color Moments

EFFICIENT INTEGRATION OF SEMANTIC TECHNOLOGIES FOR PROFESSIONAL IMAGE ANNOTATION AND SEARCH

TEXT PREPROCESSING FOR TEXT MINING USING SIDE INFORMATION

Bipartite Graph Partitioning and Content-based Image Clustering

AUTOMATIC IMAGE ANNOTATION AND RETRIEVAL USING THE JOINT COMPOSITE DESCRIPTOR.

Tilburg University. Authoritative re-ranking of search results Bogers, A.M.; van den Bosch, A. Published in: Advances in Information Retrieval

XML Data in (Object-) Relational Databases

Extending the Facets concept by applying NLP tools to catalog records of scientific literature

Transcription:

Radoslav Pavlov Peter Stanchev Editors Digital Presentation and Preservation of Cultural and Scientific Heritage International Conference Veliko Tarnovo, Bulgaria September 18 21, 2013 Proceedings Volume III Institute of Mathematics and Informatics BAS, Sofia, 2013

Proceedings of the Third International Conference Digital Presentation and Preservation of Cultural and Scientific Heritage DiPP2013, Veliko Tarnovo, Bulgaria, September 18 21, 2013 (Volume III) Radoslav Pavlov, Peter Stanchev (Eds) Desislava Paneva-Marinova Copy Editor Lubomil Draganov Cover Design This work is subject to copyright. Permission to make digital or hard copies of portions of this work for personal or classroom use is granted without fee, provided that the copies are not made or distributed for profit or commercial advantage and that the copies bear this notice and the full citation on the first page. Copyright for components of this work owned by others must be honored. Abstracting with credit is permitted. To otherwise reproduce or transmit in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage retrieval system or in any other way requires written permission from the publisher. ISSN: 1314-4006 Editors, authors of papers, 2013 Published by Institute of Mathematics and Informatics BAS Printed in Bulgaria by Demetra Ltd. Sofia, 2013

Access via CEEOL NL Germany Query Enrichment for Image Collections by Reuse of Classification Rules Nicolas Spyratos 1, Peter Stanchev 2,3, Krasimira Ivanova 2, Iliya Mitov 2 1 Université Paris-Sud, France 2 Institute of Mathematics and Informatics, Bulgarian Academy of Sciences, Sofia, Bulgaria 3 Kettering University, Flint, USA Nicolas.Spyratos@lri.fr,pstanche@kettering.edu, kivanova@math.bas.bg,imitov@math.bas.bg Abstract. User queries over image collections, based on semantic similarity, can be processed in several ways. In this paper, we propose to reuse the rules produced by rule-based classifiers in their recognition models as query pattern definitions for searching image collections. Keywords: Query Reuse, Rule-Based Classification, Image Collections 1 Introduction Reuse is often described as "not reinventing the wheel" [6]. The process of reusing is inherent to informatics. The classical software reuse has grown out of the macros and subroutines libraries of the 1960s when the assemblers started to offer the possibilities for predefined macros to generate the call and return sequences, using the support for both in-line and separately assembled sequences of codes that could be linked together. The development of this idea has crystallized as a main principle of today's objectoriented programming. Current languages, based on this principle, are building vast collections of reusable software objects and components. Source code, components, development artifacts, patterns, templates all are possible to reuse [1]. In recent years, the process of information reuse has enlarged its scope from program code to data content, using existing content components to create new documents [8] and user interaction. 2 Features of Rule-based Classifier In the family of rule-based classifiers fall groups of decision trees, decision rules, association rules. One of the distinctive features of the rule-based classifiers is that they form a human comprehensive recognition model [9]. As a result of the learning phase of such classifiers we receive a set of rules that characterize class labels. In spite of the decision trees specifics their recognition model, based on split-andconquer techniques, can easily be transformed into a set of rules. In the decision rules Digital Presentation and Preservation of Cultural and Scientific Heritage, Vol. 3, 2013, ISSN: 1314-4006

the learned model is represented as a set of IF-THEN rules, produced on the basis of a depth-first induction strategy. Association rules show stable relations between attribute-value pairs that occur frequently in a given dataset, strong associations between frequent patterns (conjunctions of attribute-value pairs) and class labels. In many cases these classifiers give good recognition results and the produced rules can be used as profiles of corresponding class-labels. 3 Using the Rules from Recognition Model as Query Patterns for Searching Image Collections Image retrieval is an extension to traditional information retrieval. Approaches to image retrieval are somehow derived from conventional information retrieval and are designed to manage the more versatile and enormous amount of visual data which exist. The distinctive characteristic of image collections is that the images can be searched not only by textual metadata, but also on the basis of theirs content. The problem with Content-based Image Retrieval is the so called Semantic gap the lack of coincidence between the information that one can extract from the visual data and the interpretation that the same data has for a user in a given situation [7]. The semantic gap is larger in visual arts images than in natural images since artworks are most often not perfectly realistic. In simple terms, the semantic gap in contentbased retrieval stems from the fact that multimedia data is captured by devices in a format, which is optimized for storage and very simple retrieval, and cannot be used to understand what the object "mean". In addition to this, user queries are based on semantic similarity, but the computer usually processes similarity based on low-level feature similarity. To bridge this gap, text annotations by humans were used in conjunction with low-level features of the objects. Another way is to try to build higher level concepts that are comprehensive by humans, on the one hand, and based on the processing of low level features, on the other hand. In this process great importance have the categorization algorithms that allow the system "learning" how to make these decisions. The classification on a test dataset in an image collection using low-level attribute space using rule-based classifiers can produce quite good recognition results for some high-level semantic class-labels. Using MPEG-7 descriptors [4] one can achieve good recognition accuracy for indoors-outdoors, scene types, artists' practices, etc. In some image collection, we can use a set of produced rules in the recognition model as semantic profiles of corresponding class-labels and include these sets as patterns in the query module, using the set of rules as disjunctive-conjunctive sequence of conditions, and naming them with the name of class-label. In this way the user operates with well-known high-level concepts and this saves him the trouble of understanding and analysing the low-level features, captured by the image analysis. In practice, the low-level features tightly correspond to the "abstract space" of the image content [3], which is connected with the aspects that are specific to art images and reflect cultural influences, specific techniques as well as emotional responses evoked by an image [5]. This fact gives the possibilities to search patterns that define 122

more abstract concepts such as "like/dislike", "exciting/boring", or "relaxing/irritating". Though emotions can be affected by various factors like gender, age, culture, background, etc. and are considered as high-level cognitive processes, they still have certain stability and generality across different people and cultures, which enable researchers to generalize their proposed methodologies from limited samples given a sufficiently large number of observers [10]. If we have quite enough representative part of the observers that are tagged learning set of the images, the produced rules from recognition process can be used to form pattern's queries for the emotional concepts that can be searched as expression from the images. 4 Example The experiments are made for a small collection of 600 images representing different movements in West-European fine arts Renaissance, Baroque, Romanticism and Impressionism. The MPEG-7 descriptors for each image are calculated. We have used Dominant Color, Scalable Color, Color Layout, Color Structure, Edge Histogram and Homogeneous Texture MPEG-7 descriptors. The low-level visual information consists of 339 values named with A1 to A339. A part of the images are also labeled with different high level semantic information, such as indoor/outdoor, scene type, artists' name, movement. In the observed case 120 images are labeled with the movement in which their techniques belong. We provide 10-fold cross-validation over this learning dataset using BFTree Classifier [2] and as a result we receive 86.67% classification accuracy. The recognition model consists of the following tree: A64 < 9.5 A4 < -25.0: Romanticism A4 >= -25.0: Baroque A64 >= 9.5 A88 < 0.5 A23 < 2.5 A114 < 3.0: Romanticism A114 >= 3.0: Impressionism A23 >= 2.5 A206 < 1.5: Romanticism A206 >= 1.5: Renaissance A88 >= 0.5 A11 < -7.5: Renaissance A11 >= -7.5: Impressionism As query patterns we can put the following sets of rules with the assumption that the result can contain about 14% false answers, because of the fuzziness of the induction algorithms (Table 1). 123

Table 1. Set of rules using in our example Query Name Renaissance like Baroque like Romanticism like Impressionism like Search Pattern (A64>=9.5) and (A88<0.5) and (A23>=2.5) and (A206>=1.5) or (A64>=9.5) and (A88>=0.5) and (A11<-7.5) (A64<9.5) and (A4>=-25.0) (A64<9.5) and (A4<-25.0) or (A64>=9.5) and (A88<0.5) and (A23<2.5) and (A114<3.0) or (A64>=9.5) and (A88<0.5) and (A23>=2.5) and (A206<1.5) (A64>=9.5) and (A88<0.5) and (A23<2.5) and (A114>=3.0) or (A64>=9.5) and (A88>=0.5) and (A11>=-7.5) The user can operate with the queries in a high semantic level, for instance to ask "Which of the images in the collections seem to belong in the Renaissance period", choosing by the names of the input query patterns "Renaissance like". The system will process the low-level visual MPEG-7 features to decide which of the paintings answer the rules conditions, written on the right side of the corresponding query pattern. 5 Conclusion Current digital spaces contain huge amounts of information, with only 1% of the Web data being in textual form, while the rest is of multimedia/streaming nature. There is a clear need to extend the next-generation search tools to accommodate these heterogeneous media. The new search engines must combine search according to textual information or other attributes associated with the files, with the ability of extracting information from the content, which is the scope of action of Content- Based Image Retrieval (CBIR). The satisfaction of user queries, which are are based on semantic similarity can be achieved in at least three ways: (1) by supplying text annotations of the digital items by humans, which is a long and hard process and cannot be done for the huge amount of image information available on web; (2) by trying to annotate automatically with concepts that are comprehensive by humans, based on the processing of low level features using different categorization algorithms; or (3) by using some advantages of the previous step dynamically, i.e. by not making an annotation in advance and storing metadata, which are not sure that will be used, but by using the query patterns that are formed as a result of previous test annotation that shows enough accurate recognition and use them only when the user query affects the defined concept. 124

In this article we have discussed a variant of this last way showing the possibility to reuse the rules produced by rule-based classifiers in their recognition models as query pattern definitions for searching image collections. Acknowledgment This work was supported in part by the project "VISUAL: Semantic Retrieval in Art collections", No: 01/14 from 21.06.2013, funded by Bulgarian-French programme for scientific cooperation RILA. References 1. Ambler, S.: A Realistic Look at Object-Oriented Reuse. Jan., 1998, http://www.drdobbs.com/a-realistic-look-at-object-oriented-reus/184415594 2. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I: The WEKA Data Mining Software: An Update; SIGKDD Explorations, 11 (1), 2009 3. Hurtut, T.: 2D Artistic Images Analysis, a Content-based Survey, 2010, http://hal.archivesouvertes.fr/hal-00459401_v2/ 4. International Standard ISO/IEC 15938-3 Multimedia Content Description Interface Part 3: Visual, 5. http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=34230 6. Ivanova, K.: A Novel Method for Content-Based Image Retrieval in Art Image Collections Utilizing Colour Semantics. PhD Thesis, Hasselt University, Belgium, 2011. 7. Raymond, E.: The Art of UNIX Programming. Chapter 16 Reuse. Pearson Education, Inc., 2003, on-line version under the CCA available at: 8. http://homepage.cs.uri.edu/~thenry/resources/unix_art/index.html 9. Smeulders, A., Worring, M., Santini, S., Gupta, A., Jain, R.: Content based image retrieval at the end of the early years. IEEE Trans. on Pattern Analysis and Machine Intelligence, 22(12), 2000, pp. 1349 1380 10. Tuan, L.A.: Accessing and Using Complex Multimedia Documents in a Digital Library, Ph.D. thesis the Laboratoire de Recherche en Informatique, Université Paris-Sud, July 2013 11. Zaiane, O., Antonie, M.-L.: On pruning and tuning rules for associative classifiers. In Proc. of Int. Conf. on Knowledge-Based Intelligence Information & Engineering Systems, LNCS, Vol. 3683, 2005, pp. 966 973 12. Zhang, H., Augilius, E., Honkela, T., Laaksonen J., Gamper H., Alene H.: Analyzing Emotional Semantics of Abstract Art Using Low-Level Image Features. Advances in Intelligent Data Analysis X, Springer LNCS 7014, 2011, pp. 413-423 125