Computing Similarity between Cultural Heritage Items using Multimodal Features

Size: px
Start display at page:

Download "Computing Similarity between Cultural Heritage Items using Multimodal Features"

Transcription

1 Computing Similarity between Cultural Heritage Items using Multimodal Features Nikolaos Aletras and Mark Stevenson Department of Computer Science, University of Sheffield

2 Could the combination of textual and image features assist in similarity estimation between Cultural Heritage items?

3 Could the combination of textual and image features assist in similarity estimation between Cultural Heritage items? Yes. We show that making use of text and image features produces better estimates of similarity than considering only one medium.

4 Talk Outline 1 2 Text Similarity 3 Image Similarity 4 Combining Text and Image Similarity 5 Evaluation 6 Results 7 Conclusion

5 Huge amount of digitised Cultural Heritage (CH) artefacts.

6 Huge amount of digitised Cultural Heritage (CH) artefacts. e.g. the Louvre, the British Museum and Europeana.

7 Huge amount of digitised Cultural Heritage (CH) artefacts. e.g. the Louvre, the British Museum and Europeana. Artefacts are usually associated with some text and an image.

8 Huge amount of digitised Cultural Heritage (CH) artefacts. e.g. the Louvre, the British Museum and Europeana. Artefacts are usually associated with some text and an image. Information is diverse and unstructured.

9 Huge amount of digitised Cultural Heritage (CH) artefacts. e.g. the Louvre, the British Museum and Europeana. Artefacts are usually associated with some text and an image. Information is diverse and unstructured. Exploring and navigation is difficult.

10 Huge amount of digitised Cultural Heritage (CH) artefacts. e.g. the Louvre, the British Museum and Europeana. Artefacts are usually associated with some text and an image. Information is diverse and unstructured. Exploring and navigation is difficult.

11 Solution Huge amount of digitised Cultural Heritage (CH) artefacts. e.g. the Louvre, the British Museum and Europeana. Artefacts are usually associated with some text and an image. Information is diverse and unstructured. Exploring and navigation is difficult. Identify similar items in collections. Text similarity. Image similarity.

12 Text Similarity Text Similarity Corpus-based approaches rely on statistics that they learn from given corpora. Each CH item is considered as a document. Word Overlap: Number of common tokens in the associated text of two items normalised by the combined total. N-gram Overlap: Identifying N-grams in common between two texts, increase the score by n 2 for each n-gram of length n.

13 Text Similarity Text Similarity TF.IDF Term and document frequencies are computed given a corpus of CH items. Latent Dirichlet Allocation (Blei et al., 2003): Summarising a collection of CH items into a predefined number of topics. Each document is represented as a probability distribution over a set of topics, each topic is a probability distribution of words given a corpus, cosine similarity by converting topic distributions of documents into vectors.

14 Image Similarity Image Similarity R,G,B Histograms Intersection (Swain and Ballard, 1991): Colour histograms record the number of the pixels that fall within predefined intervals (bins). Intersection is the number of bins that have same colour. Similarity score: average of the red, green and blue histogram similarity scores. Image Querying Metric (Jacobs et al., 1995): Features: Colour and basic shape information, implemented in imgseek 1 API. 1

15 Combining Text and Image Similarity Combining Text and Image Similarity Weighted linear combination of text and image similarity, sim t and sim img, between two items A, B. sim T +I (A, B) = w 1 sim t (A t, B t ) + w 2 sim img (A i, B i ) Weights w 1, w 2 are optimised using standard linear regression.

16 Evaluation Europeana Web-portal 2 providing access to collections of CH items. 2,000 contributors through out Europe. 20M CH artefacts, e.g. paintings, photographs, sculpture, newspaper archives. Information about each artefact: Title, description, subject, creator, provider. Thumbnail image Metadata in an XML Schema. 2

17 Evaluation Europeana

18 Evaluation Evaluation Data Europeana295 data set 295 pairs of items from Culture Grid 3 and Scran 4. Textual information: Title Description Subject keywords Preprocessing: Stemming, stop words. Visual information: Thumbnail image (average size 7,000-10,000 pixels)

19 Evaluation Human Judgements of Similarity Crowdflower 5 Humans rate pairs from 0-4 (unrelated-highly similar). 3,261 annotations from 99 participants. Gold-standard generated as the average of human ratings for each pair. Training the linear regression model using the gold-standard. Inter-annotator agreement: average of the Pearson correlation between the ratings of each participant and the average ratings of the other participants. ρ =

20 Evaluation Experiments Three types of experiments: Text similarity measures between pairs of items. Image similarity measures between pairs of items. Linear combination of text and image similarities. Performance is measured as the Pearson s correlation coefficient with the gold-standard data.

21 Results Results Image Similarity RGB imgseek Text Similarity Word Overlap tf.idf N-gram overlap LDA Table: Performance of similarity measures applied to Europeana295 data set (Pearson s correlation coefficient).

22 Results Results Best performance for text similarity: Word Overlap For image similarity results using imgseek are higher than RGB. Results obtained from both image similarity measures is lower than all of the text-based measures. The performance of all text similarity measures improves when combined with imgseek. RGB reduces performance when combined with text measures.

23 Conclusion Conclusion Information from text and images of CH artefacts can be combined to improve similarity estimation. We combined four corpus-based and two image-based similarity measures. Evaluation on a data set of 295 manually-annotated pairs of items from Europeana. Results showed that imgseek similarity method consistently improves performance of text similarity methods.

24 Conclusion Thank You For more details about PATHS project, please visit: Questions?

25 Conclusion References I David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent Dirichlet Allocation. Journal of Machine Learning Research, 3: , ISSN Charles E. Jacobs, Adam Finkelstein, and David H. Salesin. Fast multiresolution image querying. In Proceedings of the 22nd annual conference on Computer Graphics and Interactive Techniques (SIGGRAPH 95), pages , New York, NY, USA, ISBN doi: Michael J. Swain and Dana H. Ballard. Color indexing. International Journal of Computer Vision, 7:11 32, ISSN

Exploring archives with probabilistic models: Topic modelling for the European Commission Archives

Exploring archives with probabilistic models: Topic modelling for the European Commission Archives Exploring archives with probabilistic models: Topic modelling for the European Commission Archives Simon Hengchen, Mathias Coeckelbergs, Seth van Hooland, Ruben Verborgh & Thomas Steiner Université libre

More information

Interpreting Document Collections with Topic Models. Nikolaos Aletras University College London

Interpreting Document Collections with Topic Models. Nikolaos Aletras University College London Interpreting Document Collections with Topic Models Nikolaos Aletras University College London Acknowledgements Mark Stevenson, Sheffield Tim Baldwin, Melbourne Jey Han Lau, IBM Research Talk Outline Introduction

More information

Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library

Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library Nuno Freire Chief data officer The European Library Pacific Neighbourhood Consortium 2014 Annual

More information

Exploiting Conversation Structure in Unsupervised Topic Segmentation for s

Exploiting Conversation Structure in Unsupervised Topic Segmentation for  s Exploiting Conversation Structure in Unsupervised Topic Segmentation for Emails Shafiq Joty, Giuseppe Carenini, Gabriel Murray, Raymond Ng University of British Columbia Vancouver, Canada EMNLP 2010 1

More information

Multimodal Medical Image Retrieval based on Latent Topic Modeling

Multimodal Medical Image Retrieval based on Latent Topic Modeling Multimodal Medical Image Retrieval based on Latent Topic Modeling Mandikal Vikram 15it217.vikram@nitk.edu.in Suhas BS 15it110.suhas@nitk.edu.in Aditya Anantharaman 15it201.aditya.a@nitk.edu.in Sowmya Kamath

More information

Image Similarity Based on Direct Human Judgment

Image Similarity Based on Direct Human Judgment Image Similarity Based on Direct Human Judgment Raul Guerra Dept. of Computer Science University of Maryland College Park, MD 20742 rguerra@cs.umd.edu Abstract Recently the field of human-based computation

More information

Links, languages and semantics: linked data approaches in The European Library and Europeana. Valentine Charles, Nuno Freire & Antoine Isaac

Links, languages and semantics: linked data approaches in The European Library and Europeana. Valentine Charles, Nuno Freire & Antoine Isaac Links, languages and semantics: linked data approaches in The European Library and Europeana. Valentine Charles, Nuno Freire & Antoine Isaac 14 th August 2014, IFLA2014 satellite meeting, Paris The European

More information

Ranking models in Information Retrieval: A Survey

Ranking models in Information Retrieval: A Survey Ranking models in Information Retrieval: A Survey R.Suganya Devi Research Scholar Department of Computer Science and Engineering College of Engineering, Guindy, Chennai, Tamilnadu, India Dr D Manjula Professor

More information

Multimodal Information Spaces for Content-based Image Retrieval

Multimodal Information Spaces for Content-based Image Retrieval Research Proposal Multimodal Information Spaces for Content-based Image Retrieval Abstract Currently, image retrieval by content is a research problem of great interest in academia and the industry, due

More information

MINT METADATA INTEROPERABILITY SERVICES

MINT METADATA INTEROPERABILITY SERVICES MINT METADATA INTEROPERABILITY SERVICES DIGITAL HUMANITIES SUMMER SCHOOL LEUVEN 10/09/2014 Nikolaos Simou National Technical University of Athens What is MINT? 2 Mint is a herb having hundreds of varieties

More information

Basic techniques. Text processing; term weighting; vector space model; inverted index; Web Search

Basic techniques. Text processing; term weighting; vector space model; inverted index; Web Search Basic techniques Text processing; term weighting; vector space model; inverted index; Web Search Overview Indexes Query Indexing Ranking Results Application Documents User Information analysis Query processing

More information

Company Search When Documents are only Second Class Citizens

Company Search When Documents are only Second Class Citizens Company Search When Documents are only Second Class Citizens Daniel Blank, Sebastian Boosz, and Andreas Henrich University of Bamberg, D-96047 Bamberg, Germany, firstname.lastname@uni-bamberg.de, WWW home

More information

Bringing Europeana and CLARIN together: Dissemination and exploitation of cultural heritage data in a research infrastructure

Bringing Europeana and CLARIN together: Dissemination and exploitation of cultural heritage data in a research infrastructure Bringing Europeana and CLARIN together: Dissemination and exploitation of cultural heritage data in a research infrastructure Twan Goosen 1 (CLARIN ERIC), Nuno Freire 2, Clemens Neudecker 3, Maria Eskevich

More information

Implementation of a High-Performance Distributed Web Crawler and Big Data Applications with Husky

Implementation of a High-Performance Distributed Web Crawler and Big Data Applications with Husky Implementation of a High-Performance Distributed Web Crawler and Big Data Applications with Husky The Chinese University of Hong Kong Abstract Husky is a distributed computing system, achieving outstanding

More information

CLUSTER ANALYSIS APPLIED TO EUROPEANA DATA

CLUSTER ANALYSIS APPLIED TO EUROPEANA DATA CLUSTER ANALYSIS APPLIED TO EUROPEANA DATA by Esra Atescelik In partial fulfillment of the requirements for the degree of Master of Computer Science Department of Computer Science VU University Amsterdam

More information

Information Retrieval. (M&S Ch 15)

Information Retrieval. (M&S Ch 15) Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion

More information

Efficient Indexing and Searching Framework for Unstructured Data

Efficient Indexing and Searching Framework for Unstructured Data Efficient Indexing and Searching Framework for Unstructured Data Kyar Nyo Aye, Ni Lar Thein University of Computer Studies, Yangon kyarnyoaye@gmail.com, nilarthein@gmail.com ABSTRACT The proliferation

More information

Fondly Collisions: Archival hierarchy and the Europeana Data Model

Fondly Collisions: Archival hierarchy and the Europeana Data Model Fondly Collisions: Archival hierarchy and the Europeana Data Model Valentine Charles and Kerstin Arnold 8th October 2014, DCMI2014, Austin Overview The Archives Portal Europe - Introduction Projects and

More information

Text Document Clustering Using DPM with Concept and Feature Analysis

Text Document Clustering Using DPM with Concept and Feature Analysis Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 2, Issue. 10, October 2013,

More information

Improved Query by Image Retrieval using Multi-feature Algorithms

Improved Query by Image Retrieval using Multi-feature Algorithms International Journal of Scientific & Engineering Research, Volume 4, Issue 8, August 2013 Improved Query by Image using Multi-feature Algorithms Rani Saritha R, Varghese Paul, P. Ganesh Kumar Abstract

More information

arxiv: v1 [cs.cl] 29 Mar 2019

arxiv: v1 [cs.cl] 29 Mar 2019 Re-Ranking Words to Improve Interpretability of Automatically Generated Topics Areej Alokaili 1,2, Nikolaos Aletras 1 and Mark Stevenson 1 1 University of Sheffield, United Kingdom 2 King Saud University,

More information

UKOLN involvement in the ARCO Project. Manjula Patel UKOLN, University of Bath

UKOLN involvement in the ARCO Project. Manjula Patel UKOLN, University of Bath UKOLN involvement in the ARCO Project Manjula Patel UKOLN, University of Bath Overview Work Packages User Requirements Specification ARCO Data Model Types of Requirements Museum User Trials Metadata for

More information

IMPROVING THE PERFORMANCE OF CONTENT-BASED IMAGE RETRIEVAL SYSTEMS WITH COLOR IMAGE PROCESSING TOOLS

IMPROVING THE PERFORMANCE OF CONTENT-BASED IMAGE RETRIEVAL SYSTEMS WITH COLOR IMAGE PROCESSING TOOLS IMPROVING THE PERFORMANCE OF CONTENT-BASED IMAGE RETRIEVAL SYSTEMS WITH COLOR IMAGE PROCESSING TOOLS Fabio Costa Advanced Technology & Strategy (CGISS) Motorola 8000 West Sunrise Blvd. Plantation, FL 33322

More information

Informativeness for Adhoc IR Evaluation:

Informativeness for Adhoc IR Evaluation: Informativeness for Adhoc IR Evaluation: A measure that prevents assessing individual documents Romain Deveaud 1, Véronique Moriceau 2, Josiane Mothe 3, and Eric SanJuan 1 1 LIA, Univ. Avignon, France,

More information

Integrating Image Content and its Associated Text in a Web Image Retrieval Agent

Integrating Image Content and its Associated Text in a Web Image Retrieval Agent From: AAAI Technical Report SS-97-03. Compilation copyright 1997, AAAI (www.aaai.org). All rights reserved. Integrating Image Content and its Associated Text in a Web Image Retrieval Agent Victoria Meza

More information

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and

More information

A Miniature-Based Image Retrieval System

A Miniature-Based Image Retrieval System A Miniature-Based Image Retrieval System Md. Saiful Islam 1 and Md. Haider Ali 2 Institute of Information Technology 1, Dept. of Computer Science and Engineering 2, University of Dhaka 1, 2, Dhaka-1000,

More information

The Sunshine State Digital Network

The Sunshine State Digital Network The Sunshine State Digital Network Keila Zayas-Ruiz, Sunshine State Digital Network Coordinator May 10, 2018 What is DPLA? The Digital Public Library of America is a free online library that provides access

More information

A Measurement Design for the Comparison of Expert Usability Evaluation and Mobile App User Reviews

A Measurement Design for the Comparison of Expert Usability Evaluation and Mobile App User Reviews A Measurement Design for the Comparison of Expert Usability Evaluation and Mobile App User Reviews Necmiye Genc-Nayebi and Alain Abran Department of Software Engineering and Information Technology, Ecole

More information

MSRA Columbus at GeoCLEF2007

MSRA Columbus at GeoCLEF2007 MSRA Columbus at GeoCLEF2007 Zhisheng Li 1, Chong Wang 2, Xing Xie 2, Wei-Ying Ma 2 1 Department of Computer Science, University of Sci. & Tech. of China, Hefei, Anhui, 230026, P.R. China zsli@mail.ustc.edu.cn

More information

Matching Cultural Heritage items to Wikipedia

Matching Cultural Heritage items to Wikipedia Matching Cultural Heritage items to Wikipedia Eneko Agirre, Ander Barrena, Oier Lopez de Lacalle, Aitor Soroa, Samuel Fernando, Mark Stevenson IXA NLP Group, University of the Basque Country, Donostia,

More information

DHTK: The Digital Humanities ToolKit

DHTK: The Digital Humanities ToolKit DHTK: The Digital Humanities ToolKit Davide Picca, Mattia Egloff University of Lausanne Abstract. Digital Humanities have the merit of connecting two very different disciplines such as humanities and computer

More information

Multimedia Project Presentation

Multimedia Project Presentation Exploring Europe's Television Heritage in Changing Contexts Multimedia Project Presentation Deliverable 7.1. Euscreen in a nutshell A Best Practice Network funded by the econtentplus programme of the EU.

More information

Content-based Image Retrieval (CBIR)

Content-based Image Retrieval (CBIR) Content-based Image Retrieval (CBIR) Content-based Image Retrieval (CBIR) Searching a large database for images that match a query: What kinds of databases? What kinds of queries? What constitutes a match?

More information

EUROPEANA METADATA INGESTION , Helsinki, Finland

EUROPEANA METADATA INGESTION , Helsinki, Finland EUROPEANA METADATA INGESTION 20.11.2012, Helsinki, Finland As of now, Europeana has: 22.322.604 Metadata (related to a digital record) in CC0 3.698.807 are in the Public Domain 697.031 Digital Objects

More information

Reducing Redundancy with Anchor Text and Spam Priors

Reducing Redundancy with Anchor Text and Spam Priors Reducing Redundancy with Anchor Text and Spam Priors Marijn Koolen 1 Jaap Kamps 1,2 1 Archives and Information Studies, Faculty of Humanities, University of Amsterdam 2 ISLA, Informatics Institute, University

More information

Non-negative Matrix Factorization for Multimodal Image Retrieval

Non-negative Matrix Factorization for Multimodal Image Retrieval Non-negative Matrix Factorization for Multimodal Image Retrieval Fabio A. González PhD Machine Learning 2015-II Universidad Nacional de Colombia F. González NMF for MM IR ML 2015-II 1 / 54 Outline 1 The

More information

Europeana, the prototype EDLfoundation Europeana Network Europeana, vs. 1.0 ThoughtLab Technical requirements

Europeana, the prototype EDLfoundation Europeana Network Europeana, vs. 1.0 ThoughtLab Technical requirements Europeana European cultural heritage: united in its diversity Paul Doorenbosch KB - EDL Foundation 11th Special and University Libraries Conference, Opatija, 2 April 2009 Europeana, the prototype EDLfoundation

More information

EQUELLA. Searching User Guide. Version 6.4

EQUELLA. Searching User Guide. Version 6.4 EQUELLA Searching User Guide Version 6.4 Document History Document No. Reviewed Finalised Published 1 19/05/2015 20/05/2015 20/05/2015 May 2015 edition. Information in this document may change without

More information

ECLAP Kick-off An Aggregator Project for EUROPEANA

ECLAP Kick-off An Aggregator Project for EUROPEANA ECLAP Kick-off An Aggregator Project for EUROPEANA Paolo Nesi, nesi@dsi.unifi.it it Europeana: The Vision A digital it library that t is a single, direct and multilingual l access point to the European

More information

Where Should the Bugs Be Fixed?

Where Should the Bugs Be Fixed? Where Should the Bugs Be Fixed? More Accurate Information Retrieval-Based Bug Localization Based on Bug Reports Presented by: Chandani Shrestha For CS 6704 class About the Paper and the Authors Publication

More information

Report on Image Processing (ECE 8741) Project. Fast Multiresolution Image Querying implementation of paper by Jacobs, Finkelstein, Salesin.

Report on Image Processing (ECE 8741) Project. Fast Multiresolution Image Querying implementation of paper by Jacobs, Finkelstein, Salesin. Report on Image Processing (ECE 8741) Project Fast Multiresolution Image Querying implementation of paper by Jacobs, Finkelstein, Salesin. Author: Keywords: wavelet-signature, multiresolution, image-search,

More information

Europeana and the Mediterranean Region

Europeana and the Mediterranean Region Europeana and the Mediterranean Region Dov Winer Israel MINERVA Network for Digitisation of Culture MAKASH Advancing CMC in Education, Culture and Science (IL) Scientific Manager, Judaica Europeana (EAJC,

More information

Efficient Content Based Image Retrieval System with Metadata Processing

Efficient Content Based Image Retrieval System with Metadata Processing IJIRST International Journal for Innovative Research in Science & Technology Volume 1 Issue 10 March 2015 ISSN (online): 2349-6010 Efficient Content Based Image Retrieval System with Metadata Processing

More information

HUKB at NTCIR-12 IMine-2 task: Utilization of Query Analysis Results and Wikipedia Data for Subtopic Mining

HUKB at NTCIR-12 IMine-2 task: Utilization of Query Analysis Results and Wikipedia Data for Subtopic Mining HUKB at NTCIR-12 IMine-2 task: Utilization of Query Analysis Results and Wikipedia Data for Subtopic Mining Masaharu Yoshioka Graduate School of Information Science and Technology, Hokkaido University

More information

COLOR AND SHAPE BASED IMAGE RETRIEVAL

COLOR AND SHAPE BASED IMAGE RETRIEVAL International Journal of Computer Science Engineering and Information Technology Research (IJCSEITR) ISSN 2249-6831 Vol.2, Issue 4, Dec 2012 39-44 TJPRC Pvt. Ltd. COLOR AND SHAPE BASED IMAGE RETRIEVAL

More information

Extraction of Color and Texture Features of an Image

Extraction of Color and Texture Features of an Image International Journal of Engineering Research ISSN: 2348-4039 & Management Technology July-2015 Volume 2, Issue-4 Email: editor@ijermt.org www.ijermt.org Extraction of Color and Texture Features of an

More information

D 4.2. Final Prototype Interface Design

D 4.2. Final Prototype Interface Design Grant Agreement No. Project Acronym Project full title ICT-2009-270082 PATHS Personalised Access To cultural Heritage Spaces D 4.2 Final Prototype Interface Design Authors: Mark Hall (USFD), Paula Goodale

More information

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language Dong Han and Kilian Stoffel Information Management Institute, University of Neuchâtel Pierre-à-Mazel 7, CH-2000 Neuchâtel,

More information

The Europeana Data Model and Europeana Libraries Robina Clayphan

The Europeana Data Model and Europeana Libraries Robina Clayphan The Europeana Data Model and Europeana Libraries Robina Clayphan 27 April 2012, The British Library, London Overview 1. How delighted I am to be here 2. The Europeana Data Model What is it for? What does

More information

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao

Classifying Images with Visual/Textual Cues. By Steven Kappes and Yan Cao Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao Motivation Image search Building large sets of classified images Robotics Background Object recognition is unsolved Deformable shaped

More information

The Europeana Data Model, current status

The Europeana Data Model, current status The Europeana Data Model, current status Carlo Meghini Europeana v1.0 WP3 Meeting Berlin, January 25-26, 2010 Outline Part I Background Requirements Status Part II The general picture Classes Properties

More information

COAR Interoperability Roadmap. Uppsala, May 21, 2012 COAR General Assembly

COAR Interoperability Roadmap. Uppsala, May 21, 2012 COAR General Assembly COAR Interoperability Roadmap Uppsala, May 21, 2012 COAR General Assembly 1 Background COAR WG2 s main objective for 2011-2012 was to facilitate a discussion on interoperability among Open Access repositories.

More information

Europeana DSI 2 Access to Digital Resources of European Heritage

Europeana DSI 2 Access to Digital Resources of European Heritage Europeana DSI 2 Access to Digital Resources of European Heritage MILESTONE Revision 1.0 Date of submission 28.04.2017 Author(s) Krystian Adamski, Tarek Alkhaeir, Marcin Heliński, Aleksandra Nowak, Marcin

More information

The CARARE project: modeling for Linked Open Data

The CARARE project: modeling for Linked Open Data The CARARE project: modeling for Linked Open Data Kate Fernie, MDR Partners Fagdag om modellering, 7 March 2014 CARARE: Bringing content for archaeology and historic buildings to Europeana users When:

More information

Mining Web Data. Lijun Zhang

Mining Web Data. Lijun Zhang Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems

More information

Developing Focused Crawlers for Genre Specific Search Engines

Developing Focused Crawlers for Genre Specific Search Engines Developing Focused Crawlers for Genre Specific Search Engines Nikhil Priyatam Thesis Advisor: Prof. Vasudeva Varma IIIT Hyderabad July 7, 2014 Examples of Genre Specific Search Engines MedlinePlus Naukri.com

More information

Collection management systems migration report. ARTISTE, D9.4 # Rev. A April, 2002

Collection management systems migration report. ARTISTE, D9.4 # Rev. A April, 2002 ARTISTE, D9.4 #905-0004528 Rev. A April, 2002 2002-04-08 Collection management systems migration report Project acronym ARTISTE Contract number IST 11.978 Deliverable number D9.4 Deliverable title Collection

More information

An aggregation system for cultural heritage content

An aggregation system for cultural heritage content An aggregation system for cultural heritage content Nasos Drosopoulos, Vassilis Tzouvaras, Nikolaos Simou, Anna Christaki, Arne Stabenau, Kostas Pardalis, Fotis Xenikoudakis, Eleni Tsalapati and Stefanos

More information

Welcome Back to Fundamental of Multimedia (MR412) Fall, ZHU Yongxin, Winson

Welcome Back to Fundamental of Multimedia (MR412) Fall, ZHU Yongxin, Winson Welcome Back to Fundamental of Multimedia (MR412) Fall, 2012 ZHU Yongxin, Winson zhuyongxin@sjtu.edu.cn Content-Based Retrieval in Digital Libraries 18.1 How Should We Retrieve Images? 18.2 C-BIRD : A

More information

METAINFORMATION INCORPORATION IN LIBRARY DIGITISATION PROJECTS

METAINFORMATION INCORPORATION IN LIBRARY DIGITISATION PROJECTS METAINFORMATION INCORPORATION IN LIBRARY DIGITISATION PROJECTS Michael Middleton QUT School of Information Systems, Brisbane, Australia. m.middleton@qut.edu.au This paper was accepted in Poster form and

More information

From Passages into Elements in XML Retrieval

From Passages into Elements in XML Retrieval From Passages into Elements in XML Retrieval Kelly Y. Itakura David R. Cheriton School of Computer Science, University of Waterloo 200 Univ. Ave. W. Waterloo, ON, Canada yitakura@cs.uwaterloo.ca Charles

More information

MultiMatch. D1.4 Functional Specification of the Second Prototype

MultiMatch. D1.4 Functional Specification of the Second Prototype Project no. 033104 MultiMatch Technology-enhanced Learning and Access to Cultural Heritage Instrument: Specific Targeted Research Project FP6-2005-IST-5 D1.4 Functional Specification of the Second Prototype

More information

Ch. 1.4 Histograms & Stem-&-Leaf Plots

Ch. 1.4 Histograms & Stem-&-Leaf Plots Ch. 1.4 Histograms & Stem-&-Leaf Plots Learning Intentions: Create a histogram & stem-&-leaf plot of a data set. Given a list of data, use a calculator to graph a histogram. Interpret histograms & stem-&-leaf

More information

Composite Heuristic Algorithm for Clustering Text Data Sets

Composite Heuristic Algorithm for Clustering Text Data Sets Composite Heuristic Algorithm for Clustering Text Data Sets Nikita Nikitinsky, Tamara Sokolova and Ekaterina Pshehotskaya InfoWatch Nikita.Nikitinsky@infowatch.com, Tamara.Sokolova@infowatch.com, Ekaterina.Pshehotskaya@infowatch.com

More information

Evaluating an Associative Browsing Model for Personal Information

Evaluating an Associative Browsing Model for Personal Information Evaluating an Associative Browsing Model for Personal Information Jinyoung Kim, W. Bruce Croft, David A. Smith and Anton Bakalov Department of Computer Science University of Massachusetts Amherst {jykim,croft,dasmith,abakalov}@cs.umass.edu

More information

IsiXhosa Search Engine Development Report DEVELOPING INFORMATION RETRIEVAL SYSTEMS FOR AFRICAN LANGAUGES MICHAEL KYEYUNE

IsiXhosa Search Engine Development Report DEVELOPING INFORMATION RETRIEVAL SYSTEMS FOR AFRICAN LANGAUGES MICHAEL KYEYUNE 2015 IsiXhosa Search Engine Development Report DEVELOPING INFORMATION RETRIEVAL SYSTEMS FOR AFRICAN LANGAUGES MICHAEL KYEYUNE KYYMIC001@MYUCT.AC.ZA Table of Contents ABSTRACT... 3 1.INTRODUCTION... 4 2.PROJECT

More information

Achieving interoperability between the CARARE schema for monuments and sites and the Europeana Data Model

Achieving interoperability between the CARARE schema for monuments and sites and the Europeana Data Model Achieving interoperability between the CARARE schema for monuments and sites and the Europeana Data Model Antoine Isaac, Valentine Charles, Kate Fernie, Costis Dallas, Dimitris Gavrilis, Stavros Angelis

More information

Europeana update: aspects of the data

Europeana update: aspects of the data Europeana update: aspects of the data Robina Clayphan, Europeana Foundation European Film Gateway Workshop, 30 May 2011, Frankfurt/Main Overview The Europeana Data Model (EDM) Data enrichment activity

More information

Table of Contents (As covered from textbook)

Table of Contents (As covered from textbook) Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression

More information

When Semantics support Multilingual Access to Cultural Heritage The Europeana Case. Valentine Charles and Juliane Stiller

When Semantics support Multilingual Access to Cultural Heritage The Europeana Case. Valentine Charles and Juliane Stiller When Semantics support Multilingual Access to Cultural Heritage The Europeana Case Valentine Charles and Juliane Stiller SWIB 2014, Bonn, 2.12.2014 Our outline 1. Europeana 2. Multilinguality in digital

More information

Rough Feature Selection for CBIR. Outline

Rough Feature Selection for CBIR. Outline Rough Feature Selection for CBIR Instructor:Dr. Wojciech Ziarko presenter :Aifen Ye 19th Nov., 2008 Outline Motivation Rough Feature Selection Image Retrieval Image Retrieval with Rough Feature Selection

More information

Overview of 3D Object Representations

Overview of 3D Object Representations Overview of 3D Object Representations Thomas Funkhouser Princeton University C0S 597D, Fall 2003 3D Object Representations What makes a good 3D object representation? Stanford and Hearn & Baker 1 3D Object

More information

INTRO INTO WORKING WITH MINT

INTRO INTO WORKING WITH MINT INTRO INTO WORKING WITH MINT TOOLS TO MAKE YOUR COLLECTIONS WIDELY VISIBLE BERLIN 16/02/2016 Nikolaos Simou National Technical University of Athens What is MINT? 2 Mint is a herb having hundreds of varieties

More information

Topic Model Visualization with IPython

Topic Model Visualization with IPython Topic Model Visualization with IPython Sergey Karpovich 1, Alexander Smirnov 2,3, Nikolay Teslya 2,3, Andrei Grigorev 3 1 Mos.ru, Moscow, Russia 2 SPIIRAS, St.Petersburg, Russia 3 ITMO University, St.Petersburg,

More information

A Content Based Image Retrieval System Based on Color Features

A Content Based Image Retrieval System Based on Color Features A Content Based Image Retrieval System Based on Features Irena Valova, University of Rousse Angel Kanchev, Department of Computer Systems and Technologies, Rousse, Bulgaria, Irena@ecs.ru.acad.bg Boris

More information

Search Engines. Information Retrieval in Practice

Search Engines. Information Retrieval in Practice Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Beyond Bag of Words Bag of Words a document is considered to be an unordered collection of words with no relationships Extending

More information

Interactive Visual Text Analytics for Decision Making. Shixia Liu Microsoft Research Asia

Interactive Visual Text Analytics for Decision Making. Shixia Liu Microsoft Research Asia Interactive Visual Text Analytics for Decision Making Shixia Liu Microsoft Research Asia 1 Text is Everywhere We use documents as primary information artifact in our lives Our access to documents has grown

More information

Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval

Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval 1 Naïve Implementation Convert all documents in collection D to tf-idf weighted vectors, d j, for keyword vocabulary V. Convert

More information

Photoshop Introduction to The Shape Tool nigelbuckner This handout is an introduction to get you started using the Shape tool.

Photoshop Introduction to The Shape Tool nigelbuckner This handout is an introduction to get you started using the Shape tool. Photoshop Introduction to The Shape Tool nigelbuckner 2008 This handout is an introduction to get you started using the Shape tool. What is a shape in Photoshop? The Shape tool makes it possible to draw

More information

PATHS: personalising access to cultural heritage spaces

PATHS: personalising access to cultural heritage spaces PATHS: personalising access to cultural heritage spaces Kate Fernie, Jillian Griffiths, MDR Partners, London, UK Mark Stevenson, Paul Clough, Paula Goodale, Mark Hall, University of Sheffield, UK Phil

More information

IMPROVING INFORMATION RETRIEVAL BASED ON QUERY CLASSIFICATION ALGORITHM

IMPROVING INFORMATION RETRIEVAL BASED ON QUERY CLASSIFICATION ALGORITHM IMPROVING INFORMATION RETRIEVAL BASED ON QUERY CLASSIFICATION ALGORITHM Myomyo Thannaing 1, Ayenandar Hlaing 2 1,2 University of Technology (Yadanarpon Cyber City), near Pyin Oo Lwin, Myanmar ABSTRACT

More information

HOW USEFUL ARE COLOUR INVARIANTS FOR IMAGE RETRIEVAL?

HOW USEFUL ARE COLOUR INVARIANTS FOR IMAGE RETRIEVAL? HOW USEFUL ARE COLOUR INVARIANTS FOR IMAGE RETRIEVAL? Gerald Schaefer School of Computing and Technology Nottingham Trent University Nottingham, U.K. Gerald.Schaefer@ntu.ac.uk Abstract Keywords: The images

More information

Europeana: from. inspirational idea to sustainable service. National Conference Romania. Cluj-Napoca 16 th June Lizzy Komen, Europeana

Europeana: from. inspirational idea to sustainable service. National Conference Romania. Cluj-Napoca 16 th June Lizzy Komen, Europeana Europeana: from inspirational idea to sustainable service National Conference Romania Cluj-Napoca 16 th June 2010 Lizzy Komen, Europeana Content 1. Europeana Foundation and Europeana 2. Content Strategy

More information

SEMILAR API 1.0. User guide. Authors: Rajendra Banjade, Dan Stefanescu, Nobal Niraula, Mihai Lintean, and Vasile Rus

SEMILAR API 1.0. User guide. Authors: Rajendra Banjade, Dan Stefanescu, Nobal Niraula, Mihai Lintean, and Vasile Rus WWW.SEMANTICSIMILARITY.ORG SEMILAR API 1.0 User guide Authors: Rajendra Banjade, Dan Stefanescu, Nobal Niraula, Mihai Lintean, and Vasile Rus Contact: Rajendra Banjade at rbanjade@memphis.edu 7/29/2013

More information

Performing searches on Érudit

Performing searches on Érudit Performing searches on Érudit Table of Contents 1. Simple Search 3 2. Advanced search 2.1 Running a search 4 2.2 Operators and search fields 5 2.3 Filters 7 3. Search results 3.1. Refining your search

More information

Document Clustering: Comparison of Similarity Measures

Document Clustering: Comparison of Similarity Measures Document Clustering: Comparison of Similarity Measures Shouvik Sachdeva Bhupendra Kastore Indian Institute of Technology, Kanpur CS365 Project, 2014 Outline 1 Introduction The Problem and the Motivation

More information

AN EFFECTIVE INFORMATION RETRIEVAL FOR AMBIGUOUS QUERY

AN EFFECTIVE INFORMATION RETRIEVAL FOR AMBIGUOUS QUERY Asian Journal Of Computer Science And Information Technology 2: 3 (2012) 26 30. Contents lists available at www.innovativejournal.in Asian Journal of Computer Science and Information Technology Journal

More information

Joint UNECE/Eurostat/OECD work session on statistical metadata (METIS) (Geneva, 3-5 April 2006)

Joint UNECE/Eurostat/OECD work session on statistical metadata (METIS) (Geneva, 3-5 April 2006) WP. 20 ENGLISH ONLY UNITED NATIONS STATISTICAL COMMISSION and ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS EUROPEAN COMMISSION STATISTICAL OFFICE OF THE EUROPEAN COMMUNITIES (EUROSTAT)

More information

The National Digital Library Finna Among Digital Research Infrastructures in Finland

The National Digital Library Finna Among Digital Research Infrastructures in Finland The National Digital Library Finna Among Digital Research Infrastructures in Finland Heli Kautonen Head of Services, The National Library of Finland 2 March, 2013 Seminar: Epics, Digital Cultural Heritage

More information

Mahout in Action MANNING ROBIN ANIL SEAN OWEN TED DUNNING ELLEN FRIEDMAN. Shelter Island

Mahout in Action MANNING ROBIN ANIL SEAN OWEN TED DUNNING ELLEN FRIEDMAN. Shelter Island Mahout in Action SEAN OWEN ROBIN ANIL TED DUNNING ELLEN FRIEDMAN II MANNING Shelter Island contents preface xvii acknowledgments about this book xx xix about multimedia extras xxiii about the cover illustration

More information

Evaluating Topic Representations for Exploring Document Collections

Evaluating Topic Representations for Exploring Document Collections Evaluating Topic Representations for Exploring Document Collections Nikolaos Aletras (corresponding author) Computer Science University College London nikos.aletras@gmail.com Timothy Baldwin Computing

More information

Prof. Ahmet Süerdem Istanbul Bilgi University London School of Economics

Prof. Ahmet Süerdem Istanbul Bilgi University London School of Economics Prof. Ahmet Süerdem Istanbul Bilgi University London School of Economics Media Intelligence Business intelligence (BI) Uses data mining techniques and tools for the transformation of raw data into meaningful

More information

Using Statistical Properties of Text to Create. Metadata. Computer Science and Electrical Engineering Department

Using Statistical Properties of Text to Create. Metadata. Computer Science and Electrical Engineering Department Using Statistical Properties of Text to Create Metadata Grace Crowder crowder@cs.umbc.edu Charles Nicholas nicholas@cs.umbc.edu Computer Science and Electrical Engineering Department University of Maryland

More information

Metadata Topic Harmonization and Semantic Search for Linked-Data-Driven Geoportals -- A Case Study Using ArcGIS Online

Metadata Topic Harmonization and Semantic Search for Linked-Data-Driven Geoportals -- A Case Study Using ArcGIS Online Metadata Topic Harmonization and Semantic Search for Linked-Data-Driven Geoportals -- A Case Study Using ArcGIS Online Yingjie Hu 1, Krzysztof Janowicz 1, Sathya Prasad 2, and Song Gao 1 1 STKO Lab, Department

More information

Integration of Heterogeneous Metadata in Europeana. Cesare Concordia Institute of Information Science and Technology-CNR

Integration of Heterogeneous Metadata in Europeana. Cesare Concordia Institute of Information Science and Technology-CNR Integration of Heterogeneous Metadata in Europeana Cesare Concordia cesare.concordia@isti.cnr.it Institute of Information Science and Technology-CNR Outline What is Europeana The Europeana data model The

More information

What is this Song About?: Identification of Keywords in Bollywood Lyrics

What is this Song About?: Identification of Keywords in Bollywood Lyrics What is this Song About?: Identification of Keywords in Bollywood Lyrics by Drushti Apoorva G, Kritik Mathur, Priyansh Agrawal, Radhika Mamidi in 19th International Conference on Computational Linguistics

More information

A Comparison of Algorithms used to measure the Similarity between two documents

A Comparison of Algorithms used to measure the Similarity between two documents A Comparison of Algorithms used to measure the Similarity between two documents Khuat Thanh Tung, Nguyen Duc Hung, Le Thi My Hanh Abstract Nowadays, measuring the similarity of documents plays an important

More information

Non-negative Matrix Factorization for Multimodal Image Retrieval

Non-negative Matrix Factorization for Multimodal Image Retrieval Non-negative Matrix Factorization for Multimodal Image Retrieval Fabio A. González PhD Bioingenium Research Group Computer Systems and Industrial Engineering Department Universidad Nacional de Colombia

More information

From The European Library to The European Digital Library. Jill Cousins Inforum, Prague, May 2007

From The European Library to The European Digital Library. Jill Cousins Inforum, Prague, May 2007 From The European Library to The European Digital Library Jill Cousins Inforum, Prague, May 2007 Timeline Past to Present Started as TEL a project funded by the EU and led by The British Library now fully

More information