SISTEMI PER LA RICERCA

Size: px
Start display at page:

Download "SISTEMI PER LA RICERCA"

Transcription

1 SISTEMI PER LA RICERCA DI DATI MULTIMEDIALI Claudio Lucchese e Salvatore Orlando Terzo Workshop di Dipartimento Dipartimento di Informatica Università degli studi di Venezia

2 Ringrazio le seguenti persone, coinvolte nel progetto SAPIR Raffaele Perego Fausto Rabitti Paolo Bolettieri Fabrizio Falchi Tommaso Piccioli Matteo Mordacchini

3 WHAT AND, MORE IMPORTANTLY, WHY??! Multi-Media M Objects: text, web pages (Google, Yahoo!, ) images (Flickr bought by Yahoo!, ) video (YouTube bought by Google, ) audio etc.... Why we are interested in MM objects? 1 image every 10 documents. Increasing multimedia content on the web. Yet, search in audio-visual content is limited to associated text and metadata annotations. 3

4 OK, BUT GOOGLE ALREADY DOES THE JOB... We mean to search by content! Google does text-based t (annotations) ti search Do we have enough text? How is the quality of such text? Is it correlated with the actual content? Look at this: ISTI-CNR (Pisa) and Max Planck Kunsthistorische Institute project on Florentine Coat of Arms 4

5 SISTEMI PER LA RICERCA DI DATI MULTIMEDIALI 5 CONTENT-BASED SEARCH Shape Selection Results

6 SISTEMI PER LA RICERCA DI DATI MULTIMEDIALI 6 CONTENT-BASED SEARCH Shape Selection Results

7 WHAT IS THE ADDED VALUE? Content-based search for disambiguation: What about searching for sapphire on Flickr? SISTEMI PER LA RICE ERCA DI DATI MULTIMEDIALI 7 of 21

8 CONTENT-BASED, I.E. SIMILARITY SEARCH Objects are unknown distances between objects is known Metric Space assumption: symmetry identity triangle inequality Distance functions inducing a metric space: Minkowski distances, edit distance, jaccard distance... Typical queries: Range or knn Applications: 8

9 SIMILARITY SEARCH Photo Search Objects are unknown query: distances between objects is known Metric Space assumption: symmetry identity triangle inequality Distance functions inducing a metric space: Minkowski distances, edit distance, jaccard distance... Typical queries: Range or knn Applications: 9

10 Photo Search SIMILARITY SEARCHMedical Data Search Objects are unknown distances between query: objects is known Metric Space assumption: symmetry identity triangle inequality Distance functions inducing a metric space: Minkowski distances, edit distance, jaccard distance... Typical queries: Range or knn Applications: 10

11 Photo Search SIMILARITY SEARCHMedical Data Search 3D Shape Search Objects are unknown distances between objects is known Metric Space assumption: symmetry query: (haemoglobin identity molecule) triangle inequality Distance functions inducing a metric space: Minkowski distances, edit distance, jaccard distance... Typical queries: Range or knn Applications: 11

12 SIMILARITY SEARCH Objects are unknown distances between objects is known Metric Space assumption: symmetry identity triangle inequality Distance functions inducing a metric space: Minkowski distances, edit distance, jaccard distance... Typical queries: Range or knn Applications: photos, 3D shapes, medical images but also text, dna, graphs, etc. etc. 12

13 SISTEMI PER LA RICERCA DI DATI MULTIMEDIALI 13 FEATURE-BASED APPROACH From the object space to the feature space

14 SISTEMI PER LA RICERCA DI DATI MULTIMEDIALI Generalized Hyper-plane p Tree DATA STRUCTURES Vantage Point Tree 14

15 DATA STRUCTURES Vantage Point Tree Generalized Hyper-plane p Tree Multiple sub-trees may hold the solution They perform well with high dimensionality data 15

16 I HAVE NEVER SEEN ANYTHING LIKE THAT IN THE WEB!!! Why giants like Google and Yahoo! are not using content-based search? Recent studies confirm that centralized solutions are not scalable! A single standard PC would need about 12 years to process a collection of 100 million images. Why? Feature extraction is expensive! feature extraction vs. words in a web-page Searching is expensive! similarity search vs. boolean search 16

17 I HAVE NEVER SEEN ANYTHING LIKE THAT IN THE WEB!!! Why giants like Google and Yahoo! are not using content-based search? Recent studies confirm that centralized solutions are not scalable! A single standard PC would need about 12 years to process a collection of 100 million images. Why? Feature extraction is expensive! feature extraction vs. words in a web-page Searching is expensive! similarity search vs. boolean search 17

18 PARALLEL FEATURE EXTRACTION We crawled Flickr to gather Image Ids A set of clients were deployed on the EDEE Grid: Retrieve 1000 imaged Ids Download Image Extract MPEG-7 features Parse Flicker Photo-page to obtain additional metadata Send metadata to our centralized repository We have about 50 millions images with features it seems that Flickr as at least 1 billion images Yahoo! images has at least 2 billion images and they grow exponentially! Our metadata collection is called CoPhIR It is largely the largest publicly available collection 18

19 PARALLEL SEARCH The search space is mapped into a linear interval This is mapped onto a distributed network thanks to some DHT-based algorithm Each node of the network holds an M-Tree SISTEMI PER LA RICE ERCA DI DAT TI MULTIMEDIALI 19

20 OUR PROPOSAL Metric cache: Cache is widely used in traditional search engines Trivial: i Store results previous queries Return results when submitted query is stored Less trivial... Store results of previous queries Best-effort query answering (with guarantees) ) Use past queries to optimize database queries. 20

21 A CACHE WITH ONE ENTRY... q i is the query in cache with its k-nn. r i is the radius of the query. q j is a new query. If d(q i, q j )<r i then the cached objects at tdistance R ji d(q i, q j ) q i r i q j R ji = r i - d( q i, q j ) from q j are the top-k results of the new query. 21

22 A CACHE WITH MANY ENTRIES Given the new query q x find the largest R xi corresponding to the cached query q i. The cached objects within distance R xi from q x are the most similar to the query. r l q i r i r j Additionally, one could use other objects in the q l q x cache to provide an approximate answer. 22 q j

23 SISTEMI PER LA RICERCA DI DATI MULTIMEDIALI 23 SOME RESULTS...

24 SISTEMI PER LA RICERCA DI DATI MULTIMEDIALI 24 SOME RESULTS...

25 SISTEMI PER LA RICERCA DI DATI MULTIMEDIALI 25 SOME RESULTS...

26 THE END.... Grazie! =)

A Metric Cache for Similarity Search

A Metric Cache for Similarity Search A Metric Cache for Similarity Search Fabrizio Falchi ISTI-CNR Pisa, Italy fabrizio.falchi@isti.cnr.it Raffaele Perego ISTI-CNR Pisa, Italy raffaele.perego@isti.cnr.it Claudio Lucchese ISTI-CNR Pisa, Italy

More information

Crawling, Indexing, and Similarity Searching Images on the Web

Crawling, Indexing, and Similarity Searching Images on the Web Crawling, Indexing, and Similarity Searching Images on the Web (Extended Abstract) M. Batko 1 and F. Falchi 2 and C. Lucchese 2 and D. Novak 1 and R. Perego 2 and F. Rabitti 2 and J. Sedmidubsky 1 and

More information

An approach to Content-Based Image Retrieval based on the Lucene search engine library (Extended Abstract)

An approach to Content-Based Image Retrieval based on the Lucene search engine library (Extended Abstract) An approach to Content-Based Image Retrieval based on the Lucene search engine library (Extended Abstract) Claudio Gennaro, Giuseppe Amato, Paolo Bolettieri, Pasquale Savino {claudio.gennaro, giuseppe.amato,

More information

doc. RNDr. Tomáš Skopal, Ph.D. Department of Software Engineering, Faculty of Information Technology, Czech Technical University in Prague

doc. RNDr. Tomáš Skopal, Ph.D. Department of Software Engineering, Faculty of Information Technology, Czech Technical University in Prague Praha & EU: Investujeme do vaší budoucnosti Evropský sociální fond course: Searching the Web and Multimedia Databases (BI-VWM) Tomáš Skopal, 2011 SS2010/11 doc. RNDr. Tomáš Skopal, Ph.D. Department of

More information

Introduction to Similarity Search in Multimedia Databases

Introduction to Similarity Search in Multimedia Databases Introduction to Similarity Search in Multimedia Databases Tomáš Skopal Charles University in Prague Faculty of Mathematics and Phycics SIRET research group http://siret.ms.mff.cuni.cz March 23 rd 2011,

More information

MUFIN Basics. MUFIN team Faculty of Informatics, Masaryk University Brno, Czech Republic SEMWEB 1

MUFIN Basics. MUFIN team Faculty of Informatics, Masaryk University Brno, Czech Republic SEMWEB 1 MUFIN Basics MUFIN team Faculty of Informatics, Masaryk University Brno, Czech Republic mufin@fi.muni.cz SEMWEB 1 Search problem SEARCH index structure data & queries infrastructure SEMWEB 2 The thesis

More information

The Future of P2P Audio/Visual Search

The Future of P2P Audio/Visual Search The Future of P2P Audio/Visual Search Fausto Rabitti ISTI-CNR, Pisa, Italy P2PIR Workshop - ACM CIKM 2006 Outline of the talk 1. Guessing the future of technologies 2. Today outlook 1. Peer-to-Peer Applications

More information

Caching query-biased snippets for efficient retrieval

Caching query-biased snippets for efficient retrieval Caching query-biased snippets for efficient retrieval Salvatore Orlando Università Ca Foscari Venezia orlando@unive.it Diego Ceccarelli Dipartimento di Informatica Università di Pisa diego.ceccarelli@isti.cnr.it

More information

Functionalities and Flow Analyses of Knowledge Oriented Web Portals

Functionalities and Flow Analyses of Knowledge Oriented Web Portals Functionalities and Flow Analyses of Knowledge Oriented Web Portals Daniele Cenni, Paolo Nesi, Michela Paolucci DISIT DSI, Distributed Systems and Internet Technology Lab Dipartimento di Sistemi e Informatica,

More information

arxiv:cs/ v1 [cs.ir] 21 Jul 2004

arxiv:cs/ v1 [cs.ir] 21 Jul 2004 DESIGN OF A PARALLEL AND DISTRIBUTED WEB SEARCH ENGINE arxiv:cs/0407053v1 [cs.ir] 21 Jul 2004 S. ORLANDO, R. PEREGO, F. SILVESTRI Dipartimento di Informatica, Universita Ca Foscari, Venezia, Italy Istituto

More information

Pivot Selection Strategies for Permutation-Based Similarity Search

Pivot Selection Strategies for Permutation-Based Similarity Search Pivot Selection Strategies for Permutation-Based Similarity Search Giuseppe Amato, Andrea Esuli, and Fabrizio Falchi Istituto di Scienza e Tecnologie dell Informazione A. Faedo, via G. Moruzzi 1, Pisa

More information

DL-Media: An Ontology Mediated Multimedia Information Retrieval System

DL-Media: An Ontology Mediated Multimedia Information Retrieval System DL-Media: An Ontology Mediated Multimedia Information Retrieval System ISTI-CNR, Pisa, Italy straccia@isti.cnr.it What is DLMedia? Multimedia Information Retrieval (MIR) Retrieval of those multimedia objects

More information

A Digital Library Framework for Reusing e-learning Video Documents

A Digital Library Framework for Reusing e-learning Video Documents A Digital Library Framework for Reusing e-learning Video Documents Paolo Bolettieri, Fabrizio Falchi, Claudio Gennaro, and Fausto Rabitti ISTI-CNR, via G. Moruzzi 1, 56124 Pisa, Italy paolo.bolettieri,fabrizio.falchi,claudio.gennaro,

More information

doc. RNDr. Tomáš Skopal, Ph.D. Department of Software Engineering, Faculty of Information Technology, Czech Technical University in Prague

doc. RNDr. Tomáš Skopal, Ph.D. Department of Software Engineering, Faculty of Information Technology, Czech Technical University in Prague Praha & EU: Investujeme do vaší budoucnosti Evropský sociální fond course: Searching the Web and Multimedia Databases (BI-VWM) Tomáš Skopal, 2011 SS2010/11 doc. RNDr. Tomáš Skopal, Ph.D. Department of

More information

Resource Discovery in a Dynamic Grid Environment

Resource Discovery in a Dynamic Grid Environment Resource Discovery in a Dynamic Grid Environment Moreno Marzolla 1 Matteo Mordacchini 1,2 Salvatore Orlando 1,3 1 Dip. di Informatica, Università Ca Foscari di Venezia, via Torino 155, 30172 Mestre, Italy

More information

Similarity Search in Large Collections of Biometric Data

Similarity Search in Large Collections of Biometric Data Pavel Zezula Michal Batko Vlastislav Dohnal David Novak Jan Sedmidubsky Faculty of Informatics, Masaryk University Botanicka 68a, 602 00 Brno Czech Republic ABSTRACT [zezula, batko, dohnal, xnovak8, xsedmid]@fi.muni.cz

More information

Nearest Neighbor Search by Branch and Bound

Nearest Neighbor Search by Branch and Bound Nearest Neighbor Search by Branch and Bound Algorithmic Problems Around the Web #2 Yury Lifshits http://yury.name CalTech, Fall 07, CS101.2, http://yury.name/algoweb.html 1 / 30 Outline 1 Short Intro to

More information

Scalability Comparison of Peer-to-Peer Similarity-Search Structures

Scalability Comparison of Peer-to-Peer Similarity-Search Structures Scalability Comparison of Peer-to-Peer Similarity-Search Structures Michal Batko a David Novak a Fabrizio Falchi b Pavel Zezula a a Masaryk University, Brno, Czech Republic b ISTI-CNR, Pisa, Italy Abstract

More information

Distributed Multi-modal Similarity Retrieval

Distributed Multi-modal Similarity Retrieval Distributed Multi-modal Similarity Retrieval David Novak Seminar of DISA Lab, October 14, 2014 David Novak Multi-modal Similarity Retrieval DISA Seminar 1 / 17 Outline of the Talk 1 Motivation Similarity

More information

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015

International Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015 RESEARCH ARTICLE OPEN ACCESS A Semantic Link Network Based Search Engine For Multimedia Files Anuj Kumar 1, Ravi Kumar Singh 2, Vikas Kumar 3, Vivek Patel 4, Priyanka Paygude 5 Student B.Tech (I.T) [1].

More information

Setting up Flickr. Open a web browser such as Internet Explorer and type this url in the address bar.

Setting up Flickr. Open a web browser such as Internet Explorer and type this url in the address bar. Workshop 15 th July 2008 Page 1 http://blog.larkin.net.au/ Setting up Flickr Flickr is a photo sharing tool that has rich functionality and allows the public to share their photographs with others., It

More information

Prototypes for building the P2P network

Prototypes for building the P2P network NeP4B Networked Peers For Business WP 3 Task T 3.5 Deliverable P3.5.1 s for building the P2P network (Final, 30/06/2009) Abstract This deliverable describes the non integrated prototype releases of semantic

More information

Recognizing Landmarks Using Automated Classification Techniques: an Evaluation of Various Visual Features

Recognizing Landmarks Using Automated Classification Techniques: an Evaluation of Various Visual Features Recognizing Landmarks Using Automated Classification Techniques: an Evaluation of Various Visual Features Giuseppe Amato, Fabrizio Falchi, and Paolo Bolettieri ISTI-CNR via G. Moruzzi, 1 - Pisa, Italy

More information

Tree Vector Indexes: Efficient Range Queries for Dynamic Content on Peer-to-Peer Networks

Tree Vector Indexes: Efficient Range Queries for Dynamic Content on Peer-to-Peer Networks Tree Vector Indexes: Efficient Range Queries for Dynamic Content on Peer-to-Peer Networks Moreno Marzolla 2 Matteo Mordacchini 1,2 Salvatore Orlando 1,3 1 Dip. di Informatica, Università Ca Foscari di

More information

A Distributed Incremental Nearest Neighbor Algorithm

A Distributed Incremental Nearest Neighbor Algorithm A Distributed Incremental Nearest Neighbor Algorithm Fabrizio Falchi, Claudio Gennaro, Fausto Rabitti and Pavel Zezula ISTI-CNR, Pisa, Italy Faculty of Informatics, Masaryk University, Brno, Czech Republic

More information

Using MILOS to build a Multimedia Digital Library Application: The PhotoBook experience

Using MILOS to build a Multimedia Digital Library Application: The PhotoBook experience Using MILOS to build a Multimedia Digital Library Application: The PhotoBook experience Giuseppe Amato, Paolo Bolettieri, Franca Debole, Fabrizio Falchi, Fausto Rabitti, and Pasquale Savino ISTI-CNR, Pisa,

More information

CLIR and Digital Libraries

CLIR and Digital Libraries CLIR and Digital Libraries RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Google s CLIR for DL.........................................

More information

FROM PEER TO PEER...

FROM PEER TO PEER... FROM PEER TO PEER... Dipartimento di Informatica, Università degli Studi di Pisa HPC LAB, ISTI CNR Pisa in collaboration with: Alessandro Lulli, Emanuele Carlini, Massimo Coppola, Patrizio Dazzi 2 nd HPC

More information

Link Analysis and Web Search

Link Analysis and Web Search Link Analysis and Web Search Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ based on material by prof. Bing Liu http://www.cs.uic.edu/~liub/webminingbook.html

More information

The 8th Workshop on Large-Scale Distributed Systems for Information Retrieval (LSDS-IR 10)

The 8th Workshop on Large-Scale Distributed Systems for Information Retrieval (LSDS-IR 10) WORKSHOP REPORT The 8th Workshop on Large-Scale Distributed Systems for Information Retrieval (LSDS-IR 10) Roi Blanco Yahoo! Research Barcelona, Spain roi@yahoo-inc.com B. Barla Cambazoglu Yahoo! Research

More information

K-Nearest Neighbour Classifier. Izabela Moise, Evangelos Pournaras, Dirk Helbing

K-Nearest Neighbour Classifier. Izabela Moise, Evangelos Pournaras, Dirk Helbing K-Nearest Neighbour Classifier Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 Reminder Supervised data mining Classification Decision Trees Izabela

More information

Query Processing in Highly-Loaded Search Engines

Query Processing in Highly-Loaded Search Engines Query Processing in Highly-Loaded Search Engines Daniele Broccolo 1,2, Craig Macdonald 3, Salvatore Orlando 1,2, Iadh Ounis 3, Raffaele Perego 2, Fabrizio Silvestri 2, and Nicola Tonellotto 2 1 Università

More information

Branch and Bound. Algorithms for Nearest Neighbor Search: Lecture 1. Yury Lifshits

Branch and Bound. Algorithms for Nearest Neighbor Search: Lecture 1. Yury Lifshits Branch and Bound Algorithms for Nearest Neighbor Search: Lecture 1 Yury Lifshits http://yury.name Steklov Institute of Mathematics at St.Petersburg California Institute of Technology 1 / 36 Outline 1 Welcome

More information

PP-Index: Using Permutation Prefixes for Efficient and Scalable Approximate Similarity Search

PP-Index: Using Permutation Prefixes for Efficient and Scalable Approximate Similarity Search PP-Index: Using Permutation Prefixes for Efficient and Scalable Approximate Similarity Search Andrea Esuli andrea.esuli@isti.cnr.it Istituto di Scienza e Tecnologie dell Informazione A. Faedo Consiglio

More information

A Content-Addressable Network for Similarity Search in Metric Spaces

A Content-Addressable Network for Similarity Search in Metric Spaces University of Pisa Masaryk University Faculty of Informatics Department of Information Engineering Doctoral Course in Information Engineering Department of Information Technologies Doctoral Course in Informatics

More information

Efficient Indexing and Searching Framework for Unstructured Data

Efficient Indexing and Searching Framework for Unstructured Data Efficient Indexing and Searching Framework for Unstructured Data Kyar Nyo Aye, Ni Lar Thein University of Computer Studies, Yangon kyarnyoaye@gmail.com, nilarthein@gmail.com ABSTRACT The proliferation

More information

BMVA symposium on "Security and surveillance: performance evaluation

BMVA symposium on Security and surveillance: performance evaluation BMVA symposium on "Security and surveillance: performance evaluation Visor Video Surveillance Online Repository Roberto o Vezzani and Rita Cucchiara a Imagelab Information Engineering Department University

More information

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20 Data mining Piotr Paszek Classification k-nn Classifier (Piotr Paszek) Data mining k-nn 1 / 20 Plan of the lecture 1 Lazy Learner 2 k-nearest Neighbor Classifier 1 Distance (metric) 2 How to Determine

More information

Discrete planning (an introduction)

Discrete planning (an introduction) Sistemi Intelligenti Corso di Laurea in Informatica, A.A. 2017-2018 Università degli Studi di Milano Discrete planning (an introduction) Nicola Basilico Dipartimento di Informatica Via Comelico 39/41-20135

More information

Mining di Dati Web. Lezione 3 - Clustering and Classification

Mining di Dati Web. Lezione 3 - Clustering and Classification Mining di Dati Web Lezione 3 - Clustering and Classification Introduction Clustering and classification are both learning techniques They learn functions describing data Clustering is also known as Unsupervised

More information

Digital Libraries: Language Technologies

Digital Libraries: Language Technologies Digital Libraries: Language Technologies RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Recall: Inverted Index..........................................

More information

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016 CPSC 340: Machine Learning and Data Mining Non-Parametric Models Fall 2016 Assignment 0: Admin 1 late day to hand it in tonight, 2 late days for Wednesday. Assignment 1 is out: Due Friday of next week.

More information

Near Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri

Near Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri Near Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri Scene Completion Problem The Bare Data Approach High Dimensional Data Many real-world problems Web Search and Text Mining Billions

More information

An Infrastructure for MultiMedia Metadata Management

An Infrastructure for MultiMedia Metadata Management An Infrastructure for MultiMedia Metadata Management Patrizia Asirelli, Massimo Martinelli, Ovidio Salvetti Istituto di Scienza e Tecnologie dell Informazione, CNR, 56124 Pisa, Italy {Patrizia.Asirelli,

More information

SEO and UAEX.EDU GETTING YOUR WEB PAGES FOUND IN GOOGLE

SEO and UAEX.EDU GETTING YOUR WEB PAGES FOUND IN GOOGLE SEO and UAEX.EDU GETTING YOUR WEB PAGES FOUND IN GOOGLE What is Search Engine Optimization? SEO is a marketing discipline focused on growing visibility in organic (non-paid) search engine results. Why

More information

The Social Network of Java Classes Power-law for Dummies

The Social Network of Java Classes Power-law for Dummies The Social Network of Java Classes Power-law for Dummies Diego Puppin Institute for Information Sciences and Technology Pisa, Italy March 24, 2006 Diego Puppin (ISTI-CNR) LabDay March 24, 2006 1 / 41 Outline

More information

Multimedia Databases. 2. Summary. 2 Color-based Retrieval. 2.1 Multimedia Data Retrieval. 2.1 Multimedia Data Retrieval 4/14/2016.

Multimedia Databases. 2. Summary. 2 Color-based Retrieval. 2.1 Multimedia Data Retrieval. 2.1 Multimedia Data Retrieval 4/14/2016. 2. Summary Multimedia Databases Wolf-Tilo Balke Younes Ghammad Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Last week: What are multimedia databases?

More information

CSI 4107 Image Information Retrieval

CSI 4107 Image Information Retrieval CSI 4107 Image Information Retrieval This slides are inspired by a tutorial on Medical Image Retrieval by Henning Müller and Thomas Deselaers, 2005-2006 1 Outline Introduction Content-based image retrieval

More information

NewSQL Databases. The reference Big Data stack

NewSQL Databases. The reference Big Data stack Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica NewSQL Databases Corso di Sistemi e Architetture per Big Data A.A. 2017/18 Valeria Cardellini The reference

More information

Clustered Pivot Tables for I/O-optimized Similarity Search

Clustered Pivot Tables for I/O-optimized Similarity Search Clustered Pivot Tables for I/O-optimized Similarity Search Juraj Moško Charles University in Prague, Faculty of Mathematics and Physics, SIRET research group mosko.juro@centrum.sk Jakub Lokoč Charles University

More information

Patent Image Retrieval

Patent Image Retrieval Patent Image Retrieval Stefanos Vrochidis IRF Symposium 2008 Vienna, November 6, 2008 Aristotle University of Thessaloniki Overview 1. Introduction 2. Related Work in Patent Image Retrieval 3. Patent Image

More information

A Pivot-based Index Structure for Combination of Feature Vectors

A Pivot-based Index Structure for Combination of Feature Vectors A Pivot-based Index Structure for Combination of Feature Vectors Benjamin Bustos Daniel Keim Tobias Schreck Department of Computer and Information Science, University of Konstanz Universitätstr. 10 Box

More information

Internet Applications. Q. What is Internet Explorer? Explain features of Internet Explorer.

Internet Applications. Q. What is Internet Explorer? Explain features of Internet Explorer. Internet Applications Q. What is Internet Explorer? Explain features of Internet Explorer. Internet explorer: Microsoft Internet Explorer is a computer program called a browser that helps you interact

More information

US Patent 6,658,423. William Pugh

US Patent 6,658,423. William Pugh US Patent 6,658,423 William Pugh Detecting duplicate and near - duplicate files Worked on this problem at Google in summer of 2000 I have no information whether this is currently being used I know that

More information

Information Retrieval (IR) Introduction to Information Retrieval. Lecture Overview. Why do we need IR? Basics of an IR system.

Information Retrieval (IR) Introduction to Information Retrieval. Lecture Overview. Why do we need IR? Basics of an IR system. Introduction to Information Retrieval Ethan Phelps-Goodman Some slides taken from http://www.cs.utexas.edu/users/mooney/ir-course/ Information Retrieval (IR) The indexing and retrieval of textual documents.

More information

Data and the Environment: Impacts on Cost and Success

Data and the Environment: Impacts on Cost and Success Data and the Environment: Impacts on Cost and Success April 2009 Philip Sampson 630-217-6614 Agenda Cost of Quality Test Objectives Data consideration fundamentals Environment consideration fundamentals

More information

Kafka Streams: Hands-on Session A.A. 2017/18

Kafka Streams: Hands-on Session A.A. 2017/18 Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Kafka Streams: Hands-on Session A.A. 2017/18 Matteo Nardelli Laurea Magistrale in Ingegneria Informatica

More information

Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards

Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards Jürgen Herre for Integrated Circuits (FhG-IIS) Erlangen, Germany Jürgen Herre, hrr@iis.fhg.de Page 1 Overview Extracting meaning

More information

Distributed and Stream Data Mining Algorithms for Frequent Pattern Discovery

Distributed and Stream Data Mining Algorithms for Frequent Pattern Discovery Università Ca Foscari di Venezia Dipartimento di Informatica Dottorato di Ricerca in Informatica Ph.D. Thesis: TD-2006-4 Distributed and Stream Data Mining Algorithms for Frequent Pattern Discovery Claudio

More information

SIGIR Workshop Report. The SIGIR Heterogeneous and Distributed Information Retrieval Workshop

SIGIR Workshop Report. The SIGIR Heterogeneous and Distributed Information Retrieval Workshop SIGIR Workshop Report The SIGIR Heterogeneous and Distributed Information Retrieval Workshop Ranieri Baraglia HPC-Lab ISTI-CNR, Italy ranieri.baraglia@isti.cnr.it Fabrizio Silvestri HPC-Lab ISTI-CNR, Italy

More information

TEAM 6. Clustering. Professor Anita Wasilewska Cours Number: CSE 634 Course Name: DATA MINING

TEAM 6. Clustering. Professor Anita Wasilewska Cours Number: CSE 634 Course Name: DATA MINING TEAM 6 Clustering Professor Anita Wasilewska Cours Number: CSE 634 Course Name: DATA MINING References Examples: https://docs.google.com/presentation/d/1tjlskhnblpizilxhhiv0lo4c-ond2yfg_-cztvbfyho/edit#slide=id.g1384764765_0_57

More information

CPSC 340: Machine Learning and Data Mining

CPSC 340: Machine Learning and Data Mining CPSC 340: Machine Learning and Data Mining Fundamentals of learning (continued) and the k-nearest neighbours classifier Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart.

More information

Context Based Indexing in Search Engines: A Review

Context Based Indexing in Search Engines: A Review International Journal of Computer (IJC) ISSN 2307-4523 (Print & Online) Global Society of Scientific Research and Researchers http://ijcjournal.org/ Context Based Indexing in Search Engines: A Review Suraksha

More information

Information Retrieval

Information Retrieval Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,

More information

Performance Analysis for Crawling

Performance Analysis for Crawling Scalable Servers and Load Balancing Kai Shen Online Applications online applications Applications accessible to online users through. Examples Online keyword search engine: Google. Web email: Gmail. News:

More information

Geometric data structures:

Geometric data structures: Geometric data structures: Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade Sham Kakade 2017 1 Announcements: HW3 posted Today: Review: LSH for Euclidean distance Other

More information

Measuring and Visualizing Inter-domain Routing Dynamics with BGPATH

Measuring and Visualizing Inter-domain Routing Dynamics with BGPATH UNIVERSITÀ DEGLI STUDI ROMA TRE Dipartimento di Informatica e Automazione Measuring and Visualizing Inter-domain Routing Dynamics with BGPATH Luca Cittadini Tiziana Refice Alessio Campisano Giuseppe Di

More information

Scalable many-light methods

Scalable many-light methods Scalable many-light methods Jaroslav Křivánek Charles University in Prague Instant radiosity Approximate indirect illumination by 1. Generate VPLs 2. Render with VPLs 2 Instant radiosity with glossy surfaces

More information

Automated Classification of Bitmap Images using Decision Trees

Automated Classification of Bitmap Images using Decision Trees Automated Classification of Bitmap Images using Decision Trees Pavel Surynek and Ivana Lukšová Abstract The paper addresses the design of a method for automated classification of bitmap images into classes

More information

11 - Spatial Data Structures

11 - Spatial Data Structures 11 - Spatial Data Structures cknowledgement: Marco Tarini Types of Queries Graphic applications often require spatial queries Find the k points closer to a specific point p (k-nearest Neighbours, knn)

More information

doc. RNDr. Tomáš Skopal, Ph.D. Department of Software Engineering, Faculty of Information Technology, Czech Technical University in Prague

doc. RNDr. Tomáš Skopal, Ph.D. Department of Software Engineering, Faculty of Information Technology, Czech Technical University in Prague Praha & EU: Investujeme do vaší budoucnosti Evropský sociální fond course: Searching the Web and Multimedia Databases (BI-VWM) Tomáš Skopal, 2011 SS2010/11 doc. RNDr. Tomáš Skopal, Ph.D. Department of

More information

Making a wiki is as easy as making a peanut butter sandwich!

Making a wiki is as easy as making a peanut butter sandwich! Social Media Workshop: On Safari with Digital Natives LARC, 2009 PBworks Wiki Making a wiki is as easy as making a peanut butter sandwich! http://demo395.pbworks.com This is the URL for the wiki we will

More information

Search Engines and Time Series Databases

Search Engines and Time Series Databases Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Search Engines and Time Series Databases Corso di Sistemi e Architetture per Big Data A.A. 2017/18

More information

Non-Parametric Modeling

Non-Parametric Modeling Non-Parametric Modeling CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Non-Parametric Density Estimation Parzen Windows Kn-Nearest Neighbor

More information

BUbiNG. Massive Crawling for the Masses. Paolo Boldi, Andrea Marino, Massimo Santini, Sebastiano Vigna

BUbiNG. Massive Crawling for the Masses. Paolo Boldi, Andrea Marino, Massimo Santini, Sebastiano Vigna BUbiNG Massive Crawling for the Masses Paolo Boldi, Andrea Marino, Massimo Santini, Sebastiano Vigna Dipartimento di Informatica Università degli Studi di Milano Italy Once upon a time UbiCrawler UbiCrawler

More information

Approximate Nearest Neighbor Search. Deng Cai Zhejiang University

Approximate Nearest Neighbor Search. Deng Cai Zhejiang University Approximate Nearest Neighbor Search Deng Cai Zhejiang University The Era of Big Data How to Find Things Quickly? Web 1.0 Text Search Sparse feature Inverted Index How to Find Things Quickly? Web 2.0, 3.0

More information

An Improved Indexing Mechanism Based On Homonym Using Hierarchical Clustering in Search Engine *

An Improved Indexing Mechanism Based On Homonym Using Hierarchical Clustering in Search Engine * International Journal of Computing Academic Research (IJCAR) ISSN 2305-9184 Volume 4, Number 6(2015), pp.270-277 MEACSE Publications http://www.meacse.org/ijcar An Improved Indexing Mechanism Based On

More information

Large High-Dimensional Point Sets. Regular Paper Journal. Corresponding Author

Large High-Dimensional Point Sets. Regular Paper Journal. Corresponding Author Title Page Distributed Computation of the knn Graph for Large High-Dimensional Point Sets Regular Paper Journal Corresponding Author Lydia E. Kavraki Rice University, Department of Computer Science, 6100

More information

Using the data in the archive

Using the data in the archive Using the data in the archive Jacquelijn Ringersma The Language Archive Max Planck Institute for Psycholinguistics DGfS-CNRS Summer School on Linguistic Typology A very rich archive A very rich archive

More information

AST initiative 3 AST principles and goals 4 Model problems 11

AST initiative 3 AST principles and goals 4 Model problems 11 Università degli Studi dell Aquila Henry Muccini Dipartimento di Informatica www.henrymuccini.com University of L Aquila - Italy henry.muccini@di.univaq.it AST 2011, 6th IEEE/ACM ICSE workshop on Automation

More information

Search and Time Series Databases

Search and Time Series Databases Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Search and Time Series Databases Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria

More information

Dependent Variable Independent Variable dependent variable : independent variable function: domain ran ge

Dependent Variable Independent Variable dependent variable : independent variable function: domain ran ge FUNCTIONS The values of one variable often depend on the values for another: The temperature at which water boils depends on elevation (the boiling point drops as you go up). The amount by which your savings

More information

DL User Interfaces. Giuseppe Santucci Dipartimento di Informatica e Sistemistica Università di Roma La Sapienza

DL User Interfaces. Giuseppe Santucci Dipartimento di Informatica e Sistemistica Università di Roma La Sapienza DL User Interfaces Giuseppe Santucci Dipartimento di Informatica e Sistemistica Università di Roma La Sapienza Delos work on DL interfaces Delos Cluster 4: User interfaces and visualization Cluster s goals:

More information

Overview of Web Mining Techniques and its Application towards Web

Overview of Web Mining Techniques and its Application towards Web Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous

More information

Differential Compression and Optimal Caching Methods for Content-Based Image Search Systems

Differential Compression and Optimal Caching Methods for Content-Based Image Search Systems Differential Compression and Optimal Caching Methods for Content-Based Image Search Systems Di Zhong a, Shih-Fu Chang a, John R. Smith b a Department of Electrical Engineering, Columbia University, NY,

More information

A New Context Based Indexing in Search Engines Using Binary Search Tree

A New Context Based Indexing in Search Engines Using Binary Search Tree A New Context Based Indexing in Search Engines Using Binary Search Tree Aparna Humad Department of Computer science and Engineering Mangalayatan University, Aligarh, (U.P) Vikas Solanki Department of Computer

More information

Non-Bayesian Classifiers Part I: k-nearest Neighbor Classifier and Distance Functions

Non-Bayesian Classifiers Part I: k-nearest Neighbor Classifier and Distance Functions Non-Bayesian Classifiers Part I: k-nearest Neighbor Classifier and Distance Functions Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551,

More information

SEMANTIC WEB DATA MANAGEMENT. from Web 1.0 to Web 3.0

SEMANTIC WEB DATA MANAGEMENT. from Web 1.0 to Web 3.0 SEMANTIC WEB DATA MANAGEMENT from Web 1.0 to Web 3.0 CBD - 21/05/2009 Roberto De Virgilio MOTIVATIONS Web evolution Self-describing Data XML, DTD, XSD RDF, RDFS, OWL WEB 1.0, WEB 2.0, WEB 3.0 Web 1.0 is

More information

Clustering Billions of Images with Large Scale Nearest Neighbor Search

Clustering Billions of Images with Large Scale Nearest Neighbor Search Clustering Billions of Images with Large Scale Nearest Neighbor Search Ting Liu, Charles Rosenberg, Henry A. Rowley IEEE Workshop on Applications of Computer Vision February 2007 Presented by Dafna Bitton

More information

Lecture 8: Internet and Online Services. CS 598: Advanced Internetworking Matthew Caesar March 3, 2011

Lecture 8: Internet and Online Services. CS 598: Advanced Internetworking Matthew Caesar March 3, 2011 Lecture 8: Internet and Online Services CS 598: Advanced Internetworking Matthew Caesar March 3, 2011 Demands of modern networked services Old approach: run applications on local PC Now: major innovation

More information

Gene Clustering & Classification

Gene Clustering & Classification BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering

More information

Lecture 7: Decision Trees

Lecture 7: Decision Trees Lecture 7: Decision Trees Instructor: Outline 1 Geometric Perspective of Classification 2 Decision Trees Geometric Perspective of Classification Perspective of Classification Algorithmic Geometric Probabilistic...

More information

OLAP Introduction and Overview

OLAP Introduction and Overview 1 CHAPTER 1 OLAP Introduction and Overview What Is OLAP? 1 Data Storage and Access 1 Benefits of OLAP 2 What Is a Cube? 2 Understanding the Cube Structure 3 What Is SAS OLAP Server? 3 About Cube Metadata

More information

Toward a Unified View of Data and Services

Toward a Unified View of Data and Services Semantic Data & Service Integration Workshop,Berlin, Toward a Unified View of Data and Services Carlo Batini Matteo Palmonari Andrea Maurino University of Milan-Bicocca Italy Sonia Bergamaschi Francesco

More information

Information Retrieval and Web Search

Information Retrieval and Web Search Information Retrieval and Web Search Course overview Instructor: Rada Mihalcea What is this course about? Processing Indexing Retrieving textual data (or audio, video, geo-spatial,, data) Fits in four

More information

The Stanford/Technicolor/Fraunhofer HHI Video Semantic Indexing System

The Stanford/Technicolor/Fraunhofer HHI Video Semantic Indexing System The Stanford/Technicolor/Fraunhofer HHI Video Semantic Indexing System Our first participation on the TRECVID workshop A. F. de Araujo 1, F. Silveira 2, H. Lakshman 3, J. Zepeda 2, A. Sheth 2, P. Perez

More information

Cluster Analysis. Angela Montanari and Laura Anderlucci

Cluster Analysis. Angela Montanari and Laura Anderlucci Cluster Analysis Angela Montanari and Laura Anderlucci 1 Introduction Clustering a set of n objects into k groups is usually moved by the aim of identifying internally homogenous groups according to a

More information

Constructing Overlay Networks through Gossip

Constructing Overlay Networks through Gossip Constructing Overlay Networks through Gossip Márk Jelasity Università di Bologna Project funded by the Future and Emerging Technologies arm of the IST Programme The Four Main Theses 1: Topology (network

More information

Peer-to-Peer Similarity Search in Metric Spaces

Peer-to-Peer Similarity Search in Metric Spaces Peer-to-Peer Similarity Search in Metric Spaces Christos Doulkeridis, Akrivi Vlachou, Yannis Kotidis, Michalis Vazirgiannis http://www.db-net.aueb.gr/cdoulk/ cdoulk@aueb.gr Department of Informatics Athens

More information

Shapes & Transformations and Angles & Measurements Spatial Visualization and Reflections a.) b.) c.) d.) a.) b.) c.)

Shapes & Transformations and Angles & Measurements Spatial Visualization and Reflections a.) b.) c.) d.) a.) b.) c.) Chapters 1 & 2 Team Number Name Shapes & Transformations and Angles & Measurements 1.2.1 Spatial Visualization and Reflections 1-47. d.) 1-48. 1-49. 1-50. 1-51. d.) 1-52. On the axes at right, graph the

More information