SISTEMI PER LA RICERCA
|
|
- Valerie Anthony
- 5 years ago
- Views:
Transcription
1 SISTEMI PER LA RICERCA DI DATI MULTIMEDIALI Claudio Lucchese e Salvatore Orlando Terzo Workshop di Dipartimento Dipartimento di Informatica Università degli studi di Venezia
2 Ringrazio le seguenti persone, coinvolte nel progetto SAPIR Raffaele Perego Fausto Rabitti Paolo Bolettieri Fabrizio Falchi Tommaso Piccioli Matteo Mordacchini
3 WHAT AND, MORE IMPORTANTLY, WHY??! Multi-Media M Objects: text, web pages (Google, Yahoo!, ) images (Flickr bought by Yahoo!, ) video (YouTube bought by Google, ) audio etc.... Why we are interested in MM objects? 1 image every 10 documents. Increasing multimedia content on the web. Yet, search in audio-visual content is limited to associated text and metadata annotations. 3
4 OK, BUT GOOGLE ALREADY DOES THE JOB... We mean to search by content! Google does text-based t (annotations) ti search Do we have enough text? How is the quality of such text? Is it correlated with the actual content? Look at this: ISTI-CNR (Pisa) and Max Planck Kunsthistorische Institute project on Florentine Coat of Arms 4
5 SISTEMI PER LA RICERCA DI DATI MULTIMEDIALI 5 CONTENT-BASED SEARCH Shape Selection Results
6 SISTEMI PER LA RICERCA DI DATI MULTIMEDIALI 6 CONTENT-BASED SEARCH Shape Selection Results
7 WHAT IS THE ADDED VALUE? Content-based search for disambiguation: What about searching for sapphire on Flickr? SISTEMI PER LA RICE ERCA DI DATI MULTIMEDIALI 7 of 21
8 CONTENT-BASED, I.E. SIMILARITY SEARCH Objects are unknown distances between objects is known Metric Space assumption: symmetry identity triangle inequality Distance functions inducing a metric space: Minkowski distances, edit distance, jaccard distance... Typical queries: Range or knn Applications: 8
9 SIMILARITY SEARCH Photo Search Objects are unknown query: distances between objects is known Metric Space assumption: symmetry identity triangle inequality Distance functions inducing a metric space: Minkowski distances, edit distance, jaccard distance... Typical queries: Range or knn Applications: 9
10 Photo Search SIMILARITY SEARCHMedical Data Search Objects are unknown distances between query: objects is known Metric Space assumption: symmetry identity triangle inequality Distance functions inducing a metric space: Minkowski distances, edit distance, jaccard distance... Typical queries: Range or knn Applications: 10
11 Photo Search SIMILARITY SEARCHMedical Data Search 3D Shape Search Objects are unknown distances between objects is known Metric Space assumption: symmetry query: (haemoglobin identity molecule) triangle inequality Distance functions inducing a metric space: Minkowski distances, edit distance, jaccard distance... Typical queries: Range or knn Applications: 11
12 SIMILARITY SEARCH Objects are unknown distances between objects is known Metric Space assumption: symmetry identity triangle inequality Distance functions inducing a metric space: Minkowski distances, edit distance, jaccard distance... Typical queries: Range or knn Applications: photos, 3D shapes, medical images but also text, dna, graphs, etc. etc. 12
13 SISTEMI PER LA RICERCA DI DATI MULTIMEDIALI 13 FEATURE-BASED APPROACH From the object space to the feature space
14 SISTEMI PER LA RICERCA DI DATI MULTIMEDIALI Generalized Hyper-plane p Tree DATA STRUCTURES Vantage Point Tree 14
15 DATA STRUCTURES Vantage Point Tree Generalized Hyper-plane p Tree Multiple sub-trees may hold the solution They perform well with high dimensionality data 15
16 I HAVE NEVER SEEN ANYTHING LIKE THAT IN THE WEB!!! Why giants like Google and Yahoo! are not using content-based search? Recent studies confirm that centralized solutions are not scalable! A single standard PC would need about 12 years to process a collection of 100 million images. Why? Feature extraction is expensive! feature extraction vs. words in a web-page Searching is expensive! similarity search vs. boolean search 16
17 I HAVE NEVER SEEN ANYTHING LIKE THAT IN THE WEB!!! Why giants like Google and Yahoo! are not using content-based search? Recent studies confirm that centralized solutions are not scalable! A single standard PC would need about 12 years to process a collection of 100 million images. Why? Feature extraction is expensive! feature extraction vs. words in a web-page Searching is expensive! similarity search vs. boolean search 17
18 PARALLEL FEATURE EXTRACTION We crawled Flickr to gather Image Ids A set of clients were deployed on the EDEE Grid: Retrieve 1000 imaged Ids Download Image Extract MPEG-7 features Parse Flicker Photo-page to obtain additional metadata Send metadata to our centralized repository We have about 50 millions images with features it seems that Flickr as at least 1 billion images Yahoo! images has at least 2 billion images and they grow exponentially! Our metadata collection is called CoPhIR It is largely the largest publicly available collection 18
19 PARALLEL SEARCH The search space is mapped into a linear interval This is mapped onto a distributed network thanks to some DHT-based algorithm Each node of the network holds an M-Tree SISTEMI PER LA RICE ERCA DI DAT TI MULTIMEDIALI 19
20 OUR PROPOSAL Metric cache: Cache is widely used in traditional search engines Trivial: i Store results previous queries Return results when submitted query is stored Less trivial... Store results of previous queries Best-effort query answering (with guarantees) ) Use past queries to optimize database queries. 20
21 A CACHE WITH ONE ENTRY... q i is the query in cache with its k-nn. r i is the radius of the query. q j is a new query. If d(q i, q j )<r i then the cached objects at tdistance R ji d(q i, q j ) q i r i q j R ji = r i - d( q i, q j ) from q j are the top-k results of the new query. 21
22 A CACHE WITH MANY ENTRIES Given the new query q x find the largest R xi corresponding to the cached query q i. The cached objects within distance R xi from q x are the most similar to the query. r l q i r i r j Additionally, one could use other objects in the q l q x cache to provide an approximate answer. 22 q j
23 SISTEMI PER LA RICERCA DI DATI MULTIMEDIALI 23 SOME RESULTS...
24 SISTEMI PER LA RICERCA DI DATI MULTIMEDIALI 24 SOME RESULTS...
25 SISTEMI PER LA RICERCA DI DATI MULTIMEDIALI 25 SOME RESULTS...
26 THE END.... Grazie! =)
A Metric Cache for Similarity Search
A Metric Cache for Similarity Search Fabrizio Falchi ISTI-CNR Pisa, Italy fabrizio.falchi@isti.cnr.it Raffaele Perego ISTI-CNR Pisa, Italy raffaele.perego@isti.cnr.it Claudio Lucchese ISTI-CNR Pisa, Italy
More informationCrawling, Indexing, and Similarity Searching Images on the Web
Crawling, Indexing, and Similarity Searching Images on the Web (Extended Abstract) M. Batko 1 and F. Falchi 2 and C. Lucchese 2 and D. Novak 1 and R. Perego 2 and F. Rabitti 2 and J. Sedmidubsky 1 and
More informationAn approach to Content-Based Image Retrieval based on the Lucene search engine library (Extended Abstract)
An approach to Content-Based Image Retrieval based on the Lucene search engine library (Extended Abstract) Claudio Gennaro, Giuseppe Amato, Paolo Bolettieri, Pasquale Savino {claudio.gennaro, giuseppe.amato,
More informationdoc. RNDr. Tomáš Skopal, Ph.D. Department of Software Engineering, Faculty of Information Technology, Czech Technical University in Prague
Praha & EU: Investujeme do vaší budoucnosti Evropský sociální fond course: Searching the Web and Multimedia Databases (BI-VWM) Tomáš Skopal, 2011 SS2010/11 doc. RNDr. Tomáš Skopal, Ph.D. Department of
More informationIntroduction to Similarity Search in Multimedia Databases
Introduction to Similarity Search in Multimedia Databases Tomáš Skopal Charles University in Prague Faculty of Mathematics and Phycics SIRET research group http://siret.ms.mff.cuni.cz March 23 rd 2011,
More informationMUFIN Basics. MUFIN team Faculty of Informatics, Masaryk University Brno, Czech Republic SEMWEB 1
MUFIN Basics MUFIN team Faculty of Informatics, Masaryk University Brno, Czech Republic mufin@fi.muni.cz SEMWEB 1 Search problem SEARCH index structure data & queries infrastructure SEMWEB 2 The thesis
More informationThe Future of P2P Audio/Visual Search
The Future of P2P Audio/Visual Search Fausto Rabitti ISTI-CNR, Pisa, Italy P2PIR Workshop - ACM CIKM 2006 Outline of the talk 1. Guessing the future of technologies 2. Today outlook 1. Peer-to-Peer Applications
More informationCaching query-biased snippets for efficient retrieval
Caching query-biased snippets for efficient retrieval Salvatore Orlando Università Ca Foscari Venezia orlando@unive.it Diego Ceccarelli Dipartimento di Informatica Università di Pisa diego.ceccarelli@isti.cnr.it
More informationFunctionalities and Flow Analyses of Knowledge Oriented Web Portals
Functionalities and Flow Analyses of Knowledge Oriented Web Portals Daniele Cenni, Paolo Nesi, Michela Paolucci DISIT DSI, Distributed Systems and Internet Technology Lab Dipartimento di Sistemi e Informatica,
More informationarxiv:cs/ v1 [cs.ir] 21 Jul 2004
DESIGN OF A PARALLEL AND DISTRIBUTED WEB SEARCH ENGINE arxiv:cs/0407053v1 [cs.ir] 21 Jul 2004 S. ORLANDO, R. PEREGO, F. SILVESTRI Dipartimento di Informatica, Universita Ca Foscari, Venezia, Italy Istituto
More informationPivot Selection Strategies for Permutation-Based Similarity Search
Pivot Selection Strategies for Permutation-Based Similarity Search Giuseppe Amato, Andrea Esuli, and Fabrizio Falchi Istituto di Scienza e Tecnologie dell Informazione A. Faedo, via G. Moruzzi 1, Pisa
More informationDL-Media: An Ontology Mediated Multimedia Information Retrieval System
DL-Media: An Ontology Mediated Multimedia Information Retrieval System ISTI-CNR, Pisa, Italy straccia@isti.cnr.it What is DLMedia? Multimedia Information Retrieval (MIR) Retrieval of those multimedia objects
More informationA Digital Library Framework for Reusing e-learning Video Documents
A Digital Library Framework for Reusing e-learning Video Documents Paolo Bolettieri, Fabrizio Falchi, Claudio Gennaro, and Fausto Rabitti ISTI-CNR, via G. Moruzzi 1, 56124 Pisa, Italy paolo.bolettieri,fabrizio.falchi,claudio.gennaro,
More informationdoc. RNDr. Tomáš Skopal, Ph.D. Department of Software Engineering, Faculty of Information Technology, Czech Technical University in Prague
Praha & EU: Investujeme do vaší budoucnosti Evropský sociální fond course: Searching the Web and Multimedia Databases (BI-VWM) Tomáš Skopal, 2011 SS2010/11 doc. RNDr. Tomáš Skopal, Ph.D. Department of
More informationResource Discovery in a Dynamic Grid Environment
Resource Discovery in a Dynamic Grid Environment Moreno Marzolla 1 Matteo Mordacchini 1,2 Salvatore Orlando 1,3 1 Dip. di Informatica, Università Ca Foscari di Venezia, via Torino 155, 30172 Mestre, Italy
More informationSimilarity Search in Large Collections of Biometric Data
Pavel Zezula Michal Batko Vlastislav Dohnal David Novak Jan Sedmidubsky Faculty of Informatics, Masaryk University Botanicka 68a, 602 00 Brno Czech Republic ABSTRACT [zezula, batko, dohnal, xnovak8, xsedmid]@fi.muni.cz
More informationNearest Neighbor Search by Branch and Bound
Nearest Neighbor Search by Branch and Bound Algorithmic Problems Around the Web #2 Yury Lifshits http://yury.name CalTech, Fall 07, CS101.2, http://yury.name/algoweb.html 1 / 30 Outline 1 Short Intro to
More informationScalability Comparison of Peer-to-Peer Similarity-Search Structures
Scalability Comparison of Peer-to-Peer Similarity-Search Structures Michal Batko a David Novak a Fabrizio Falchi b Pavel Zezula a a Masaryk University, Brno, Czech Republic b ISTI-CNR, Pisa, Italy Abstract
More informationDistributed Multi-modal Similarity Retrieval
Distributed Multi-modal Similarity Retrieval David Novak Seminar of DISA Lab, October 14, 2014 David Novak Multi-modal Similarity Retrieval DISA Seminar 1 / 17 Outline of the Talk 1 Motivation Similarity
More informationInternational Journal of Computer Science Trends and Technology (IJCST) Volume 3 Issue 3, May-June 2015
RESEARCH ARTICLE OPEN ACCESS A Semantic Link Network Based Search Engine For Multimedia Files Anuj Kumar 1, Ravi Kumar Singh 2, Vikas Kumar 3, Vivek Patel 4, Priyanka Paygude 5 Student B.Tech (I.T) [1].
More informationSetting up Flickr. Open a web browser such as Internet Explorer and type this url in the address bar.
Workshop 15 th July 2008 Page 1 http://blog.larkin.net.au/ Setting up Flickr Flickr is a photo sharing tool that has rich functionality and allows the public to share their photographs with others., It
More informationPrototypes for building the P2P network
NeP4B Networked Peers For Business WP 3 Task T 3.5 Deliverable P3.5.1 s for building the P2P network (Final, 30/06/2009) Abstract This deliverable describes the non integrated prototype releases of semantic
More informationRecognizing Landmarks Using Automated Classification Techniques: an Evaluation of Various Visual Features
Recognizing Landmarks Using Automated Classification Techniques: an Evaluation of Various Visual Features Giuseppe Amato, Fabrizio Falchi, and Paolo Bolettieri ISTI-CNR via G. Moruzzi, 1 - Pisa, Italy
More informationTree Vector Indexes: Efficient Range Queries for Dynamic Content on Peer-to-Peer Networks
Tree Vector Indexes: Efficient Range Queries for Dynamic Content on Peer-to-Peer Networks Moreno Marzolla 2 Matteo Mordacchini 1,2 Salvatore Orlando 1,3 1 Dip. di Informatica, Università Ca Foscari di
More informationA Distributed Incremental Nearest Neighbor Algorithm
A Distributed Incremental Nearest Neighbor Algorithm Fabrizio Falchi, Claudio Gennaro, Fausto Rabitti and Pavel Zezula ISTI-CNR, Pisa, Italy Faculty of Informatics, Masaryk University, Brno, Czech Republic
More informationUsing MILOS to build a Multimedia Digital Library Application: The PhotoBook experience
Using MILOS to build a Multimedia Digital Library Application: The PhotoBook experience Giuseppe Amato, Paolo Bolettieri, Franca Debole, Fabrizio Falchi, Fausto Rabitti, and Pasquale Savino ISTI-CNR, Pisa,
More informationCLIR and Digital Libraries
CLIR and Digital Libraries RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Google s CLIR for DL.........................................
More informationFROM PEER TO PEER...
FROM PEER TO PEER... Dipartimento di Informatica, Università degli Studi di Pisa HPC LAB, ISTI CNR Pisa in collaboration with: Alessandro Lulli, Emanuele Carlini, Massimo Coppola, Patrizio Dazzi 2 nd HPC
More informationLink Analysis and Web Search
Link Analysis and Web Search Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ based on material by prof. Bing Liu http://www.cs.uic.edu/~liub/webminingbook.html
More informationThe 8th Workshop on Large-Scale Distributed Systems for Information Retrieval (LSDS-IR 10)
WORKSHOP REPORT The 8th Workshop on Large-Scale Distributed Systems for Information Retrieval (LSDS-IR 10) Roi Blanco Yahoo! Research Barcelona, Spain roi@yahoo-inc.com B. Barla Cambazoglu Yahoo! Research
More informationK-Nearest Neighbour Classifier. Izabela Moise, Evangelos Pournaras, Dirk Helbing
K-Nearest Neighbour Classifier Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1 Reminder Supervised data mining Classification Decision Trees Izabela
More informationQuery Processing in Highly-Loaded Search Engines
Query Processing in Highly-Loaded Search Engines Daniele Broccolo 1,2, Craig Macdonald 3, Salvatore Orlando 1,2, Iadh Ounis 3, Raffaele Perego 2, Fabrizio Silvestri 2, and Nicola Tonellotto 2 1 Università
More informationBranch and Bound. Algorithms for Nearest Neighbor Search: Lecture 1. Yury Lifshits
Branch and Bound Algorithms for Nearest Neighbor Search: Lecture 1 Yury Lifshits http://yury.name Steklov Institute of Mathematics at St.Petersburg California Institute of Technology 1 / 36 Outline 1 Welcome
More informationPP-Index: Using Permutation Prefixes for Efficient and Scalable Approximate Similarity Search
PP-Index: Using Permutation Prefixes for Efficient and Scalable Approximate Similarity Search Andrea Esuli andrea.esuli@isti.cnr.it Istituto di Scienza e Tecnologie dell Informazione A. Faedo Consiglio
More informationA Content-Addressable Network for Similarity Search in Metric Spaces
University of Pisa Masaryk University Faculty of Informatics Department of Information Engineering Doctoral Course in Information Engineering Department of Information Technologies Doctoral Course in Informatics
More informationEfficient Indexing and Searching Framework for Unstructured Data
Efficient Indexing and Searching Framework for Unstructured Data Kyar Nyo Aye, Ni Lar Thein University of Computer Studies, Yangon kyarnyoaye@gmail.com, nilarthein@gmail.com ABSTRACT The proliferation
More informationBMVA symposium on "Security and surveillance: performance evaluation
BMVA symposium on "Security and surveillance: performance evaluation Visor Video Surveillance Online Repository Roberto o Vezzani and Rita Cucchiara a Imagelab Information Engineering Department University
More informationData mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20
Data mining Piotr Paszek Classification k-nn Classifier (Piotr Paszek) Data mining k-nn 1 / 20 Plan of the lecture 1 Lazy Learner 2 k-nearest Neighbor Classifier 1 Distance (metric) 2 How to Determine
More informationDiscrete planning (an introduction)
Sistemi Intelligenti Corso di Laurea in Informatica, A.A. 2017-2018 Università degli Studi di Milano Discrete planning (an introduction) Nicola Basilico Dipartimento di Informatica Via Comelico 39/41-20135
More informationMining di Dati Web. Lezione 3 - Clustering and Classification
Mining di Dati Web Lezione 3 - Clustering and Classification Introduction Clustering and classification are both learning techniques They learn functions describing data Clustering is also known as Unsupervised
More informationDigital Libraries: Language Technologies
Digital Libraries: Language Technologies RAFFAELLA BERNARDI UNIVERSITÀ DEGLI STUDI DI TRENTO P.ZZA VENEZIA, ROOM: 2.05, E-MAIL: BERNARDI@DISI.UNITN.IT Contents 1 Recall: Inverted Index..........................................
More informationCPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016
CPSC 340: Machine Learning and Data Mining Non-Parametric Models Fall 2016 Assignment 0: Admin 1 late day to hand it in tonight, 2 late days for Wednesday. Assignment 1 is out: Due Friday of next week.
More informationNear Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri
Near Neighbor Search in High Dimensional Data (1) Dr. Anwar Alhenshiri Scene Completion Problem The Bare Data Approach High Dimensional Data Many real-world problems Web Search and Text Mining Billions
More informationAn Infrastructure for MultiMedia Metadata Management
An Infrastructure for MultiMedia Metadata Management Patrizia Asirelli, Massimo Martinelli, Ovidio Salvetti Istituto di Scienza e Tecnologie dell Informazione, CNR, 56124 Pisa, Italy {Patrizia.Asirelli,
More informationSEO and UAEX.EDU GETTING YOUR WEB PAGES FOUND IN GOOGLE
SEO and UAEX.EDU GETTING YOUR WEB PAGES FOUND IN GOOGLE What is Search Engine Optimization? SEO is a marketing discipline focused on growing visibility in organic (non-paid) search engine results. Why
More informationThe Social Network of Java Classes Power-law for Dummies
The Social Network of Java Classes Power-law for Dummies Diego Puppin Institute for Information Sciences and Technology Pisa, Italy March 24, 2006 Diego Puppin (ISTI-CNR) LabDay March 24, 2006 1 / 41 Outline
More informationMultimedia Databases. 2. Summary. 2 Color-based Retrieval. 2.1 Multimedia Data Retrieval. 2.1 Multimedia Data Retrieval 4/14/2016.
2. Summary Multimedia Databases Wolf-Tilo Balke Younes Ghammad Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de Last week: What are multimedia databases?
More informationCSI 4107 Image Information Retrieval
CSI 4107 Image Information Retrieval This slides are inspired by a tutorial on Medical Image Retrieval by Henning Müller and Thomas Deselaers, 2005-2006 1 Outline Introduction Content-based image retrieval
More informationNewSQL Databases. The reference Big Data stack
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica NewSQL Databases Corso di Sistemi e Architetture per Big Data A.A. 2017/18 Valeria Cardellini The reference
More informationClustered Pivot Tables for I/O-optimized Similarity Search
Clustered Pivot Tables for I/O-optimized Similarity Search Juraj Moško Charles University in Prague, Faculty of Mathematics and Physics, SIRET research group mosko.juro@centrum.sk Jakub Lokoč Charles University
More informationPatent Image Retrieval
Patent Image Retrieval Stefanos Vrochidis IRF Symposium 2008 Vienna, November 6, 2008 Aristotle University of Thessaloniki Overview 1. Introduction 2. Related Work in Patent Image Retrieval 3. Patent Image
More informationA Pivot-based Index Structure for Combination of Feature Vectors
A Pivot-based Index Structure for Combination of Feature Vectors Benjamin Bustos Daniel Keim Tobias Schreck Department of Computer and Information Science, University of Konstanz Universitätstr. 10 Box
More informationInternet Applications. Q. What is Internet Explorer? Explain features of Internet Explorer.
Internet Applications Q. What is Internet Explorer? Explain features of Internet Explorer. Internet explorer: Microsoft Internet Explorer is a computer program called a browser that helps you interact
More informationUS Patent 6,658,423. William Pugh
US Patent 6,658,423 William Pugh Detecting duplicate and near - duplicate files Worked on this problem at Google in summer of 2000 I have no information whether this is currently being used I know that
More informationInformation Retrieval (IR) Introduction to Information Retrieval. Lecture Overview. Why do we need IR? Basics of an IR system.
Introduction to Information Retrieval Ethan Phelps-Goodman Some slides taken from http://www.cs.utexas.edu/users/mooney/ir-course/ Information Retrieval (IR) The indexing and retrieval of textual documents.
More informationData and the Environment: Impacts on Cost and Success
Data and the Environment: Impacts on Cost and Success April 2009 Philip Sampson 630-217-6614 Agenda Cost of Quality Test Objectives Data consideration fundamentals Environment consideration fundamentals
More informationKafka Streams: Hands-on Session A.A. 2017/18
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Kafka Streams: Hands-on Session A.A. 2017/18 Matteo Nardelli Laurea Magistrale in Ingegneria Informatica
More informationWorkshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards
Workshop W14 - Audio Gets Smart: Semantic Audio Analysis & Metadata Standards Jürgen Herre for Integrated Circuits (FhG-IIS) Erlangen, Germany Jürgen Herre, hrr@iis.fhg.de Page 1 Overview Extracting meaning
More informationDistributed and Stream Data Mining Algorithms for Frequent Pattern Discovery
Università Ca Foscari di Venezia Dipartimento di Informatica Dottorato di Ricerca in Informatica Ph.D. Thesis: TD-2006-4 Distributed and Stream Data Mining Algorithms for Frequent Pattern Discovery Claudio
More informationSIGIR Workshop Report. The SIGIR Heterogeneous and Distributed Information Retrieval Workshop
SIGIR Workshop Report The SIGIR Heterogeneous and Distributed Information Retrieval Workshop Ranieri Baraglia HPC-Lab ISTI-CNR, Italy ranieri.baraglia@isti.cnr.it Fabrizio Silvestri HPC-Lab ISTI-CNR, Italy
More informationTEAM 6. Clustering. Professor Anita Wasilewska Cours Number: CSE 634 Course Name: DATA MINING
TEAM 6 Clustering Professor Anita Wasilewska Cours Number: CSE 634 Course Name: DATA MINING References Examples: https://docs.google.com/presentation/d/1tjlskhnblpizilxhhiv0lo4c-ond2yfg_-cztvbfyho/edit#slide=id.g1384764765_0_57
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining Fundamentals of learning (continued) and the k-nearest neighbours classifier Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart.
More informationContext Based Indexing in Search Engines: A Review
International Journal of Computer (IJC) ISSN 2307-4523 (Print & Online) Global Society of Scientific Research and Researchers http://ijcjournal.org/ Context Based Indexing in Search Engines: A Review Suraksha
More informationInformation Retrieval
Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,
More informationPerformance Analysis for Crawling
Scalable Servers and Load Balancing Kai Shen Online Applications online applications Applications accessible to online users through. Examples Online keyword search engine: Google. Web email: Gmail. News:
More informationGeometric data structures:
Geometric data structures: Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade Sham Kakade 2017 1 Announcements: HW3 posted Today: Review: LSH for Euclidean distance Other
More informationMeasuring and Visualizing Inter-domain Routing Dynamics with BGPATH
UNIVERSITÀ DEGLI STUDI ROMA TRE Dipartimento di Informatica e Automazione Measuring and Visualizing Inter-domain Routing Dynamics with BGPATH Luca Cittadini Tiziana Refice Alessio Campisano Giuseppe Di
More informationScalable many-light methods
Scalable many-light methods Jaroslav Křivánek Charles University in Prague Instant radiosity Approximate indirect illumination by 1. Generate VPLs 2. Render with VPLs 2 Instant radiosity with glossy surfaces
More informationAutomated Classification of Bitmap Images using Decision Trees
Automated Classification of Bitmap Images using Decision Trees Pavel Surynek and Ivana Lukšová Abstract The paper addresses the design of a method for automated classification of bitmap images into classes
More information11 - Spatial Data Structures
11 - Spatial Data Structures cknowledgement: Marco Tarini Types of Queries Graphic applications often require spatial queries Find the k points closer to a specific point p (k-nearest Neighbours, knn)
More informationdoc. RNDr. Tomáš Skopal, Ph.D. Department of Software Engineering, Faculty of Information Technology, Czech Technical University in Prague
Praha & EU: Investujeme do vaší budoucnosti Evropský sociální fond course: Searching the Web and Multimedia Databases (BI-VWM) Tomáš Skopal, 2011 SS2010/11 doc. RNDr. Tomáš Skopal, Ph.D. Department of
More informationMaking a wiki is as easy as making a peanut butter sandwich!
Social Media Workshop: On Safari with Digital Natives LARC, 2009 PBworks Wiki Making a wiki is as easy as making a peanut butter sandwich! http://demo395.pbworks.com This is the URL for the wiki we will
More informationSearch Engines and Time Series Databases
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Search Engines and Time Series Databases Corso di Sistemi e Architetture per Big Data A.A. 2017/18
More informationNon-Parametric Modeling
Non-Parametric Modeling CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Non-Parametric Density Estimation Parzen Windows Kn-Nearest Neighbor
More informationBUbiNG. Massive Crawling for the Masses. Paolo Boldi, Andrea Marino, Massimo Santini, Sebastiano Vigna
BUbiNG Massive Crawling for the Masses Paolo Boldi, Andrea Marino, Massimo Santini, Sebastiano Vigna Dipartimento di Informatica Università degli Studi di Milano Italy Once upon a time UbiCrawler UbiCrawler
More informationApproximate Nearest Neighbor Search. Deng Cai Zhejiang University
Approximate Nearest Neighbor Search Deng Cai Zhejiang University The Era of Big Data How to Find Things Quickly? Web 1.0 Text Search Sparse feature Inverted Index How to Find Things Quickly? Web 2.0, 3.0
More informationAn Improved Indexing Mechanism Based On Homonym Using Hierarchical Clustering in Search Engine *
International Journal of Computing Academic Research (IJCAR) ISSN 2305-9184 Volume 4, Number 6(2015), pp.270-277 MEACSE Publications http://www.meacse.org/ijcar An Improved Indexing Mechanism Based On
More informationLarge High-Dimensional Point Sets. Regular Paper Journal. Corresponding Author
Title Page Distributed Computation of the knn Graph for Large High-Dimensional Point Sets Regular Paper Journal Corresponding Author Lydia E. Kavraki Rice University, Department of Computer Science, 6100
More informationUsing the data in the archive
Using the data in the archive Jacquelijn Ringersma The Language Archive Max Planck Institute for Psycholinguistics DGfS-CNRS Summer School on Linguistic Typology A very rich archive A very rich archive
More informationAST initiative 3 AST principles and goals 4 Model problems 11
Università degli Studi dell Aquila Henry Muccini Dipartimento di Informatica www.henrymuccini.com University of L Aquila - Italy henry.muccini@di.univaq.it AST 2011, 6th IEEE/ACM ICSE workshop on Automation
More informationSearch and Time Series Databases
Università degli Studi di Roma Tor Vergata Dipartimento di Ingegneria Civile e Ingegneria Informatica Search and Time Series Databases Corso di Sistemi e Architetture per Big Data A.A. 2016/17 Valeria
More informationDependent Variable Independent Variable dependent variable : independent variable function: domain ran ge
FUNCTIONS The values of one variable often depend on the values for another: The temperature at which water boils depends on elevation (the boiling point drops as you go up). The amount by which your savings
More informationDL User Interfaces. Giuseppe Santucci Dipartimento di Informatica e Sistemistica Università di Roma La Sapienza
DL User Interfaces Giuseppe Santucci Dipartimento di Informatica e Sistemistica Università di Roma La Sapienza Delos work on DL interfaces Delos Cluster 4: User interfaces and visualization Cluster s goals:
More informationOverview of Web Mining Techniques and its Application towards Web
Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous
More informationDifferential Compression and Optimal Caching Methods for Content-Based Image Search Systems
Differential Compression and Optimal Caching Methods for Content-Based Image Search Systems Di Zhong a, Shih-Fu Chang a, John R. Smith b a Department of Electrical Engineering, Columbia University, NY,
More informationA New Context Based Indexing in Search Engines Using Binary Search Tree
A New Context Based Indexing in Search Engines Using Binary Search Tree Aparna Humad Department of Computer science and Engineering Mangalayatan University, Aligarh, (U.P) Vikas Solanki Department of Computer
More informationNon-Bayesian Classifiers Part I: k-nearest Neighbor Classifier and Distance Functions
Non-Bayesian Classifiers Part I: k-nearest Neighbor Classifier and Distance Functions Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551,
More informationSEMANTIC WEB DATA MANAGEMENT. from Web 1.0 to Web 3.0
SEMANTIC WEB DATA MANAGEMENT from Web 1.0 to Web 3.0 CBD - 21/05/2009 Roberto De Virgilio MOTIVATIONS Web evolution Self-describing Data XML, DTD, XSD RDF, RDFS, OWL WEB 1.0, WEB 2.0, WEB 3.0 Web 1.0 is
More informationClustering Billions of Images with Large Scale Nearest Neighbor Search
Clustering Billions of Images with Large Scale Nearest Neighbor Search Ting Liu, Charles Rosenberg, Henry A. Rowley IEEE Workshop on Applications of Computer Vision February 2007 Presented by Dafna Bitton
More informationLecture 8: Internet and Online Services. CS 598: Advanced Internetworking Matthew Caesar March 3, 2011
Lecture 8: Internet and Online Services CS 598: Advanced Internetworking Matthew Caesar March 3, 2011 Demands of modern networked services Old approach: run applications on local PC Now: major innovation
More informationGene Clustering & Classification
BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering
More informationLecture 7: Decision Trees
Lecture 7: Decision Trees Instructor: Outline 1 Geometric Perspective of Classification 2 Decision Trees Geometric Perspective of Classification Perspective of Classification Algorithmic Geometric Probabilistic...
More informationOLAP Introduction and Overview
1 CHAPTER 1 OLAP Introduction and Overview What Is OLAP? 1 Data Storage and Access 1 Benefits of OLAP 2 What Is a Cube? 2 Understanding the Cube Structure 3 What Is SAS OLAP Server? 3 About Cube Metadata
More informationToward a Unified View of Data and Services
Semantic Data & Service Integration Workshop,Berlin, Toward a Unified View of Data and Services Carlo Batini Matteo Palmonari Andrea Maurino University of Milan-Bicocca Italy Sonia Bergamaschi Francesco
More informationInformation Retrieval and Web Search
Information Retrieval and Web Search Course overview Instructor: Rada Mihalcea What is this course about? Processing Indexing Retrieving textual data (or audio, video, geo-spatial,, data) Fits in four
More informationThe Stanford/Technicolor/Fraunhofer HHI Video Semantic Indexing System
The Stanford/Technicolor/Fraunhofer HHI Video Semantic Indexing System Our first participation on the TRECVID workshop A. F. de Araujo 1, F. Silveira 2, H. Lakshman 3, J. Zepeda 2, A. Sheth 2, P. Perez
More informationCluster Analysis. Angela Montanari and Laura Anderlucci
Cluster Analysis Angela Montanari and Laura Anderlucci 1 Introduction Clustering a set of n objects into k groups is usually moved by the aim of identifying internally homogenous groups according to a
More informationConstructing Overlay Networks through Gossip
Constructing Overlay Networks through Gossip Márk Jelasity Università di Bologna Project funded by the Future and Emerging Technologies arm of the IST Programme The Four Main Theses 1: Topology (network
More informationPeer-to-Peer Similarity Search in Metric Spaces
Peer-to-Peer Similarity Search in Metric Spaces Christos Doulkeridis, Akrivi Vlachou, Yannis Kotidis, Michalis Vazirgiannis http://www.db-net.aueb.gr/cdoulk/ cdoulk@aueb.gr Department of Informatics Athens
More informationShapes & Transformations and Angles & Measurements Spatial Visualization and Reflections a.) b.) c.) d.) a.) b.) c.)
Chapters 1 & 2 Team Number Name Shapes & Transformations and Angles & Measurements 1.2.1 Spatial Visualization and Reflections 1-47. d.) 1-48. 1-49. 1-50. 1-51. d.) 1-52. On the axes at right, graph the
More information