Hierarchical Link Analysis for Ranking Web Data
|
|
- Phebe Walters
- 5 years ago
- Views:
Transcription
1 Hierarchical Link Analysis for Ranking Web Data Renaud Delbru, Nickolai Toupikov, Michele Catasta, Giovanni Tummarello, and Stefan Decker Digital Enterprise Research Institute, Galway June 1, 2010
2 Introduction Web of Data There is a growing increase of web data sources... Linked Open Data cloud; Open Graph protocol; e-commerces (good relations), e-government,... How to search and retrieve relevant information? One single query can return million of entities and users expect only the most relevant ones. Web data search engines (e.g., Sindice) need effective way to rank entities. Partial solution: Popularity-based entity ranking. 1 / 36
3 Link Analysis on the Web Link Analysis Given a directed graph, determine the popularity of its nodes using link information A link from a node i to a node j is considered as an evidence of the importance of node j Link Analysis for Web Documents PageRank considers exclusively link structure Hierarchical Link Analysis consider both link structure and hierarchical structure Link Analysis for Web Data Current approaches consider exclusively link structure Sindice: Dataset/Entity centric view 2 / 36
4 Link Analysis on the Web Link Analysis Given a directed graph, determine the popularity of its nodes using link information A link from a node i to a node j is considered as an evidence of the importance of node j Link Analysis for Web Documents PageRank considers exclusively link structure Hierarchical Link Analysis consider both link structure and hierarchical structure Link Analysis for Web Data Current approaches consider exclusively link structure Sindice: Dataset/Entity centric view 2 / 36
5 Link Analysis on the Web Link Analysis Given a directed graph, determine the popularity of its nodes using link information A link from a node i to a node j is considered as an evidence of the importance of node j Link Analysis for Web Documents PageRank considers exclusively link structure Hierarchical Link Analysis consider both link structure and hierarchical structure Link Analysis for Web Data Current approaches consider exclusively link structure Sindice: Dataset/Entity centric view 2 / 36
6 Link Analysis on the Web Link Analysis Given a directed graph, determine the popularity of its nodes using link information A link from a node i to a node j is considered as an evidence of the importance of node j Link Analysis for Web Documents PageRank considers exclusively link structure Hierarchical Link Analysis consider both link structure and hierarchical structure Link Analysis for Web Data Current approaches consider exclusively link structure Sindice: Dataset/Entity centric view 2 / 36
7 Link Analysis on the Web Link Analysis Given a directed graph, determine the popularity of its nodes using link information A link from a node i to a node j is considered as an evidence of the importance of node j Link Analysis for Web Documents PageRank considers exclusively link structure Hierarchical Link Analysis consider both link structure and hierarchical structure Link Analysis for Web Data Current approaches consider exclusively link structure Sindice: Dataset/Entity centric view 2 / 36
8 Outline: Web Data Model Web Data Model Web Data Graph Dataset Graph Internal and External Node Intra and Inter-Dataset Edge Linkset Two-Layer Model Quantifying the Two-Layer Model 3 / 36
9 Web Data Graph Figure: Web data graph 4 / 36
10 Dataset Graph Figure: Dataset graph 5 / 36
11 Internal and External Node Figure: Internal (red) and external nodes (blue) 6 / 36
12 Intra and Inter-Dataset Edge Figure: Inter-dataset (orange) and intra-dataset (black) edges 7 / 36
13 Linkset Figure: Linkset 8 / 36
14 Two-Layer Model Figure: Two-layer model of the Web of Data 9 / 36
15 Quantifying the two-layer model Datasets DBpedia 17.7 million of entities Citeseer (RKBExplorer) 2.48 million of entities Geonames 13.8 million of entities Sindice 60 million of entities among datasets Dataset Intra Inter DBpedia 88M (93.2%) 6.4M (6.8%) Citeseer 12.9M (77.7%) 3.7M (22.3%) Geonames 59M (98.3%) 1M (1.7%) Sindice 287M (78.8%) 77M (21.2%) Table: Ratio intra / inter dataset links 10 / 36
16 Outline: The DING Model The DING Model Overview Unsupervised Link Weighting Computing DatasetRank Computing Local EntityRank Combining Dataset Rank and Entity Rank 11 / 36
17 The DING Model: Overview DING Principles DING performs entity ranking in three steps: 1 dataset ranks are computed by performing link analysis on the top layer (i.e. the dataset graph); 2 for each dataset, entity ranks are computed by performing link analysis on the local entity collection; 3 the popularity of the dataset is propagated to its entities and combined with their local ranks to estimate a global entity rank. 12 / 36
18 The DING Model: Overview DING Principles DING performs entity ranking in three steps: 1 dataset ranks are computed by performing link analysis on the top layer (i.e. the dataset graph); 2 for each dataset, entity ranks are computed by performing link analysis on the local entity collection; 3 the popularity of the dataset is propagated to its entities and combined with their local ranks to estimate a global entity rank. 12 / 36
19 The DING Model: Overview DING Principles DING performs entity ranking in three steps: 1 dataset ranks are computed by performing link analysis on the top layer (i.e. the dataset graph); 2 for each dataset, entity ranks are computed by performing link analysis on the local entity collection; 3 the popularity of the dataset is propagated to its entities and combined with their local ranks to estimate a global entity rank. 12 / 36
20 The DING Model: Overview DING Principles DING performs entity ranking in three steps: 1 dataset ranks are computed by performing link analysis on the top layer (i.e. the dataset graph); 2 for each dataset, entity ranks are computed by performing link analysis on the local entity collection; 3 the popularity of the dataset is propagated to its entities and combined with their local ranks to estimate a global entity rank. 12 / 36
21 Unsupervised Link Weighting Intuition TF-IDF applied on link labels Link Frequency - Inverse Dataset Frequency (LF-IDF) Link weighting factor w σ,i,j Assign low weight to very common links, such as rdfs:seealso w σ,i,j = LF (L σ,i,j ) IDF (σ) = L σ,i,j Lτ,i,k L τ,i,k log N 1 + freq(σ) 13 / 36
22 Unsupervised Link Weighting Intuition TF-IDF applied on link labels Link Frequency - Inverse Dataset Frequency (LF-IDF) Link weighting factor w σ,i,j Assign low weight to very common links, such as rdfs:seealso w σ,i,j = LF (L σ,i,j ) IDF (σ) = L σ,i,j Lτ,i,k L τ,i,k log N 1 + freq(σ) 14 / 36
23 Unsupervised Link Weighting Intuition TF-IDF applied on link labels Link Frequency - Inverse Dataset Frequency (LF-IDF) Link weighting factor w σ,i,j Assign low weight to very common links, such as rdfs:seealso w σ,i,j = LF (L σ,i,j ) IDF (σ) = L σ,i,j Lτ,i,k L τ,i,k log N 1 + freq(σ) 15 / 36
24 Computing Dataset Rank Assumption Dataset surfing behaviour is the same as the web page surfing behaviour in PageRank 16 / 36
25 Computing Dataset Rank Assumption Dataset surfing behaviour is the same as the web page surfing behaviour in PageRank DatasetRank Weighted PageRank on the weighted dataset graph 17 / 36
26 Computing Dataset Rank Assumption Dataset surfing behaviour is the same as the web page surfing behaviour in PageRank DatasetRank Weighted PageRank on the weighted dataset graph Distribution factor w σ,i,j is defined by LF-IDF r k (D j ) = α r k 1 E Dj (D i )w σ,i,j + (1 α) Lσ,i,j D G E D 18 / 36
27 Computing Dataset Rank Assumption Dataset surfing behaviour is the same as the web page surfing behaviour in PageRank DatasetRank Weighted PageRank on the weighted dataset graph Distribution factor w σ,i,j is defined by LF-IDF Probability of random jump is proportional to the size of a dataset r k (D j ) = α r k 1 E Dj (D i )w σ,i,j + (1 α) Lσ,i,j D G E D 19 / 36
28 Computing Local EntityRank Generic Algorithms Weighted EntityRank: Weighted PageRank applied on the internal entities and intra-links of a dataset Weighted LinkCount: in-degree counting links applied on the internal entities and intra-links of a dataset 20 / 36
29 Combining Dataset Rank and Entity Rank Naive approach Purely probabilistic point of view: joint probability Assumption: independent events Global score r g (e) = P(e D) = r(e) r(d) Problem: favours smaller datasets DING Approach Add a local entity rank factor; Normalise local ranks to a same average based on dataset size r g (e) = r(d) r(e) E D D G E D 21 / 36
30 Outline: Experimental Results Experimental Results Overview User Study SemSearch / 36
31 Experimental Results: Overview Link Analysis Methods Global EntityRank (GER); Local LinkCount (LLC) and Local EntityRank (LER); Local algorithms combined with DatasetRank (DR-LLC and DR-LER). Experiments 1 User study to evaluate qualitatively each methods; 2 Semantic Search challenge. 23 / 36
32 User Study: Design Exp-A Exp-B Task Local entity ranking (LER & LLC) on DBpedia dataset 31 participants DING (DR-LER & DR-LLC) on Sindice s page-repository 58 participants 10 queries (keyword and SPARQL queries) One result list (top-10) per algorithm Rate algorithms (W, SW, S, SB, B) in relation to GER 24 / 36
33 User Study: Questionnaire Figure: One of the questionnaire given to the participant 25 / 36
34 User Study A: Results (a) LER Rate O i E i %χ 2 B % SB % S % SW % W % Totals (b) LLC Rate O i E i %χ 2 B % SB % S % SW % W % Totals Table: Chi-square test for Exp-A. The column %χ 2 gives, for each modality, its contribution to χ 2 (in relative value). Conclusion LER and LLC provides similar results than GER. However, there is a more significant proportion of the population that considers LER more similar to GER. 26 / 36
35 User Study B: Results (a) DR-LER Rate O i E i %χ 2 B % SB % S % SW % W % Totals (b) DR-LLC Rate O i E i %χ 2 B % SB % S % SW % W % Totals Table: Chi-square test for Exp-B. The column %χ 2 gives, for each modality, its contribution to χ 2 (in relative value). Conclusion It appears that DR-LLC provides a better effectiveness. A large proportion of the population finds it slightly better than GER, and this is reinforced by a few number of people finding it worse. 27 / 36
36 SemSearch 2010: Entity Search Track SemSearch 2010 First semantic search evaluation; Focus on entity search. Experiment Design Billion Triple Challenge 2009 dataset; 92 keyword queries; Relevance judgement on top 10 entities. 28 / 36
37 SemSearch 2010: Experiment Results Figure: SemSearch 2010 evaluation results 29 / 36
38 Scalability: Computing Dataset Rank Graph Node Edge Web Data 60M 364M Dataset 50K 1.2M Table: Graph Size DatasetRank 1 iteration 200ms; Good quality rank in few seconds. 30 / 36
39 Scalability: Dataset size distribution Power-law distribution; The majority of the datasets contain less than 1000 nodes. 31 / 36
40 Scalability: Computing Entity Rank EntityRank 55 iterations of 1 minute (for DBPedia dataset). LinkCount requires only 1 iteration; can be computed on the fly with appropriate data index. 32 / 36
41 Dataset-Dependent Local EntityRank Dataset Specific Algorithms No reason to have one generic algorithm for all datasets; We could choose appropriate entity ranking algorithm for each dataset. Graph Structure Dataset Algorithm Generic, Controlled DBpedia LinkCount Generic, Open Social Communities EntityRank Hierarchical Geonames, Taxonomies DHC Bipartite DBLP CiteRank Table: List of various graph structures with appropriate algorithms 33 / 36
42 Dataset-Dependent Local EntityRank Dataset Specific Algorithms No reason to have one generic algorithm for all datasets; We could choose appropriate entity ranking algorithm for each dataset. Graph Structure Dataset Algorithm Generic, Controlled DBpedia LinkCount Generic, Open Social Communities EntityRank Hierarchical Geonames, Taxonomies DHC Bipartite DBLP CiteRank Table: List of various graph structures with appropriate algorithms 34 / 36
43 Dataset-Dependent Local EntityRank Dataset Specific Algorithms No reason to have one generic algorithm for all datasets; We could choose appropriate entity ranking algorithm for each dataset. Graph Structure Dataset Algorithm Generic, Controlled DBpedia LinkCount Generic, Open Social Communities EntityRank Hierarchical Geonames, Taxonomies DHC Bipartite DBLP CiteRank Table: List of various graph structures with appropriate algorithms 35 / 36
44 Conclusion DING Method Hierarchical Link Analysis for web data; Quality comparable or even better than standard approaches; Lower computational complexity; Dataset-dependent local entity ranking. Future Work Investigate how to detect appropriate local entity ranking method for a dataset; Study query-dependent ranking and how it can be combined with DING ranking. 36 / 36
Linked Data in the Clouds : a Sindice.com perspective
Linked Data in the Clouds : a Sindice.com perspective Giovanni Tummarello, FBK - DERI Copyright 2008. All rights reserved. Some definitions Linked Open Data: Official Definition The data that is available
More informationSWSE: Objects before documents!
Provided by the author(s) and NUI Galway in accordance with publisher policies. Please cite the published version when available. Title SWSE: Objects before documents! Author(s) Harth, Andreas; Hogan,
More informationTag-based Social Interest Discovery
Tag-based Social Interest Discovery Xin Li / Lei Guo / Yihong (Eric) Zhao Yahoo!Inc 2008 Presented by: Tuan Anh Le (aletuan@vub.ac.be) 1 Outline Introduction Data set collection & Pre-processing Architecture
More informationHome Page. Title Page. Page 1 of 14. Go Back. Full Screen. Close. Quit
Page 1 of 14 Retrieving Information from the Web Database and Information Retrieval (IR) Systems both manage data! The data of an IR system is a collection of documents (or pages) User tasks: Browsing
More informationIdentifying Relevant Sources for Data Linking using a Semantic Web Index
Identifying Relevant Sources for Data Linking using a Semantic Web Index Andriy Nikolov a.nikolov@open.ac.uk Knowledge Media Institute Open University Milton Keynes, UK Mathieu d Aquin m.daquin@open.ac.uk
More informationChapter 27 Introduction to Information Retrieval and Web Search
Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval
More informationSindice.com: Weaving the open linked data. Tummarello, Giovanni; Delbru, Renaud; Oren, Eyal
Provided by the author(s) and NUI Galway in accordance with publisher policies. Please cite the published version when available. Title Sindice.com: Weaving the open linked data Author(s) Tummarello, Giovanni;
More informationSearching Web Data: an Entity Retrieval and High-Performance Indexing Model
Searching Web Data: an Entity Retrieval and High-Performance Indexing Model Renaud Delbru a, Stephane Campinas a, Giovanni Tummarello a,b a Digital Enterprise Research Institute, National University of
More informationInformation Retrieval Lecture 4: Web Search. Challenges of Web Search 2. Natural Language and Information Processing (NLIP) Group
Information Retrieval Lecture 4: Web Search Computer Science Tripos Part II Simone Teufel Natural Language and Information Processing (NLIP) Group sht25@cl.cam.ac.uk (Lecture Notes after Stephen Clark)
More informationCC PROCESAMIENTO MASIVO DE DATOS OTOÑO Lecture 7: Information Retrieval II. Aidan Hogan
CC5212-1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2017 Lecture 7: Information Retrieval II Aidan Hogan aidhog@gmail.com How does Google know about the Web? Inverted Index: Example 1 Fruitvale Station is a 2013
More informationStudying the Impact of Text Summarization on Contextual Advertising
Studying the Impact of Text Summarization on Contextual Advertising G. Armano, A. Giuliani, and E. Vargiu Intelligent Agents and Soft-Computing Group Dept. of Electrical and Electronic Engineering University
More informationLODatio: A Schema-Based Retrieval System forlinkedopendataatweb-scale
LODatio: A Schema-Based Retrieval System forlinkedopendataatweb-scale Thomas Gottron 1, Ansgar Scherp 2,1, Bastian Krayer 1, and Arne Peters 1 1 Institute for Web Science and Technologies, University of
More informationEmpowering People with Knowledge the Next Frontier for Web Search. Wei-Ying Ma Assistant Managing Director Microsoft Research Asia
Empowering People with Knowledge the Next Frontier for Web Search Wei-Ying Ma Assistant Managing Director Microsoft Research Asia Important Trends for Web Search Organizing all information Addressing user
More informationThe Data Web and Linked Data.
Mustafa Jarrar Lecture Notes, Knowledge Engineering (SCOM7348) University of Birzeit 1 st Semester, 2011 Knowledge Engineering (SCOM7348) The Data Web and Linked Data. Dr. Mustafa Jarrar University of
More informationLinked Data. Department of Software Enginnering Faculty of Information Technology Czech Technical University in Prague Ivo Lašek, 2011
Linked Data Department of Software Enginnering Faculty of Information Technology Czech Technical University in Prague Ivo Lašek, 2011 Semantic Web, MI-SWE, 11/2011, Lecture 9 Evropský sociální fond Praha
More informationEffective Latent Space Graph-based Re-ranking Model with Global Consistency
Effective Latent Space Graph-based Re-ranking Model with Global Consistency Feb. 12, 2009 1 Outline Introduction Related work Methodology Graph-based re-ranking model Learning a latent space graph A case
More informationEntity and Knowledge Base-oriented Information Retrieval
Entity and Knowledge Base-oriented Information Retrieval Presenter: Liuqing Li liuqing@vt.edu Digital Library Research Laboratory Virginia Polytechnic Institute and State University Blacksburg, VA 24061
More informationJianyong Wang Department of Computer Science and Technology Tsinghua University
Jianyong Wang Department of Computer Science and Technology Tsinghua University jianyong@tsinghua.edu.cn Joint work with Wei Shen (Tsinghua), Ping Luo (HP), and Min Wang (HP) Outline Introduction to entity
More informationOn Measuring the Lattice of Commonalities Among Several Linked Datasets
On Measuring the Lattice of Commonalities Among Several Linked Datasets Michalis Mountantonakis and Yannis Tzitzikas FORTH-ICS Information Systems Laboratory University of Crete Computer Science Department
More informationTansu Alpcan C. Bauckhage S. Agarwal
1 / 16 C. Bauckhage S. Agarwal Deutsche Telekom Laboratories GBR 2007 2 / 16 Outline 3 / 16 Overview A novel expert peering system for community-based information exchange A graph-based scheme consisting
More informationSemantic and Distributed Entity Search in the Web of Data
Semantic and Distributed Entity Search in the Web of Data Robert Neumayer neumayer@idi.ntnu.no Norwegian University of Science and Technology Trondheim, Norway March 6, 2013 1/48 1. Entity Search and the
More informationNATURAL LANGUAGE PROCESSING
NATURAL LANGUAGE PROCESSING LESSON 9 : SEMANTIC SIMILARITY OUTLINE Semantic Relations Semantic Similarity Levels Sense Level Word Level Text Level WordNet-based Similarity Methods Hybrid Methods Similarity
More informationOverview of Web Mining Techniques and its Application towards Web
Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous
More informationSOFIA: Social Filtering for Niche Markets
Social Filtering for Niche Markets Matteo Dell'Amico Licia Capra University College London UCL MobiSys Seminar 9 October 2007 : Social Filtering for Niche Markets Outline 1 Social Filtering Competence:
More informationEXTRACTION OF RELEVANT WEB PAGES USING DATA MINING
Chapter 3 EXTRACTION OF RELEVANT WEB PAGES USING DATA MINING 3.1 INTRODUCTION Generally web pages are retrieved with the help of search engines which deploy crawlers for downloading purpose. Given a query,
More informationExploring and Using the Semantic Web
Exploring and Using the Semantic Web Mathieu d Aquin KMi, The Open University m.daquin@open.ac.uk What?? Exploring the Semantic Web Vocabularies Ontologies Linked Data RDF documents Example: Exploring
More informationIntuitive and Interactive Query Formulation to Improve the Usability of Query Systems for Heterogeneous Graphs
Intuitive and Interactive Query Formulation to Improve the Usability of Query Systems for Heterogeneous Graphs Nandish Jayaram University of Texas at Arlington PhD Advisors: Dr. Chengkai Li, Dr. Ramez
More informationCS/INFO 1305 Information Retrieval
(Search) Search Using a computer to find relevant pieces of information Text search Idea popularized in the article As We May Think by Vannevar Bush in 1945 Artificial Intelligence Where (or for what)
More informationSimilarity Ranking in Large- Scale Bipartite Graphs
Similarity Ranking in Large- Scale Bipartite Graphs Alessandro Epasto Brown University - 20 th March 2014 1 Joint work with J. Feldman, S. Lattanzi, S. Leonardi, V. Mirrokni [WWW, 2014] 2 AdWords Ads Ads
More informationHogan, Aidan; Harth, Andreas; Decker, Stefan
Provided by the author(s) and NUI Galway in accordance with publisher policies. Please cite the published version when available. Title ReConRank: A Scalable Ranking Method for Semantic Web Data with Context
More informationPRISM: Concept-preserving Social Image Search Results Summarization
PRISM: Concept-preserving Social Image Search Results Summarization Boon-Siew Seah Sourav S Bhowmick Aixin Sun Nanyang Technological University Singapore Outline 1 Introduction 2 Related studies 3 Search
More informationW3C Workshop on the Future of Social Networking, January 2009, Barcelona
1 of 6 06/01/2010 20:19 W3C Workshop on the Future of Social Networking, 15-16 January 2009, Barcelona John G. Breslin 1,2, Uldis Bojārs 1, Alexandre Passant, Sergio Fernández 3, Stefan Decker 1 1 Digital
More informationDiffusion and Clustering on Large Graphs
Diffusion and Clustering on Large Graphs Alexander Tsiatas Final Defense 17 May 2012 Introduction Graphs are omnipresent in the real world both natural and man-made Examples of large graphs: The World
More informationFeature selection. LING 572 Fei Xia
Feature selection LING 572 Fei Xia 1 Creating attribute-value table x 1 x 2 f 1 f 2 f K y Choose features: Define feature templates Instantiate the feature templates Dimensionality reduction: feature selection
More informationQuery Expansion using Wikipedia and DBpedia
Query Expansion using Wikipedia and DBpedia Nitish Aggarwal and Paul Buitelaar Unit for Natural Language Processing, Digital Enterprise Research Institute, National University of Ireland, Galway firstname.lastname@deri.org
More informationA Distributional Approach for Terminological Semantic Search on the Linked Data Web
A Distributional Approach for Terminological Semantic Search on the Linked Data Web André Freitas Digital Enterprise Research Institute (DERI) National University of Ireland, Galway andre.freitas@deri.org
More informationReduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs
Reduce and Aggregate: Similarity Ranking in Multi-Categorical Bipartite Graphs Alessandro Epasto J. Feldman*, S. Lattanzi*, S. Leonardi, V. Mirrokni*. *Google Research Sapienza U. Rome Motivation Recommendation
More informationInformation Retrieval and Web Search
Information Retrieval and Web Search Introduction to IR models and methods Rada Mihalcea (Some of the slides in this slide set come from IR courses taught at UT Austin and Stanford) Information Retrieval
More informationQuery Independent Scholarly Article Ranking
Query Independent Scholarly Article Ranking Shuai Ma, Chen Gong, Renjun Hu, Dongsheng Luo, Chunming Hu, Jinpeng Huai SKLSDE Lab, Beihang University, China Beijing Advanced Innovation Center for Big Data
More informationPerformance and cost effectiveness of caching in mobile access networks
Performance and cost effectiveness of caching in mobile access networks Jim Roberts (IRT-SystemX) joint work with Salah Eddine Elayoubi (Orange Labs) ICN 2015 October 2015 The memory-bandwidth tradeoff
More informationKeyword query interpretation over structured data
Keyword query interpretation over structured data Advanced Methods of Information Retrieval Elena Demidova SS 2018 Elena Demidova: Advanced Methods of Information Retrieval SS 2018 1 Recap Elena Demidova:
More informationIn = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most
In = number of words appearing exactly n times N = number of words in the collection of words A = a constant. For example, if N=100 and the most common word appears 10 times then A = rn*n/n = 1*10/100
More informationSearch Ranking for Heterogeneous Data over Dataspace
I J C T A, 9(20), 2016, pp. 421-431 International Science Press Search Ranking for Heterogeneous Data over Dataspace Niranjan Lal 1, Samimul Qamar 2 and Savita Shiwani 3 ABSTRACT Traditional relational
More informationA Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles
A Scalable Approach for Efficiently Generating Structured Dataset Topic Profiles Besnik Fetahu, Stefan Dietze, Bernardo Pereira Nunes *, Marco Antonio Casanova *, Davide Taibi, and Wolfgang Nejdl L3S Research
More informationSIREn: Entity Retrieval System for the Web of Data
SIREn: Entity Retrieval System for the Web of Data Renaud Delbru Digital Enterprise Research Institute National University of Ireland Galway, Ireland renaud.delbru@deri.org Abstract We present ongoing
More informationCS/INFO 1305 Summer 2009
Information Retrieval Information Retrieval (Search) IR Search Using a computer to find relevant pieces of information Text search Idea popularized in the article As We May Think by Vannevar Bush in 1945
More informationTriple Indexing: An Efficient Technique for Fast Phrase Query Evaluation
Triple Indexing: An Efficient Technique for Fast Phrase Query Evaluation Shashank Gugnani BITS-Pilani, K.K. Birla Goa Campus Goa, India - 403726 Rajendra Kumar Roul BITS-Pilani, K.K. Birla Goa Campus Goa,
More informationBasic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval
Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval 1 Naïve Implementation Convert all documents in collection D to tf-idf weighted vectors, d j, for keyword vocabulary V. Convert
More informationInformation Retrieval
Information Retrieval CSC 375, Fall 2016 An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information than for him not to have
More informationSindice Widgets: Lightweight embedding of Semantic Web capabilities into existing user applications.
Sindice Widgets: Lightweight embedding of Semantic Web capabilities into existing user applications. Adam Westerski, Aftab Iqbal, and Giovanni Tummarello Digital Enterprise Research Institute, NUI Galway,Ireland
More informationLink Analysis and Web Search
Link Analysis and Web Search Moreno Marzolla Dip. di Informatica Scienza e Ingegneria (DISI) Università di Bologna http://www.moreno.marzolla.name/ based on material by prof. Bing Liu http://www.cs.uic.edu/~liub/webminingbook.html
More informationOutline. Possible solutions. The basic problem. How? How? Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity
Outline Relevance Feedback, Query Expansion, and Inputs to Ranking Beyond Similarity Lecture 10 CS 410/510 Information Retrieval on the Internet Query reformulation Sources of relevance for feedback Using
More information10/10/13. Traditional database system. Information Retrieval. Information Retrieval. Information retrieval system? Information Retrieval Issues
COS 597A: Principles of Database and Information Systems Information Retrieval Traditional database system Large integrated collection of data Uniform access/modifcation mechanisms Model of data organization
More informationSemantic Cloud Generation based on Linked Data for Efficient Semantic Annotation
Semantic Cloud Generation based on Linked Data for Efficient Semantic Annotation - Korea-Germany Joint Workshop for LOD2 2011 - Han-Gyu Ko Dept. of Computer Science, KAIST Korea Advanced Institute of Science
More informationOpen Data Integration. Renée J. Miller
Open Data Integration Renée J. Miller miller@northeastern.edu !2 Open Data Principles Timely & Comprehensive Accessible and Usable Complete - All public data is made available. Public data is data that
More informationProject Report on winter
Project Report on 01-60-538-winter Yaxin Li, Xiaofeng Liu October 17, 2017 Li, Liu October 17, 2017 1 / 31 Outline Introduction a Basic Search Engine with Improvements Features PageRank Classification
More informationDeveloping Focused Crawlers for Genre Specific Search Engines
Developing Focused Crawlers for Genre Specific Search Engines Nikhil Priyatam Thesis Advisor: Prof. Vasudeva Varma IIIT Hyderabad July 7, 2014 Examples of Genre Specific Search Engines MedlinePlus Naukri.com
More informationLarge Scale Graph Algorithms
Large Scale Graph Algorithms A Guide to Web Research: Lecture 2 Yury Lifshits Steklov Institute of Mathematics at St.Petersburg Stuttgart, Spring 2007 1 / 34 Talk Objective To pose an abstract computational
More informationSampling Large Graphs for Anticipatory Analysis
Sampling Large Graphs for Anticipatory Analysis Lauren Edwards*, Luke Johnson, Maja Milosavljevic, Vijay Gadepally, Benjamin A. Miller IEEE High Performance Extreme Computing Conference September 16, 2015
More informationWeb Semantics: Science, Services and Agents on the World Wide Web
Web Semantics: Science, Services and Agents on the World Wide Web 7 (2009) 189 203 Contents lists available at ScienceDirect Web Semantics: Science, Services and Agents on the World Wide Web journal homepage:
More informationVisual Representations for Machine Learning
Visual Representations for Machine Learning Spectral Clustering and Channel Representations Lecture 1 Spectral Clustering: introduction and confusion Michael Felsberg Klas Nordberg The Spectral Clustering
More informationMining Web Data. Lijun Zhang
Mining Web Data Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Web Crawling and Resource Discovery Search Engine Indexing and Query Processing Ranking Algorithms Recommender Systems
More informationCLUSTERING. JELENA JOVANOVIĆ Web:
CLUSTERING JELENA JOVANOVIĆ Email: jeljov@gmail.com Web: http://jelenajovanovic.net OUTLINE What is clustering? Application domains K-Means clustering Understanding it through an example The K-Means algorithm
More informationMaster Project. Various Aspects of Recommender Systems. Prof. Dr. Georg Lausen Dr. Michael Färber Anas Alzoghbi Victor Anthony Arrascue Ayala
Master Project Various Aspects of Recommender Systems May 2nd, 2017 Master project SS17 Albert-Ludwigs-Universität Freiburg Prof. Dr. Georg Lausen Dr. Michael Färber Anas Alzoghbi Victor Anthony Arrascue
More informationCSE 573: Artificial Intelligence Autumn 2010
CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine
More informationBUAA AUDR at ImageCLEF 2012 Photo Annotation Task
BUAA AUDR at ImageCLEF 2012 Photo Annotation Task Lei Huang, Yang Liu State Key Laboratory of Software Development Enviroment, Beihang University, 100191 Beijing, China huanglei@nlsde.buaa.edu.cn liuyang@nlsde.buaa.edu.cn
More informationGraphs / Networks. CSE 6242/ CX 4242 Feb 18, Centrality measures, algorithms, interactive applications. Duen Horng (Polo) Chau Georgia Tech
CSE 6242/ CX 4242 Feb 18, 2014 Graphs / Networks Centrality measures, algorithms, interactive applications Duen Horng (Polo) Chau Georgia Tech Partly based on materials by Professors Guy Lebanon, Jeffrey
More informationPayment Systems Statistics
Payment Systems Statistics Payment Systems Department Payment Systems Table-1: Comparative Position of Payment Systems (Volume in Million and Value in Rs. Trillion) Quarter 1 - Quarter 4 - Quarter 1 FY18
More informationStatic Pruning of Terms In Inverted Files
In Inverted Files Roi Blanco and Álvaro Barreiro IRLab University of A Corunna, Spain 29th European Conference on Information Retrieval, Rome, 2007 Motivation : to reduce inverted files size with lossy
More informationGeneration of Semantic Clouds Based on Linked Data for Efficient Multimedia Semantic Annotation
Generation of Semantic Clouds Based on Linked Data for Efficient Multimedia Semantic Annotation Han-Gyu Ko and In-Young Ko Department of Computer Science, Korea Advanced Institute of Science and Technology,
More informationLink Mining & Entity Resolution. Lise Getoor University of Maryland, College Park
Link Mining & Entity Resolution Lise Getoor University of Maryland, College Park Learning in Structured Domains Traditional machine learning and data mining approaches assume: A random sample of homogeneous
More informationInformation Retrieval. (M&S Ch 15)
Information Retrieval (M&S Ch 15) 1 Retrieval Models A retrieval model specifies the details of: Document representation Query representation Retrieval function Determines a notion of relevance. Notion
More informationInformation Retrieval. Information Retrieval and Web Search
Information Retrieval and Web Search Introduction to IR models and methods Information Retrieval The indexing and retrieval of textual documents. Searching for pages on the World Wide Web is the most recent
More informationCS 224W Final Report Group 37
1 Introduction CS 224W Final Report Group 37 Aaron B. Adcock Milinda Lakkam Justin Meyer Much of the current research is being done on social networks, where the cost of an edge is almost nothing; the
More informationInformation Retrieval
Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,
More informationIntroduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.
Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p. 6 What is Web Mining? p. 6 Summary of Chapters p. 8 How
More informationLink Analysis in the Cloud
Cloud Computing Link Analysis in the Cloud Dell Zhang Birkbeck, University of London 2017/18 Graph Problems & Representations What is a Graph? G = (V,E), where V represents the set of vertices (nodes)
More informationDiffusion and Clustering on Large Graphs
Diffusion and Clustering on Large Graphs Alexander Tsiatas Thesis Proposal / Advancement Exam 8 December 2011 Introduction Graphs are omnipresent in the real world both natural and man-made Examples of
More information60-538: Information Retrieval
60-538: Information Retrieval September 7, 2017 1 / 48 Outline 1 what is IR 2 3 2 / 48 Outline 1 what is IR 2 3 3 / 48 IR not long time ago 4 / 48 5 / 48 now IR is mostly about search engines there are
More informationAuthoritative K-Means for Clustering of Web Search Results
Authoritative K-Means for Clustering of Web Search Results Gaojie He Master in Information Systems Submission date: June 2010 Supervisor: Kjetil Nørvåg, IDI Co-supervisor: Robert Neumayer, IDI Norwegian
More informationA short introduction to the development and evaluation of Indexing systems
A short introduction to the development and evaluation of Indexing systems Danilo Croce croce@info.uniroma2.it Master of Big Data in Business SMARS LAB 3 June 2016 Outline An introduction to Lucene Main
More informationSig.ma: live views on the Web of Data
Sig.ma: live views on the Web of Data Giovanni Tummarello, Richard Cyganiak, Michele Catasta, Szymon Danielczyk, and Stefan Decker Digital Enterprise Research Institute National University of Ireland,
More informationSocial Networks 2015 Lecture 10: The structure of the web and link analysis
04198250 Social Networks 2015 Lecture 10: The structure of the web and link analysis The structure of the web Information networks Nodes: pieces of information Links: different relations between information
More informationEffective Keyword Search over (Semi)-Structured Big Data Mehdi Kargar
Effective Keyword Search over (Semi)-Structured Big Data Mehdi Kargar School of Computer Science Faculty of Science University of Windsor How Big is this Big Data? 40 Billion Instagram Photos 300 Hours
More informationUniversity of Maryland. Tuesday, March 2, 2010
Data-Intensive Information Processing Applications Session #5 Graph Algorithms Jimmy Lin University of Maryland Tuesday, March 2, 2010 This work is licensed under a Creative Commons Attribution-Noncommercial-Share
More informationFinding Topic-centric Identified Experts based on Full Text Analysis
Finding Topic-centric Identified Experts based on Full Text Analysis Hanmin Jung, Mikyoung Lee, In-Su Kang, Seung-Woo Lee, Won-Kyung Sung Information Service Research Lab., KISTI, Korea jhm@kisti.re.kr
More informationQuery Decomposition: A Multiple Neighborhood Approach to Relevance Feedback Processing in Content-based Image Retrieval
Query Decomposition: A Multiple Neighborhood Approach to Relevance Feedback Processing in Content-based Image Retrieval Kien A. Hua, Ning Yu, Danzhou Liu School of Electrical Engineering and Computer Science
More informationUSC Viterbi School of Engineering
Introduction to Computational Thinking and Data Science USC Viterbi School of Engineering http://www.datascience4all.org Term: Fall 2016 Time: Tues- Thur 10am- 11:50am Location: Allan Hancock Foundation
More informationProf. Dr. Christian Bizer
STI Summit July 6 th, 2011, Riga, Latvia Global Data Integration and Global Data Mining Prof. Dr. Christian Bizer Freie Universität ität Berlin Germany Outline 1. Topology of the Web of Data What data
More informationSocial Search Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson
Social Search Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson The Anatomy of a Large-Scale Social Search Engine by Horowitz, Kamvar WWW2010 Web IR Input is a query of keywords
More informationInformation Retrieval
Natural Language Processing SoSe 2015 Information Retrieval Dr. Mariana Neves June 22nd, 2015 (based on the slides of Dr. Saeedeh Momtazi) Outline Introduction Indexing Block 2 Document Crawling Text Processing
More informationSOCIAL MEDIA MINING. Data Mining Essentials
SOCIAL MEDIA MINING Data Mining Essentials Dear instructors/users of these slides: Please feel free to include these slides in your own material, or modify them as you see fit. If you decide to incorporate
More informationWhat should I link to? Identifying relevant sources and classes for data linking
What should I link to? Identifying relevant sources and classes for data linking Andriy Nikolov, Mathieu d Aquin, Enrico Motta Knowledge Media Institute, The Open University, Milton Keynes, UK {a.nikolov,
More informationRanking Algorithms For Digital Forensic String Search Hits
DIGITAL FORENSIC RESEARCH CONFERENCE Ranking Algorithms For Digital Forensic String Search Hits By Nicole Beebe and Lishu Liu Presented At The Digital Forensic Research Conference DFRWS 2014 USA Denver,
More informationChapter 6: Information Retrieval and Web Search. An introduction
Chapter 6: Information Retrieval and Web Search An introduction Introduction n Text mining refers to data mining using text documents as data. n Most text mining tasks use Information Retrieval (IR) methods
More informationLink Analysis in Web Mining
Problem formulation (998) Link Analysis in Web Mining Hubs and Authorities Spam Detection Suppose we are given a collection of documents on some broad topic e.g., stanford, evolution, iraq perhaps obtained
More informationSOURCERER: MINING AND SEARCHING INTERNET- SCALE SOFTWARE REPOSITORIES
SOURCERER: MINING AND SEARCHING INTERNET- SCALE SOFTWARE REPOSITORIES Introduction to Information Retrieval CS 150 Donald J. Patterson This content based on the paper located here: http://dx.doi.org/10.1007/s10618-008-0118-x
More informationSemantic Website Clustering
Semantic Website Clustering I-Hsuan Yang, Yu-tsun Huang, Yen-Ling Huang 1. Abstract We propose a new approach to cluster the web pages. Utilizing an iterative reinforced algorithm, the model extracts semantic
More informationIntroduction to Information Retrieval
Introduction to Information Retrieval Skiing Seminar Information Retrieval 2010/2011 Introduction to Information Retrieval Prof. Ulrich Müller-Funk, MScIS Andreas Baumgart and Kay Hildebrand Agenda 1 Boolean
More informationPart I: Data Mining Foundations
Table of Contents 1. Introduction 1 1.1. What is the World Wide Web? 1 1.2. A Brief History of the Web and the Internet 2 1.3. Web Data Mining 4 1.3.1. What is Data Mining? 6 1.3.2. What is Web Mining?
More information