Entity and Knowledge Base-oriented Information Retrieval

Similar documents
Towards Better Text Understanding and Retrieval through Kernel Entity Salience Modeling

End-to-End Neural Ad-hoc Ranking with Kernel Pooling

Explicit Semantic Ranking for Academic Search via Knowledge Graph Embedding

Knowledge Based Text Representations for Information Retrieval

Towards Better Text Understanding and Retrieval through Kernel Entity Salience Modeling

Outline. Morning program Preliminaries Semantic matching Learning to rank Entities

A Deep Relevance Matching Model for Ad-hoc Retrieval

Understanding the Query: THCIB and THUIS at NTCIR-10 Intent Task. Junjun Wang 2013/4/22

Attentive Neural Architecture for Ad-hoc Structured Document Retrieval

Jianyong Wang Department of Computer Science and Technology Tsinghua University

Queripidia: Query-specific Wikipedia Construction

Search Engines. Information Retrieval in Practice

Improving Difficult Queries by Leveraging Clusters in Term Graph

Finding Relevant Relations in Relevant Documents

UMass at TREC 2017 Common Core Track

External Query Reformulation for Text-based Image Retrieval

Fall CS646: Information Retrieval. Lecture 2 - Introduction to Search Result Ranking. Jiepu Jiang University of Massachusetts Amherst 2016/09/12

Bi-directional Linkability From Wikipedia to Documents and Back Again: UMass at TREC 2012 Knowledge Base Acceleration Track

Chapter 6: Information Retrieval and Web Search. An introduction

Chapter 27 Introduction to Information Retrieval and Web Search

Learning to Reweight Terms with Distributed Representations

CS 6320 Natural Language Processing

Learning Semantic Entity Representations with Knowledge Graph and Deep Neural Networks and its Application to Named Entity Disambiguation

Fractional Similarity : Cross-lingual Feature Selection for Search

WebSci and Learning to Rank for IR

Understanding the Query: THCIB and THUIS at NTCIR-10 Intent Task

Is Brad Pitt Related to Backstreet Boys? Exploring Related Entities

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

Deep Character-Level Click-Through Rate Prediction for Sponsored Search

Information Retrieval. (M&S Ch 15)

James Mayfield! The Johns Hopkins University Applied Physics Laboratory The Human Language Technology Center of Excellence!

A Few Things to Know about Machine Learning for Web Search

Effective Latent Space Graph-based Re-ranking Model with Global Consistency

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

An Axiomatic Approach to IR UIUC TREC 2005 Robust Track Experiments

Linking Entities in Chinese Queries to Knowledge Graph

Informativeness for Adhoc IR Evaluation:

Ranking with Query-Dependent Loss for Web Search

Deep and Reinforcement Learning for Information Retrieval

Papers for comprehensive viva-voce

Modern Retrieval Evaluations. Hongning Wang

Sponsored Search Advertising. George Trimponias, CSE

Information Retrieval (IR) Introduction to Information Retrieval. Lecture Overview. Why do we need IR? Basics of an IR system.

Efficient query processing

Introduction to Text Mining. Hongning Wang

Semantic Entity Retrieval using Web Queries over Structured RDF Data

arxiv: v2 [cs.ir] 27 Jul 2017

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets

Basic techniques. Text processing; term weighting; vector space model; inverted index; Web Search

Jan Pedersen 22 July 2010

Information Retrieval

Semantic Matching by Non-Linear Word Transportation for Information Retrieval

Estimating Embedding Vectors for Queries

Advanced Topics in Information Retrieval. Learning to Rank. ATIR July 14, 2016

NUS-I2R: Learning a Combined System for Entity Linking

An Information Retrieval Approach for Source Code Plagiarism Detection

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

CS54701: Information Retrieval

Supervised Reranking for Web Image Search

From Neural Re-Ranking to Neural Ranking:

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer

ENTITY CENTRIC INFORMATION RETRIEVAL. Xitong Liu

Semantic image search using queries

TriRank: Review-aware Explainable Recommendation by Modeling Aspects

University of Delaware at Diversity Task of Web Track 2010

Finding Topic-centric Identified Experts based on Full Text Analysis

Part I: Data Mining Foundations

Representation Learning using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval

Query Expansion using Wikipedia and DBpedia

Improving Web Page Retrieval using Search Context from Clicked Domain Names. Rongmei Li

Text Analytics (Text Mining)

Developing Focused Crawlers for Genre Specific Search Engines

Fall Lecture 16: Learning-to-rank

TREC-10 Web Track Experiments at MSRA

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

Information Retrieval: Retrieval Models

Entity-Based Information Retrieval System: A Graph-Based Document and Query Model

TREC OpenSearch Planning Session

Text Analytics (Text Mining)

NPRF: A Neural Pseudo Relevance Feedback Framework for Ad-hoc Information Retrieval

Information Retrieval. CS630 Representing and Accessing Digital Information. What is a Retrieval Model? Basic IR Processes

Automatic Domain Partitioning for Multi-Domain Learning

Table2Vec: Neural Word and Entity Embeddings for Table Population and Retrieval

Social Search Introduction to Information Retrieval INF 141/ CS 121 Donald J. Patterson

Exploiting Entity Linking in Queries for Entity Retrieval

A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK

Introduction to Information Retrieval

Representation/Indexing (fig 1.2) IR models - overview (fig 2.1) IR models - vector space. Weighting TF*IDF. U s e r. T a s k s

Retrieval and Feedback Models for Blog Distillation

Information Retrieval

Northeastern University in TREC 2009 Million Query Track

From Neural Re-Ranking to Neural Ranking: Learning a Sparse Representation for Inverted Indexing

<is web> Information Systems & Semantic Web University of Koblenz Landau, Germany

Word Embeddings in Search Engines, Quality Evaluation. Eneko Pinzolas

Word Embedding for Social Book Suggestion

for Searching Social Media Posts

Eagle Eye. Sommersemester 2017 Big Data Science Praktikum. Zhenyu Chen - Wentao Hua - Guoliang Xue - Bernhard Fabry - Daly

Classification of Text Documents Using B-Tree

Basic Tokenizing, Indexing, and Implementation of Vector-Space Retrieval

Transcription:

Entity and Knowledge Base-oriented Information Retrieval Presenter: Liuqing Li liuqing@vt.edu Digital Library Research Laboratory Virginia Polytechnic Institute and State University Blacksburg, VA 24061 Nov 08, 2018

Outline Paper List Background Approach Evaluation 1

Outline Paper List Background Approach Evaluation 2

Paper List State-of-the-art Major groups and members James Allan s group (UMA), Jeffrey Dalton Jamie Callan s group (CMU), Chenyan Xiong Paper Author Conference [1] Entity query feature expansion using knowledge base links Jeffrey Dalton, Laura Dietz, James Allan SIGIR 14 [2] Esdrank: Connecting query and documents through external semi-structured data Chenyan Xiong, Jamie Callan CIKM 15 [3] Query expansion with Freebase Chenyan Xiong, Jamie Callan ICTIR 15 [4] Bag-of-Entities representation for ranking Chenyan Xiong, Jamie Callan, Tie-Yan Liu ICTIR 16 3

Paper List Paper Author Conference [5] An empirical study of learning to rank for entity search Jing Chen, Chenyan Xiong, Jamie Callan SIGIR 16 [6] JointSem: Combining Query Entity Linking and Entity based Document Ranking Chenyan Xiong, Zhengzhong Liu, Jamie Callan, Eduard Hovy CIKM 17 [7] Explicit Semantic Ranking for Academic Search via Knowledge Graph Embedding Chenyan Xiong, Russell Power, Jamie Callan WWW 17 [8] Word-entity duet representations for document ranking Chenyan Xiong, Jamie Callan, Tie-Yan Liu SIGIR 17 [9] DBpedia-entity v2: a test collection for entity search Faegheh Hasibi, et al. SIGIR 17 [10] End-to-end neural ad-hoc ranking with kernel pooling Chenyan Xiong, et al. SIGIR 17 [11] Convolutional neural networks for softmatching n-grams in ad-hoc search Zhuyun Dai, Chenyan Xiong, Jamie Callan, Zhiyuan Liu WSDM 18 4

Outline Paper List Background Approach Evaluation 5

Background Document Retrieval Information Retrieval Pseudo Relevance Feedback & Query Expansion (PRF & QE) Learning to Rank (LeToR) Entity Retrieval Entity-related Knowledge Base Entity Linking Systems Entity and Knowledge Base-oriented Information Retrieval 6

Background Document Retrieval Document Files 7

Background Document Retrieval Inverted Indexing 8

Background PRF & QE 9

Background Learning to Rank 10

Background Learning to Rank Features are IMPORTANT! Query-independent features Depend only on documents (e.g., PageRank or doc length) Query-dependent features Depend both on document content and query Query level features Depend only on query (e.g., number of words in a query) 11

Background Entity Retrieval Goal Retrieve entities (e.g., Wikipedia pages) with a couple of queries Previous tasks TREC Entity Track (2009, 2010, 2011) INEX Entity Ranking Track (2009) INEX Linked Data Track (2012) Document retrieval with entity annotations 12

Background Knowledge Base 13

Background Knowledge Base Freebase Sub-graph [3] Chenyan Xiong, Jamie Callan. Query expansion with Freebase. ICTIR 15. 14

Background Entity Linking Systems FACC1 (Google) Freebase Annotations of the ClueWeb Corpora 11 billion entity annotations in 800 million documents TagMe (A3 Lab) Identify short-phrases and link them to Wikipedia Demo: https://tagme.d4science.org/tagme/ 15

Background Entity Linking Systems CMNS (Hasibi et al., Norwegian University) Entity Linking in Queries: Tasks and Evaluation ICTIR 15 16

Background Entity Linking Systems CMNS (Hasibi et al., Norwegian University) Entity Linking in Queries: Tasks and Evaluation ICTIR 15 17

Background Entity and KB-oriented IR Workflow Users Queries Document Collection Knowledge Base of Entity Representations and Relations URIs of Entity Real Entity 18

Outline Paper List Background Approach Evaluation 19

Paper Entity query feature expansion using knowledge base links. SIGIR 14. Strengths Most previous work focuses on text, mainly using unigram concepts This paper enriches queries with features from entities and their links to knowledge bases Knowledge Base (Wikipedia, Freebase) Entity Annotation (FACC1) 20

Entity Query Feature Expansion (EQFE) Focuses on words, entities, types, categories Entity linking the query provides very precise indicators but may also miss many of the relevant entities Entity expansion in PRF may make a query noisy [1] Jeffrey Dalton, Laura Dietz, James Allan. Entity query feature expansion using knowledge base links. SIGIR 14. 21

EQFE [1] Jeffrey Dalton, Laura Dietz, James Allan. Entity query feature expansion using knowledge base links. SIGIR 14. 22

Paper Esdrank: Connecting query and documents through external semi-structured data. CIKM 15. Strengths Treats the vocabularies, terms and entities from external data as objects Treats the external objects as latent layer between query and documents Knowledge Base (Freebase) Entity Annotation (TagMe, FACC1) 23

EsdRank Latent-ListMLE, a LeToR algorithm, models the objects as latent space between query and documents Query Objects Documents U = u $$,, u '(,, u )* V = v $,, v (,, v * How to find related objects? Query Annotation, Object Search, Document Annotation [2] Chenyan Xiong, Jamie Callan. Esdrank: Connecting query and documents through external semi-structured data. CIKM 15. 24

EsdRank Selected Features Query - Objects Object Selection Score Textual Similarity Ontology Overlap Object Frequency Objects - Documents Textual Similarity Ontology Overlap Graph Connection Document Quality Similarity with Other Objects [2] Chenyan Xiong, Jamie Callan. Esdrank: Connecting query and documents through external semi-structured data. CIKM 15. 25

Paper Query expansion with Freebase. ICTIR 15. Strengths Two methods of identifying the entities associated with a query Two methods of using those entities to perform query expansion Knowledge Base (Freebase) Entity Annotation (FACC1) 26

Unsupervised Expansion Linking Freebase Objects to the Query Link by Search Query q is issued to the Google Search API to get its ranking of objects O with ranking scores Link by FACC1 Select related objects from the FACC1 annotations in top retrieved documents How to rank objects? [3] Chenyan Xiong, Jamie Callan. Query expansion with Freebase. ICTIR 15. 27

Unsupervised Expansion Selecting Expansion Terms from Linked Objects Select by PRF Tf-idf based PRF on linked objects descriptions Select by Category Naïve Bayesian Classifier Probability of a term belonging to a category [3] Chenyan Xiong, Jamie Callan. Query expansion with Freebase. ICTIR 15. 28

Paper Bag-of-Entities representation for ranking. ICTIR 16. Strengths Evaluate three different entity linking systems Knowledge Base (Freebase, Wikipedia) Entity Annotation (FACC1, TagMe, CMNS) 29

Bag-of-Entities Representation Entity annotation (FACC1, TagMe, CMNS) Use a bag-of-entities to represent a query or document Apply two basic ranking models Coordinate Match (COOR) Entity Frequency (EF) [4] Chenyan Xiong, Jamie Callan, Tie-Yan Liu. Bag-of-Entities representation for ranking. ICTIR 16. 30

Bag-of-Entities Representation Pre-retrieved documents re-ranking Study different entity annotation methods Compare bag-of-words and bag-of-entities methods [4] Chenyan Xiong, Jamie Callan, Tie-Yan Liu. Bag-of-Entities representation for ranking. ICTIR 16. 31

Paper An empirical study of learning to rank for entity search. SIGIR 16. Strengths Learning to rank methods are as powerful for ranking entities as for ranking documents Knowledge Base (DBpedia with RDF triples) Entity Annotation (None) 32

LeToR for Entity Search How to represent entities? Name, Cat, Attr, RelEn, SimEn Features for query-entity pairs Language model BM25 Coordinate match Cosine similarity Sequential dependency model (SDM) Fielded sequential dependency model (FSDM) [5] Jing Chen, Chenyan Xiong, Jamie Callan. An empirical study of learning to rank for entity search. SIGIR 16. 33

Paper JointSem: Combining Query Entity Linking and Entity based Document Ranking. CIKM 17. Strengths Combines query entity linking and entity-based document ranking A joint learning-to-rank model Knowledge Base (Wikipedia) Entity Annotation (FACC1) 34

Joint Semantic Ranking Spotting, Linking and Ranking Surface forms and candidate entities Feature Generation Table 1: Spotting, linking, and ranking features. Surface Form Features are extracted for each spotted surface form. Entity Features are extracted for each candidate entity from each spot. Entity-Document Ranking Features are extracted for each entity-document pair. The number in brackets is the dimension of the corresponding feature group. [6] Chenyan Xiong, Zhengzhong Liu, Jamie Callan, Eduard Hovy. JointSem: Combining Query Entity Linking and Entity based Document Ranking. CIKM 17. 35

Joint Semantic Ranking Joint Learning to Rank learns the importance of the surface form learns the importance of the aligned entity learns the ranking of document for the entity [6] Chenyan Xiong, Zhengzhong Liu, Jamie Callan, Eduard Hovy. JointSem: Combining Query Entity Linking and Entity based Document Ranking. CIKM 17. 36

Paper Explicit Semantic Ranking for Academic Search via Knowledge Graph Embedding. WWW 17. Strengths Represents queries and documents in the entity space and ranks them based on their semantic connections from their knowledge graph embedding Knowledge Base (Freebase) Entity Annotation (FACC1, CMNS) 37

Explicit Semantic Ranking Each element in the matrix max-pooling along the query dimension bin-pooling (histogram) [7] Chenyan Xiong, Russell Power, Jamie Callan. Explicit Semantic Ranking for Academic Search via Knowledge Graph Embedding. WWW 17. 38

Paper Word-entity duet representations for document ranking. SIGIR 17. Strengths Creates a word-entity duet framework Creates an attention-based ranking model Knowledge Base (Freebase) Entity Annotation (TagMe) 39

Word-Entity Duet Representation Words and entities in both query and documents [8] Chenyan Xiong, Jamie Callan, Tie-Yan Liu. Word-entity duet representations for document ranking. SIGIR 17. 40

Word-Entity Duet Representation Words and entities in both query and documents [8] Chenyan Xiong, Jamie Callan, Tie-Yan Liu. Word-entity duet representations for document ranking. SIGIR 17. 41

Word-Entity Duet Representation Words and entities in both query and documents [8] Chenyan Xiong, Jamie Callan, Tie-Yan Liu. Word-entity duet representations for document ranking. SIGIR 17. 42

Word-Entity Duet Representation Words and entities in both query and documents [8] Chenyan Xiong, Jamie Callan, Tie-Yan Liu. Word-entity duet representations for document ranking. SIGIR 17. 43

Word-Entity Duet Representation Attention-based ranking model [8] Chenyan Xiong, Jamie Callan, Tie-Yan Liu. Word-entity duet representations for document ranking. SIGIR 17. 44

Paper End-to-end neural ad-hoc ranking with kernel pooling. SIGIR 17. Convolutional neural networks for soft-matching n- grams in ad-hoc search. WSDM 18. Strengths Kernel pooling CNN Could be entity or knowledge base-based 45

Kernel Pooling [10] Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, Russell Power. End-to-end neural ad-hoc ranking with kernel pooling. SIGIR 17. 46

Kernel Pooling [10] Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, Russell Power. End-to-end neural ad-hoc ranking with kernel pooling. SIGIR 17. 47

Kernel Pooling [10] Chenyan Xiong, Zhuyun Dai, Jamie Callan, Zhiyuan Liu, Russell Power. End-to-end neural ad-hoc ranking with kernel pooling. SIGIR 17. 48

N-grams [11] Zhuyun Dai, Chenyan Xiong, Jamie Callan, Zhiyuan Liu. Convolutional neural networks for soft-matching n-grams in ad-hoc search. WSDM 18. 49

Outline Paper List Background Approach Evaluation 50

Evaluation Datasets ClueWeb09 https://lemurproject.org/clueweb09/ Entire Dataset: 4,780,950,903 Unique URLs ClueWeb09-B: first 50 million English pages (Popular) ClueWeb12 https://lemurproject.org/clueweb12/ Entire Dataset: 733 million pages ClueWeb12-B13: about 50 million pages (Popular) 51

Evaluation Queries TREC 2009-2012: 200 queries (ClueWeb09-B) TREC 2013-2014: 100 queries (ClueWeb12-B13) Ad hoc task and diversity task 52

Evaluation Metrics TREC Official: NDCG@20 and ERR@20 More: MAP@K, P@K Re-rank: Win / Tie / Loss 53

Questions? Liuqing Li 3 rd year PhD student in Digital Library Research Laboratory GTA in CS4984 / CS5984 (Big Data Text Summarization) Email: liuqing@vt.edu Website: http://liuqing.dlib.vt.edu/main