Re-contextualization and contextual Entity exploration. Sebastian Holzki

Similar documents
Time-aware Approaches to Information Retrieval

Linking Entities in Chinese Queries to Knowledge Graph

Effective Latent Space Graph-based Re-ranking Model with Global Consistency

ANNUAL REPORT Visit us at project.eu Supported by. Mission

Promoting Ranking Diversity for Biomedical Information Retrieval based on LDA

NUS-I2R: Learning a Combined System for Entity Linking

Towards Summarizing the Web of Entities

Evaluating the Usefulness of Sentiment Information for Focused Crawlers

Deep Character-Level Click-Through Rate Prediction for Sponsored Search

Mining Human Trajectory Data: A Study on Check-in Sequences. Xin Zhao Renmin University of China,

Mining Web Data. Lijun Zhang

Effect of log-based Query Term Expansion on Retrieval Effectiveness in Patent Searching

Link Analysis. Hongning Wang

node2vec: Scalable Feature Learning for Networks

On Modeling Temporal Dynamics Forgetting and Remembering for Intelligent Information Access

CSCI 599: Applications of Natural Language Processing Information Retrieval Evaluation"

Overview of the INEX 2009 Link the Wiki Track

Predict Topic Trend in Blogosphere

Part I: Data Mining Foundations

PUTTING CONTEXT INTO SEARCH AND SEARCH INTO CONTEXT. Susan Dumais, Microsoft Research

Supervised Random Walks

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets

A Content Based Image Retrieval System Based on Color Features

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer

Annotating Spatio-Temporal Information in Documents

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

COMP5331: Knowledge Discovery and Data Mining

Detecting Multilingual and Multi-Regional Query Intent in Web Search

Getting More from Segmentation Evaluation

Presenting a New Dataset for the Timeline Generation Problem

ResPubliQA 2010

Advanced Topics in Information Retrieval. Learning to Rank. ATIR July 14, 2016

Context-based Navigational Support in Hypermedia

Situated Conversational Interaction

Ryen W. White, Peter Bailey, Liwei Chen Microsoft Corporation

Query Sugges*ons. Debapriyo Majumdar Information Retrieval Spring 2015 Indian Statistical Institute Kolkata

Location-Based Web Services for Car Infotainment

Columbia University High-Level Feature Detection: Parts-based Concept Detectors

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Semantic Clickstream Mining

Hybrid Acquisition of Temporal Scopes for RDF Data

Tulip: Lightweight Entity Recognition and Disambiguation Using Wikipedia-Based Topic Centroids. Marek Lipczak Arash Koushkestani Evangelos Milios

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

ALEXANDRIA. Temporal Retrieval, Exploration and Analytics in Web Archives. Wolfgang Nejdl. L3S Research Center Hannover, Germany

Estimating Credibility of User Clicks with Mouse Movement and Eye-tracking Information

UMass at TREC 2017 Common Core Track

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

Unsupervised Rank Aggregation with Distance-Based Models

A B2B Search Engine. Abstract. Motivation. Challenges. Technical Report

An Introduction to PREMIS. Jenn Riley Metadata Librarian IU Digital Library Program

1. TABLE OF CONTENTS. 1. Table of Contents Guide Instructions Access Tririga Reset Your Password... 8

Overview of the 5th International Competition on Plagiarism Detection

Social Behavior Prediction Through Reality Mining

A BFS-BASED SIMILAR CONFERENCE RETRIEVAL FRAMEWORK

Recommendation System for Location-based Social Network CS224W Project Report

Temporal Information Access (Temporalia-2)

Papers for comprehensive viva-voce

CPSC 340: Machine Learning and Data Mining. Multi-Dimensional Scaling Fall 2017

University of Virginia Department of Computer Science. CS 4501: Information Retrieval Fall 2015

Sentiment Classification of Food Reviews

The PageRank Citation Ranking

Link Structure Analysis

Ranking web pages using machine learning approaches

PERSONALIZED MOBILE SEARCH ENGINE BASED ON MULTIPLE PREFERENCE, USER PROFILE AND ANDROID PLATFORM

Supervised Ranking for Plagiarism Source Retrieval

Microsoft FAST Search Server 2010 for SharePoint for Application Developers Course 10806A; 3 Days, Instructor-led

A Frame Study for Post-Processing Analysis on System Behavior: A Case Study of Deadline Miss Detection

My name is Brian Pottle. I will be your guide for the next 45 minutes of interactive lectures and review on this lesson.

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Ontology based Model and Procedure Creation for Topic Analysis in Chinese Language

TREC 2016 Dynamic Domain Track: Exploiting Passage Representation for Retrieval and Relevance Feedback

Mining Web Data. Lijun Zhang

E6885 Network Science Lecture 11: Knowledge Graphs

Query Difficulty Prediction for Contextual Image Retrieval

Information Retrieval

An Ensemble Dialogue System for Facts-Based Sentence Generation

Ontology Based Prediction of Difficult Keyword Queries

Query Subtopic Mining Exploiting Word Embedding for Search Result Diversification

Keyword search in relational databases. By SO Tsz Yan Amanda & HON Ka Lam Ethan

Improving Difficult Queries by Leveraging Clusters in Term Graph

CADIAL Search Engine at INEX

The University of Amsterdam at the CLEF 2008 Domain Specific Track

TempWeb rd Temporal Web Analytics Workshop

Diversifying Query Suggestions Based on Query Documents

Information Retrieval and Web Search

Towards Efficient and Effective Semantic Table Interpretation Ziqi Zhang

University of Amsterdam at INEX 2010: Ad hoc and Book Tracks

Information Retrieval

Context Aware Computing

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Sampling Large Graphs for Anticipatory Analysis

Mymory: Enhancing a Semantic Wiki with Context Annotations

Research at Google: With a Case of YouTube-8M. Joonseok Lee, Google Research

Automated Tagging for Online Q&A Forums

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS

More Efficient Classification of Web Content Using Graph Sampling

Information Retrieval. Information Retrieval and Web Search

Wikulu: Information Management in Wikis Enhanced by Language Technologies

Extracting Information from Complex Networks

Presented by: Dimitri Galmanovich. Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu

Transcription:

Re-contextualization and contextual Entity exploration Sebastian Holzki Sebastian Holzki June 7, 2016 1

Authors: Joonseok Lee, Ariel Fuxman, Bo Zhao, and Yuanhua Lv - PAPER PRESENTATION - LEVERAGING KNOWLEDGE BASES FOR CONTEXTUAL ENTITY EXPLORATION Sebastian Holzki June 7, 2016 2

Information needs Primary Sebastian Holzki June 7, 2016 3

Contextual entity exploration problem User selection Context Text to entity mapping Knowledge Base Knowledge Graph Sebastian Holzki June 7, 2016 4

Using knowledge bases effectively Build a focused subgraph of the knowledge graph User selection Context around user selection Explore the graph for other relevant entities Figure: Lee, Fuxman, Bo and Yuanhua [1], (adapted) Context: blue User-selection: gray Sebastian Holzki June 7, 2016 5

Creating a reasonable ranking Context Text to entity mapping Deliver a limited number of entities, ordered by relevance 1 2 3 4 Focused subgraph Sebastian Holzki June 7, 2016 6

How to measure relevance? Ability to connect user-selected entity and context entities Context-Selection Betweenness Overall relevance to user-selected entity Personalized Random Walk Figure: Lee, Fuxman, Bo and Yuanhua [1], (adapted) Sebastian Holzki June 7, 2016 7

Context-Selection betweenness (background) Employing Normalized Wikipedia Distance * I u and I v : Neighbours of u and v Measures the semantic distance of Wikipedia pages (u and v) More common neighbours less distant Normalized by corpus size * Already introduced within this course to measure entity coherence, namely: mw-coh. Sebastian Holzki June 7, 2016 8

Context-Selection betweenness θ = 0.5: Emphasizes more relevant contexts Weight relative to path-length l and number of all shortest paths k Normalized by weight of all shortest paths Z (Not only those including node v) Result: Nodes v connecting s and c through short paths have higher Context-Selection Betweenness Sebastian Holzki June 7, 2016 9

Personalized Random Walk Motivation: Consider the overall relevance of an entity Personalized means allowing arbitrary jumps: Probability for user selected entity: x s Probabilities for context entities: x c / C Result: emphasize on entities near user selection or context Reminder: Random Walk assigns stationary probabilities to nodes in a graph, by randomly walking through the graph. Sebastian Holzki June 7, 2016 10

Score Aggregation 2 1 3 1. Large graphs produce low probabilities 2. Context-Selection Betweenness is biased by number of context entities 3. Trust in Context-Selection Betweenness is affected by graph-size and number of context entities Sebastian Holzki June 7, 2016 11

Implementation details Wikipedia (2014) as knowledge base Entity linking through Wikipedia Hyperlinks Personalized Random Walk Parameters: x s = 0.05; x c = 0 Aggregated score used as ranking User selection generated by crowd workers Evaluation of ranking also done by crowd workers Sebastian Holzki June 7, 2016 12

Results: comparison to baselines Source: Lee, Fuxman, Bo and Yuanhua [1] Sebastian Holzki June 7, 2016 13

Results: scores and aggregation Source: Lee, Fuxman, Bo and Yuanhua [1] Sebastian Holzki June 7, 2016 14

Summary Switching between primary and secondary sources Contextual entity exploration Relevance in knowledge graphs Context-Selection Betweenness + Personalized Random Walk Comparison to baseline Future work (Source: [1]) Different contextualization systems have low degree of overlap Idea: combine them and re-rank Sebastian Holzki June 7, 2016 15

[Backup] Normalized Wikipedia Distance Example I u = 4 I v = 3 I u I v = 2 V = 9 NWD = (log(4) - log(2)) / (log(9) - log(3)) 0.631 Sebastian Holzki June 7, 2016 16

[Backup] Results: influence of entity linking Source: Lee, Fuxman, Bo and Yuanhua [1] Sebastian Holzki June 7, 2016 17

Authors: Nam Khanh Tran, Andrea Ceroni, Nattiya Kanhabua and Claudia Niederée - PAPER PRESENTATION - BACK TO THE PAST: SUPPORTING INTERPRETATIONS OF FORGOTTEN STORIES BY TIME-AWARE RE- CONTEXTUALIZATION Sebastian Holzki June 7, 2016 18

Context of documents from the past Information item Time-aware context Source: Course slides of Nam Khanh Tran [3] Sebastian Holzki June 7, 2016 19

Four goals of contextualizing the past 1. Relevance: Context c has to be relevant to information item d 2. Complementarity: Avoid displaying duplicate information 3. Time-awareness: Consider creation-time of d 4. Conciseness: Show brief context, not whole pages Sebastian Holzki June 7, 2016 20

Prerequisites Document with contextualization hooks Contextualization hook: parts of document, that require contextualization. [2] Entities, concept mentions, Knowledge base Facility to provide proper context for hooks (Proper: with regard to the four goals) Sebastian Holzki June 7, 2016 21

Main concept of the paper Document Hook identification Query formulation Ranked context 1 2 3 Context ranking Context retrieval Index of contextualization units Pre processed knowledge base Source: Course slides of Nam Khanh Tran [3] and original paper [2] Sebastian Holzki June 7, 2016 22

Query formulation 1. Document-based queries (title, lead paragraph) 2. Hook-based queries (all hooks, each hook on its own) 3. Learning to select promising queries out of power-set of hooks Trained on Recall@k Some prediction-features: Linguistic: entities, document length, number of nouns, Temporal scope (±2 years), temporal similarity, temporal document frequency Query Performance Prediction (QPP) Sebastian Holzki June 7, 2016 23

Context retrieval Query-likelihood language model: Rank contextualization units c according to the probability P(c q) => How likely is it, that c generates the given query q at random Assuming independent query terms Serves for relevance goal Sebastian Holzki June 7, 2016 24

Learning to re-rank context Contextualization units initially ranked by retrieval model Goal of reranking: Achieve high precision-measure Also consider a variety of features Trained by examples: [proper context => document] Incorporated features: Topic diversity Entity difference, text difference (complementarity) Temporal similarity of context and document Sebastian Holzki June 7, 2016 25

Evaluation: query formulation methods Highlighted: variations of machine-learned approach (Query Performance Prediction) Sebastian Holzki June 7, 2016 26

Evaluation: re-ranking context Baselines { RF: learning to rank approach using RankForest LM: standard query-likelihood language model LM-T: modification of LM including time-awareness Same query formulation method QPP@100 Learning to rank performs significantly better Complementarity plays an important role Sebastian Holzki June 7, 2016 27

Summary Contextualization need for documents from the past Usual entity contextualization falls short Time-awareness and complementarity improve context quality Query formulation optimized for recall Context retrieval and ranking optimized for precision Results show significant improvements Sebastian Holzki June 7, 2016 28

Topic: Contextualization overall summary Focus on user-selection/hooks Well structured and large knowledge bases are mandatory Some goals require more effort (time-awareness) Automated contextualization makes life a lot easier Thanks for your attention! Sebastian Holzki June 7, 2016 29

Sources [1] Lee, Joonseok and Fuxman, Ariel and Zhao, Bo and Lv, Yuanhua. Leveraging Knowledge Bases for Contextual Entity Exploration. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015. [2] Nam Khanh Tran, Andrea Ceroni, Nattiya Kanhabua and Claudia Niederée. Back to the Past: Supporting Interpretations of Forgotten Stories by Time-aware Re-Contextualization. Proceedings of the Eighth International Conference on Web Search and Data Mining (WSDM'15), Shanghai, China, February, 2015 [3] Nam Khanh Tran. Contextualization (Course slides). Lecture Web Science at Leibniz Universität Hannover, May 3, 2016. Sebastian Holzki June 7, 2016 30