Learning Semantic Entity Representations with Knowledge Graph and Deep Neural Networks and its Application to Named Entity Disambiguation

Similar documents
Representation Learning using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval

Papers for comprehensive viva-voce

Leveraging Knowledge Graphs for Web-Scale Unsupervised Semantic Parsing. Interspeech 2013

Conversational Knowledge Graphs. Larry Heck Microsoft Research

Linking Entities in Chinese Queries to Knowledge Graph

Entity and Knowledge Base-oriented Information Retrieval

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets

Jianyong Wang Department of Computer Science and Technology Tsinghua University

Outline. Morning program Preliminaries Semantic matching Learning to rank Entities

Understanding the Query: THCIB and THUIS at NTCIR-10 Intent Task. Junjun Wang 2013/4/22

Normalizing Social Media Texts by Combining Word Embeddings and Edit Distances in a Random Forest Regressor

PTE : Predictive Text Embedding through Large-scale Heterogeneous Text Networks

NUS-I2R: Learning a Combined System for Entity Linking

Quality-Aware Entity-Level Semantic Representations for Short Texts

Entity Linking. David Soares Batista. November 11, Disciplina de Recuperação de Informação, Instituto Superior Técnico

A Study of MatchPyramid Models on Ad hoc Retrieval

A Deep Top-K Relevance Matching Model for Ad-hoc Retrieval

Is Brad Pitt Related to Backstreet Boys? Exploring Related Entities

Linking Entities in Tweets to Wikipedia Knowledge Base

A Deep Relevance Matching Model for Ad-hoc Retrieval

ELCO3: Entity Linking with Corpus Coherence Combining Open Source Annotators

Module 3: GATE and Social Media. Part 4. Named entities

Situated Conversational Interaction

S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking

August 2012 Daejeon, South Korea

Robust and Collective Entity Disambiguation through Semantic Embeddings

Interpreting Document Collections with Topic Models. Nikolaos Aletras University College London

A Korean Knowledge Extraction System for Enriching a KBox

Disambiguating Entities Referred by Web Endpoints using Tree Ensembles

Chinese Microblog Entity Linking System Combining Wikipedia and Search Engine Retrieval Results

Tulip: Lightweight Entity Recognition and Disambiguation Using Wikipedia-Based Topic Centroids. Marek Lipczak Arash Koushkestani Evangelos Milios

The SMAPH System for Query Entity Recognition and Disambiguation

AIDArabic A Named-Entity Disambiguation Framework for Arabic Text

A Few Things to Know about Machine Learning for Web Search

Knowledge Graphs: In Theory and Practice

EREL: an Entity Recognition and Linking algorithm

Mining Social Media Users Interest

arxiv: v1 [cs.ir] 31 Jul 2017

Metadata. What is it? Why is it important? Marcia L. Zeng Kent State University. Dealing with Metadata: Content, Distribution, and Availability

THE amount of Web data has increased exponentially

arxiv: v1 [cs.ir] 2 Mar 2017

Markov Chains for Robust Graph-based Commonsense Information Extraction

Personalized Page Rank for Named Entity Disambiguation

Searching complex graphs

RPI INSIDE DEEPQA INTRODUCTION QUESTION ANALYSIS 11/26/2013. Watson is. IBM Watson. Inside Watson RPI WATSON RPI WATSON ??? ??? ???

Building Rich User Profiles for Personalized News Recommendation

YJTI at the NTCIR-13 STC Japanese Subtask

LSTM and its variants for visual recognition. Xiaodan Liang Sun Yat-sen University

Time-Aware Entity Linking

Improving Difficult Queries by Leveraging Clusters in Term Graph

Enhancing Named Entity Recognition in Twitter Messages Using Entity Linking

CS249: ADVANCED DATA MINING

Plagiarism Detection Using FP-Growth Algorithm

Distant Supervision via Prototype-Based. global representation learning

Attentive Neural Architecture for Ad-hoc Structured Document Retrieval

Effect of Enriched Ontology Structures on RDF Embedding-Based Entity Linking

Explicit Semantic Ranking for Academic Search via Knowledge Graph Embedding

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Below, we will walk through the three main elements of the algorithm, which include Domain Attributes, On-Page and Off-Page factors.

SVD-based Universal DNN Modeling for Multiple Scenarios

Query Sugges*ons. Debapriyo Majumdar Information Retrieval Spring 2015 Indian Statistical Institute Kolkata

Entity Linking in Queries: Efficiency vs. Effectiveness

Image Similarity Measurements Using Hmok- Simrank

Knowledge Graphs: In Theory and Practice

Variable-Component Deep Neural Network for Robust Speech Recognition

Harnessing Relationships for Domain-specific Subgraph Extraction: A Recommendation Use Case

SEMANTIC INDEXING (ENTITY LINKING)

Query Intent Detection using Convolutional Neural Networks

Knowledge Based Systems Text Analysis

S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking

Top-N Recommendations from Implicit Feedback Leveraging Linked Open Data

Natural Language Processing with PoolParty

Entity Linking in Queries: Tasks and Evaluation

VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER

Techreport for GERBIL V1

Context based Re-ranking of Web Documents (CReWD)

On Statistical Characteristics of Real-life Knowledge Graphs

Personalized Entity Recommendation: A Heterogeneous Information Network Approach

arxiv: v1 [cs.mm] 12 Jan 2016

Exploiting DBpedia for Graph-based Entity Linking to Wikipedia

Meta-path based Multi-Network Collective Link Prediction

Entity Linking for Queries by Searching Wikipedia Sentences

How to Find Your Most Cost-Effective Keywords

Presented by: Dimitri Galmanovich. Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu

December 4, BigData 2017 Enterprise Knowledge Graphs for Large Scale Analytics 1/47

CS47300: Web Information Search and Management

Semantic image search using queries

arxiv: v3 [cs.cl] 25 Jan 2018

Modeling Interestingness with Deep Neural Networks

NEURAL MODELS FOR INFORMATION RETRIEVAL

SuperAgent: A Customer Service Chatbot for E-commerce Websites

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer

Scholarly Big Data: Leverage for Science

CS425: Algorithms for Web Scale Data

This Report Distributed By:

A Hybrid Neural Model for Type Classification of Entity Mentions

A Novel deep learning models for Cold Start Product Recommendation using Micro blogging Information

Rapidly Building Domain-Specific Entity-Centric Language Models Using Semantic Web Knowledge Sources

Entity Linking on Philosophical Documents

Thursday, 26 January, 12. Web Site Design

Transcription:

Learning Semantic Entity Representations with Knowledge Graph and Deep Neural Networks and its Application to Named Entity Disambiguation Hongzhao Huang 1 and Larry Heck 2 Computer Science Department, Rensselaer Polytechnic Institute 1 Microsoft Research 2 {huangh9@rpi.edu, Larry.Heck@microsoft.com} Specific Thanks Yelong Shen and Gustavo Abrego for the help on deep neural network related issues

Word Embeddings Standard word representation o One-hot representation Microsoft [0, 0, 0, 0,,0, 1, 0,,0] Neural word embeddings o Distributed representation Microsoft [0.453, -0.292, 0.732,, -0.243] o Represent a word by its contextual surrounding words You shall know a word by the company it keeps (J. R. Firth 1957: 11) government debt problems turning into banking crises as has happened in saying that Europe needs unified banking regulation to replace the hodgepodge Examples from (Socher et al, NAACL2013 turorial)

From Word Embeddings to Entity Embeddings How about entities? o Usually composed of multiple words Microsoft Research, James Cameron, Atlanta Hawks!= + o Entities play crucial role in many applications Entity Linking, Relation Extraction, Question & Answering Our goal o Learn task specific accurate semantic entity representations

How can we represent entities? How we learn about a new entity/concept? <James Cameron, film director, Titanic> <James Cameron, won awards, Academy Award for Best Picture>.

Semantic Knowledge Graphs (KGs) A graph composed of: o Nodes: uniquely identified entities or literals o Edges: semantic relations E.g., film director, film producer, CEO of Many rich and clean KGs o Satori, Google KG, Freebase, Dbpedia. Broad applications to natural language processing and spoken language understanding o E.g., Unsupervised semantic parsing (Heck et al, 2012) Use KG to guide automatic labeling of training instances This work: encode world knowledge from KG to assist deep understanding and accurate semantic representations of entities

Semantic Knowledge Graphs: An Example

Named Entity Disambiguation (NED): Task Definition Disambiguate linkable mentions from a specific context to their referent entities in a Knowledge Base o A mention: a phrase referring to something in the world Named entity (person, organization), object, event o An entity: a page in a Knowledge Base At a WH briefing here in Santiago, NSA spox Rhodes came with a litany of pushback on idea WH didn't consult with.

Entity Semantic Relatedness is Crucial for NED Stay up Hawk Fans. We are going through a slump, but we have to stay positive. Go Hawks! The most important feature used for NED o Non-collective approaches (Ferragina & Scaiella, 2010; Milne and Witten, 2008; Guo et.al., 2013) o Collective Approaches (Cucerzan, 2007; Milne and Witten, 2008b; Kulkarni et al., 2009; Pennacchiotti and Pantel, 2009; Ferragina and Scaiella, 2010; Cucerzan, 2011; Guo et al.,2011; Han and Sun, 2011; Han et al., 2011; Ratinovet al., 2011; Chen and Ji, 2011; Kozareva et al., 2011; Shen et al., 2013; Liu et al., 2013, Huang et al., 2014)

The State-of-the-art Approaches for Entity Semantic Relatedness Limitation I: Ingore the world knowledge from the rich Knowledge Graphs (Milne and Witten, 2008): Wikipedia Link-based unsupervised method C: the set of entities in Wikipedia o C i : the set of incoming links to c i Limitation II: what if we donot have anchor links? o Formulate as a learning-to-rank problem Supervised Method (Ceccarelli et.al., 2013) o Explore a set of link-based features

Our Approach Learn entity representations with supervised DNN and KG o Non-linear DNN proven to have more expressive power than the linear models o Directly to optimize parameters for semantic relatedness The DNN-based Semantic Similarity Model (DSSM) (Huang et al, 2013) Semantic space Deep non-linear projections 300 300 300 50K 300 300 300 50K Feature vector 500K Entity One 500K Entity Two Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, and Larry Heck, Learning Deep Structured Semantic Models for Web Search using Clickthrough Data Proc. CIKM2013

Encoding Knowledge from Knowledge Graph Knowledge Representation Example Description Letter tri-gram vector dog = <#do, dog, og#> <0,,1,1,,0,1,,0> Entity Type 1-of-V vector <0,.,0,,1,,0, > Subgraph 1-of-V vector for relation Letter tri-gram for entities

Unsupervised Collective Disambiguation with Graph Regularization Perform collective disambiguation for a set of topically-related tweets simultaneously Accuracy = 0.25, tweets are short and noisy, can o Handle information shortage and noiseness problems o Easy to not collect provide a set of rich topically-related context tweets (e.g., via social network ) information Underlining concepts are referent concepts

Graph Construction Over Multiple Tweets Each node is a pair of mention and entity candidates o Entity candidates are retrieved based on anchor links in Wikipedia An edge is created for two nodes if o Two mentions are relevant Detect with meta path o And two entities are semantically related Cosine similarity over semantic entity embeddings Similarity is used as the edge weight hawks, Hawk gators, Florida Gators men's basketball 0.02 slump, Slump (geology) 0.325 0.252 slump, Slump (sports) kemba walker, Kemba Walker bucks, Milwaukee Bucks 0.625 0.245 0.524 0.821 0.578 hawks, Atlanta Hawks

Relevant Mention Detection: Meta Path A meta-path is a path defined over a network and composed of a sequence of relations between different object types (Sun et al., 2011) o Each meta path represent a semantic relation Meta paths between mention and mention o M-T-M o M-T-U-T-M-M o M-T-H-T-M o M-T-U-T-M-T-H-T-M o M-T-H-T-M-T-U-T-M M: mention, T: tweet, U: user, H: hashtag Schema of a Heterogeneous Information Network in Twitter Two mentions are considered as relevant if there exist at least one meta path between them

Unsupervised Graph Regularization The model (Adapted from Zhu et.al, 2003) o Initial ranking score 0.32 o prior popularity and context similarity hawks, Hawk 0.34 0.25 0.74 gators, Florida Gators men's basketball 0.02 slump, Slump (geology) 0.325 0.252 slump, Slump (sports) kemba walker, Kemba Walker bucks, Milwaukee Bucks 0.625 0.245 0.524 0.25 0.821 0.8 0.578 y i : the final ranking score of node i y i0 : the initial ranking score of node i W: weight matrix of the graph hawks, Atlanta Hawks 0.22

Data and Scoring Metric Data o A public data set includes 502 messages from 28 users (Meiji et al., 2012) o A Wikipedia dump on May 3, 2013 Scoring Metric o Accuracy on top ranked entity candidates

Models for Comparison TagMe: an unsupervised model based on prior popularity and semantic relatedness of a single message (Ferragina and Scaiella, 2010) Meij: the state-of-the-art supervised approach based on the random forest model (Meij et al., 2012) GraphRegu: our proposed unsupervised graph regularization model

Overall Performance Our methods are unsupervised Method Accuracy TagMe (unsupervised) 61.9% Meiji (5 fold cross-validation) 68.4% GraphRegu + (Milne and Witten, 2008) 64.3%

Overall Performance (con t) Encode Knowledge from contextual descriptions Method Accuracy TagMe (unsupervised) 61.9% Meiji (5 fold cross-validation) 68.4% GraphRegu + (Milne and Witten, 2008) 64.3% GraphRegu + DSSM + Description 71.8% 26% error rate reduction over TagMe 21% error rate reduction over the standard method to compute semantic relatedness (Milne and Witten, 2008)

Overall Performance Encode Knowledge from structured KG Method Accuracy TagMe (unsupervised) 61.9% Meiji (5 fold cross-validation) 68.4% GraphRegu + (Milne and Witten, 2008) 64.3% GraphRegu + DSSM + Subgraph (Entity) 68.2% GraphRegu + DSSM + Subgraph (Relation + Entity) 70.0% GraphRegu + DSSM + Subgraph (Relation + Entity) + Entity Type 70.9% 23.6% error rate reduction over TagMe 18.5% error rate reduction over the standard method to compute semantic relatedness (Milne and Witten, 2008)

Overall Performance Encode all Knowledge from KG Method Accuracy TagMe (unsupervised) 61.9% Meiji (5 fold cross-validation) 68.4% GraphRegu + (Milne and Witten, 2008) 64.3% GraphRegu + DSSM + Description 71.8% GraphRegu + DSSM + Subgraph (Entity) 68.2% GraphRegu + DSSM + Subgraph (Relation + Entity) 70.0% GraphRegu + DSSM + Subgraph (Relation + Entity) + Entity Type 70.9% GraphRegu + DSSM + Description + Subgraph (Relation + Entity) + Entity Type 71.9%

Conclusions and Future work We propose to learn deep semantic entity embeddings with supervised DNN and Knowledge Graph o Significantly outperform the standard approach for named entity disambiguation Future Work o Encode semantic meta-paths from Kowledge Graph into DNN To capture the semantic meaning of knowledge o Learn entity embedding with Knowledge Graph for other tasks E.g., Question & Answering

Thank You!!! Any Questions/Comments? We will release the embedding for the whole Wikipedia Concepts Soon!!!