Jianyong Wang Department of Computer Science and Technology Tsinghua University

Similar documents
THE amount of Web data has increased exponentially

Linking Entities in Chinese Queries to Knowledge Graph

LINDEN: Linking Named Entities with Knowledge Base via Semantic Knowledge

Introduction to Text Mining. Hongning Wang

Papers for comprehensive viva-voce

Linking Entities in Short Texts Based on a Chinese Semantic Knowledge Base

Entity and Knowledge Base-oriented Information Retrieval

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets

August 2012 Daejeon, South Korea

NUS-I2R: Learning a Combined System for Entity Linking

On Statistical Characteristics of Real-life Knowledge Graphs

Presented by: Dimitri Galmanovich. Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu

Quality-Aware Entity-Level Semantic Representations for Short Texts

Effective Latent Space Graph-based Re-ranking Model with Global Consistency

Understanding the Query: THCIB and THUIS at NTCIR-10 Intent Task

Knowledge Based Systems Text Analysis

Interpreting Document Collections with Topic Models. Nikolaos Aletras University College London

Learning Semantic Entity Representations with Knowledge Graph and Deep Neural Networks and its Application to Named Entity Disambiguation

Understanding the Query: THCIB and THUIS at NTCIR-10 Intent Task. Junjun Wang 2013/4/22

SEMANTIC INDEXING (ENTITY LINKING)

Deep Character-Level Click-Through Rate Prediction for Sponsored Search

Learning Ontology-Based User Profiles: A Semantic Approach to Personalized Web Search

Tools and Infrastructure for Supporting Enterprise Knowledge Graphs

Is Brad Pitt Related to Backstreet Boys? Exploring Related Entities

TERM BASED WEIGHT MEASURE FOR INFORMATION FILTERING IN SEARCH ENGINES

<is web> Information Systems & Semantic Web University of Koblenz Landau, Germany

Entity Linking in Web Tables with Multiple Linked Knowledge Bases

Lecture 24: NER & Entity Linking

Exploiting Semantics Where We Find Them

Enhanced Performance of Search Engine with Multitype Feature Co-Selection of Db-scan Clustering Algorithm

EXTRACT THE TARGET LIST WITH HIGH ACCURACY FROM TOP-K WEB PAGES

Text Analytics (Text Mining)

Image Similarity Measurements Using Hmok- Simrank

Cost-Effective Conceptual Design. over Taxonomies. Yodsawalai Chodpathumwan. University of Illinois at Urbana-Champaign.

Knowledge Graphs: In Theory and Practice

Enhanced retrieval using semantic technologies:

Topic Diversity Method for Image Re-Ranking

S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking

A Joint Model for Discovering and Linking Entities

TDT- An Efficient Clustering Algorithm for Large Database Ms. Kritika Maheshwari, Mr. M.Rajsekaran

December 4, BigData 2017 Enterprise Knowledge Graphs for Large Scale Analytics 1/47

Outline. Morning program Preliminaries Semantic matching Learning to rank Entities

Telling Experts from Spammers Expertise Ranking in Folksonomies

VisoLink: A User-Centric Social Relationship Mining

Clustering. Bruno Martins. 1 st Semester 2012/2013

A Hybrid Neural Model for Type Classification of Entity Mentions

Entity Linking. David Soares Batista. November 11, Disciplina de Recuperação de Informação, Instituto Superior Técnico

Text Analytics (Text Mining)

Maximizing the Value of STM Content through Semantic Enrichment. Frank Stumpf December 1, 2009

UBC Entity Discovery and Linking & Diagnostic Entity Linking at TAC-KBP 2014

A hybrid method to categorize HTML documents

NATURAL LANGUAGE PROCESSING

Towards Breaking the Quality Curse. AWebQuerying Web-Querying Approach to Web People Search.

SUPPORTING PRIVACY PROTECTION IN PERSONALIZED WEB SEARCH- A REVIEW Neha Dewangan 1, Rugraj 2

Holistic and Compact Selectivity Estimation for Hybrid Queries over RDF Graphs

Information Retrieval using Pattern Deploying and Pattern Evolving Method for Text Mining

High-Throughput and Language-Agnostic Entity Disambiguation and Linking on User Generated Data

Annotating Spatio-Temporal Information in Documents

Recommender Systems: Practical Aspects, Case Studies. Radek Pelánek

A Novel Categorized Search Strategy using Distributional Clustering Neenu Joseph. M 1, Sudheep Elayidom 2

A Survey on Postive and Unlabelled Learning

Construction and Querying of Large-scale Knowledge Bases. Part II: Schema-agnostic Knowledge Base Querying

Lecture Notes: Social Networks: Models, Algorithms, and Applications Lecture 28: Apr 26, 2012 Scribes: Mauricio Monsalve and Yamini Mule

ACM MM Dong Liu, Shuicheng Yan, Yong Rui and Hong-Jiang Zhang

Linking FRBR Entities to LOD through Semantic Matching

Query Independent Scholarly Article Ranking

From Dynamic to Unbalanced Ontology Matching

Replication on Affinity Propagation: Clustering by Passing Messages Between Data Points

IMPROVING INFORMATION RETRIEVAL BASED ON QUERY CLASSIFICATION ALGORITHM

Fraud Detection of Mobile Apps

CHAPTER 5 OPTIMAL CLUSTER-BASED RETRIEVAL

Focused crawling: a new approach to topic-specific Web resource discovery. Authors

PRISM: Concept-preserving Social Image Search Results Summarization

Learning Dense Models of Query Similarity from User Click Logs

PERSONALIZED MOBILE SEARCH ENGINE BASED ON MULTIPLE PREFERENCE, USER PROFILE AND ANDROID PLATFORM

TRENTINOMEDIA: Exploiting NLP and Background Knowledge to Browse a Large Multimedia News Store

PROVIDING PRIVACY AND PERSONALIZATION IN SEARCH

Chinese Microblog Entity Linking System Combining Wikipedia and Search Engine Retrieval Results

arxiv: v1 [cs.ir] 2 Mar 2017

Ontology-Based Web Query Classification for Research Paper Searching

Cost Effective Conceptual Design for Semantic Annotation

Overview of Web Mining Techniques and its Application towards Web

Developing Focused Crawlers for Genre Specific Search Engines

Text Mining: A Burgeoning technology for knowledge extraction

Approaches to Mining the Web

Ontological Topic Modeling to Extract Twitter users' Topics of Interest

Deep Web Crawling and Mining for Building Advanced Search Application

RiMOM Results for OAEI 2010

Meta-path based Multi-Network Collective Link Prediction

Tulip: Lightweight Entity Recognition and Disambiguation Using Wikipedia-Based Topic Centroids. Marek Lipczak Arash Koushkestani Evangelos Milios

SLIPO. Scalable Linking and Integration of Big POI data. Giorgos Giannopoulos IMIS/Athena RC

Hierarchical Link Analysis for Ranking Web Data

Query Difficulty Prediction for Contextual Image Retrieval

Improving Difficult Queries by Leveraging Clusters in Term Graph

Plagiarism Detection Using FP-Growth Algorithm

Natural Language Processing. SoSe Question Answering

Hidden-Web Databases: Classification and Search

A service based on Linked Data to classify Web resources using a Knowledge Organisation System

Mining Social Media Users Interest

ImgSeek: Capturing User s Intent For Internet Image Search

Transcription:

Jianyong Wang Department of Computer Science and Technology Tsinghua University jianyong@tsinghua.edu.cn Joint work with Wei Shen (Tsinghua), Ping Luo (HP), and Min Wang (HP)

Outline Introduction to entity linking with a knowledge base Motivation & definition Entity linking for unstructured Web documents Entity linking for structured Web lists/tables Entity linking for Tweets Conclusion 2015-11-12 2

Motivation Knowledge base construction from heterogeneous data Better user experience of information search and recommendation is always in great demand Semantic search on the Web, Deep Q/A in NL, 2015-11-12 3

Motivation Knowledge base construction from heterogeneous data Better user experience of information search and recommendation is always in great demand Semantic search on the Web, Deep Q/A in NL, Structured knowledge discovery from heterogeneous data Free texts, Tables, Lists, Twitter, Weibo, Entities, semantic categories, mutual relations, Freebase 68 million entities 1 billion facts Knowledge Graph (as of 2012) 570 million entities 18 billion facts DBpedia 3.64 million entities DBpedia Yago Wikipedia Over 10 million 结构化 entities 364 120 万个条目 million( relations 本体 ) 2015-11-12 4

Motivation Existing Knowledge Bases are far from perfect They are large, but their coverage is still low Popular or well-known person, place or thing E.g., Google s Knowledge graph As of 2012, its semantic network contained over 570 million objects and more than 18 billion facts about the relationships between these different objects which are used to understand the meaning of the keywords entered for the search Source: http://en.wikipedia.org/wiki/ Google_Knowledge_Graph 2015-11-12 5

Motivation Knowledge base population Automatically populating and enriching the existing KB with the newly extracted facts Why? Limited coverage for existing KBs As world evolves New facts come into existence Entity linking is inherently considered as an important subtask for knowledge base population Entity Relation Entity Michael Jordan isplayerof Bulls Michael Jordan : Michael Jordan (NBA); Michael Jordan (Professor); Michael Jordan (footballer);. Bulls : Chicago Bulls; Bulls, New Zealand; Bulls (rugby);. 2015-11-12 6

Web Text: German Chancellor Angela Merkel and her husband Joachim Sauer went to Ulm, Germany. Entity Linking Knowledge base: Bridging NIL Figure : An example of YAGO 2015-11-12 7

Challenges in Entity Linking Entity ambiguity problem Name variations: a named entity may have multiple names National Basketball Association NBA New York City Big Apple Entity ambiguity: a name may refer to several different named entities Michael J. Jordan (NBA player) Michael Jordan Michael I. Jordan (Berkeley professor) Michael W. Jordan (footballer) Michael Jordan (mycologist) 2015-11-12 8

Outline Introduction to entity linking with a knowledge base Motivation & definition Entity linking for unstructured Web documents Entities detected from unstructured Web documents: a common scenario The LINDEN framework (WWW 12) Entity linking for structured Web lists/tables Entities detected from structured Web lists/tables The LIEGE framework (KDD 12) Entity linking for Tweets Entities detected from short and noisy Tweets The KAURI framework (KDD 13) Conclusion 2015-11-12 9

Entity linking for unstructured Data Problem Definition Entity linking task Input: A textual named entity mention m, already recognized in the unstructured Web document Output: The corresponding real world entity e in the knowledge base If the matching entity e for entity mention m does not exist in the knowledge base, we should return a NIL for m 2015-11-12 10

Entity Linking for Unstructured Data Previous Methods Essential step of entity linking Define a similarity measure between the text around the entity mention and the document associated with the entity Bag of words model Represent the context as a term vector Measure the co-occurrence statistics of terms Cannot capture the semantic knowledge Example: Text: Michael Jordan wins NBA champion. 2015-11-12 11

Entity Linking for Unstructured Data Our solution: The LINDEN Framework Define four features Feature 1: Prior probability Based on the count information Semantic network based features Feature 2: Semantic associativity Based on the Wikipedia hyperlink structure Feature 3: Semantic similarity Derived from the taxonomy of YAGO Feature 4: Global coherence Global document-level topical coherence among entities To rank the candidates, we compute a score by a linear combination of the four features, where Entity mentions: Michael Jordan, NBA Use a max-margin technique to automatically learn the weights Minimize over ξ m 0 and the objective 2015-11-12 12

Entity Linking for Unstructured Data Our solution: Experimental Study 2015-11-12 13

Outline Introduction to entity linking with a knowledge base Motivation & definition Entity linking for unstructured Web documents Entities detected from unstructured Web documents: a common scenario The LINDEN framework (WWW 12) Entity linking for structured Web list/tables Entities detected from structured Web lists/tables The LIEGE framework (KDD 12) Entity linking for Tweets Entities detected from short and noisy Tweets The KAURI framework (KDD 13) Conclusion 2015-11-12 14

Entity Linking for Structured Data Motivation A list of famous football players A list of bestselling albums A list of famous artists A list of NBA players Web List NBA player Berkeley professor 2015-11-12 15

Input Entity Linking for Structured Data Problem Definition List linking task Link the entity mentions that appear in the Web lists with the corresponding real world entities in the knowledge base Output Figure: An illustration of the list linking task. The Web list enumerates some best-selling single volume books. Candidate mapping entities from knowledge base for each list item are shown on the right of the figure; true mapping entity for each list item is underlined. 2015-11-12 16

Entity Linking for Structured Data List Linking The list linking task is practically important Knowledge base population and table annotation Challenge No textual context Different from the task of linking entities in free text Assumption Entities mentioned in a Web list can be any collection of entities that have the same conceptual type 2015-11-12 17

Entity Linking for Structured Data Our solution: Linking Quality Metric Prior probability Define the popularity of an entity based on the link count information from Wikipedia Coherence The type of the candidate mapping entity should be coherent with the types of the other mapping entities in the same Web list Type hierarchy based similarity Distributional context similarity 2015-11-12 18

Entity Linking for Structured Data Our solution: Linking Quality Linking quality for candidate entity r i,j Prior probability Coherence Type hierarchy based similarity Weight vector Distributional context similarity We utilize the max-margin technique to automatically learn the weight vector which gives proper weights for different features (Details in paper) 2015-11-12 19

Entity Linking for Structured Data Our solution: Iterative Substitution Algorithm Initialization: Pick the candidate entity that has the maximum prior probability as the initial estimate of the mapping entity for the list item Iterative substitution: Iteratively refine the mapping entity list to improve its linking quality When the maximum improvement is smaller than zero, we stop the iteration. We prove that this algorithm is guaranteed to converge. 2015-11-12 20

Entity Linking for Structured Data Our solution: Experimental Study 2015-11-12 21

Outline Introduction to entity linking with a knowledge base Motivation & definition Entity linking for unstructured Web documents Entities detected from unstructured Web documents: a common scenario The LINDEN framework (WWW 12) Entity linking for structured Web list/tables Entities detected from structured Web lists/tables The LIEGE framework (KDD 12) Entity linking for Tweets Entities detected from short and noisy Tweets The KAURI framework (KDD 13) Conclusion 2015-11-12 22

Entity Linking for Tweets Motivation Twitter: important information source Beneficial for exploiting and understanding this huge corpus of valuable text data on the Web, and also helps populate and enrich the existing knowledge bases. 2015-11-12 23

Entity Linking for Tweets Problem Definition Tweet entity linking link the textual named entity mentions detected from tweets with their mapping entities existing in a knowledge base Input Output Figure: An illustration of the tweet entity linking task. Named entity mentions detected in tweets are in bold; candidate mapping entities for each entity mention are generated by a dictionary-based method and ranked by their prior probabilities in decreasing order; true mapping entities are underlined. 2015-11-12 24

Entity Linking for Tweets Challenge Challenge noisy, short, and informal nature of tweets Previous entity linking methods (EACL 06, EMNLP 07, KDD 09, SIGIR 11, EMNLP 11, and WWW 12) focus on linking entities in Web documents Context Similarity Topical Coherence Not work well 2015-11-12 25

Entity Linking for Tweets Our Solution: The KAURI Framework We can increase the linking accuracy, if we combine intra-tweet local information with inter-tweet user interest information Not work well 2015-11-12 26

Entity Linking for Tweets Our Solution: Intra-tweet Local Information The initial interest score Context similarity bag of words cosine similarity Prior probability entity frequency in the Wikipedia article corpus Topical coherence in tweet estimated using the iterative substitution algorithm proposed in our LIEGE model. Weight vector We utilize the max-margin technique to automatically learn the weight vector which gives proper weights for those three intra-tweet local features. W. Shen, J. Wang, P. Luo, and M. Wang. Liege: Link entities in web lists with knowledge base. In SIGKDD'12. 2015-11-12 27

Entity Linking for Tweets Our Solution: User interest propagation alg The final interest score vector The interest propagation strength matrix column-normalized The initial interest score vector Normalized Initialization: Then apply this formula iteratively until within some threshold stabilizes 2015-11-12 28

Entity Linking for Tweets Our Solution: Experimental Study Table: Experimental results over the data set LINDEN is our model proposed to address the task of linking entities in Web documents. W. Shen, J. Wang, P. Luo, and M. Wang. Linden: linking named entities with knowledge base via semantic knowledge. In WWW'12. 2015-11-12 29

Outline Introduction to entity linking with a knowledge base Motivation & definition Entity linking for unstructured Web documents Entity linking for structured Web lists/tables Entity linking for Tweets Conclusion 2015-11-12 30

Conclusion Entity linking is an interesting and challenging task Entity linking is very important for knowledge base population Recent progress Entity linking for Web documents (many existing work) Popularity + semantic knowledge Entity linking for Web lists (LIEGE is the first one) Coherence + iterative refining Entity linking for Tweets (a few papers) Global user interest propagation Future directions Efficient, large-scale entity linking Entity linking with domain-specific knowledge bases E.g., in the domains of computer science, biomedicine, entertainment, products, finance, tourism, etc. Crowdsourcing-based entity linking 2015-11-12 31

References L. Jiang, J. Wang, N. An, et al. GRAPE: A Graph-Based Framework for Disambiguating People Appearances in Web Search. ICDM'09 X. Fan, J. Wang, X. Pu, L. Zhou, B. Lv. On Graph-based Name Disambiguation. ACM JDIQ, 2011 W. Shen, J. Wang, P. Luo, and M. Wang. Linden: linking named entities with knowledge base via semantic knowledge, WWW '12 W. Shen, J. Wang, P. Luo, and M. Wang. Liege: Link entities in web lists with knowledge base, SIGKDD '12 W. Shen, J. Wang, P. Luo, and M. Wang. A graph-based approach for ontology population with named entities, CIKM '12 W. Shen, J. Wang, P. Luo, and M. Wang. Linking Named Entities in Tweets with Knowledge Base via User Interest Modeling, SIGKDD'13 L. Jiang, P. Luo, J. Wang, et al. GRIAS: an Entity-Relation Graph based Framework for Discovering Entity Aliases. ICDM'13 W. Shen, J. Han, J. Wang. A Probabilistic Model for Linking Named Entities in Web Text with Heterogeneous Information Networks. SIGMOD'14 2015-11-12 32

Thanks for your attention! 2015-11-12 33