S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking

Similar documents
S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking

Jan Pedersen 22 July 2010

Learning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li

Structured Learning. Jun Zhu

JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation

Introduction to Text Mining. Hongning Wang

ABC-CNN: Attention Based CNN for Visual Question Answering

Bayesian model ensembling using meta-trained recurrent neural networks

ImageNet Classification with Deep Convolutional Neural Networks

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models

Multi-Class Logistic Regression and Perceptron

Structured Perceptron. Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen

TTIC 31190: Natural Language Processing

Outline. Morning program Preliminaries Semantic matching Learning to rank Entities

CS229 Final Project: Predicting Expected Response Times

Opinion Mining by Transformation-Based Domain Adaptation

A Study of MatchPyramid Models on Ad hoc Retrieval

One-Pass Ranking Models for Low-Latency Product Recommendations

Jianyong Wang Department of Computer Science and Technology Tsinghua University

Support Vector Machines + Classification for IR

DBpedia Spotlight at the MSM2013 Challenge

NUS-I2R: Learning a Combined System for Entity Linking

A Hybrid Neural Model for Type Classification of Entity Mentions

Deep Face Recognition. Nathan Sun

Motivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)

All lecture slides will be available at CSC2515_Winter15.html

WebSci and Learning to Rank for IR

Taming Text. How to Find, Organize, and Manipulate It MANNING GRANT S. INGERSOLL THOMAS S. MORTON ANDREW L. KARRIS. Shelter Island

LSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017

CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets

Table of Contents 1 Introduction A Declarative Approach to Entity Resolution... 17

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

Personalized Web Search

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Facial Expression Classification with Random Filters Feature Extraction

CS5670: Computer Vision

A Deep Relevance Matching Model for Ad-hoc Retrieval

Multi-Label Classification with Conditional Tree-structured Bayesian Networks

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs

Lizhe Sun. November 17, Florida State University. Ranking in Statistics and Machine Learning. Lizhe Sun. Introduction

Region-based Segmentation and Object Detection

Enhancing Named Entity Recognition in Twitter Messages Using Entity Linking

Locally Adaptive Regression Kernels with (many) Applications

Mask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma

Combine the PA Algorithm with a Proximal Classifier

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018

A Dendrogram. Bioinformatics (Lec 17)

Multimodal Information Spaces for Content-based Image Retrieval

Combining Text Embedding and Knowledge Graph Embedding Techniques for Academic Search Engines

Representation Learning using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval

Discriminative classifiers for image recognition

Urban Scene Segmentation, Recognition and Remodeling. Part III. Jinglu Wang 11/24/2016 ACCV 2016 TUTORIAL

Combining Neural Networks and Log-linear Models to Improve Relation Extraction

A fully-automatic approach to answer geographic queries: GIRSA-WP at GikiP

Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class

PTE : Predictive Text Embedding through Large-scale Heterogeneous Text Networks

An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation

CSE 573: Artificial Intelligence Autumn 2010

Backpropagating through Structured Argmax using a SPIGOT

ECS289: Scalable Machine Learning

Information Retrieval

Improving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries

Collective Entity Disambiguation with Structured Gradient Tree Boosting

More on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization

Exploiting Internal and External Semantics for the Clustering of Short Texts Using World Knowledge

arxiv: v1 [cs.cv] 20 Dec 2016

Detection and Extraction of Events from s

Constrained Convolutional Neural Networks for Weakly Supervised Segmentation. Deepak Pathak, Philipp Krähenbühl and Trevor Darrell

An Algorithm based on SURF and LBP approach for Facial Expression Recognition

UMass at TREC 2017 Common Core Track

Tri-modal Human Body Segmentation

Machine Learning Classifiers and Boosting

Assessing the Quality of Natural Language Text

Model Inference and Averaging. Baging, Stacking, Random Forest, Boosting

Joint Inference in Image Databases via Dense Correspondence. Michael Rubinstein MIT CSAIL (while interning at Microsoft Research)

Conditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,

Entity and Knowledge Base-oriented Information Retrieval

Bilevel Sparse Coding

16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text

Web-based Question Answering: Revisiting AskMSR

A study of classification algorithms using Rapidminer

scikit-learn (Machine Learning in Python)

Practical Methodology. Lecture slides for Chapter 11 of Deep Learning Ian Goodfellow

CS6716 Pattern Recognition

Natural Language Processing

for Searching Social Media Posts

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.

Layerwise Interweaving Convolutional LSTM

CS6220: DATA MINING TECHNIQUES

Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set.

Conditional Random Fields for Word Hyphenation

Learning to Rank Aggregated Answers for Crossword Puzzles

Support Vector Machines 290N, 2015

Effective Latent Space Graph-based Re-ranking Model with Global Consistency

Generative and discriminative classification techniques

Locally Adaptive Regression Kernels with (many) Applications

Machine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart

Link Prediction for Social Network

Automatic Domain Partitioning for Multi-Domain Learning

Transcription:

S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking Yi Yang * and Ming-Wei Chang # * Georgia Institute of Technology, Atlanta # Microsoft Research, Redmond

Traditional NLP Settings

Traditional NLP Settings High dimensional sparse features (e.g., lexical features) Languages are naturally in high dimensional spaces. Powerful! Very expressive.

Traditional NLP Settings High dimensional sparse features (e.g., lexical features) Languages are naturally in high dimensional spaces. Powerful! Very expressive. Linear models Linear Support Vector Machine Maximize Entropy model

Traditional NLP Settings High dimensional sparse features (e.g., lexical features) Languages are naturally in high dimensional spaces. Powerful! Very expressive. Linear models Linear Support Vector Machine Maximize Entropy model Sparse features + Linear models

Rise of Dense Features

Rise of Dense Features Low dimensional embedding features

Rise of Dense Features Low dimensional embedding features Low dimensional statistics features Named mention statistics Click-through statistics

Rise of Dense Features Low dimensional embedding features Low dimensional statistics features Named mention statistics Click-through statistics Dense features + Non-linear models

Non-linear Models

Non-linear Models Neural networks

Non-linear Models Neural networks Neural Kernel Tree # of Papers in ACL 15 35 26.25 17.5 8.75 0 35

Non-linear Models Neural networks Kernel methods Tree-based models (e.g., Random Forest, Boosted Tree) Neural Kernel Tree # of Papers in ACL 15 35 26.25 17.5 8.75 0 35

Non-linear Models Neural networks Kernel methods Tree-based models (e.g., Random Forest, Boosted Tree) Neural Kernel Tree # of Papers in ACL 15 35 26.25 17.5 8.75 0 35 5 1

Tree-based Models

Tree-based Models Empirical successes Information retrieval [LambdaMART; Burges, 2010] Computer vision [Babenko et al., 2011] Real world classification [Fernandez-Delgado et al., 2014]

Tree-based Models Empirical successes Information retrieval [LambdaMART; Burges, 2010] Computer vision [Babenko et al., 2011] Real world classification [Fernandez-Delgado et al., 2014] Why tree-based models? Handle categorical features and count data better. Implicitly perform feature selection.

Contribution We present S-MART: Structured Multiple Additive Regression Trees A general class of tree-based structured learning algorithms. A friend of problems with dense features.

Contribution We present S-MART: Structured Multiple Additive Regression Trees A general class of tree-based structured learning algorithms. A friend of problems with dense features. We apply S-MART to entity linking on short and noisy texts Entity linking utilizes statistics dense features.

Contribution We present S-MART: Structured Multiple Additive Regression Trees A general class of tree-based structured learning algorithms. A friend of problems with dense features. We apply S-MART to entity linking on short and noisy texts Entity linking utilizes statistics dense features. Experimental results show that S-MART significantly outperforms all alternative baselines.

Outline S-MART: A family of Tree-based Structured Learning Algorithms S-MART for Tweet Entity Linking Non-overlapping inference Experiments

Outline S-MART: A family of Tree-based Structured Learning Algorithms S-MART for Tweet Entity Linking Non-overlapping inference Experiments

Structured Learning

Structured Learning Model a joint scoring function S(x, y) over an input structure x and an output structure y

Structured Learning Model a joint scoring function S(x, y) over an input structure x and an output structure y Obtain the prediction requires inference (e.g., dynamic programming) by = arg max y2gen(x) S(x, y)

Structured Multiple Additive Regression Trees (S-MART)

Structured Multiple Additive Regression Trees (S-MART) Assume a decomposition over factors

Structured Multiple Additive Regression Trees (S-MART) Assume a decomposition over factors Optimize with functional gradient descents

Structured Multiple Additive Regression Trees (S-MART) Assume a decomposition over factors Optimize with functional gradient descents Model functional gradients using regression trees h m (x, y k ) F (x, y k )=F M (x, y k )= MX m=1 m h m (x, y k )

Gradient Descent

Gradient Descent Linear combination of parameters and feature functions F (x, y k )=w > f(x, y k )

Gradient Descent Linear combination of parameters and feature functions F (x, y k )=w > f(x, y k ) Gradient descent in vector space w m = w m 1 m @L @w m 1

Gradient Descent Linear combination of parameters and feature functions F (x, y k )=w > f(x, y k ) Gradient descent in vector space @L w m = w m 1 m w 2 w 3 @w m 1 w 0 w 1

Gradient Descent in Function Space

Gradient Descent in Function Space F 0 (x, y k )=0

Gradient Descent in Function Space F 0 (x, y k )=0 F m 1 (x, y k )

Gradient Descent in Function Space F 0 (x, y k )=0 F m 1 (x, y k ) g m (x, y k )= apple @L(y,S(x, y k )) @F (x, y k ) F (x,y k )=F m 1 (x,y k )

Gradient Descent in Function Space F 0 (x, y k )=0 F m 1 (x, y k ) g m (x, y k ) g m (x, y k )= apple @L(y,S(x, y k )) @F (x, y k ) F (x,y k )=F m 1 (x,y k )

Gradient Descent in Function Space F 0 (x, y k )=0 Requiring Inference F m 1 (x, y k ) g m (x, y k ) g m (x, y k )= apple @L(y,S(x, y k )) @F (x, y k ) F (x,y k )=F m 1 (x,y k )

Gradient Descent in Function Space F 0 (x, y k )=0 F m (x, y k ) F m (x, y k )=F m 1 (x, y k ) m g m (x, y k )

Model Functional Gradients g m (x, y k )

Model Functional Gradients g m (x, y k )

Model Functional Gradients Pointwise Functional Gradients

Model Functional Gradients Pointwise Functional Gradients Approximation by regression Tree g m (x, y k )

Model Functional Gradients Pointwise Functional Gradients Approximation by regression Is linkprob > 0.5? Tree Is PER? -0.5 g m (x, y k ) -0.1 Is clickprob > 0.1? -0.3

S-MART vs. TreeCRF TreeCRF [Dietterich+, 2004] S-MART Structure Linear chain Various structures Loss function Logistic loss Various losses Scoring function F y t (x) F (x, y t )

S-MART vs. TreeCRF TreeCRF [Dietterich+, 2004] S-MART Structure Linear chain Various structures Loss function Logistic loss Various losses Scoring function F y t (x) F (x, y t )

S-MART vs. TreeCRF TreeCRF [Dietterich+, 2004] S-MART Structure Linear chain Various structures Loss function Logistic loss Various losses Scoring function F y t (x) F (x, y t )

S-MART vs. TreeCRF TreeCRF [Dietterich+, 2004] S-MART Structure Linear chain Various structures Loss function Logistic loss Various losses Scoring function F y t (x) F (x, y t )

S-MART vs. TreeCRF TreeCRF [Dietterich+, 2004] S-MART Structure Linear chain Various structures Loss function Logistic loss Various losses Scoring function F y t (x) F (x, y t )

Outline S-MART: A family of Tree-based Structured Learning Algorithms S-MART for Tweet Entity Linking Non-overlapping inference Experiments

Entity Linking in Short Texts

Entity Linking in Short Texts Data explosion: noisy and short texts Twitter messages Web queries

Entity Linking in Short Texts Data explosion: noisy and short texts Twitter messages Web queries Downstream applications Semantic parsing and question answering [Yih et al., 2015] Relation extraction [Riedel et al., 2013]

Tweet Entity Linking

Tweet Entity Linking

Tweet Entity Linking

Tweet Entity Linking

Entity Linking meets Dense Features

Entity Linking meets Dense Features Short of labeled data Lack of context makes annotation more challenging. Language changes, annotation may become stale and ill-suited for new spellings and words. [Yang and Eisenstein, 2013]

Entity Linking meets Dense Features Short of labeled data Lack of context makes annotation more challenging. Language changes, annotation may become stale and ill-suited for new spellings and words. [Yang and Eisenstein, 2013] Powerful statistic dense features [Guo et al., 2013] The probability of a surface form to be an entity View count of a Wikipedia page Textual similarity between a tweet and a Wikipedia page

System Overview Tokenized Message Candidate Generation Joint Recognition and Disambiguation Entity Linking Results

System Overview Tokenized Message Candidate Generation Joint Recognition and Disambiguation Entity Linking Results Structured learning: select the best non-overlapping entity assignment Choose top 20 entity candidates for each surface form Add a special NIL entity to represent no entity should be fired here

System Overview Tokenized Message Candidate Generation Joint Recognition and Disambiguation Entity Linking Results Structured learning: select the best non-overlapping entity assignment Choose top 20 entity candidates for each surface form Add a special NIL entity to represent no entity should be fired here

System Overview Tokenized Message Candidate Generation Joint Recognition and Disambiguation Entity Linking Results Structured learning: select the best non-overlapping entity assignment Choose top 20 entity candidates for each surface form Add a special NIL entity to represent no entity should be fired here Eli Manning and the New York Giants are going to win the World Series

System Overview Tokenized Message Candidate Generation Joint Recognition and Disambiguation Entity Linking Results Structured learning: select the best non-overlapping entity assignment Choose top 20 entity candidates for each surface form Add a special NIL entity to represent no entity should be fired here Eli Manning and the New York Giants are going to win the World Series NIL

System Overview Tokenized Message Candidate Generation Joint Recognition and Disambiguation Entity Linking Results Structured learning: select the best non-overlapping entity assignment Choose top 20 entity candidates for each surface form Add a special NIL entity to represent no entity should be fired here Eli Manning and the New York Giants are going to win the World Series NIL

System Overview Tokenized Message Candidate Generation Joint Recognition and Disambiguation Entity Linking Results Structured learning: select the best non-overlapping entity assignment Choose top 20 entity candidates for each surface form Add a special NIL entity to represent no entity should be fired here Eli Manning and the New York Giants are going to win the World Series Eli_Manning

System Overview Tokenized Message Candidate Generation Joint Recognition and Disambiguation Entity Linking Results Structured learning: select the best non-overlapping entity assignment Choose top 20 entity candidates for each surface form Add a special NIL entity to represent no entity should be fired here Eli Manning and the New York Giants are going to win the World Series Eli_Manning New_York_Giants World_Series

S-MART for Tweet Entity Linking

S-MART for Tweet Entity Linking Logistic loss L(y,S(x, y)) = log P (y x) = log Z(x) S(x, y )

S-MART for Tweet Entity Linking Logistic loss L(y,S(x, y)) = log P (y x) = log Z(x) S(x, y ) Point-wise gradients @L g ku = @F (x,y k = u k ) = P (y k = u k x) 1[yk = u k ]

S-MART for Tweet Entity Linking Logistic loss L(y,S(x, y)) = log P (y x) = log Z(x) S(x, y ) Point-wise gradients g ku = @L @F (x,y k = u k ) Non-overlapping Inference = P (y k = u k x) 1[y k = u k ]

Inference: Forward Algorithm Eli Manning and the New York Giants are going to win the World Series Eli Eli Manning Manning New New York York New York Giants win World World Series Series Giants

Inference: Forward Algorithm Eli Manning and the New York Giants are going to win the World Series Eli Eli Manning Manning New New York York New York Giants win World World Series Series Giants

Inference: Forward Algorithm Eli Manning and the New York Giants are going to win the World Series Eli Eli Manning Manning New New York York New York Giants win World World Series Series Giants

Inference: Forward Algorithm Eli Manning and the New York Giants are going to win the World Series Eli Eli Manning Manning New New York York New York Giants win World World Series Series Giants

Inference: Forward Algorithm Eli Manning and the New York Giants are going to win the World Series Eli Eli Manning Manning New New York York New York Giants win World World Series Series Giants

Inference: Forward Algorithm Eli Manning and the New York Giants are going to win the World Series Eli Eli Manning Manning New New York York New York Giants win World World Series Series Giants

Inference: Backward Algorithm Eli Manning and the New York Giants are going to win the World Series Eli Eli Manning Manning New New York York New York Giants win World World Series Series Giants

Inference: Backward Algorithm Eli Manning and the New York Giants are going to win the World Series Eli Eli Manning Manning New New York New York Giants win World World Series Series York Giants

Inference: Backward Algorithm Eli Manning and the New York Giants are going to win the World Series Eli Eli Manning Manning New New York New York Giants win World World Series Series York Giants

Inference: Backward Algorithm Eli Manning and the New York Giants are going to win the World Series Eli Eli Manning Manning New New York New York Giants win World World Series Series York Giants

Inference: Backward Algorithm Eli Manning and the New York Giants are going to win the World Series Eli Eli Manning Manning New New York New York Giants win World World Series Series York Giants

Outline S-MART: A family of Tree-based Structured Learning Algorithms S-MART for Tweet Entity Linking Non-overlapping inference Experiments

Data Named Entity Extraction & Linking (NEEL) Challenge datasets [Cano et al., 2014] TACL datasets [Fang & Chang, 2014]

Data Named Entity Extraction & Linking (NEEL) Challenge datasets [Cano et al., 2014] TACL datasets [Fang & Chang, 2014] Data #Tweet #Entity Date NEEL Train 2,340 2,202 Jul. ~ Aug. 11 NEEL Test 1,164 687 Jul. ~ Aug. 11 TACL-IE 500 300 Dec. 12 TACL-IR 980 - Dec. 12

Evaluation Methodology IE-driven Evaluation [Guo et al., 2013] Standard evaluation of the system ability on extracting entities from tweets Metric: macro F-score

Evaluation Methodology IE-driven Evaluation [Guo et al., 2013] Standard evaluation of the system ability on extracting entities from tweets Metric: macro F-score IR-driven Evaluation [Fang & Chang, 2014] Evaluation of the system ability on disambiguation of the target entities in tweets Metric: macro F-score on query entities

Algorithms Structured Perceptron Linear SSVM * Polynomial SSVM LambdaRank MART # S-MART Structured Non-linear Tree-based * previous state of the art system # winning system of NEEL challenge 2014

IE-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART 85 80 0.763 0.665 NEEL Test F1 75 70 73.2 74.6 75.5 77.4 65

IE-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART 85 80 0.763 0.665 NEEL Test F1 75 70 70.9 73.2 74.6 75.5 77.4 65

IE-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART 85 80 0.763 0.665 NEEL Test F1 75 70 70.9 Linear 73.2 74.6 75.5 77.4 65

IE-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART 85 80 0.763 0.665 NEEL Test F1 75 70 70.9 Linear 73.2 74.6 75.5 77.4 65

IE-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART 85 80 0.763 0.665 NEEL Test F1 75 70 70.9 Linear 73.2 Non-linear 74.6 75.5 77.4 65

IE-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART 85 80 0.763 0.665 NEEL Test F1 75 70 70.9 Linear 73.2 74.6 75.5 77.4 65

IE-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART 85 80 Neural based model 0.763 0.665 NEEL Test F1 75 70 70.9 Linear 73.2 74.6 75.5 77.4 65

IE-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART 85 80 0.763 0.665 NEEL Test F1 75 70 70.9 Linear 73.2 74.6 75.5 77.4 65

IE-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART 85 80 Kernel based 0.763 model 0.665 NEEL Test F1 75 70 70.9 Linear 73.2 74.6 75.5 77.4 65

IE-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART 85 80 0.763 0.665 NEEL Test F1 75 70 70.9 Linear 73.2 74.6 75.5 77.4 65

IE-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART 85 0.763 Tree based model 0.665 80 NEEL Test F1 75 70 70.9 Linear 73.2 74.6 75.5 77.4 65

IE-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART 85 0.763 0.665 80 81.1 NEEL Test F1 75 70 70.9 Linear 73.2 74.6 75.5 77.4 65

IE-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART 85 0.763 Tree based structured model 0.665 80 81.1 NEEL Test F1 75 70 70.9 Linear 73.2 74.6 75.5 77.4 65

IR-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART 70 0.763 65 67.4 0.665 TACL-IR F1 60 55 58.0 62.2 63.6 63.0 50

IR-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART 70 0.763 65 67.4 0.665 TACL-IR F1 60 55 58.0 62.2 63.6 63.0 50

IR-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART 70 0.763 65 67.4 0.665 TACL-IR F1 60 55 58.0 62.2 63.6 56.8 63.0 50

IR-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART 70 0.763 65 67.4 0.665 TACL-IR F1 60 55 58.0 62.2 63.6 56.8 63.0 50

IR-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART 70 0.763 65 67.4 0.665 TACL-IR F1 60 55 58.0 62.2 63.6 56.8 63.0 50

IR-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART 70 0.763 TACL-IR F1 65 60 55 58.0 Linear 62.2 63.6 56.8 63.0 67.4 0.665 50

IR-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART TACL-IR F1 70 65 60 55 58.0 Linear 62.2 0.763 63.6 Non-linear 56.8 63.0 67.4 0.665 50

Conclusion

Conclusion A novel tree-based structured learning framework S-MART Generalization of TreeCRF

Conclusion A novel tree-based structured learning framework S-MART Generalization of TreeCRF A novel inference algorithm for non-overlapping structure of the tweet entity linking task.

Conclusion A novel tree-based structured learning framework S-MART Generalization of TreeCRF A novel inference algorithm for non-overlapping structure of the tweet entity linking task. Application: Knowledge base QA (outstanding paper of ACL 15) Our system is a core component of the QA system.

Conclusion A novel tree-based structured learning framework S-MART Generalization of TreeCRF A novel inference algorithm for non-overlapping structure of the tweet entity linking task. Application: Knowledge base QA (outstanding paper of ACL 15) Our system is a core component of the QA system. Rise of non-linear models We can try advanced neural based structured algorithms It s worth to try different non-linear models

Thank you!