S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking
|
|
- Prudence Harris
- 5 years ago
- Views:
Transcription
1 S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking Yi Yang * and Ming-Wei Chang # * Georgia Institute of Technology, Atlanta # Microsoft Research, Redmond
2 Traditional NLP Settings
3 Traditional NLP Settings High dimensional sparse features (e.g., lexical features) Languages are naturally in high dimensional spaces. Powerful! Very expressive.
4 Traditional NLP Settings High dimensional sparse features (e.g., lexical features) Languages are naturally in high dimensional spaces. Powerful! Very expressive. Linear models Linear Support Vector Machine Maximize Entropy model
5 Traditional NLP Settings High dimensional sparse features (e.g., lexical features) Languages are naturally in high dimensional spaces. Powerful! Very expressive. Linear models Linear Support Vector Machine Maximize Entropy model Sparse features + Linear models
6 Rise of Dense Features
7 Rise of Dense Features Low dimensional embedding features
8 Rise of Dense Features Low dimensional embedding features Low dimensional statistics features Named mention statistics Click-through statistics
9 Rise of Dense Features Low dimensional embedding features Low dimensional statistics features Named mention statistics Click-through statistics Dense features + Non-linear models
10 Non-linear Models
11 Non-linear Models Neural networks
12 Non-linear Models Neural networks Neural Kernel Tree # of Papers in ACL
13 Non-linear Models Neural networks Kernel methods Tree-based models (e.g., Random Forest, Boosted Tree) Neural Kernel Tree # of Papers in ACL
14 Non-linear Models Neural networks Kernel methods Tree-based models (e.g., Random Forest, Boosted Tree) Neural Kernel Tree # of Papers in ACL
15 Tree-based Models
16 Tree-based Models Empirical successes Information retrieval [LambdaMART; Burges, 2010] Computer vision [Babenko et al., 2011] Real world classification [Fernandez-Delgado et al., 2014]
17 Tree-based Models Empirical successes Information retrieval [LambdaMART; Burges, 2010] Computer vision [Babenko et al., 2011] Real world classification [Fernandez-Delgado et al., 2014] Why tree-based models? Handle categorical features and count data better. Implicitly perform feature selection.
18 Contribution We present S-MART: Structured Multiple Additive Regression Trees A general class of tree-based structured learning algorithms. A friend of problems with dense features.
19 Contribution We present S-MART: Structured Multiple Additive Regression Trees A general class of tree-based structured learning algorithms. A friend of problems with dense features. We apply S-MART to entity linking on short and noisy texts Entity linking utilizes statistics dense features.
20 Contribution We present S-MART: Structured Multiple Additive Regression Trees A general class of tree-based structured learning algorithms. A friend of problems with dense features. We apply S-MART to entity linking on short and noisy texts Entity linking utilizes statistics dense features. Experimental results show that S-MART significantly outperforms all alternative baselines.
21 Outline S-MART: A family of Tree-based Structured Learning Algorithms S-MART for Tweet Entity Linking Non-overlapping inference Experiments
22 Outline S-MART: A family of Tree-based Structured Learning Algorithms S-MART for Tweet Entity Linking Non-overlapping inference Experiments
23 Structured Learning
24 Structured Learning Model a joint scoring function S(x, y) over an input structure x and an output structure y
25 Structured Learning Model a joint scoring function S(x, y) over an input structure x and an output structure y Obtain the prediction requires inference (e.g., dynamic programming) by = arg max y2gen(x) S(x, y)
26 Structured Multiple Additive Regression Trees (S-MART)
27 Structured Multiple Additive Regression Trees (S-MART) Assume a decomposition over factors
28 Structured Multiple Additive Regression Trees (S-MART) Assume a decomposition over factors Optimize with functional gradient descents
29 Structured Multiple Additive Regression Trees (S-MART) Assume a decomposition over factors Optimize with functional gradient descents Model functional gradients using regression trees h m (x, y k ) F (x, y k )=F M (x, y k )= MX m=1 m h m (x, y k )
30 Gradient Descent
31 Gradient Descent Linear combination of parameters and feature functions F (x, y k )=w > f(x, y k )
32 Gradient Descent Linear combination of parameters and feature functions F (x, y k )=w > f(x, y k ) Gradient descent in vector space w m = w m m 1
33 Gradient Descent Linear combination of parameters and feature functions F (x, y k )=w > f(x, y k ) Gradient descent in vector w m = w m 1 m w 2 w m 1 w 0 w 1
34 Gradient Descent in Function Space
35 Gradient Descent in Function Space F 0 (x, y k )=0
36 Gradient Descent in Function Space F 0 (x, y k )=0 F m 1 (x, y k )
37 Gradient Descent in Function Space F 0 (x, y k )=0 F m 1 (x, y k ) g m (x, y k )= y k (x, y k ) F (x,y k )=F m 1 (x,y k )
38 Gradient Descent in Function Space F 0 (x, y k )=0 F m 1 (x, y k ) g m (x, y k ) g m (x, y k )= y k (x, y k ) F (x,y k )=F m 1 (x,y k )
39 Gradient Descent in Function Space F 0 (x, y k )=0 Requiring Inference F m 1 (x, y k ) g m (x, y k ) g m (x, y k )= y k (x, y k ) F (x,y k )=F m 1 (x,y k )
40 Gradient Descent in Function Space F 0 (x, y k )=0 F m (x, y k ) F m (x, y k )=F m 1 (x, y k ) m g m (x, y k )
41 Model Functional Gradients g m (x, y k )
42 Model Functional Gradients g m (x, y k )
43 Model Functional Gradients Pointwise Functional Gradients
44 Model Functional Gradients Pointwise Functional Gradients Approximation by regression Tree g m (x, y k )
45 Model Functional Gradients Pointwise Functional Gradients Approximation by regression Is linkprob > 0.5? Tree Is PER? -0.5 g m (x, y k ) -0.1 Is clickprob > 0.1? -0.3
46 S-MART vs. TreeCRF TreeCRF [Dietterich+, 2004] S-MART Structure Linear chain Various structures Loss function Logistic loss Various losses Scoring function F y t (x) F (x, y t )
47 S-MART vs. TreeCRF TreeCRF [Dietterich+, 2004] S-MART Structure Linear chain Various structures Loss function Logistic loss Various losses Scoring function F y t (x) F (x, y t )
48 S-MART vs. TreeCRF TreeCRF [Dietterich+, 2004] S-MART Structure Linear chain Various structures Loss function Logistic loss Various losses Scoring function F y t (x) F (x, y t )
49 S-MART vs. TreeCRF TreeCRF [Dietterich+, 2004] S-MART Structure Linear chain Various structures Loss function Logistic loss Various losses Scoring function F y t (x) F (x, y t )
50 S-MART vs. TreeCRF TreeCRF [Dietterich+, 2004] S-MART Structure Linear chain Various structures Loss function Logistic loss Various losses Scoring function F y t (x) F (x, y t )
51 Outline S-MART: A family of Tree-based Structured Learning Algorithms S-MART for Tweet Entity Linking Non-overlapping inference Experiments
52 Entity Linking in Short Texts
53 Entity Linking in Short Texts Data explosion: noisy and short texts Twitter messages Web queries
54 Entity Linking in Short Texts Data explosion: noisy and short texts Twitter messages Web queries Downstream applications Semantic parsing and question answering [Yih et al., 2015] Relation extraction [Riedel et al., 2013]
55 Tweet Entity Linking
56 Tweet Entity Linking
57 Tweet Entity Linking
58 Tweet Entity Linking
59 Entity Linking meets Dense Features
60 Entity Linking meets Dense Features Short of labeled data Lack of context makes annotation more challenging. Language changes, annotation may become stale and ill-suited for new spellings and words. [Yang and Eisenstein, 2013]
61 Entity Linking meets Dense Features Short of labeled data Lack of context makes annotation more challenging. Language changes, annotation may become stale and ill-suited for new spellings and words. [Yang and Eisenstein, 2013] Powerful statistic dense features [Guo et al., 2013] The probability of a surface form to be an entity View count of a Wikipedia page Textual similarity between a tweet and a Wikipedia page
62 System Overview Tokenized Message Candidate Generation Joint Recognition and Disambiguation Entity Linking Results
63 System Overview Tokenized Message Candidate Generation Joint Recognition and Disambiguation Entity Linking Results Structured learning: select the best non-overlapping entity assignment Choose top 20 entity candidates for each surface form Add a special NIL entity to represent no entity should be fired here
64 System Overview Tokenized Message Candidate Generation Joint Recognition and Disambiguation Entity Linking Results Structured learning: select the best non-overlapping entity assignment Choose top 20 entity candidates for each surface form Add a special NIL entity to represent no entity should be fired here
65 System Overview Tokenized Message Candidate Generation Joint Recognition and Disambiguation Entity Linking Results Structured learning: select the best non-overlapping entity assignment Choose top 20 entity candidates for each surface form Add a special NIL entity to represent no entity should be fired here Eli Manning and the New York Giants are going to win the World Series
66 System Overview Tokenized Message Candidate Generation Joint Recognition and Disambiguation Entity Linking Results Structured learning: select the best non-overlapping entity assignment Choose top 20 entity candidates for each surface form Add a special NIL entity to represent no entity should be fired here Eli Manning and the New York Giants are going to win the World Series NIL
67 System Overview Tokenized Message Candidate Generation Joint Recognition and Disambiguation Entity Linking Results Structured learning: select the best non-overlapping entity assignment Choose top 20 entity candidates for each surface form Add a special NIL entity to represent no entity should be fired here Eli Manning and the New York Giants are going to win the World Series NIL
68 System Overview Tokenized Message Candidate Generation Joint Recognition and Disambiguation Entity Linking Results Structured learning: select the best non-overlapping entity assignment Choose top 20 entity candidates for each surface form Add a special NIL entity to represent no entity should be fired here Eli Manning and the New York Giants are going to win the World Series Eli_Manning
69 System Overview Tokenized Message Candidate Generation Joint Recognition and Disambiguation Entity Linking Results Structured learning: select the best non-overlapping entity assignment Choose top 20 entity candidates for each surface form Add a special NIL entity to represent no entity should be fired here Eli Manning and the New York Giants are going to win the World Series Eli_Manning New_York_Giants World_Series
70 S-MART for Tweet Entity Linking
71 S-MART for Tweet Entity Linking Logistic loss L(y,S(x, y)) = log P (y x) = log Z(x) S(x, y )
72 S-MART for Tweet Entity Linking Logistic loss L(y,S(x, y)) = log P (y x) = log Z(x) S(x, y ) Point-wise g ku (x,y k = u k ) = P (y k = u k x) 1[yk = u k ]
73 S-MART for Tweet Entity Linking Logistic loss L(y,S(x, y)) = log P (y x) = log Z(x) S(x, y ) Point-wise gradients g (x,y k = u k ) Non-overlapping Inference = P (y k = u k x) 1[y k = u k ]
74 Inference: Forward Algorithm Eli Manning and the New York Giants are going to win the World Series Eli Eli Manning Manning New New York York New York Giants win World World Series Series Giants
75 Inference: Forward Algorithm Eli Manning and the New York Giants are going to win the World Series Eli Eli Manning Manning New New York York New York Giants win World World Series Series Giants
76 Inference: Forward Algorithm Eli Manning and the New York Giants are going to win the World Series Eli Eli Manning Manning New New York York New York Giants win World World Series Series Giants
77 Inference: Forward Algorithm Eli Manning and the New York Giants are going to win the World Series Eli Eli Manning Manning New New York York New York Giants win World World Series Series Giants
78 Inference: Forward Algorithm Eli Manning and the New York Giants are going to win the World Series Eli Eli Manning Manning New New York York New York Giants win World World Series Series Giants
79 Inference: Forward Algorithm Eli Manning and the New York Giants are going to win the World Series Eli Eli Manning Manning New New York York New York Giants win World World Series Series Giants
80 Inference: Backward Algorithm Eli Manning and the New York Giants are going to win the World Series Eli Eli Manning Manning New New York York New York Giants win World World Series Series Giants
81 Inference: Backward Algorithm Eli Manning and the New York Giants are going to win the World Series Eli Eli Manning Manning New New York New York Giants win World World Series Series York Giants
82 Inference: Backward Algorithm Eli Manning and the New York Giants are going to win the World Series Eli Eli Manning Manning New New York New York Giants win World World Series Series York Giants
83 Inference: Backward Algorithm Eli Manning and the New York Giants are going to win the World Series Eli Eli Manning Manning New New York New York Giants win World World Series Series York Giants
84 Inference: Backward Algorithm Eli Manning and the New York Giants are going to win the World Series Eli Eli Manning Manning New New York New York Giants win World World Series Series York Giants
85 Outline S-MART: A family of Tree-based Structured Learning Algorithms S-MART for Tweet Entity Linking Non-overlapping inference Experiments
86 Data Named Entity Extraction & Linking (NEEL) Challenge datasets [Cano et al., 2014] TACL datasets [Fang & Chang, 2014]
87 Data Named Entity Extraction & Linking (NEEL) Challenge datasets [Cano et al., 2014] TACL datasets [Fang & Chang, 2014] Data #Tweet #Entity Date NEEL Train 2,340 2,202 Jul. ~ Aug. 11 NEEL Test 1, Jul. ~ Aug. 11 TACL-IE Dec. 12 TACL-IR Dec. 12
88 Evaluation Methodology IE-driven Evaluation [Guo et al., 2013] Standard evaluation of the system ability on extracting entities from tweets Metric: macro F-score
89 Evaluation Methodology IE-driven Evaluation [Guo et al., 2013] Standard evaluation of the system ability on extracting entities from tweets Metric: macro F-score IR-driven Evaluation [Fang & Chang, 2014] Evaluation of the system ability on disambiguation of the target entities in tweets Metric: macro F-score on query entities
90 Algorithms Structured Perceptron Linear SSVM * Polynomial SSVM LambdaRank MART # S-MART Structured Non-linear Tree-based * previous state of the art system # winning system of NEEL challenge 2014
91 IE-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART NEEL Test F
92 IE-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART NEEL Test F
93 IE-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART NEEL Test F Linear
94 IE-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART NEEL Test F Linear
95 IE-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART NEEL Test F Linear 73.2 Non-linear
96 IE-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART NEEL Test F Linear
97 IE-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART Neural based model NEEL Test F Linear
98 IE-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART NEEL Test F Linear
99 IE-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART Kernel based model NEEL Test F Linear
100 IE-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART NEEL Test F Linear
101 IE-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART Tree based model NEEL Test F Linear
102 IE-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART NEEL Test F Linear
103 IE-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART Tree based structured model NEEL Test F Linear
104 IR-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART TACL-IR F
105 IR-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART TACL-IR F
106 IR-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART TACL-IR F
107 IR-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART TACL-IR F
108 IR-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART TACL-IR F
109 IR-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART TACL-IR F Linear
110 IR-driven Evaluation SP Linear SSVM Poly SSVM LambdaRank MART S-MART TACL-IR F Linear Non-linear
111 Conclusion
112 Conclusion A novel tree-based structured learning framework S-MART Generalization of TreeCRF
113 Conclusion A novel tree-based structured learning framework S-MART Generalization of TreeCRF A novel inference algorithm for non-overlapping structure of the tweet entity linking task.
114 Conclusion A novel tree-based structured learning framework S-MART Generalization of TreeCRF A novel inference algorithm for non-overlapping structure of the tweet entity linking task. Application: Knowledge base QA (outstanding paper of ACL 15) Our system is a core component of the QA system.
115 Conclusion A novel tree-based structured learning framework S-MART Generalization of TreeCRF A novel inference algorithm for non-overlapping structure of the tweet entity linking task. Application: Knowledge base QA (outstanding paper of ACL 15) Our system is a core component of the QA system. Rise of non-linear models We can try advanced neural based structured algorithms It s worth to try different non-linear models
116 Thank you!
S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking
S-MART: Novel Tree-based Structured Learning Algorithms Applied to Tweet Entity Linking Yi Yang School of Interactive Computing Georgia Institute of Technology yiyang@gatech.edu Ming-Wei Chang Microsoft
More informationJan Pedersen 22 July 2010
Jan Pedersen 22 July 2010 Outline Problem Statement Best effort retrieval vs automated reformulation Query Evaluation Architecture Query Understanding Models Data Sources Standard IR Assumptions Queries
More informationLearning to Match. Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li
Learning to Match Jun Xu, Zhengdong Lu, Tianqi Chen, Hang Li 1. Introduction The main tasks in many applications can be formalized as matching between heterogeneous objects, including search, recommendation,
More informationStructured Learning. Jun Zhu
Structured Learning Jun Zhu Supervised learning Given a set of I.I.D. training samples Learn a prediction function b r a c e Supervised learning (cont d) Many different choices Logistic Regression Maximum
More informationJOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS. Puyang Xu, Ruhi Sarikaya. Microsoft Corporation
JOINT INTENT DETECTION AND SLOT FILLING USING CONVOLUTIONAL NEURAL NETWORKS Puyang Xu, Ruhi Sarikaya Microsoft Corporation ABSTRACT We describe a joint model for intent detection and slot filling based
More informationIntroduction to Text Mining. Hongning Wang
Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:
More informationABC-CNN: Attention Based CNN for Visual Question Answering
ABC-CNN: Attention Based CNN for Visual Question Answering CIS 601 PRESENTED BY: MAYUR RUMALWALA GUIDED BY: DR. SUNNIE CHUNG AGENDA Ø Introduction Ø Understanding CNN Ø Framework of ABC-CNN Ø Datasets
More informationBayesian model ensembling using meta-trained recurrent neural networks
Bayesian model ensembling using meta-trained recurrent neural networks Luca Ambrogioni l.ambrogioni@donders.ru.nl Umut Güçlü u.guclu@donders.ru.nl Yağmur Güçlütürk y.gucluturk@donders.ru.nl Julia Berezutskaya
More informationImageNet Classification with Deep Convolutional Neural Networks
ImageNet Classification with Deep Convolutional Neural Networks Alex Krizhevsky Ilya Sutskever Geoffrey Hinton University of Toronto Canada Paper with same name to appear in NIPS 2012 Main idea Architecture
More informationLarge-Scale Lasso and Elastic-Net Regularized Generalized Linear Models
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models DB Tsai Steven Hillion Outline Introduction Linear / Nonlinear Classification Feature Engineering - Polynomial Expansion Big-data
More informationMulti-Class Logistic Regression and Perceptron
Multi-Class Logistic Regression and Perceptron Instructor: Wei Xu Some slides adapted from Dan Jurfasky, Brendan O Connor and Marine Carpuat MultiClass Classification Q: what if we have more than 2 categories?
More informationStructured Perceptron. Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen
Structured Perceptron Ye Qiu, Xinghui Lu, Yue Lu, Ruofei Shen 1 Outline 1. 2. 3. 4. Brief review of perceptron Structured Perceptron Discriminative Training Methods for Hidden Markov Models: Theory and
More informationTTIC 31190: Natural Language Processing
TTIC 31190: Natural Language Processing Kevin Gimpel Winter 2016 Lecture 2: Text Classification 1 Please email me (kgimpel@ttic.edu) with the following: your name your email address whether you taking
More informationOutline. Morning program Preliminaries Semantic matching Learning to rank Entities
112 Outline Morning program Preliminaries Semantic matching Learning to rank Afternoon program Modeling user behavior Generating responses Recommender systems Industry insights Q&A 113 are polysemic Finding
More informationCS229 Final Project: Predicting Expected Response Times
CS229 Final Project: Predicting Expected Email Response Times Laura Cruz-Albrecht (lcruzalb), Kevin Khieu (kkhieu) December 15, 2017 1 Introduction Each day, countless emails are sent out, yet the time
More informationOpinion Mining by Transformation-Based Domain Adaptation
Opinion Mining by Transformation-Based Domain Adaptation Róbert Ormándi, István Hegedűs, and Richárd Farkas University of Szeged, Hungary {ormandi,ihegedus,rfarkas}@inf.u-szeged.hu Abstract. Here we propose
More informationA Study of MatchPyramid Models on Ad hoc Retrieval
A Study of MatchPyramid Models on Ad hoc Retrieval Liang Pang, Yanyan Lan, Jiafeng Guo, Jun Xu, Xueqi Cheng Institute of Computing Technology, Chinese Academy of Sciences Text Matching Many text based
More informationOne-Pass Ranking Models for Low-Latency Product Recommendations
One-Pass Ranking Models for Low-Latency Product Recommendations Martin Saveski @msaveski MIT (Amazon Berlin) One-Pass Ranking Models for Low-Latency Product Recommendations Amazon Machine Learning Team,
More informationJianyong Wang Department of Computer Science and Technology Tsinghua University
Jianyong Wang Department of Computer Science and Technology Tsinghua University jianyong@tsinghua.edu.cn Joint work with Wei Shen (Tsinghua), Ping Luo (HP), and Min Wang (HP) Outline Introduction to entity
More informationSupport Vector Machines + Classification for IR
Support Vector Machines + Classification for IR Pierre Lison University of Oslo, Dep. of Informatics INF3800: Søketeknologi April 30, 2014 Outline of the lecture Recap of last week Support Vector Machines
More informationDBpedia Spotlight at the MSM2013 Challenge
DBpedia Spotlight at the MSM2013 Challenge Pablo N. Mendes 1, Dirk Weissenborn 2, and Chris Hokamp 3 1 Kno.e.sis Center, CSE Dept., Wright State University 2 Dept. of Comp. Sci., Dresden Univ. of Tech.
More informationNUS-I2R: Learning a Combined System for Entity Linking
NUS-I2R: Learning a Combined System for Entity Linking Wei Zhang Yan Chuan Sim Jian Su Chew Lim Tan School of Computing National University of Singapore {z-wei, tancl} @comp.nus.edu.sg Institute for Infocomm
More informationA Hybrid Neural Model for Type Classification of Entity Mentions
A Hybrid Neural Model for Type Classification of Entity Mentions Motivation Types group entities to categories Entity types are important for various NLP tasks Our task: predict an entity mention s type
More informationDeep Face Recognition. Nathan Sun
Deep Face Recognition Nathan Sun Why Facial Recognition? Picture ID or video tracking Higher Security for Facial Recognition Software Immensely useful to police in tracking suspects Your face will be an
More informationMotivation: Shortcomings of Hidden Markov Model. Ko, Youngjoong. Solution: Maximum Entropy Markov Model (MEMM)
Motivation: Shortcomings of Hidden Markov Model Maximum Entropy Markov Models and Conditional Random Fields Ko, Youngjoong Dept. of Computer Engineering, Dong-A University Intelligent System Laboratory,
More informationAll lecture slides will be available at CSC2515_Winter15.html
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 9: Support Vector Machines All lecture slides will be available at http://www.cs.toronto.edu/~urtasun/courses/csc2515/ CSC2515_Winter15.html Many
More informationWebSci and Learning to Rank for IR
WebSci and Learning to Rank for IR Ernesto Diaz-Aviles L3S Research Center. Hannover, Germany diaz@l3s.de Ernesto Diaz-Aviles www.l3s.de 1/16 Motivation: Information Explosion Ernesto Diaz-Aviles
More informationTaming Text. How to Find, Organize, and Manipulate It MANNING GRANT S. INGERSOLL THOMAS S. MORTON ANDREW L. KARRIS. Shelter Island
Taming Text How to Find, Organize, and Manipulate It GRANT S. INGERSOLL THOMAS S. MORTON ANDREW L. KARRIS 11 MANNING Shelter Island contents foreword xiii preface xiv acknowledgments xvii about this book
More informationLSTM for Language Translation and Image Captioning. Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia
1 LSTM for Language Translation and Image Captioning Tel Aviv University Deep Learning Seminar Oran Gafni & Noa Yedidia 2 Part I LSTM for Language Translation Motivation Background (RNNs, LSTMs) Model
More informationCPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017
CPSC 340: Machine Learning and Data Mining More Linear Classifiers Fall 2017 Admin Assignment 3: Due Friday of next week. Midterm: Can view your exam during instructor office hours next week, or after
More informationCIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets
CIRGDISCO at RepLab2012 Filtering Task: A Two-Pass Approach for Company Name Disambiguation in Tweets Arjumand Younus 1,2, Colm O Riordan 1, and Gabriella Pasi 2 1 Computational Intelligence Research Group,
More informationTable of Contents 1 Introduction A Declarative Approach to Entity Resolution... 17
Table of Contents 1 Introduction...1 1.1 Common Problem...1 1.2 Data Integration and Data Management...3 1.2.1 Information Quality Overview...3 1.2.2 Customer Data Integration...4 1.2.3 Data Management...8
More informationSEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic
SEMANTIC COMPUTING Lecture 8: Introduction to Deep Learning Dagmar Gromann International Center For Computational Logic TU Dresden, 7 December 2018 Overview Introduction Deep Learning General Neural Networks
More informationPersonalized Web Search
Personalized Web Search Dhanraj Mavilodan (dhanrajm@stanford.edu), Kapil Jaisinghani (kjaising@stanford.edu), Radhika Bansal (radhika3@stanford.edu) Abstract: With the increase in the diversity of contents
More informationPreface to the Second Edition. Preface to the First Edition. 1 Introduction 1
Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches
More informationFacial Expression Classification with Random Filters Feature Extraction
Facial Expression Classification with Random Filters Feature Extraction Mengye Ren Facial Monkey mren@cs.toronto.edu Zhi Hao Luo It s Me lzh@cs.toronto.edu I. ABSTRACT In our work, we attempted to tackle
More informationCS5670: Computer Vision
CS5670: Computer Vision Noah Snavely Lecture 33: Recognition Basics Slides from Andrej Karpathy and Fei-Fei Li http://vision.stanford.edu/teaching/cs231n/ Announcements Quiz moved to Tuesday Project 4
More informationA Deep Relevance Matching Model for Ad-hoc Retrieval
A Deep Relevance Matching Model for Ad-hoc Retrieval Jiafeng Guo 1, Yixing Fan 1, Qingyao Ai 2, W. Bruce Croft 2 1 CAS Key Lab of Web Data Science and Technology, Institute of Computing Technology, Chinese
More informationMulti-Label Classification with Conditional Tree-structured Bayesian Networks
Multi-Label Classification with Conditional Tree-structured Bayesian Networks Original work: Batal, I., Hong C., and Hauskrecht, M. An Efficient Probabilistic Framework for Multi-Dimensional Classification.
More informationDeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution and Fully Connected CRFs Zhipeng Yan, Moyuan Huang, Hao Jiang 5/1/2017 1 Outline Background semantic segmentation Objective,
More informationLizhe Sun. November 17, Florida State University. Ranking in Statistics and Machine Learning. Lizhe Sun. Introduction
in in Florida State University November 17, 2017 Framework in 1. our life 2. Early work: Model Examples 3. webpage Web page search modeling Data structure Data analysis with machine learning algorithms
More informationRegion-based Segmentation and Object Detection
Region-based Segmentation and Object Detection Stephen Gould Tianshi Gao Daphne Koller Presented at NIPS 2009 Discussion and Slides by Eric Wang April 23, 2010 Outline Introduction Model Overview Model
More informationEnhancing Named Entity Recognition in Twitter Messages Using Entity Linking
Enhancing Named Entity Recognition in Twitter Messages Using Entity Linking Ikuya Yamada 1 2 3 Hideaki Takeda 2 Yoshiyasu Takefuji 3 ikuya@ousia.jp takeda@nii.ac.jp takefuji@sfc.keio.ac.jp 1 Studio Ousia,
More informationLocally Adaptive Regression Kernels with (many) Applications
Locally Adaptive Regression Kernels with (many) Applications Peyman Milanfar EE Department University of California, Santa Cruz Joint work with Hiro Takeda, Hae Jong Seo, Xiang Zhu Outline Introduction/Motivation
More informationMask R-CNN. presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma
Mask R-CNN presented by Jiageng Zhang, Jingyao Zhan, Yunhan Ma Mask R-CNN Background Related Work Architecture Experiment Mask R-CNN Background Related Work Architecture Experiment Background From left
More informationCombine the PA Algorithm with a Proximal Classifier
Combine the Passive and Aggressive Algorithm with a Proximal Classifier Yuh-Jye Lee Joint work with Y.-C. Tseng Dept. of Computer Science & Information Engineering TaiwanTech. Dept. of Statistics@NCKU
More informationKernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Kernels + K-Means Matt Gormley Lecture 29 April 25, 2018 1 Reminders Homework 8:
More informationA Dendrogram. Bioinformatics (Lec 17)
A Dendrogram 3/15/05 1 Hierarchical Clustering [Johnson, SC, 1967] Given n points in R d, compute the distance between every pair of points While (not done) Pick closest pair of points s i and s j and
More informationMultimodal Information Spaces for Content-based Image Retrieval
Research Proposal Multimodal Information Spaces for Content-based Image Retrieval Abstract Currently, image retrieval by content is a research problem of great interest in academia and the industry, due
More informationCombining Text Embedding and Knowledge Graph Embedding Techniques for Academic Search Engines
Combining Text Embedding and Knowledge Graph Embedding Techniques for Academic Search Engines SemDeep-4, Oct. 2018 Gengchen Mai Krzysztof Janowicz Bo Yan STKO Lab, University of California, Santa Barbara
More informationRepresentation Learning using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval
Representation Learning using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval Xiaodong Liu 12, Jianfeng Gao 1, Xiaodong He 1 Li Deng 1, Kevin Duh 2, Ye-Yi Wang 1 1
More informationDiscriminative classifiers for image recognition
Discriminative classifiers for image recognition May 26 th, 2015 Yong Jae Lee UC Davis Outline Last time: window-based generic object detection basic pipeline face detection with boosting as case study
More informationUrban Scene Segmentation, Recognition and Remodeling. Part III. Jinglu Wang 11/24/2016 ACCV 2016 TUTORIAL
Part III Jinglu Wang Urban Scene Segmentation, Recognition and Remodeling 102 Outline Introduction Related work Approaches Conclusion and future work o o - - ) 11/7/16 103 Introduction Motivation Motivation
More informationCombining Neural Networks and Log-linear Models to Improve Relation Extraction
Combining Neural Networks and Log-linear Models to Improve Relation Extraction Thien Huu Nguyen and Ralph Grishman Computer Science Department, New York University {thien,grishman}@cs.nyu.edu Outline Relation
More informationA fully-automatic approach to answer geographic queries: GIRSA-WP at GikiP
A fully-automatic approach to answer geographic queries: at GikiP Johannes Leveling Sven Hartrumpf Intelligent Information and Communication Systems (IICS) University of Hagen (FernUniversität in Hagen)
More informationMini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class
Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class Guidelines Submission. Submit a hardcopy of the report containing all the figures and printouts of code in class. For readability
More informationPTE : Predictive Text Embedding through Large-scale Heterogeneous Text Networks
PTE : Predictive Text Embedding through Large-scale Heterogeneous Text Networks Pramod Srinivasan CS591txt - Text Mining Seminar University of Illinois, Urbana-Champaign April 8, 2016 Pramod Srinivasan
More informationAn Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation
An Empirical Evaluation of Deep Architectures on Problems with Many Factors of Variation Hugo Larochelle, Dumitru Erhan, Aaron Courville, James Bergstra, and Yoshua Bengio Université de Montréal 13/06/2007
More informationCSE 573: Artificial Intelligence Autumn 2010
CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine
More informationBackpropagating through Structured Argmax using a SPIGOT
Backpropagating through Structured Argmax using a SPIGOT Hao Peng, Sam Thomson, Noah A. Smith @ACL July 17, 2018 Overview arg max Parser Downstream task Loss L Overview arg max Parser Downstream task Head
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences
More informationInformation Retrieval
Information Retrieval Learning to Rank Ilya Markov i.markov@uva.nl University of Amsterdam Ilya Markov i.markov@uva.nl Information Retrieval 1 Course overview Offline Data Acquisition Data Processing Data
More informationImproving Latent Fingerprint Matching Performance by Orientation Field Estimation using Localized Dictionaries
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 11, November 2014,
More informationCollective Entity Disambiguation with Structured Gradient Tree Boosting
Collective Entity Disambiguation with Structured Gradient Tree Boosting Yi Yang Ozan Irsoy Kazi Shefaet Rahman Bloomberg LP New York, NY 10022 {yyang464+oirsoy+krahman7}@bloomberg.net Abstract We present
More informationMore on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization
More on Learning Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization Neural Net Learning Motivated by studies of the brain. A network of artificial
More informationExploiting Internal and External Semantics for the Clustering of Short Texts Using World Knowledge
Exploiting Internal and External Semantics for the Using World Knowledge, 1,2 Nan Sun, 1 Chao Zhang, 1 Tat-Seng Chua 1 1 School of Computing National University of Singapore 2 School of Computer Science
More informationarxiv: v1 [cs.cv] 20 Dec 2016
End-to-End Pedestrian Collision Warning System based on a Convolutional Neural Network with Semantic Segmentation arxiv:1612.06558v1 [cs.cv] 20 Dec 2016 Heechul Jung heechul@dgist.ac.kr Min-Kook Choi mkchoi@dgist.ac.kr
More informationDetection and Extraction of Events from s
Detection and Extraction of Events from Emails Shashank Senapaty Department of Computer Science Stanford University, Stanford CA senapaty@cs.stanford.edu December 12, 2008 Abstract I build a system to
More informationConstrained Convolutional Neural Networks for Weakly Supervised Segmentation. Deepak Pathak, Philipp Krähenbühl and Trevor Darrell
Constrained Convolutional Neural Networks for Weakly Supervised Segmentation Deepak Pathak, Philipp Krähenbühl and Trevor Darrell 1 Multi-class Image Segmentation Assign a class label to each pixel in
More informationAn Algorithm based on SURF and LBP approach for Facial Expression Recognition
ISSN: 2454-2377, An Algorithm based on SURF and LBP approach for Facial Expression Recognition Neha Sahu 1*, Chhavi Sharma 2, Hitesh Yadav 3 1 Assistant Professor, CSE/IT, The North Cap University, Gurgaon,
More informationUMass at TREC 2017 Common Core Track
UMass at TREC 2017 Common Core Track Qingyao Ai, Hamed Zamani, Stephen Harding, Shahrzad Naseri, James Allan and W. Bruce Croft Center for Intelligent Information Retrieval College of Information and Computer
More informationTri-modal Human Body Segmentation
Tri-modal Human Body Segmentation Master of Science Thesis Cristina Palmero Cantariño Advisor: Sergio Escalera Guerrero February 6, 2014 Outline 1 Introduction 2 Tri-modal dataset 3 Proposed baseline 4
More informationMachine Learning Classifiers and Boosting
Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve
More informationAssessing the Quality of Natural Language Text
Assessing the Quality of Natural Language Text DC Research Ulm (RIC/AM) daniel.sonntag@dfki.de GI 2004 Agenda Introduction and Background to Text Quality Text Quality Dimensions Intrinsic Text Quality,
More informationModel Inference and Averaging. Baging, Stacking, Random Forest, Boosting
Model Inference and Averaging Baging, Stacking, Random Forest, Boosting Bagging Bootstrap Aggregating Bootstrap Repeatedly select n data samples with replacement Each dataset b=1:b is slightly different
More informationJoint Inference in Image Databases via Dense Correspondence. Michael Rubinstein MIT CSAIL (while interning at Microsoft Research)
Joint Inference in Image Databases via Dense Correspondence Michael Rubinstein MIT CSAIL (while interning at Microsoft Research) My work Throughout the year (and my PhD thesis): Temporal Video Analysis
More informationConditional Random Fields and beyond D A N I E L K H A S H A B I C S U I U C,
Conditional Random Fields and beyond D A N I E L K H A S H A B I C S 5 4 6 U I U C, 2 0 1 3 Outline Modeling Inference Training Applications Outline Modeling Problem definition Discriminative vs. Generative
More informationEntity and Knowledge Base-oriented Information Retrieval
Entity and Knowledge Base-oriented Information Retrieval Presenter: Liuqing Li liuqing@vt.edu Digital Library Research Laboratory Virginia Polytechnic Institute and State University Blacksburg, VA 24061
More informationBilevel Sparse Coding
Adobe Research 345 Park Ave, San Jose, CA Mar 15, 2013 Outline 1 2 The learning model The learning algorithm 3 4 Sparse Modeling Many types of sensory data, e.g., images and audio, are in high-dimensional
More information16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning. Spring 2018 Lecture 14. Image to Text
16-785: Integrated Intelligence in Robotics: Vision, Language, and Planning Spring 2018 Lecture 14. Image to Text Input Output Classification tasks 4/1/18 CMU 16-785: Integrated Intelligence in Robotics
More informationWeb-based Question Answering: Revisiting AskMSR
Web-based Question Answering: Revisiting AskMSR Chen-Tse Tsai University of Illinois Urbana, IL 61801, USA ctsai12@illinois.edu Wen-tau Yih Christopher J.C. Burges Microsoft Research Redmond, WA 98052,
More informationA study of classification algorithms using Rapidminer
Volume 119 No. 12 2018, 15977-15988 ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu A study of classification algorithms using Rapidminer Dr.J.Arunadevi 1, S.Ramya 2, M.Ramesh Raja
More informationscikit-learn (Machine Learning in Python)
scikit-learn (Machine Learning in Python) (PB13007115) 2016-07-12 (PB13007115) scikit-learn (Machine Learning in Python) 2016-07-12 1 / 29 Outline 1 Introduction 2 scikit-learn examples 3 Captcha recognize
More informationPractical Methodology. Lecture slides for Chapter 11 of Deep Learning Ian Goodfellow
Practical Methodology Lecture slides for Chapter 11 of Deep Learning www.deeplearningbook.org Ian Goodfellow 2016-09-26 What drives success in ML? Arcane knowledge of dozens of obscure algorithms? Mountains
More informationCS6716 Pattern Recognition
CS6716 Pattern Recognition Aaron Bobick School of Interactive Computing Administrivia PS3 is out now, due April 8. Today chapter 12 of the Hastie book. Slides (and entertainment) from Moataz Al-Haj Three
More informationNatural Language Processing
Natural Language Processing Machine Learning Potsdam, 26 April 2012 Saeedeh Momtazi Information Systems Group Introduction 2 Machine Learning Field of study that gives computers the ability to learn without
More informationfor Searching Social Media Posts
Mining the Temporal Statistics of Query Terms for Searching Social Media Posts ICTIR 17 Amsterdam Oct. 1 st 2017 Jinfeng Rao Ferhan Ture Xing Niu Jimmy Lin Task: Ad-hoc Search on Social Media domain Stream
More informationAnnouncements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.
CS 188: Artificial Intelligence Spring 2011 Lecture 21: Perceptrons 4/13/2010 Announcements Project 4: due Friday. Final Contest: up and running! Project 5 out! Pieter Abbeel UC Berkeley Many slides adapted
More informationLayerwise Interweaving Convolutional LSTM
Layerwise Interweaving Convolutional LSTM Tiehang Duan and Sargur N. Srihari Department of Computer Science and Engineering The State University of New York at Buffalo Buffalo, NY 14260, United States
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Image Data: Classification via Neural Networks Instructor: Yizhou Sun yzsun@ccs.neu.edu November 19, 2015 Methods to Learn Classification Clustering Frequent Pattern Mining
More informationEvaluation. Evaluate what? For really large amounts of data... A: Use a validation set.
Evaluate what? Evaluation Charles Sutton Data Mining and Exploration Spring 2012 Do you want to evaluate a classifier or a learning algorithm? Do you want to predict accuracy or predict which one is better?
More informationConditional Random Fields for Word Hyphenation
Conditional Random Fields for Word Hyphenation Tsung-Yi Lin and Chen-Yu Lee Department of Electrical and Computer Engineering University of California, San Diego {tsl008, chl260}@ucsd.edu February 12,
More informationLearning to Rank Aggregated Answers for Crossword Puzzles
Learning to Rank Aggregated Answers for Crossword Puzzles Massimo Nicosia 1,2, Gianni Barlacchi 2 and Alessandro Moschitti 1,2 1 Qatar Computing Research Institute 2 University of Trento m.nicosia@gmail.com,
More informationSupport Vector Machines 290N, 2015
Support Vector Machines 290N, 2015 Two Class Problem: Linear Separable Case with a Hyperplane Class 1 Class 2 Many decision boundaries can separate these two classes using a hyperplane. Which one should
More informationEffective Latent Space Graph-based Re-ranking Model with Global Consistency
Effective Latent Space Graph-based Re-ranking Model with Global Consistency Feb. 12, 2009 1 Outline Introduction Related work Methodology Graph-based re-ranking model Learning a latent space graph A case
More informationGenerative and discriminative classification techniques
Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14
More informationLocally Adaptive Regression Kernels with (many) Applications
Locally Adaptive Regression Kernels with (many) Applications Peyman Milanfar EE Department University of California, Santa Cruz Joint work with Hiro Takeda, Hae Jong Seo, Xiang Zhu Outline Introduction/Motivation
More informationMachine Learning. The Breadth of ML Neural Networks & Deep Learning. Marc Toussaint. Duy Nguyen-Tuong. University of Stuttgart
Machine Learning The Breadth of ML Neural Networks & Deep Learning Marc Toussaint University of Stuttgart Duy Nguyen-Tuong Bosch Center for Artificial Intelligence Summer 2017 Neural Networks Consider
More informationLink Prediction for Social Network
Link Prediction for Social Network Ning Lin Computer Science and Engineering University of California, San Diego Email: nil016@eng.ucsd.edu Abstract Friendship recommendation has become an important issue
More informationAutomatic Domain Partitioning for Multi-Domain Learning
Automatic Domain Partitioning for Multi-Domain Learning Di Wang diwang@cs.cmu.edu Chenyan Xiong cx@cs.cmu.edu William Yang Wang ww@cmu.edu Abstract Multi-Domain learning (MDL) assumes that the domain labels
More information