Parallel Boosted Regression Trees for Web Search Ranking
|
|
- Claude Lawson
- 6 years ago
- Views:
Transcription
1 Parallel Boosted Regression Trees for Web Search Ranking Stephen Tyree, Kilian Q. Weinberger, Kunal Agrawal, Jennifer Paykin Department of Computer Science and Engineering, Washington University in St. Louis, St. Louis, MO, USA Wesleyan University, Middletown, CT, USA WWW 0, March 30, 0, Hyderabad, India
2
3 Overview 3
4 Overview Search Query + Documents Order by Relevance 3
5 Overview Search Query + Documents Order by Relevance h t ( ) Ensemble Weak Learner Gradient Boosted Regression Trees h t ( ) α g t ( ) + 3
6 Overview Search Query + Documents Order by Relevance h t ( ) Ensemble Gradient Boosted Regression Trees h t ( ) α g t ( ) bin j r j m j + Approximate Parallel Method bin j Weak Learner m j r j p j 0 p j 0 3
7 Overview Ensemble Weak Learner!"##$%"& &!" %!" $!" Search Query + Documents Order by Relevance Speedup and Accuracy ()*+"#",-./$%0" h t ( ) Gradient Boosted Regression Trees h t ( ) α g t ( ) bin j r j m j p j bin j m j r j + Approximate Parallel Method 0 #!"!" 34"#",-.&/%0" 34"$",-.%'0" "!" #!" $!" %!" &!" '!" '%()#*&+,&-*+.#//+*/& p j 0 3
8 Web Ranking 4
9 Web Ranking Query Documents Relevance Function h( ) Feature Generator Ranker Document/ Query Features {x i } x () x ()... x (n) 5
10 Web Ranking Query Query Documents Documents Relevance Function h( ) Feature Generator Ranker x () x ()... x (n) Document/ Query Features {x i } 5
11 Web Ranking Query Query Documents Documents Relevance Function h( ) Feature Generator Ranker x () x ()... x (n) Document/ Query Features {x i } 5
12 Web Ranking Query Query Documents Documents Relevance Function h( ) Feature Generator Ranker x () x ()... x (n) Document/ Query Features {x i } 5
13 Web Ranking Query Query Documents Feature Generator More Relevant Documents Relevance Function h( ) Ranker Document/ Query Features {x i } Less Relevant x () x ()... x (n) 5
14 Web Ranking Query Query Documents Feature Generator More Relevant Documents Relevance Function h( ) Ranker Document/ Query Features {x i } Less Relevant x () x ()... x (n) Ranking by pointwise relevance h : R f [0, 4], h(x i ) y i 5
15 Web Ranking 6
16 Web Ranking Supervised machine learning problem 6
17 Web Ranking Supervised machine learning problem Feature vectors Document/query pairs: x i R f 6
18 Web Ranking Supervised machine learning problem Feature vectors Document/query pairs: x i R f Labels Relevance: y i {0,,, 3, 4} 6
19 Web Ranking Supervised machine learning problem Feature vectors Document/query pairs: x i R f Labels Relevance: y i {0,,, 3, 4} Training data: D = {(x i,y i )} n i= Predictor: h : R f [0, 4], h(x i ) y i 6
20 Learning Relevance Ensemble Weak Learner h t ( ) h t ( ) α g t ( ) + 7
21 Learning a Relevance Predictor 8
22 Learning a Relevance Predictor Yahoo! Labs Learning to Rank Challenge 00 8
23 Learning a Relevance Predictor Yahoo! Labs Learning to Rank Challenge 00 Top 8 of 055 submissions 8
24 Learning a Relevance Predictor Yahoo! Labs Learning to Rank Challenge 00 Top 8 of 055 submissions Gradient Boosted Regression Trees Ensemble Weak Learner h t ( ) + h t ( ) α g t ( ) 8
25 Gradient Boosted Regression Trees Ensemble Weak Learner h t ( ) + h t ( ) α g t ( ) 9
26 Gradient Boosted Regression Trees Ensemble Weak Learner h t ( ) + Final predictor: h t ( ) α g t ( ) h t (x i )=h t (x i )+αg t (x i ) 9
27 Gradient Boosted Regression Trees Ensemble Weak Learner h t ( ) + h t ( ) α g t ( ) Final predictor: h t (x i )=h t (x i )+αg t (x i ) Weak learners: g t (x i ) y i h t (x i ) 9
28 Gradient Boosted Regression Trees Ensemble Weak Learner h t ( ) + h t ( ) α g t ( ) Final predictor: h t (x i )=h t (x i )+αg t (x i ) Weak learners: g t (x i ) y i h t (x i ) Approximate gradient descent in predictor space 9
29 Gradient Boosted Regression Trees Ensemble Weak Learner h t ( ) + h t ( ) α g t ( ) 0
30 Gradient Boosted Regression Trees Ensemble Weak Learner h t ( ) + Parallelization Large training datasets h t ( ) α g t ( ) 0
31 Gradient Boosted Regression Trees Ensemble Weak Learner h t ( ) + h t ( ) α g t ( ) Parallelization Large training datasets Numerous training iterations 0
32 Gradient Boosted Regression Trees Ensemble Weak Learner h t ( ) + h t ( ) α g t ( ) Parallelization Large training datasets Numerous training iterations But training is sequential! 0
33 Learning a Regression Tree Feature Feature
34 Learning a Regression Tree Feature Feature
35 Learning a Regression Tree Feature Feature
36 Learning a Regression Tree Feature Feature
37 Learning a Regression Tree Feature Feature
38 Learning a Regression Tree Feature Feature
39 Learning a Regression Tree Feature Feature
40 Learning a Regression Tree Feature Feature
41 Learning a Regression Tree Feature ? 3.8 Feature
42 Learning a Regression Tree Feature ? 3.8 Feature
43 Learning a Regression Tree Feature Feature
44 Learning a Regression Tree feature f split s label ȳ L label ȳ R
45 Learning a Regression Tree CART alogirthm: greedily minimize cost at split feature f split s label ȳ L label ȳ R
46 Learning a Regression Tree CART alogirthm: greedily minimize cost at split L s = (x i,y i ) L s (y i ȳ s L) + (x i,y i ) R s (y i ȳ s R) feature f split s label ȳ L label ȳ R
47 Learning a Regression Tree CART alogirthm: greedily minimize cost at split L s = Optimal split: (x i,y i ) L s (y i ȳ s L) + s = argmin s (x i,y i ) R s (y i ȳ s R) L s feature f split s label ȳ L label ȳ R
48 Learning a Regression Tree Feature Feature 3
49 Learning a Regression Tree Feature Feature 3
50 Learning a Regression Tree Feature Feature 3
51 Learning a Regression Tree Feature Feature 3
52 Learning a Regression Tree Feature Feature 3
53 bin j r j m j p j bin j r j m j 0 3 Parallel Method p j 0 4
54 Learning a Regression Tree 5
55 Learning a Regression Tree To evaluate loss on a split point s on feature f argmin (y i ȳl) s + (y i ȳr) s s (x i,y i ) L s (x i,y i ) R s 5
56 Learning a Regression Tree To evaluate loss on a split point s on feature f argmin (y i ȳl) s + (y i ȳr) s s (x i,y i ) L s (x i,y i ) R s m s : number of instances with feature f less than s 5
57 Learning a Regression Tree To evaluate loss on a split point s on feature f argmin (y i ȳl) s + (y i ȳr) s s (x i,y i ) L s (x i,y i ) R s m s : number of instances with feature f less than s s : sum of labels for instances with feature f less than s 5
58 Learning a Regression Tree To evaluate loss on a split point s on feature f argmin (y i ȳl) s + (y i ȳr) s s (x i,y i ) L s (x i,y i ) R s m s : number of instances with feature f less than s s : sum of labels for instances with feature f less than s s = argmin s ( s ) : best split s m s m m s 5
59 Learning a Regression Tree To evaluate loss on a split point s on feature f argmin (y i ȳl) s + (y i ȳr) s s (x i,y i ) L s (x i,y i ) R s m s : number of instances with feature f less than s s : sum of labels for instances with feature f less than s s = argmin s ( s ) : best split s m s m m s Estimate m s and s in parallel! 5
60 Parallel Tree Construction 6
61 Parallel Tree Construction Ben-Haim/Yom-Tov (00) Parallel decision tree construction 6
62 Parallel Tree Construction Ben-Haim/Yom-Tov (00) Parallel decision tree construction Our work 6
63 Parallel Tree Construction Ben-Haim/Yom-Tov (00) Parallel decision tree construction Our work Adapted to support regression 6
64 Parallel Tree Construction Ben-Haim/Yom-Tov (00) Parallel decision tree construction Our work Adapted to support regression Optimized for low-depth trees 6
65 Parallel Tree Construction Ben-Haim/Yom-Tov (00) Parallel decision tree construction Our work Adapted to support regression Optimized for low-depth trees Provides open-source C++/MPI implementation 6
66 Parallel Tree Construction Master Processor Processor Processor p 7
67 Parallel Tree Construction Master Processor Processor Processor p Distribute training data 7
68 Parallel Tree Construction Master Processor Processor Processor p Distribute training data 7
69 Parallel Algorithm Master Initialize regression tree Processor Processor Processor p 8
70 Parallel Algorithm Master Processor Processor Processor p 9
71 Parallel Algorithm Master Processor Processor Processor p For each feature... 9
72 Parallel Algorithm Master Processor Processor Processor p For each feature... 9
73 Parallel Algorithm Master Processor Processor Compress... Processor p For each feature... 9
74 Parallel Algorithm Master Processor Processor Compress... Processor p For each feature... 9
75 Parallel Algorithm Master Send to master Processor Processor Compress... Processor p For each feature... 9
76 Parallel Algorithm Master Send to master Processor Processor Compress... Processor p For each feature... 9
77 Parallel Algorithm Master Processor Processor Processor p Repeat with other features... 0
78 Parallel Algorithm Master Processor Processor Processor p Repeat with other features... 0
79 Parallel Algorithm Master Processor Processor Processor p Repeat with other features... 0
80 Parallel Algorithm Master Processor Processor Processor p Repeat with other features... 0
81 Parallel Algorithm Master Processor Processor Processor p Repeat with other features... 0
82 Parallel Algorithm Master Processor Processor Processor p Repeat with other features... 0
83 Parallel Algorithm Master Processor Processor Processor p Repeat with other features... 0
84 Parallel Algorithm Master Processor Processor Processor p Repeat with other features... 0
85 Parallel Algorithm Master Processor Processor Processor p Repeat with other features... 0
86 Parallel Algorithm Master Processor Processor Processor p
87 Parallel Algorithm Master Master selects a split point Processor Processor Processor p
88 Parallel Algorithm Master Master selects a split point Expands tree Processor Processor Processor p
89 Parallel Algorithm Master Master selects a split point Expands tree Processor Processor Processor p
90 Parallel Algorithm Master Master selects a split point Expands tree Processor Processor Processor p And distributes updated tree
91 Parallel Algorithm Master Completes tree Processor Processor Processor p
92 Parallel Algorithm Master Completes tree Processor Processor Processor p
93 Parallel Algorithm Master Completes tree Processor Processor Processor p
94 Parallel Algorithm Master Completes tree Processor Processor Processor p
95 Parallel Algorithm Master Completes tree Processor Processor Processor p
96 Parallel Algorithm Master Completes tree Ensemble Adds to ensemble
97 Parallel Algorithm Master Completes tree Ensemble Adds to ensemble
98 Parallel Algorithm Master Completes tree Ensemble Adds to ensemble Processors update residual
99 Histograms bin j r j m j p j 0 3
100 Histograms Compress distribution of feature values across instances bin j r j m j p j 0 3
101 Histograms Compress distribution of feature values across instances Dynamic bins with stats bin j r j m j p j 0 3
102 Histograms Compress distribution of feature values across instances Dynamic bins with stats p j : bin center bin j r j m j p j 0 3
103 Histograms Compress distribution of feature values across instances Dynamic bins with stats p j : bin center m j : number of points bin j r j m j p j 0 3
104 Histograms Compress distribution of feature values across instances Dynamic bins with stats p j : bin center m j : number of points r j : sum of labels bin j r j m j p j 0 3
105 Histograms Compress distribution of feature values across instances Dynamic bins with stats p j : bin center m j : number of points r j : sum of labels Maximum histogram size bin j r j m j p j 0 3
106 Histograms bin j r j m j p j 0 4
107 Histograms Histogram functions bin j r j m j p j 0 4
108 Histograms Histogram functions Merge(histogramA, histogramb) bin j r j m j p j 0 4
109 Histograms Histogram functions Merge(histogramA, histogramb) Uniform(histogram, n) bin j r j m j p j 0 4
110 Histograms Histogram functions Merge(histogramA, histogramb) Uniform(histogram, n) bin j r j m j p j 0 InterpolateM(histogram, s) InterpolateR(histogram, s) 4
111 Parallel Tree Construction 5
112 Parallel Tree Construction Why this setup works 5
113 Parallel Tree Construction Why this setup works Accuracy: weak learner assumption 5
114 Parallel Tree Construction Why this setup works Accuracy: weak learner assumption Speedup 5
115 Parallel Tree Construction Why this setup works Accuracy: weak learner assumption Speedup Tunable communication 5
116 Parallel Tree Construction Why this setup works Accuracy: weak learner assumption Speedup Tunable communication Limited depth trees 5
117 Parallel Tree Construction Why this setup works Accuracy: weak learner assumption Speedup Tunable communication Limited depth trees Large data sets 5
118 &!" Results %!" 4!"##$%"& $!" ()*+"#",-./$%0" #!" 34"#",-.&/%0" 34"$",-.%'0"!" "!" #!" $!" %!" &!" '!" '%()#*&+,&-*+.#//+*/& 6
119 Datasets 7
120 Datasets Yahoo LTRC Set : 473,34 training instances, 700 features Set : 34,85 training instances, 700 features 7
121 Datasets Yahoo LTRC Set : 473,34 training instances, 700 features Set : 34,85 training instances, 700 features Microsoft LETOR Fold: 73,4 training instances, 36 features 7
122 Software 8
123 Software pgbrt (this work) Approximate, parallel method 8
124 Software pgbrt (this work) Approximate, parallel method RT-Rank (Mohan, et al.) Exact GBRT method 8
125 Speedup 9
126 Speedup Speedup on 48-core SMP machine 9
127 &!" Speedup Speedup on 48-core SMP machine!"##$%"& %!" $!" #!"!" ()*+"#",-./$%0" 34"#",-.&/%0" 34"$",-.%'0" "!" #!" $!" %!" &!" '!" '%()#*&+,&-*+.#//+*/& 9
128 Speedup!"##$%"& &!" %!" $!" Maximum speedup = 4 (on 48 processors) ()*+"#",-./$%0" Speedup on 48-core SMP machine #!"!" 34"#",-.&/%0" 34"$",-.%'0" "!" #!" $!" %!" &!" '!" '%()#*&+,&-*+.#//+*/& 9
129 Speedup!"##$%"& &!" %!" $!" Maximum speedup = 4 (on 48 processors) ()*+"#",-./$%0" Speedup on 48-core SMP machine #!"!" 34"#",-.&/%0" 34"$",-.%'0" "!" #!" $!" %!" &!" '!" '%()#*&+,&-*+.#//+*/& Speedup on distributed memory cluster 9
130 Speedup!"##$%"& &!" %!" $!" Maximum speedup = 4 (on 48 processors) ()*+"#",-./$%0" Speedup on 48-core SMP machine #!"!" 34"#",-.&/%0" 34"$",-.%'0" "!" #!" $!" %!" &!" '!" '%()#*&+,&-*+.#//+*/& Speedup on distributed memory cluster &!" %!"!"##$%"& $!" ()*+"#",-./$%0" #!" 34"#",-.&/%0" 34"$",-.%'0"!" "!" #!" $!" %!" &!" '!" '%()#*&+,&-*+.#//+*/& 9
131 Speedup!"##$%"& &!" %!" $!" Maximum speedup = 4 (on 48 processors) ()*+"#",-./$%0" Speedup on 48-core SMP machine #!"!" 34"#",-.&/%0" 34"$",-.%'0" "!" #!" $!" %!" &!" '!" '%()#*&+,&-*+.#//+*/& Speedup on distributed memory cluster!"##$%"& &!" %!" $!" Maximum speedup = 5 (on 3 processors) ()*+"#",-./$%0" #!" 34"#",-.&/%0" 34"$",-.%'0"!" "!" #!" $!" %!" &!" '!" '%()#*&+,&-*+.#//+*/& 9
132 Accuracy 30
133 Accuracy ERR and NDCG metrics 30
134 Accuracy ERR and NDCG metrics Yahoo LTRC Set (34,85 training instances, 700 features) 30
135 Accuracy ERR and NDCG metrics Yahoo LTRC Set (34,85 training instances, 700 features) 30
136 Accuracy ERR and NDCG metrics Yahoo LTRC Set (34,85 training instances, 700 features) 30
137 Accuracy ERR and NDCG metrics Yahoo LTRC Set (34,85 training instances, 700 features) Same accuracy after slightly more iterations 30
138 Accuracy 3
139 Accuracy ERR and NDCG metrics 3
140 Accuracy ERR and NDCG metrics Yahoo LTRC Set (473,34 training instances, 700 features) 3
141 Accuracy ERR and NDCG metrics Yahoo LTRC Set (473,34 training instances, 700 features) 3
142 Accuracy ERR and NDCG metrics Yahoo LTRC Set (473,34 training instances, 700 features)!"#*%&!"#()&!"#$%!"##'&!"#'(&!"#'&!"#$%&,-./&0$3& 4,-./&0$3& 4,-./&0'3&!& $!!& +!!!& +$!!& %!!!& &'()*+,-.% 3
143 Accuracy ERR and NDCG metrics Yahoo LTRC Set (473,34 training instances, 700 features) Same accuracy after slightly increased depth!"#$%!"#*%&!"#()&!"##'&!"#'(&!"#'&!"#$%&,-./&0$3& 4,-./&0$3& 4,-./&0'3&!& $!!& +!!!& +$!!& %!!!& &'()*+,-.% 3
144 Accuracy 3
145 Accuracy Effects of pgbrt approximation 3
146 Accuracy Effects of pgbrt approximation Requires more iterations or slightly increased depth 3
147 Accuracy Effects of pgbrt approximation Requires more iterations or slightly increased depth (Permitted by speedup) 3
148 Accuracy Effects of pgbrt approximation Requires more iterations or slightly increased depth (Permitted by speedup) Same accuracy on Yahoo LTRC 3
149 Accuracy Effects of pgbrt approximation Requires more iterations or slightly increased depth (Permitted by speedup) Same accuracy on Yahoo LTRC Microsoft LETOR within %-% 3
150 Conclusions 33
151 Conclusions Parallel, approximate GBRT implementation 33
152 Conclusions Parallel, approximate GBRT implementation Both speedup and accuracy 33
153 Conclusions Parallel, approximate GBRT implementation Both speedup and accuracy Processes in hours what took days for WashU LTRC team 33
154 Acknowledgements Yahoo! Labs Ananth Mohan/Zheng Chen Weinberger lab Minmin Chen, Eddie Xu, Dor Kedem, Yuzong Liu Agrawal lab David Ferry, Jordan Krage 34
Parallel Boosted Regression Trees for Web Search Ranking
Parallel Boosted Regression Trees for Web Search Ranking Stephen Tyree swtyree@wustl.edu Kilian Q. Weinberger kilian@wustl.edu Department of Computer Science & Engineering Washington University in St.
More informationParallel Boosted Regression Trees for Web Search Ranking
Parallel Boosted Regression Trees for Web Search Ranking Stephen Tyree swtyree@wustl.edu Kilian Weinberger kilian@wustl.edu Department of Computer Science & Engineering Washington University in St. Louis
More informationAn Empirical Analysis on Point-wise Machine Learning Techniques using Regression Trees for Web-search Ranking
Washington University in St. Louis Washington University Open Scholarship All Theses and Dissertations (ETDs) January 2010 An Empirical Analysis on Point-wise Machine Learning Techniques using Regression
More informationScaling Up Decision Tree Ensembles
Scaling Up Decision Tree Ensembles Misha Bilenko (Microsoft) with Ron Bekkerman (LinkedIn) and John Langford (Yahoo!) http://hunch.net/~large_scale_survey Tree Ensembles: Basics Rule-based prediction is
More informationLizhe Sun. November 17, Florida State University. Ranking in Statistics and Machine Learning. Lizhe Sun. Introduction
in in Florida State University November 17, 2017 Framework in 1. our life 2. Early work: Model Examples 3. webpage Web page search modeling Data structure Data analysis with machine learning algorithms
More informationApproximation and Relaxation Approaches for Parallel and Distributed Machine Learning
Washington University in St. Louis Washington University Open Scholarship Engineering and Applied Science Theses & Dissertations Engineering and Applied Science Winter 12-15-2014 Approximation and Relaxation
More informationWashington University in St. Louis. School of Engineering and Applied Science. Department of Computer Science and Engineering
Washington University in St. Louis School of Engineering and Applied Science Department of Computer Science and Engineering Dissertation Examination Committee: Kilian Q. Weinberger, Chair Sanmay Das Yasutaka
More informationCSE 546 Machine Learning, Autumn 2013 Homework 2
CSE 546 Machine Learning, Autumn 2013 Homework 2 Due: Monday, October 28, beginning of class 1 Boosting [30 Points] We learned about boosting in lecture and the topic is covered in Murphy 16.4. On page
More informationClassifier Cascade for Minimizing Feature Evaluation Cost
Minmin Chen Zhixiang (Eddie) Xu Kilian Q. Weinberger Olivier Chapelle Dor Kedem Washington University in Saint Louis Saint Louis, MO chenm, zhixiang.xu, kilian, kedem.dor@wustl.edu Yahoo! Research Santa
More informationGradient Boosted Feature Selection. Zhixiang (Eddie) Xu, Gao Huang, Kilian Q. Weinberger, Alice X. Zheng
Gradient Boosted Feature Selection Zhixiang (Eddie) Xu, Gao Huang, Kilian Q. Weinberger, Alice X. Zheng 1 Goals of feature selection Reliably extract relevant features Identify non-linear feature dependency
More informationComputer Vision Group Prof. Daniel Cremers. 8. Boosting and Bagging
Prof. Daniel Cremers 8. Boosting and Bagging Repetition: Regression We start with a set of basis functions (x) =( 0 (x), 1(x),..., M 1(x)) x 2 í d The goal is to fit a model into the data y(x, w) =w T
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationLearning to rank, a supervised approach for ranking of documents Master Thesis in Computer Science - Algorithms, Languages and Logic KRISTOFER TAPPER
Learning to rank, a supervised approach for ranking of documents Master Thesis in Computer Science - Algorithms, Languages and Logic KRISTOFER TAPPER Chalmers University of Technology University of Gothenburg
More informationSparkTree: Push the Limit of Tree Ensemble Learning
SparkTree: Push the Limit of Tree Ensemble Learning Hucheng Zhou Microsoft Research, Beijing, China Bo Zhao Nanjing University, Nanjing, China HUZHO@MICROSOFT.COM BHOPPI@OUTLOOK.COM Editor: Kevin Murphy
More informationmltool Documentation Release Maurizio Sambati
mltool Documentation Release 0.5.1 Maurizio Sambati November 18, 2015 Contents 1 Overview 3 1.1 Features.................................................. 3 1.2 Installation................................................
More informationOn the Effectiveness of Query Weighting for Adapting Rank Learners to New Unlabelled Collections
On the Effectiveness of Query Weighting for Adapting Rank Learners to New Unlabelled Collections Pengfei Li RMIT University, Australia li.pengfei@rmit.edu.au Mark Sanderson RMIT University, Australia mark.sanderson@rmit.edu.au
More informationClassifier Cascade for Minimizing Feature Evaluation Cost
Classifier Cascade for Minimizing Feature Ealuation Cost Minmin Chen Zhixiang (Eddie) Xu Kilian Q. Weinberger Oliier Chapelle Dor Kedem Washington Uniersity in Saint Louis Saint Louis, MO chenm, zhixiang.xu,
More informationPredict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry
Predict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry Jincheng Cao, SCPD Jincheng@stanford.edu 1. INTRODUCTION When running a direct mail campaign, it s common practice
More informationComputer Vision Group Prof. Daniel Cremers. 6. Boosting
Prof. Daniel Cremers 6. Boosting Repetition: Regression We start with a set of basis functions (x) =( 0 (x), 1(x),..., M 1(x)) x 2 í d The goal is to fit a model into the data y(x, w) =w T (x) To do this,
More informationOverview of the NTCIR-13 OpenLiveQ Task
Overview of the NTCIR-13 OpenLiveQ Task Makoto P. Kato, Takehiro Yamamoto (Kyoto University), Sumio Fujita, Akiomi Nishida, Tomohiro Manabe (Yahoo Japan Corporation) Agenda Task Design (3 slides) Data
More informationInformation Search in Web Archives
Information Search in Web Archives Miguel Costa Advisor: Prof. Mário J. Silva Co-Advisor: Prof. Francisco Couto Department of Informatics, Faculty of Sciences, University of Lisbon PhD thesis defense,
More informationMulti-label Classification. Jingzhou Liu Dec
Multi-label Classification Jingzhou Liu Dec. 6 2016 Introduction Multi-class problem, Training data (x $, y $ ) ( ), x $ X R., y $ Y = 1,2,, L Learn a mapping f: X Y Each instance x $ is associated with
More informationNotes on Multilayer, Feedforward Neural Networks
Notes on Multilayer, Feedforward Neural Networks CS425/528: Machine Learning Fall 2012 Prepared by: Lynne E. Parker [Material in these notes was gleaned from various sources, including E. Alpaydin s book
More informationUniversity of California, Berkeley
University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2014 Paper 321 A Scalable Supervised Subsemble Prediction Algorithm Stephanie Sapp Mark J. van der Laan
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationA Practical Tour of Ensemble (Machine) Learning
A Practical Tour of Ensemble (Machine) Learning Nima Hejazi Evan Muzzall Division of Biostatistics, University of California, Berkeley D-Lab, University of California, Berkeley slides: https://googl/wwaqc
More informationLearning to Rank for Faceted Search Bridging the gap between theory and practice
Learning to Rank for Faceted Search Bridging the gap between theory and practice Agnes van Belle @ Berlin Buzzwords 2017 Job-to-person search system Generated query Match indicator Faceted search Multiple
More informationInformation theory methods for feature selection
Information theory methods for feature selection Zuzana Reitermanová Department of Computer Science Faculty of Mathematics and Physics Charles University in Prague, Czech Republic Diplomový a doktorandský
More information7. Boosting and Bagging Bagging
Group Prof. Daniel Cremers 7. Boosting and Bagging Bagging Bagging So far: Boosting as an ensemble learning method, i.e.: a combination of (weak) learners A different way to combine classifiers is known
More informationMachine learning library for Apache Flink
Machine learning library for Apache Flink MTP Mid Term Report submitted to Indian Institute of Technology Mandi for partial fulfillment of the degree of B. Tech. by Devang Bacharwar (B2059) under the guidance
More informationProject Proposals. Xiang Zhang. Department of Computer Science Courant Institute of Mathematical Sciences New York University.
Project Proposals Xiang Zhang Department of Computer Science Courant Institute of Mathematical Sciences New York University March 26, 2013 Xiang Zhang (NYU) Project Proposals March 26, 2013 1 / 9 Contents
More informationS8873 GBM INFERENCING ON GPU. Shankara Rao Thejaswi Nanditale, Vinay Deshpande
S8873 GBM INFERENCING ON GPU Shankara Rao Thejaswi Nanditale, Vinay Deshpande Introduction AGENDA Objective Experimental Results Implementation Details Conclusion 2 INTRODUCTION 3 BOOSTING What is it?
More informationInternational Journal of Advance Research in Engineering, Science & Technology
Impact Factor (SJIF): 4.542 International Journal of Advance Research in Engineering, Science & Technology e-issn: 2393-9877, p-issn: 2394-2444 Volume 4, Issue 12, December-2017 A Parallel Patient Treatment
More informationBalancing Speed and Quality in Online Learning to Rank for Information Retrieval
Balancing Speed and Quality in Online Learning to Rank for Information Retrieval ABSTRACT Harrie Oosterhuis University of Amsterdam Amsterdam, The Netherlands oosterhuis@uva.nl In Online Learning to Rank
More informationMachine Learning Techniques for Data Mining
Machine Learning Techniques for Data Mining Eibe Frank University of Waikato New Zealand 10/25/2000 1 PART VII Moving on: Engineering the input and output 10/25/2000 2 Applying a learner is not all Already
More informationData Mining Lecture 8: Decision Trees
Data Mining Lecture 8: Decision Trees Jo Houghton ECS Southampton March 8, 2019 1 / 30 Decision Trees - Introduction A decision tree is like a flow chart. E. g. I need to buy a new car Can I afford it?
More informationCSC411 Fall 2014 Machine Learning & Data Mining. Ensemble Methods. Slides by Rich Zemel
CSC411 Fall 2014 Machine Learning & Data Mining Ensemble Methods Slides by Rich Zemel Ensemble methods Typical application: classi.ication Ensemble of classi.iers is a set of classi.iers whose individual
More informationscikit-learn (Machine Learning in Python)
scikit-learn (Machine Learning in Python) (PB13007115) 2016-07-12 (PB13007115) scikit-learn (Machine Learning in Python) 2016-07-12 1 / 29 Outline 1 Introduction 2 scikit-learn examples 3 Captcha recognize
More informationSubsemble: A Flexible Subset Ensemble Prediction Method. Stephanie Karen Sapp. A dissertation submitted in partial satisfaction of the
Subsemble: A Flexible Subset Ensemble Prediction Method by Stephanie Karen Sapp A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Statistics
More informationWebSci and Learning to Rank for IR
WebSci and Learning to Rank for IR Ernesto Diaz-Aviles L3S Research Center. Hannover, Germany diaz@l3s.de Ernesto Diaz-Aviles www.l3s.de 1/16 Motivation: Information Explosion Ernesto Diaz-Aviles
More informationCombining Gradient Boosting Machines with Collective Inference to Predict Continuous Values
Combining Gradient Boosting Machines with Collective Inference to Predict Continuous Values Iman Alodah Computer Science Department Purdue University West Lafayette, Indiana 47906 Email: ialodah@purdue.edu
More informationLearning to Rank for Information Retrieval. Tie-Yan Liu Lead Researcher Microsoft Research Asia
Learning to Rank for Information Retrieval Tie-Yan Liu Lead Researcher Microsoft Research Asia 4/20/2008 Tie-Yan Liu @ Tutorial at WWW 2008 1 The Speaker Tie-Yan Liu Lead Researcher, Microsoft Research
More informationLearning Anisotropic RBF Kernels
Learning Anisotropic RBF Kernels Fabio Aiolli and Michele Donini University of Padova - Department of Mathematics Via Trieste, 63, 35121 Padova - Italy {aiolli,mdonini}@math.unipd.it Abstract. We present
More informationTencentBoost: A Gradient Boosting Tree System with Parameter Server
TencentBoost: A Gradient Boosting Tree System with Parameter Server Jie Jiang Jiawei Jiang Bin Cui Ce Zhang School of Software & Microelectronics, Peking University Tencent Inc. School of EECS & Key Laboratory
More informationextreme Gradient Boosting (XGBoost)
extreme Gradient Boosting (XGBoost) XGBoost stands for extreme Gradient Boosting. The motivation for boosting was a procedure that combi nes the outputs of many weak classifiers to produce a powerful committee.
More informationGlobally Induced Forest: A Prepruning Compression Scheme
Globally Induced Forest: A Prepruning Compression Scheme Jean-Michel Begon, Arnaud Joly, Pierre Geurts Systems and Modeling, Dept. of EE and CS, University of Liege, Belgium ICML 2017 Goal and motivation
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description
More informationSupervised Learning for Image Segmentation
Supervised Learning for Image Segmentation Raphael Meier 06.10.2016 Raphael Meier MIA 2016 06.10.2016 1 / 52 References A. Ng, Machine Learning lecture, Stanford University. A. Criminisi, J. Shotton, E.
More informationRandom Forest in Genomic Selection
Random Forest in genomic selection 1 Dpto Mejora Genética Animal, INIA, Madrid; Universidad Politécnica de Valencia, 20-24 September, 2010. Outline 1 Remind 2 Random Forest Introduction Classification
More informationEnsemble Learning. Another approach is to leverage the algorithms we have via ensemble methods
Ensemble Learning Ensemble Learning So far we have seen learning algorithms that take a training set and output a classifier What if we want more accuracy than current algorithms afford? Develop new learning
More informationFAST MARGIN-BASED COST-SENSITIVE CLASSIFICATION. Feng Nan, Joseph Wang, Kirill Trapeznikov, Venkatesh Saligrama. Boston University
2014 IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP) FAST MARGIN-BASED COST-SENSITIVE CLASSIFICATION Feng Nan, Joseph Wang, Kirill Trapeznikov, Venkatesh Saligrama Boston
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:
More informationPredictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA
Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,
More informationRECENT studies have shown that machine-learned
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 26, NO. 9, SEPTEMBER 2014 2281 Runtime Optimizations for Tree-based Machine Learning Models Nima Asadi, Jimmy Lin, and Arjen P. de Vries Abstract
More informationLecture 20: Bagging, Random Forests, Boosting
Lecture 20: Bagging, Random Forests, Boosting Reading: Chapter 8 STATS 202: Data mining and analysis November 13, 2017 1 / 17 Classification and Regression trees, in a nut shell Grow the tree by recursively
More informationA Unified Framework to Integrate Supervision and Metric Learning into Clustering
A Unified Framework to Integrate Supervision and Metric Learning into Clustering Xin Li and Dan Roth Department of Computer Science University of Illinois, Urbana, IL 61801 (xli1,danr)@uiuc.edu December
More informationWindow Extraction for Information Retrieval
Window Extraction for Information Retrieval Samuel Huston Center for Intelligent Information Retrieval University of Massachusetts Amherst Amherst, MA, 01002, USA sjh@cs.umass.edu W. Bruce Croft Center
More informationCSE 151 Machine Learning. Instructor: Kamalika Chaudhuri
CSE 151 Machine Learning Instructor: Kamalika Chaudhuri Announcements Midterm on Monday May 21 (decision trees, kernels, perceptron, and comparison to knns) Review session on Friday (enter time on Piazza)
More informationAllstate Insurance Claims Severity: A Machine Learning Approach
Allstate Insurance Claims Severity: A Machine Learning Approach Rajeeva Gaur SUNet ID: rajeevag Jeff Pickelman SUNet ID: pattern Hongyi Wang SUNet ID: hongyiw I. INTRODUCTION The insurance industry has
More informationMachine Learning Duncan Anderson Managing Director, Willis Towers Watson
Machine Learning Duncan Anderson Managing Director, Willis Towers Watson 21 March 2018 GIRO 2016, Dublin - Response to machine learning Don t panic! We re doomed! 2 This is not all new Actuaries adopt
More informationEntity and Knowledge Base-oriented Information Retrieval
Entity and Knowledge Base-oriented Information Retrieval Presenter: Liuqing Li liuqing@vt.edu Digital Library Research Laboratory Virginia Polytechnic Institute and State University Blacksburg, VA 24061
More informationINTRODUCTION TO MACHINE LEARNING. Measuring model performance or error
INTRODUCTION TO MACHINE LEARNING Measuring model performance or error Is our model any good? Context of task Accuracy Computation time Interpretability 3 types of tasks Classification Regression Clustering
More informationSlides for Data Mining by I. H. Witten and E. Frank
Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-
More informationLecture 06 Decision Trees I
Lecture 06 Decision Trees I 08 February 2016 Taylor B. Arnold Yale Statistics STAT 365/665 1/33 Problem Set #2 Posted Due February 19th Piazza site https://piazza.com/ 2/33 Last time we starting fitting
More informationDynamic Resource Allocation for Distributed Dataflows. Lauritz Thamsen Technische Universität Berlin
Dynamic Resource Allocation for Distributed Dataflows Lauritz Thamsen Technische Universität Berlin 04.05.2018 Distributed Dataflows E.g. MapReduce, SCOPE, Spark, and Flink Used for scalable processing
More information3 Ways to Improve Your Regression
3 Ways to Improve Your Regression Introduction This tutorial will take you through the steps demonstrated in the 3 Ways to Improve Your Regression webinar. First, you will be introduced to a dataset about
More informationBoosting Simple Model Selection Cross Validation Regularization
Boosting: (Linked from class website) Schapire 01 Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 8 th,
More informationAutomatic Initialization of the TLD Object Tracker: Milestone Update
Automatic Initialization of the TLD Object Tracker: Milestone Update Louis Buck May 08, 2012 1 Background TLD is a long-term, real-time tracker designed to be robust to partial and complete occlusions
More informationModel Inference and Averaging. Baging, Stacking, Random Forest, Boosting
Model Inference and Averaging Baging, Stacking, Random Forest, Boosting Bagging Bootstrap Aggregating Bootstrap Repeatedly select n data samples with replacement Each dataset b=1:b is slightly different
More informationFeature-Cost Sensitive Learning with Submodular Trees of Classifiers
Feature-Cost Sensitive Learning with Submodular Trees of Classifiers Matt J. Kusner, Wenlin Chen, Quan Zhou, Zhixiang (Eddie) Xu, Kilian Q. Weinberger, Yixin Chen Washington University in St. Louis, 1
More informationLearning Temporal-Dependent Ranking Models
Learning Temporal-Dependent Ranking Models Miguel Costa, Francisco Couto, Mário Silva LaSIGE @ Faculty of Sciences, University of Lisbon IST/INESC-ID, University of Lisbon 37th Annual ACM SIGIR Conference,
More informationPerceptrons and Backpropagation. Fabio Zachert Cognitive Modelling WiSe 2014/15
Perceptrons and Backpropagation Fabio Zachert Cognitive Modelling WiSe 2014/15 Content History Mathematical View of Perceptrons Network Structures Gradient Descent Backpropagation (Single-Layer-, Multilayer-Networks)
More informationTagProp: Discriminative Metric Learning in Nearest Neighbor Models for Image Annotation
TagProp: Discriminative Metric Learning in Nearest Neighbor Models for Image Annotation Matthieu Guillaumin, Thomas Mensink, Jakob Verbeek, Cordelia Schmid LEAR team, INRIA Rhône-Alpes, Grenoble, France
More informationPractical Guidance for Machine Learning Applications
Practical Guidance for Machine Learning Applications Brett Wujek About the authors Material from SGF Paper SAS2360-2016 Brett Wujek Senior Data Scientist, Advanced Analytics R&D ~20 years developing engineering
More informationAccelerated Machine Learning Algorithms in Python
Accelerated Machine Learning Algorithms in Python Patrick Reilly, Leiming Yu, David Kaeli reilly.pa@husky.neu.edu Northeastern University Computer Architecture Research Lab Outline Motivation and Goals
More informationRanking with Query-Dependent Loss for Web Search
Ranking with Query-Dependent Loss for Web Search Jiang Bian 1, Tie-Yan Liu 2, Tao Qin 2, Hongyuan Zha 1 Georgia Institute of Technology 1 Microsoft Research Asia 2 Outline Motivation Incorporating Query
More informationCS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp
CS 229 Final Project - Using machine learning to enhance a collaborative filtering recommendation system for Yelp Chris Guthrie Abstract In this paper I present my investigation of machine learning as
More informationXGBoost: A Scalable Tree Boosting System
XGBoost: A Scalable Tree Boosting System Tianqi Chen University of Washington tqchen@cs.washington.edu Carlos Guestrin University of Washington guestrin@cs.washington.edu ABSTRACT Tree boosting is a highly
More informationPersonalized Web Search
Personalized Web Search Dhanraj Mavilodan (dhanrajm@stanford.edu), Kapil Jaisinghani (kjaising@stanford.edu), Radhika Bansal (radhika3@stanford.edu) Abstract: With the increase in the diversity of contents
More informationAssignment 3 ITCS-6010/8010: Cloud Computing for Data Analysis
Assignment 3 ITCS-6010/8010: Cloud Computing for Data Analysis Due by 11:59:59pm on Tuesday, March 16, 2010 This assignment is based on a similar assignment developed at the University of Washington. Running
More informationInterpretable Machine Learning with Applications to Banking
Interpretable Machine Learning with Applications to Banking Linwei Hu Advanced Technologies for Modeling, Corporate Model Risk Wells Fargo October 26, 2018 2018 Wells Fargo Bank, N.A. All rights reserved.
More informationLearning Dense Models of Query Similarity from User Click Logs
Learning Dense Models of Query Similarity from User Click Logs Fabio De Bona, Stefan Riezler*, Keith Hall, Massi Ciaramita, Amac Herdagdelen, Maria Holmqvist Google Research, Zürich *Dept. of Computational
More informationSemi-supervised learning and active learning
Semi-supervised learning and active learning Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Combining classifiers Ensemble learning: a machine learning paradigm where multiple learners
More informationMulti-Task Learning for Boosting with Application to Web Search Ranking
Multi-Task Learning for Boosting with Application to Web Search Ranking Olivier Chapelle Yahoo! Labs Sunnyvale, CA chap@yahoo-inc.com Kilian Weinberger Washington University Saint Louis, MO kilian@wustl.edu
More informationAssignment No: 2. Assessment as per Schedule. Specifications Readability Assignments
Specifications Readability Assignments Assessment as per Schedule Oral Total 6 4 4 2 4 20 Date of Performance:... Expected Date of Completion:... Actual Date of Completion:... ----------------------------------------------------------------------------------------------------------------
More informationNonparametric Methods Recap
Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority
More informationA Comparative study of Clustering Algorithms using MapReduce in Hadoop
A Comparative study of Clustering Algorithms using MapReduce in Hadoop Dweepna Garg 1, Khushboo Trivedi 2, B.B.Panchal 3 1 Department of Computer Science and Engineering, Parul Institute of Engineering
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Lecture 10 - Classification trees Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey
More informationChapter 7: Numerical Prediction
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 7: Numerical Prediction Lecture: Prof. Dr.
More informationPARALLELIZED IMPLEMENTATION OF LOGISTIC REGRESSION USING MPI
PARALLELIZED IMPLEMENTATION OF LOGISTIC REGRESSION USING MPI CSE 633 PARALLEL ALGORITHMS BY PAVAN G JOSHI What is machine learning? Machine learning is a type of artificial intelligence (AI) that provides
More informationMore on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization
More on Learning Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization Neural Net Learning Motivated by studies of the brain. A network of artificial
More informationWeb Spam Challenge 2008
Web Spam Challenge 2008 Data Analysis School, Moscow, Russia K. Bauman, A. Brodskiy, S. Kacher, E. Kalimulina, R. Kovalev, M. Lebedev, D. Orlov, P. Sushin, P. Zryumov, D. Leshchiner, I. Muchnik The Data
More informationUniversity of Delaware at Diversity Task of Web Track 2010
University of Delaware at Diversity Task of Web Track 2010 Wei Zheng 1, Xuanhui Wang 2, and Hui Fang 1 1 Department of ECE, University of Delaware 2 Yahoo! Abstract We report our systems and experiments
More informationApplication of Additive Groves Ensemble with Multiple Counts Feature Evaluation to KDD Cup 09 Small Data Set
Application of Additive Groves Application of Additive Groves Ensemble with Multiple Counts Feature Evaluation to KDD Cup 09 Small Data Set Daria Sorokina Carnegie Mellon University Pittsburgh PA 15213
More informationTutorial on Machine Learning Tools
Tutorial on Machine Learning Tools Yanbing Xue Milos Hauskrecht Why do we need these tools? Widely deployed classical models No need to code from scratch Easy-to-use GUI Outline Matlab Apps Weka 3 UI TensorFlow
More informationRandom Walk Inference and Learning. Carnegie Mellon University 7/28/2011 EMNLP 2011, Edinburgh, Scotland, UK
Random Walk Inference and Learning in A Large Scale Knowledge Base Ni Lao, Tom Mitchell, William W. Cohen Carnegie Mellon University 2011.7.28 1 Outline Motivation Inference in Knowledge Bases The NELL
More informationParallel learning of content recommendations using map- reduce
Parallel learning of content recommendations using map- reduce Michael Percy Stanford University Abstract In this paper, machine learning within the map- reduce paradigm for ranking
More informationStatistical foundations of machine learning
Statistical foundations of machine learning INFO-F-422 Gianluca Bontempi Machine Learning Group Computer Science Department mlg.ulb.ac.be Some algorithms for nonlinear modeling Feedforward neural network
More informationObject recognition. Methods for classification and image representation
Object recognition Methods for classification and image representation Credits Slides by Pete Barnum Slides by FeiFei Li Paul Viola, Michael Jones, Robust Realtime Object Detection, IJCV 04 Navneet Dalal
More informationMachine Learning / Jan 27, 2010
Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,
More information