Logarithmic Time Prediction

Size: px
Start display at page:

Download "Logarithmic Time Prediction"

Transcription

1 Logarithmic Time Prediction John Langford Microsoft Research DIMACS Workshop on Big Data through the Lens of Sublinear Algorithms

2 The Multiclass Prediction Problem Repeatedly 1 See x 2 Predict ŷ {1,..., K} 3 See y

3 The Multiclass Prediction Problem Repeatedly 1 See x 2 Predict ŷ {1,..., K} 3 See y Goal: Find h(x) minimizing error rate: with h(x) fast. Pr (h(x) y) (x,y) D

4 Why?

5 Why?

6 Trick #1 K is small

7 Trick #2: A hierarchy exists

8 Trick #2: A hierarchy exists So use Trick #1 repeatedly.

9 Trick #3: Shared representation

10 Trick #3: Shared representation Very helpful... but computation in the last layer can still blow up.

11 Trick #4: Structured Prediction

12 Trick #4: Structured Prediction But what if the structure is unclear?

13 Trick #5: GPU

14 Trick #5: GPU 4 Teraflops is great... yet still burns energy.

15 How fast can we hope to go?

16 How fast can we hope to go? Theorem: There exists multiclass classification problems where achieving 0 error rate requires Ω(log K) time to train or test per example.

17 How fast can we hope to go? Theorem: There exists multiclass classification problems where achieving 0 error rate requires Ω(log K) time to train or test per example. Proof: By construction Pick y U(1,..., K)

18 How fast can we hope to go? Theorem: There exists multiclass classification problems where achieving 0 error rate requires Ω(log K) time to train or test per example. Proof: By construction Pick y U(1,..., K) Any prediction algorithm outputting less than log 2 K bits loses with constant probability. Any training algorithm reading an example requires Ω(log 2 K) time.

19 Can we predict in time O(log 2 K)? Benefit Computational Advantage of Log Time K / log(k) e+06 K

20 Not it #1: Sparse Error Correcting Output Codes 1 Create O(log K) binary vectors b iy of length K

21 Not it #1: Sparse Error Correcting Output Codes 1 Create O(log K) binary vectors b iy of length K 2 Train O(log K) binary classifiers h i to minimize error rate: Pr x,y (h i (x) b iy )

22 Not it #1: Sparse Error Correcting Output Codes 1 Create O(log K) binary vectors b iy of length K 2 Train O(log K) binary classifiers h i to minimize error rate: Pr x,y (h i (x) b iy ) 3 Predict by finding y with minimal error.

23 Not it #1: Sparse Error Correcting Output Codes 1 Create O(log K) binary vectors b iy of length K 2 Train O(log K) binary classifiers h i to minimize error rate: Pr x,y (h i (x) b iy ) 3 Predict by finding y with minimal error. Prediction is Ω(K)

24 Not it #2: Hierarchy Construction 1 Build confusion matrix of errors.

25 Not it #2: Hierarchy Construction 1 Build confusion matrix of errors. 2 Recursive partition to create hierarchy.

26 Not it #2: Hierarchy Construction 1 Build confusion matrix of errors. 2 Recursive partition to create hierarchy. 3 Apply hierarchy solution.

27 Not it #2: Hierarchy Construction 1 Build confusion matrix of errors. 2 Recursive partition to create hierarchy. 3 Apply hierarchy solution. Training is Ω(K) or worse.

28 Not it #3: Unnormalized learning Train K regressors by For each example (x, y) 1 Train regressor y with (x, 1).

29 Not it #3: Unnormalized learning Train K regressors by For each example (x, y) 1 Train regressor y with (x, 1). 2 Pick y y uniformly at random. 3 Train regressor y with (x, 1).

30 Not it #3: Unnormalized learning Train K regressors by For each example (x, y) 1 Train regressor y with (x, 1). 2 Pick y y uniformly at random. 3 Train regressor y with (x, 1). Prediction is still Ω(K).

31 Can we predict in time O(log 2 K)?

32 Is logarithmic time even possible? P(y=1) =.4 P(y=2) =.3 P(y=3) =.3 P({2, 3}) > P(1) lose for divide and conquer 1 1 v {2,3} 2 v 3 2 3

33 Filter Trees [BLR09] P(y=1) =.4 P(y=2) =.3 P(y=3) =.3 1 Learn 2v3 first 2 Throw away all error examples 3 Learn 1 v Survivors 1 1 v {2,3} 2 v Theorem: For all multiclass problems, for all binary classifiers, Multiclass Regret Average Binary Regret * log(k)

34 Can you make it robust? Winner

35 Can you make it robust? Winners

36 Can you make it robust? Winners

37 Can you make it robust? Winners Theorem: [BLR09] For all multiclass problems, for all binary classifiers, a log(k)-correcting tournament satisfies: Multiclass Regret Average Binary Regret * 5.5 Determined best paper prize for ICML2012 (area chair decisions).

38 How do you learn structure? Not all partitions are equally difficult. Compare {1, 7}v{3, 8} to {1, 8}v{3, 7} What is better?

39 How do you learn structure? Not all partitions are equally difficult. Compare {1, 7}v{3, 8} to {1, 8}v{3, 7} What is better? [BWG10]: Better to confuse near leaves than near root. Intuition: the root predictor tends to be overconstrained while the leafwards predictors are less constrained.

40 The Partitioning Problem [CL14] Given a set of n examples each with one of K labels, find a partitioner h that maximizes: E x,y Pr(h(x) = 1, y) Pr(h(x) = 1) Pr(y)

41 The Partitioning Problem [CL14] Given a set of n examples each with one of K labels, find a partitioner h that maximizes: E x Pr(y) Pr(h(x) = 1 x X y ) Pr(h(x) = 1) y where X y is the set of x associated with y.

42 The Partitioning Problem [CL14] Given a set of n examples each with one of K labels, find a partitioner h that maximizes: Nonconvex for any symmetric hypothesis class (ouch)

43 Bottom Up doesn t work Suppose you use linear representations.

44 Bottom Up doesn t work Suppose you use linear representations. Suppose you first build a 1v3 predictor.

45 Bottom Up doesn t work Suppose you use linear representations. Suppose you first build a 1v3 predictor. Suppose you then build a 2v{1v3} predictor. You lose.

46 Does partitioning recurse well? Theorem: If at every node n, E x,y Pr(h(x) = 1, y) Pr(h(x) = 1) Pr(y) > γ then after ( ) 4(1 γ) 2 ln k 1 γ 2 ɛ splits, the multiclass error is less than ɛ.

47 Online Partitioning Relax the optimization criteria: E x,y E x y [ŷ(x)] E x [ŷ(x)]... and approximate with running average

48 Online Partitioning Relax the optimization criteria: E x,y E x y [ŷ(x)] E x [ŷ(x)]... and approximate with running average Let e = 0 and for all y, e y = 0, n y = 0 For each example (x, y) 1 if e y < e then b = 1 else b = 1 2 Update w using (x, b) 3 n y n y e y (ny 1)ey n y + ŷ(x) n y 5 e (t 1)e t + ŷ(x) t Apply recursively to construct a tree structure.

49 Accuracy for a fixed training time accuracy isolet LOMtree vs one-against-all 105 sector 1000 aloi LOMtree OAA imagenet ODP number of classes

50 Test Error %, optimized, no train-time constraint Test Error % Performance of Log-time algorithms Rand Filter LOM Isolet Sector Aloi Imagenet ODP

51 Test Error %, optimized, no train-time constraint Test Error % Rand Filter LOM OAA Compared to OAA Isolet Sector Aloi Imagenet ODP

52 Classes vs Test time ratio 12 LOMtree vs one against all 10 log 2 (time ratio) log 2 (number of classes)

53 Can we predict in time O(log 2 K)?

54 Can we predict in time O(log 2 K)? What is the right way to achieve consistency and dynamic partition?

55 Can we predict in time O(log 2 K)? What is the right way to achieve consistency and dynamic partition? How can you balance representation complexity and sample complexity?

56 Bibliography Alina Beygelzimer, John Langford, Pradeep Ravikumar, Error-Correcting Tournaments, Samy Bengio, Jason Weston, David Grangier, Label embedding trees for large multi-class tasks, NIPS Anna Choromanska, John Langford, Logarithmic Time Online Multiclass prediction,

The Offset Tree for Learning with Partial Labels

The Offset Tree for Learning with Partial Labels The Offset Tree for Learning with Partial Labels Alina Beygelzimer IBM Research John Langford Yahoo! Research June 30, 2009 KDD 2009 1 A user with some hidden interests make a query on Yahoo. 2 Yahoo chooses

More information

Logarithmic Time Online Multiclass prediction

Logarithmic Time Online Multiclass prediction Logarithmic Time Online Multiclass prediction Anna Choromanska Courant Institute of Mathematical Sciences New York, NY, USA achoroma@cims.nyu.edu John Langford Microsoft Research New York, NY, USA jcl@microsoft.com

More information

Data Mining: Classifier Evaluation. CSCI-B490 Seminar in Computer Science (Data Mining)

Data Mining: Classifier Evaluation. CSCI-B490 Seminar in Computer Science (Data Mining) Data Mining: Classifier Evaluation CSCI-B490 Seminar in Computer Science (Data Mining) Predictor Evaluation 1. Question: how good is our algorithm? how will we estimate its performance? 2. Question: what

More information

Introduction to Machine Learning Lecture 4. Mehryar Mohri Courant Institute and Google Research

Introduction to Machine Learning Lecture 4. Mehryar Mohri Courant Institute and Google Research Introduction to Machine Learning Lecture 4 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Nearest-Neighbor Algorithms Nearest Neighbor Algorithms Definition: fix k 1, given a labeled

More information

Generalizing Binary Classiers to the Multiclass Case

Generalizing Binary Classiers to the Multiclass Case Generalizing Binary Classiers to the Multiclass Case Dain, Eliyahu eliyahud@post.tau.ac.il Zach, Rotem rotemzac@post.tau.ac.il March 3, 2013 Abstract The course described several ways to solve binary classication

More information

Multi-label Classification. Jingzhou Liu Dec

Multi-label Classification. Jingzhou Liu Dec Multi-label Classification Jingzhou Liu Dec. 6 2016 Introduction Multi-class problem, Training data (x $, y $ ) ( ), x $ X R., y $ Y = 1,2,, L Learn a mapping f: X Y Each instance x $ is associated with

More information

I211: Information infrastructure II

I211: Information infrastructure II Data Mining: Classifier Evaluation I211: Information infrastructure II 3-nearest neighbor labeled data find class labels for the 4 data points 1 0 0 6 0 0 0 5 17 1.7 1 1 4 1 7.1 1 1 1 0.4 1 2 1 3.0 0 0.1

More information

Label Embedding Trees for Large Multi-Class Tasks

Label Embedding Trees for Large Multi-Class Tasks Label Embedding Trees for Large Multi-Class Tasks Samy Bengio (1) Jason Weston (1) David Grangier (2) (1) Google Research, New York, NY {bengio, jweston}@google.com (2) NEC Labs America, Princeton, NJ

More information

Candidates vs. Noises Estimation for Large Multi-Class Classification Problem

Candidates vs. Noises Estimation for Large Multi-Class Classification Problem Lei Han 1 Yiheng Huang 1 Tong Zhang 1 Abstract This paper proposes a method for multi-class classification problems, where the number of classes K is large. The method, referred to as Candidates vs. Noises

More information

The Offset Tree for Learning with Partial Labels

The Offset Tree for Learning with Partial Labels The Offset Tree for Learning with Partial Labels Alina Beygelzimer IBM Research beygel@us.ibm.com John Langford Yahoo Research jl@yahoo-inc.com Tong Zhang Rutgers Statistics Department tongz@rci.rutgers.edu

More information

Introduction to Randomized Algorithms

Introduction to Randomized Algorithms Introduction to Randomized Algorithms Gopinath Mishra Advanced Computing and Microelectronics Unit Indian Statistical Institute Kolkata 700108, India. Organization 1 Introduction 2 Some basic ideas from

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees David S. Rosenberg New York University April 3, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 April 3, 2018 1 / 51 Contents 1 Trees 2 Regression

More information

Decision Trees Oct

Decision Trees Oct Decision Trees Oct - 7-2009 Previously We learned two different classifiers Perceptron: LTU KNN: complex decision boundary If you are a novice in this field, given a classification application, are these

More information

Conditional Probability Tree Estimation Analysis and Algorithms

Conditional Probability Tree Estimation Analysis and Algorithms UAI 2009 BEYGELZIMER ET AL. 51 Conditional Probability Tree Estimation Analysis and Algorithms Alina Beygelzimer IBM Research beygel@us.ibm.com John Langford Yahoo! Research jl@yahoo-inc.com Yuri Lifshits

More information

1 Document Classification [60 points]

1 Document Classification [60 points] CIS519: Applied Machine Learning Spring 2018 Homework 4 Handed Out: April 3 rd, 2018 Due: April 14 th, 2018, 11:59 PM 1 Document Classification [60 points] In this problem, you will implement several text

More information

DeepWalk: Online Learning of Social Representations

DeepWalk: Online Learning of Social Representations DeepWalk: Online Learning of Social Representations ACM SIG-KDD August 26, 2014, Rami Al-Rfou, Steven Skiena Stony Brook University Outline Introduction: Graphs as Features Language Modeling DeepWalk Evaluation:

More information

Local Constraints in Combinatorial Optimization

Local Constraints in Combinatorial Optimization Local Constraints in Combinatorial Optimization Madhur Tulsiani Institute for Advanced Study Local Constraints in Approximation Algorithms Linear Programming (LP) or Semidefinite Programming (SDP) based

More information

6 Randomized rounding of semidefinite programs

6 Randomized rounding of semidefinite programs 6 Randomized rounding of semidefinite programs We now turn to a new tool which gives substantially improved performance guarantees for some problems We now show how nonlinear programming relaxations can

More information

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017 CPSC 340: Machine Learning and Data Mining Probabilistic Classification Fall 2017 Admin Assignment 0 is due tonight: you should be almost done. 1 late day to hand it in Monday, 2 late days for Wednesday.

More information

Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm

Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm Sham M Kakade c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 17 Review 1 / 17 Decision Tree: Making a

More information

CS229 Lecture notes. Raphael John Lamarre Townshend

CS229 Lecture notes. Raphael John Lamarre Townshend CS229 Lecture notes Raphael John Lamarre Townshend Decision Trees We now turn our attention to decision trees, a simple yet flexible class of algorithms. We will first consider the non-linear, region-based

More information

arxiv: v1 [cs.lg] 5 May 2015

arxiv: v1 [cs.lg] 5 May 2015 Reinforced Decision Trees Reinforced Decision Trees arxiv:1505.00908v1 [cs.lg] 5 May 2015 Aurélia Léon aurelia.leon@lip6.fr Sorbonne Universités, UPMC Univ Paris 06, UMR 7606, LIP6, F-75005, Paris, France

More information

15-451/651: Design & Analysis of Algorithms November 4, 2015 Lecture #18 last changed: November 22, 2015

15-451/651: Design & Analysis of Algorithms November 4, 2015 Lecture #18 last changed: November 22, 2015 15-451/651: Design & Analysis of Algorithms November 4, 2015 Lecture #18 last changed: November 22, 2015 While we have good algorithms for many optimization problems, the previous lecture showed that many

More information

We can use a max-heap to sort data.

We can use a max-heap to sort data. Sorting 7B N log N Sorts 1 Heap Sort We can use a max-heap to sort data. Convert an array to a max-heap. Remove the root from the heap and store it in its proper position in the same array. Repeat until

More information

Object Classification Problem

Object Classification Problem HIERARCHICAL OBJECT CATEGORIZATION" Gregory Griffin and Pietro Perona. Learning and Using Taxonomies For Fast Visual Categorization. CVPR 2008 Marcin Marszalek and Cordelia Schmid. Constructing Category

More information

Testing Continuous Distributions. Artur Czumaj. DIMAP (Centre for Discrete Maths and it Applications) & Department of Computer Science

Testing Continuous Distributions. Artur Czumaj. DIMAP (Centre for Discrete Maths and it Applications) & Department of Computer Science Testing Continuous Distributions Artur Czumaj DIMAP (Centre for Discrete Maths and it Applications) & Department of Computer Science University of Warwick Joint work with A. Adamaszek & C. Sohler Testing

More information

Hierarchical Clustering 4/5/17

Hierarchical Clustering 4/5/17 Hierarchical Clustering 4/5/17 Hypothesis Space Continuous inputs Output is a binary tree with data points as leaves. Useful for explaining the training data. Not useful for making new predictions. Direction

More information

We assume uniform hashing (UH):

We assume uniform hashing (UH): We assume uniform hashing (UH): the probe sequence of each key is equally likely to be any of the! permutations of 0,1,, 1 UH generalizes the notion of SUH that produces not just a single number, but a

More information

CS369G: Algorithmic Techniques for Big Data Spring

CS369G: Algorithmic Techniques for Big Data Spring CS369G: Algorithmic Techniques for Big Data Spring 2015-2016 Lecture 11: l 0 -Sampling and Introduction to Graph Streaming Prof. Moses Charikar Scribe: Austin Benson 1 Overview We present and analyze the

More information

7. Decision or classification trees

7. Decision or classification trees 7. Decision or classification trees Next we are going to consider a rather different approach from those presented so far to machine learning that use one of the most common and important data structure,

More information

Ensemble Methods: Bagging

Ensemble Methods: Bagging Ensemble Methods: Bagging Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Eric Eaton (UPenn), Jenna Wiens (UMich), Tommi Jaakola (MIT), David Kauchak (Pomona), David Sontag

More information

Kernel Methods & Support Vector Machines

Kernel Methods & Support Vector Machines & Support Vector Machines & Support Vector Machines Arvind Visvanathan CSCE 970 Pattern Recognition 1 & Support Vector Machines Question? Draw a single line to separate two classes? 2 & Support Vector

More information

1 Matchings in Graphs

1 Matchings in Graphs Matchings in Graphs J J 2 J 3 J 4 J 5 J J J 6 8 7 C C 2 C 3 C 4 C 5 C C 7 C 8 6 J J 2 J 3 J 4 J 5 J J J 6 8 7 C C 2 C 3 C 4 C 5 C C 7 C 8 6 Definition Two edges are called independent if they are not adjacent

More information

Metric Learning for Large-Scale Image Classification:

Metric Learning for Large-Scale Image Classification: Metric Learning for Large-Scale Image Classification: Generalizing to New Classes at Near-Zero Cost Florent Perronnin 1 work published at ECCV 2012 with: Thomas Mensink 1,2 Jakob Verbeek 2 Gabriela Csurka

More information

Data Mining Part 5. Prediction

Data Mining Part 5. Prediction Data Mining Part 5. Prediction 5.4. Spring 2010 Instructor: Dr. Masoud Yaghini Outline Using IF-THEN Rules for Classification Rule Extraction from a Decision Tree 1R Algorithm Sequential Covering Algorithms

More information

The Limits of Sorting Divide-and-Conquer Comparison Sorts II

The Limits of Sorting Divide-and-Conquer Comparison Sorts II The Limits of Sorting Divide-and-Conquer Comparison Sorts II CS 311 Data Structures and Algorithms Lecture Slides Monday, October 12, 2009 Glenn G. Chappell Department of Computer Science University of

More information

Lecture and notes by: Nate Chenette, Brent Myers, Hari Prasad November 8, Property Testing

Lecture and notes by: Nate Chenette, Brent Myers, Hari Prasad November 8, Property Testing Property Testing 1 Introduction Broadly, property testing is the study of the following class of problems: Given the ability to perform (local) queries concerning a particular object (e.g., a function,

More information

Nearest neighbor classification DSE 220

Nearest neighbor classification DSE 220 Nearest neighbor classification DSE 220 Decision Trees Target variable Label Dependent variable Output space Person ID Age Gender Income Balance Mortgag e payment 123213 32 F 25000 32000 Y 17824 49 M 12000-3000

More information

Mergesort again. 1. Split the list into two equal parts

Mergesort again. 1. Split the list into two equal parts Quicksort Mergesort again 1. Split the list into two equal parts 5 3 9 2 8 7 3 2 1 4 5 3 9 2 8 7 3 2 1 4 Mergesort again 2. Recursively mergesort the two parts 5 3 9 2 8 7 3 2 1 4 2 3 5 8 9 1 2 3 4 7 Mergesort

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

How many leaves on the decision tree? There are n! leaves, because every permutation appears at least once.

How many leaves on the decision tree? There are n! leaves, because every permutation appears at least once. Chapter 8. Sorting in Linear Time Types of Sort Algorithms The only operation that may be used to gain order information about a sequence is comparison of pairs of elements. Quick Sort -- comparison-based

More information

06: Logistic Regression

06: Logistic Regression 06_Logistic_Regression 06: Logistic Regression Previous Next Index Classification Where y is a discrete value Develop the logistic regression algorithm to determine what class a new input should fall into

More information

Today s Outline. Motivation. Disjoint Sets. Disjoint Sets and Dynamic Equivalence Relations. Announcements. Today s Topics:

Today s Outline. Motivation. Disjoint Sets. Disjoint Sets and Dynamic Equivalence Relations. Announcements. Today s Topics: Today s Outline Disjoint Sets and Dynamic Equivalence Relations Announcements Assignment # due Thurs 0/ at pm Today s Topics: Disjoint Sets & Dynamic Equivalence CSE Data Structures and Algorithms 0//0

More information

!!! Warning!!! Learning jargon is always painful even if the concepts behind the jargon are not hard. So, let s get used to it. In mathematics you don't understand things. You just get used to them. von

More information

Lecture 7: Asymmetric K-Center

Lecture 7: Asymmetric K-Center Advanced Approximation Algorithms (CMU 18-854B, Spring 008) Lecture 7: Asymmetric K-Center February 5, 007 Lecturer: Anupam Gupta Scribe: Jeremiah Blocki In this lecture, we will consider the K-center

More information

Balanced Search Trees

Balanced Search Trees Balanced Search Trees Computer Science E-22 Harvard Extension School David G. Sullivan, Ph.D. Review: Balanced Trees A tree is balanced if, for each node, the node s subtrees have the same height or have

More information

1 Overview. 2 Applications of submodular maximization. AM 221: Advanced Optimization Spring 2016

1 Overview. 2 Applications of submodular maximization. AM 221: Advanced Optimization Spring 2016 AM : Advanced Optimization Spring 06 Prof. Yaron Singer Lecture 0 April th Overview Last time we saw the problem of Combinatorial Auctions and framed it as a submodular maximization problem under a partition

More information

CSE373: Data Structures and Algorithms Lecture 4: Asymptotic Analysis. Aaron Bauer Winter 2014

CSE373: Data Structures and Algorithms Lecture 4: Asymptotic Analysis. Aaron Bauer Winter 2014 CSE373: Data Structures and Algorithms Lecture 4: Asymptotic Analysis Aaron Bauer Winter 2014 Previously, on CSE 373 We want to analyze algorithms for efficiency (in time and space) And do so generally

More information

Evaluation Metrics. (Classifiers) CS229 Section Anand Avati

Evaluation Metrics. (Classifiers) CS229 Section Anand Avati Evaluation Metrics (Classifiers) CS Section Anand Avati Topics Why? Binary classifiers Metrics Rank view Thresholding Confusion Matrix Point metrics: Accuracy, Precision, Recall / Sensitivity, Specificity,

More information

Coverage Approximation Algorithms

Coverage Approximation Algorithms DATA MINING LECTURE 12 Coverage Approximation Algorithms Example Promotion campaign on a social network We have a social network as a graph. People are more likely to buy a product if they have a friend

More information

Motivation for B-Trees

Motivation for B-Trees 1 Motivation for Assume that we use an AVL tree to store about 20 million records We end up with a very deep binary tree with lots of different disk accesses; log2 20,000,000 is about 24, so this takes

More information

CSC 411 Lecture 4: Ensembles I

CSC 411 Lecture 4: Ensembles I CSC 411 Lecture 4: Ensembles I Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 04-Ensembles I 1 / 22 Overview We ve seen two particular classification algorithms:

More information

Optimisation While Streaming

Optimisation While Streaming Optimisation While Streaming Amit Chakrabarti Dartmouth College Joint work with S. Kale, A. Wirth DIMACS Workshop on Big Data Through the Lens of Sublinear Algorithms, Aug 2015 Combinatorial Optimisation

More information

CSE 546 Machine Learning, Autumn 2013 Homework 2

CSE 546 Machine Learning, Autumn 2013 Homework 2 CSE 546 Machine Learning, Autumn 2013 Homework 2 Due: Monday, October 28, beginning of class 1 Boosting [30 Points] We learned about boosting in lecture and the topic is covered in Murphy 16.4. On page

More information

4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd

4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd 4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd Data Compression Q. Given a text that uses 32 symbols (26 different letters, space, and some punctuation characters), how can we

More information

Integral Geometry and the Polynomial Hirsch Conjecture

Integral Geometry and the Polynomial Hirsch Conjecture Integral Geometry and the Polynomial Hirsch Conjecture Jonathan Kelner, MIT Partially based on joint work with Daniel Spielman Introduction n A lot of recent work on Polynomial Hirsch Conjecture has focused

More information

improving raytracing speed

improving raytracing speed ray tracing II computer graphics ray tracing II 2006 fabio pellacini 1 improving raytracing speed computer graphics ray tracing II 2006 fabio pellacini 2 raytracing computational complexity ray-scene intersection

More information

princeton univ. F 15 cos 521: Advanced Algorithm Design Lecture 2: Karger s Min Cut Algorithm

princeton univ. F 15 cos 521: Advanced Algorithm Design Lecture 2: Karger s Min Cut Algorithm princeton univ. F 5 cos 5: Advanced Algorithm Design Lecture : Karger s Min Cut Algorithm Lecturer: Pravesh Kothari Scribe:Pravesh (These notes are a slightly modified version of notes from previous offerings

More information

FINAL EXAM SOLUTIONS

FINAL EXAM SOLUTIONS COMP/MATH 3804 Design and Analysis of Algorithms I Fall 2015 FINAL EXAM SOLUTIONS Question 1 (12%). Modify Euclid s algorithm as follows. function Newclid(a,b) if a

More information

Large Margin Classification Using the Perceptron Algorithm

Large Margin Classification Using the Perceptron Algorithm Large Margin Classification Using the Perceptron Algorithm Yoav Freund Robert E. Schapire Presented by Amit Bose March 23, 2006 Goals of the Paper Enhance Rosenblatt s Perceptron algorithm so that it can

More information

Exact Algorithms Lecture 7: FPT Hardness and the ETH

Exact Algorithms Lecture 7: FPT Hardness and the ETH Exact Algorithms Lecture 7: FPT Hardness and the ETH February 12, 2016 Lecturer: Michael Lampis 1 Reminder: FPT algorithms Definition 1. A parameterized problem is a function from (χ, k) {0, 1} N to {0,

More information

Data Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification

Data Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification Data Mining 3.3 Fall 2008 Instructor: Dr. Masoud Yaghini Outline Using IF-THEN Rules for Classification Rules With Exceptions Rule Extraction from a Decision Tree 1R Algorithm Sequential Covering Algorithms

More information

Pattern Recognition for Neuroimaging Data

Pattern Recognition for Neuroimaging Data Pattern Recognition for Neuroimaging Data Edinburgh, SPM course April 2013 C. Phillips, Cyclotron Research Centre, ULg, Belgium http://www.cyclotron.ulg.ac.be Overview Introduction Univariate & multivariate

More information

John Oliver from The Daily Show. Supporting worthy causes at the G20 Pittsburgh Summit: Bayesians Against Discrimination. Ban Genetic Algorithms

John Oliver from The Daily Show. Supporting worthy causes at the G20 Pittsburgh Summit: Bayesians Against Discrimination. Ban Genetic Algorithms John Oliver from The Daily Show Supporting worthy causes at the G20 Pittsburgh Summit: Bayesians Against Discrimination Ban Genetic Algorithms Support Vector Machines Watch out for the protests tonight

More information

Computerlinguistische Anwendungen Support Vector Machines

Computerlinguistische Anwendungen Support Vector Machines with Scikitlearn Computerlinguistische Anwendungen Support Vector Machines Thang Vu CIS, LMU thangvu@cis.uni-muenchen.de May 20, 2015 1 Introduction Shared Task 1 with Scikitlearn Today we will learn about

More information

II (Sorting and) Order Statistics

II (Sorting and) Order Statistics II (Sorting and) Order Statistics Heapsort Quicksort Sorting in Linear Time Medians and Order Statistics 8 Sorting in Linear Time The sorting algorithms introduced thus far are comparison sorts Any comparison

More information

CMPSCI 250: Introduction to Computation. Lecture #22: Graphs, Paths, and Trees David Mix Barrington 12 March 2014

CMPSCI 250: Introduction to Computation. Lecture #22: Graphs, Paths, and Trees David Mix Barrington 12 March 2014 CMPSCI 250: Introduction to Computation Lecture #22: Graphs, Paths, and Trees David Mix Barrington 12 March 2014 Graphs, Paths, and Trees Graph Definitions Paths and the Path Predicate Cycles, Directed

More information

Business Club. Decision Trees

Business Club. Decision Trees Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building

More information

SUBDIVISIONS OF TRANSITIVE TOURNAMENTS A.D. SCOTT

SUBDIVISIONS OF TRANSITIVE TOURNAMENTS A.D. SCOTT SUBDIVISIONS OF TRANSITIVE TOURNAMENTS A.D. SCOTT Abstract. We prove that, for r 2 and n n(r), every directed graph with n vertices and more edges than the r-partite Turán graph T (r, n) contains a subdivision

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Support Vector Machines Aurélie Herbelot 2018 Centre for Mind/Brain Sciences University of Trento 1 Support Vector Machines: introduction 2 Support Vector Machines (SVMs) SVMs

More information

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics

More information

DATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines

DATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines DATA MINING LECTURE 10B Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines NEAREST NEIGHBOR CLASSIFICATION 10 10 Illustrating Classification Task Tid Attrib1

More information

6.842 Randomness and Computation September 25-27, Lecture 6 & 7. Definition 1 Interactive Proof Systems (IPS) [Goldwasser, Micali, Rackoff]

6.842 Randomness and Computation September 25-27, Lecture 6 & 7. Definition 1 Interactive Proof Systems (IPS) [Goldwasser, Micali, Rackoff] 6.84 Randomness and Computation September 5-7, 017 Lecture 6 & 7 Lecturer: Ronitt Rubinfeld Scribe: Leo de Castro & Kritkorn Karntikoon 1 Interactive Proof Systems An interactive proof system is a protocol

More information

Advanced Algorithm Design and Analysis (Lecture 12) SW5 fall 2005 Simonas Šaltenis E1-215b

Advanced Algorithm Design and Analysis (Lecture 12) SW5 fall 2005 Simonas Šaltenis E1-215b Advanced Algorithm Design and Analysis (Lecture 12) SW5 fall 2005 Simonas Šaltenis E1-215b simas@cs.aau.dk Range Searching in 2D Main goals of the lecture: to understand and to be able to analyze the kd-trees

More information

10601 Machine Learning. Model and feature selection

10601 Machine Learning. Model and feature selection 10601 Machine Learning Model and feature selection Model selection issues We have seen some of this before Selecting features (or basis functions) Logistic regression SVMs Selecting parameter value Prior

More information

Lecture 6-Decision Tree & MDL

Lecture 6-Decision Tree & MDL 6-Decision Tree & MDL-1 Machine Learning Lecture 6-Decision Tree & MDL Lecturer: Haim Permuter Scribes: Asaf Lavi and Ben Marinberg This lecture discusses decision trees and the minimum description length

More information

Dynamic Indexability and Lower Bounds for Dynamic One-Dimensional Range Query Indexes

Dynamic Indexability and Lower Bounds for Dynamic One-Dimensional Range Query Indexes Dynamic Indexability and Lower Bounds for Dynamic One-Dimensional Range Query Indexes Ke Yi HKUST 1-1 First Annual SIGMOD Programming Contest (to be held at SIGMOD 2009) Student teams from degree granting

More information

On the Agenda Control Problem for Knockout Tournaments

On the Agenda Control Problem for Knockout Tournaments On the Agenda Control Problem for Knockout Tournaments Thuc Vu, Alon Altman, Yoav Shoham Abstract Knockout tournaments are very common in practice for various settings such as sport events and sequential

More information

Decision Tree CE-717 : Machine Learning Sharif University of Technology

Decision Tree CE-717 : Machine Learning Sharif University of Technology Decision Tree CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adapted from: Prof. Tom Mitchell Decision tree Approximating functions of usually discrete

More information

More on Neural Networks. Read Chapter 5 in the text by Bishop, except omit Sections 5.3.3, 5.3.4, 5.4, 5.5.4, 5.5.5, 5.5.6, 5.5.7, and 5.

More on Neural Networks. Read Chapter 5 in the text by Bishop, except omit Sections 5.3.3, 5.3.4, 5.4, 5.5.4, 5.5.5, 5.5.6, 5.5.7, and 5. More on Neural Networks Read Chapter 5 in the text by Bishop, except omit Sections 5.3.3, 5.3.4, 5.4, 5.5.4, 5.5.5, 5.5.6, 5.5.7, and 5.6 Recall the MLP Training Example From Last Lecture log likelihood

More information

Patterns for! Parallel Programming II!

Patterns for! Parallel Programming II! Lecture 4! Patterns for! Parallel Programming II! John Cavazos! Dept of Computer & Information Sciences! University of Delaware! www.cis.udel.edu/~cavazos/cisc879! Task Decomposition Also known as functional

More information

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19

Treaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19 CSE34T/CSE549T /05/04 Lecture 9 Treaps Binary Search Trees (BSTs) Search trees are tree-based data structures that can be used to store and search for items that satisfy a total order. There are many types

More information

Spatial Data Structures and Speed-Up Techniques. Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology

Spatial Data Structures and Speed-Up Techniques. Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology Spatial Data Structures and Speed-Up Techniques Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology Spatial data structures What is it? Data structure that organizes

More information

Credit card Fraud Detection using Predictive Modeling: a Review

Credit card Fraud Detection using Predictive Modeling: a Review February 207 IJIRT Volume 3 Issue 9 ISSN: 2396002 Credit card Fraud Detection using Predictive Modeling: a Review Varre.Perantalu, K. BhargavKiran 2 PG Scholar, CSE, Vishnu Institute of Technology, Bhimavaram,

More information

6 Distributed data management I Hashing

6 Distributed data management I Hashing 6 Distributed data management I Hashing There are two major approaches for the management of data in distributed systems: hashing and caching. The hashing approach tries to minimize the use of communication

More information

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Kernels + K-Means Matt Gormley Lecture 29 April 25, 2018 1 Reminders Homework 8:

More information

Lecture 9: Support Vector Machines

Lecture 9: Support Vector Machines Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and

More information

Parallel Algorithms for Geometric Graph Problems Grigory Yaroslavtsev

Parallel Algorithms for Geometric Graph Problems Grigory Yaroslavtsev Parallel Algorithms for Geometric Graph Problems Grigory Yaroslavtsev http://grigory.us Appeared in STOC 2014, joint work with Alexandr Andoni, Krzysztof Onak and Aleksandar Nikolov. The Big Data Theory

More information

Boosting Simple Model Selection Cross Validation Regularization

Boosting Simple Model Selection Cross Validation Regularization Boosting: (Linked from class website) Schapire 01 Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 8 th,

More information

S1) It's another form of peak finder problem that we discussed in class, We exploit the idea used in binary search.

S1) It's another form of peak finder problem that we discussed in class, We exploit the idea used in binary search. Q1) Given an array A which stores 0 and 1, such that each entry containing 0 appears before all those entries containing 1. In other words, it is like {0, 0, 0,..., 0, 0, 1, 1,..., 111}. Design an algorithm

More information

CSE373: Data Structure & Algorithms Lecture 18: Comparison Sorting. Dan Grossman Fall 2013

CSE373: Data Structure & Algorithms Lecture 18: Comparison Sorting. Dan Grossman Fall 2013 CSE373: Data Structure & Algorithms Lecture 18: Comparison Sorting Dan Grossman Fall 2013 Introduction to Sorting Stacks, queues, priority queues, and dictionaries all focused on providing one element

More information

1 Minimum Cut Problem

1 Minimum Cut Problem CS 6 Lecture 6 Min Cut and Karger s Algorithm Scribes: Peng Hui How, Virginia Williams (05) Date: November 7, 07 Anthony Kim (06), Mary Wootters (07) Adapted from Virginia Williams lecture notes Minimum

More information

Classification. Instructor: Wei Ding

Classification. Instructor: Wei Ding Classification Decision Tree Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Preliminaries Each data record is characterized by a tuple (x, y), where x is the attribute

More information

Metric Techniques and Approximation Algorithms. Anupam Gupta Carnegie Mellon University

Metric Techniques and Approximation Algorithms. Anupam Gupta Carnegie Mellon University Metric Techniques and Approximation Algorithms Anupam Gupta Carnegie Mellon University Metric space M = (V, d) set Vof points y z distances d(x,y) triangle inequality d(x,y) d(x,z) + d(z,y) x why metric

More information

Advances in Structured Prediction

Advances in Structured Prediction Advances in Structured Prediction John Langford Microsoft Research jl@hunch.net Hal Daumé III U Maryland me@hal3.name Slides and more: http://hunch.net/~l2s Examples of structured prediction joint The

More information

Globally Induced Forest: A Prepruning Compression Scheme

Globally Induced Forest: A Prepruning Compression Scheme Globally Induced Forest: A Prepruning Compression Scheme Jean-Michel Begon, Arnaud Joly, Pierre Geurts Systems and Modeling, Dept. of EE and CS, University of Liege, Belgium ICML 2017 Goal and motivation

More information

Lecture 7: Decision Trees

Lecture 7: Decision Trees Lecture 7: Decision Trees Instructor: Outline 1 Geometric Perspective of Classification 2 Decision Trees Geometric Perspective of Classification Perspective of Classification Algorithmic Geometric Probabilistic...

More information

Computational Geometry

Computational Geometry Windowing queries Windowing Windowing queries Zoom in; re-center and zoom in; select by outlining Windowing Windowing queries Windowing Windowing queries Given a set of n axis-parallel line segments, preprocess

More information

Classification in a large number of categories

Classification in a large number of categories Classification in a large number of categories T. Artières, Joint work with the MLIA team at LIP6, the AMA team at LIG and Demokritos Lab. d informatique de Paris 6, France Lab. d Informatique de Grenoble

More information

Abusing a hypergraph partitioner for unweighted graph partitioning

Abusing a hypergraph partitioner for unweighted graph partitioning Abusing a hypergraph partitioner for unweighted graph partitioning B. O. Fagginger Auer R. H. Bisseling Utrecht University February 13, 2012 Fagginger Auer, Bisseling (UU) Mondriaan Graph Partitioning

More information