Logarithmic Time Prediction
|
|
- Johnathan Thompson
- 5 years ago
- Views:
Transcription
1 Logarithmic Time Prediction John Langford Microsoft Research DIMACS Workshop on Big Data through the Lens of Sublinear Algorithms
2 The Multiclass Prediction Problem Repeatedly 1 See x 2 Predict ŷ {1,..., K} 3 See y
3 The Multiclass Prediction Problem Repeatedly 1 See x 2 Predict ŷ {1,..., K} 3 See y Goal: Find h(x) minimizing error rate: with h(x) fast. Pr (h(x) y) (x,y) D
4 Why?
5 Why?
6 Trick #1 K is small
7 Trick #2: A hierarchy exists
8 Trick #2: A hierarchy exists So use Trick #1 repeatedly.
9 Trick #3: Shared representation
10 Trick #3: Shared representation Very helpful... but computation in the last layer can still blow up.
11 Trick #4: Structured Prediction
12 Trick #4: Structured Prediction But what if the structure is unclear?
13 Trick #5: GPU
14 Trick #5: GPU 4 Teraflops is great... yet still burns energy.
15 How fast can we hope to go?
16 How fast can we hope to go? Theorem: There exists multiclass classification problems where achieving 0 error rate requires Ω(log K) time to train or test per example.
17 How fast can we hope to go? Theorem: There exists multiclass classification problems where achieving 0 error rate requires Ω(log K) time to train or test per example. Proof: By construction Pick y U(1,..., K)
18 How fast can we hope to go? Theorem: There exists multiclass classification problems where achieving 0 error rate requires Ω(log K) time to train or test per example. Proof: By construction Pick y U(1,..., K) Any prediction algorithm outputting less than log 2 K bits loses with constant probability. Any training algorithm reading an example requires Ω(log 2 K) time.
19 Can we predict in time O(log 2 K)? Benefit Computational Advantage of Log Time K / log(k) e+06 K
20 Not it #1: Sparse Error Correcting Output Codes 1 Create O(log K) binary vectors b iy of length K
21 Not it #1: Sparse Error Correcting Output Codes 1 Create O(log K) binary vectors b iy of length K 2 Train O(log K) binary classifiers h i to minimize error rate: Pr x,y (h i (x) b iy )
22 Not it #1: Sparse Error Correcting Output Codes 1 Create O(log K) binary vectors b iy of length K 2 Train O(log K) binary classifiers h i to minimize error rate: Pr x,y (h i (x) b iy ) 3 Predict by finding y with minimal error.
23 Not it #1: Sparse Error Correcting Output Codes 1 Create O(log K) binary vectors b iy of length K 2 Train O(log K) binary classifiers h i to minimize error rate: Pr x,y (h i (x) b iy ) 3 Predict by finding y with minimal error. Prediction is Ω(K)
24 Not it #2: Hierarchy Construction 1 Build confusion matrix of errors.
25 Not it #2: Hierarchy Construction 1 Build confusion matrix of errors. 2 Recursive partition to create hierarchy.
26 Not it #2: Hierarchy Construction 1 Build confusion matrix of errors. 2 Recursive partition to create hierarchy. 3 Apply hierarchy solution.
27 Not it #2: Hierarchy Construction 1 Build confusion matrix of errors. 2 Recursive partition to create hierarchy. 3 Apply hierarchy solution. Training is Ω(K) or worse.
28 Not it #3: Unnormalized learning Train K regressors by For each example (x, y) 1 Train regressor y with (x, 1).
29 Not it #3: Unnormalized learning Train K regressors by For each example (x, y) 1 Train regressor y with (x, 1). 2 Pick y y uniformly at random. 3 Train regressor y with (x, 1).
30 Not it #3: Unnormalized learning Train K regressors by For each example (x, y) 1 Train regressor y with (x, 1). 2 Pick y y uniformly at random. 3 Train regressor y with (x, 1). Prediction is still Ω(K).
31 Can we predict in time O(log 2 K)?
32 Is logarithmic time even possible? P(y=1) =.4 P(y=2) =.3 P(y=3) =.3 P({2, 3}) > P(1) lose for divide and conquer 1 1 v {2,3} 2 v 3 2 3
33 Filter Trees [BLR09] P(y=1) =.4 P(y=2) =.3 P(y=3) =.3 1 Learn 2v3 first 2 Throw away all error examples 3 Learn 1 v Survivors 1 1 v {2,3} 2 v Theorem: For all multiclass problems, for all binary classifiers, Multiclass Regret Average Binary Regret * log(k)
34 Can you make it robust? Winner
35 Can you make it robust? Winners
36 Can you make it robust? Winners
37 Can you make it robust? Winners Theorem: [BLR09] For all multiclass problems, for all binary classifiers, a log(k)-correcting tournament satisfies: Multiclass Regret Average Binary Regret * 5.5 Determined best paper prize for ICML2012 (area chair decisions).
38 How do you learn structure? Not all partitions are equally difficult. Compare {1, 7}v{3, 8} to {1, 8}v{3, 7} What is better?
39 How do you learn structure? Not all partitions are equally difficult. Compare {1, 7}v{3, 8} to {1, 8}v{3, 7} What is better? [BWG10]: Better to confuse near leaves than near root. Intuition: the root predictor tends to be overconstrained while the leafwards predictors are less constrained.
40 The Partitioning Problem [CL14] Given a set of n examples each with one of K labels, find a partitioner h that maximizes: E x,y Pr(h(x) = 1, y) Pr(h(x) = 1) Pr(y)
41 The Partitioning Problem [CL14] Given a set of n examples each with one of K labels, find a partitioner h that maximizes: E x Pr(y) Pr(h(x) = 1 x X y ) Pr(h(x) = 1) y where X y is the set of x associated with y.
42 The Partitioning Problem [CL14] Given a set of n examples each with one of K labels, find a partitioner h that maximizes: Nonconvex for any symmetric hypothesis class (ouch)
43 Bottom Up doesn t work Suppose you use linear representations.
44 Bottom Up doesn t work Suppose you use linear representations. Suppose you first build a 1v3 predictor.
45 Bottom Up doesn t work Suppose you use linear representations. Suppose you first build a 1v3 predictor. Suppose you then build a 2v{1v3} predictor. You lose.
46 Does partitioning recurse well? Theorem: If at every node n, E x,y Pr(h(x) = 1, y) Pr(h(x) = 1) Pr(y) > γ then after ( ) 4(1 γ) 2 ln k 1 γ 2 ɛ splits, the multiclass error is less than ɛ.
47 Online Partitioning Relax the optimization criteria: E x,y E x y [ŷ(x)] E x [ŷ(x)]... and approximate with running average
48 Online Partitioning Relax the optimization criteria: E x,y E x y [ŷ(x)] E x [ŷ(x)]... and approximate with running average Let e = 0 and for all y, e y = 0, n y = 0 For each example (x, y) 1 if e y < e then b = 1 else b = 1 2 Update w using (x, b) 3 n y n y e y (ny 1)ey n y + ŷ(x) n y 5 e (t 1)e t + ŷ(x) t Apply recursively to construct a tree structure.
49 Accuracy for a fixed training time accuracy isolet LOMtree vs one-against-all 105 sector 1000 aloi LOMtree OAA imagenet ODP number of classes
50 Test Error %, optimized, no train-time constraint Test Error % Performance of Log-time algorithms Rand Filter LOM Isolet Sector Aloi Imagenet ODP
51 Test Error %, optimized, no train-time constraint Test Error % Rand Filter LOM OAA Compared to OAA Isolet Sector Aloi Imagenet ODP
52 Classes vs Test time ratio 12 LOMtree vs one against all 10 log 2 (time ratio) log 2 (number of classes)
53 Can we predict in time O(log 2 K)?
54 Can we predict in time O(log 2 K)? What is the right way to achieve consistency and dynamic partition?
55 Can we predict in time O(log 2 K)? What is the right way to achieve consistency and dynamic partition? How can you balance representation complexity and sample complexity?
56 Bibliography Alina Beygelzimer, John Langford, Pradeep Ravikumar, Error-Correcting Tournaments, Samy Bengio, Jason Weston, David Grangier, Label embedding trees for large multi-class tasks, NIPS Anna Choromanska, John Langford, Logarithmic Time Online Multiclass prediction,
The Offset Tree for Learning with Partial Labels
The Offset Tree for Learning with Partial Labels Alina Beygelzimer IBM Research John Langford Yahoo! Research June 30, 2009 KDD 2009 1 A user with some hidden interests make a query on Yahoo. 2 Yahoo chooses
More informationLogarithmic Time Online Multiclass prediction
Logarithmic Time Online Multiclass prediction Anna Choromanska Courant Institute of Mathematical Sciences New York, NY, USA achoroma@cims.nyu.edu John Langford Microsoft Research New York, NY, USA jcl@microsoft.com
More informationData Mining: Classifier Evaluation. CSCI-B490 Seminar in Computer Science (Data Mining)
Data Mining: Classifier Evaluation CSCI-B490 Seminar in Computer Science (Data Mining) Predictor Evaluation 1. Question: how good is our algorithm? how will we estimate its performance? 2. Question: what
More informationIntroduction to Machine Learning Lecture 4. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 4 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Nearest-Neighbor Algorithms Nearest Neighbor Algorithms Definition: fix k 1, given a labeled
More informationGeneralizing Binary Classiers to the Multiclass Case
Generalizing Binary Classiers to the Multiclass Case Dain, Eliyahu eliyahud@post.tau.ac.il Zach, Rotem rotemzac@post.tau.ac.il March 3, 2013 Abstract The course described several ways to solve binary classication
More informationMulti-label Classification. Jingzhou Liu Dec
Multi-label Classification Jingzhou Liu Dec. 6 2016 Introduction Multi-class problem, Training data (x $, y $ ) ( ), x $ X R., y $ Y = 1,2,, L Learn a mapping f: X Y Each instance x $ is associated with
More informationI211: Information infrastructure II
Data Mining: Classifier Evaluation I211: Information infrastructure II 3-nearest neighbor labeled data find class labels for the 4 data points 1 0 0 6 0 0 0 5 17 1.7 1 1 4 1 7.1 1 1 1 0.4 1 2 1 3.0 0 0.1
More informationLabel Embedding Trees for Large Multi-Class Tasks
Label Embedding Trees for Large Multi-Class Tasks Samy Bengio (1) Jason Weston (1) David Grangier (2) (1) Google Research, New York, NY {bengio, jweston}@google.com (2) NEC Labs America, Princeton, NJ
More informationCandidates vs. Noises Estimation for Large Multi-Class Classification Problem
Lei Han 1 Yiheng Huang 1 Tong Zhang 1 Abstract This paper proposes a method for multi-class classification problems, where the number of classes K is large. The method, referred to as Candidates vs. Noises
More informationThe Offset Tree for Learning with Partial Labels
The Offset Tree for Learning with Partial Labels Alina Beygelzimer IBM Research beygel@us.ibm.com John Langford Yahoo Research jl@yahoo-inc.com Tong Zhang Rutgers Statistics Department tongz@rci.rutgers.edu
More informationIntroduction to Randomized Algorithms
Introduction to Randomized Algorithms Gopinath Mishra Advanced Computing and Microelectronics Unit Indian Statistical Institute Kolkata 700108, India. Organization 1 Introduction 2 Some basic ideas from
More informationClassification and Regression Trees
Classification and Regression Trees David S. Rosenberg New York University April 3, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 April 3, 2018 1 / 51 Contents 1 Trees 2 Regression
More informationDecision Trees Oct
Decision Trees Oct - 7-2009 Previously We learned two different classifiers Perceptron: LTU KNN: complex decision boundary If you are a novice in this field, given a classification application, are these
More informationConditional Probability Tree Estimation Analysis and Algorithms
UAI 2009 BEYGELZIMER ET AL. 51 Conditional Probability Tree Estimation Analysis and Algorithms Alina Beygelzimer IBM Research beygel@us.ibm.com John Langford Yahoo! Research jl@yahoo-inc.com Yuri Lifshits
More information1 Document Classification [60 points]
CIS519: Applied Machine Learning Spring 2018 Homework 4 Handed Out: April 3 rd, 2018 Due: April 14 th, 2018, 11:59 PM 1 Document Classification [60 points] In this problem, you will implement several text
More informationDeepWalk: Online Learning of Social Representations
DeepWalk: Online Learning of Social Representations ACM SIG-KDD August 26, 2014, Rami Al-Rfou, Steven Skiena Stony Brook University Outline Introduction: Graphs as Features Language Modeling DeepWalk Evaluation:
More informationLocal Constraints in Combinatorial Optimization
Local Constraints in Combinatorial Optimization Madhur Tulsiani Institute for Advanced Study Local Constraints in Approximation Algorithms Linear Programming (LP) or Semidefinite Programming (SDP) based
More information6 Randomized rounding of semidefinite programs
6 Randomized rounding of semidefinite programs We now turn to a new tool which gives substantially improved performance guarantees for some problems We now show how nonlinear programming relaxations can
More informationCPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017
CPSC 340: Machine Learning and Data Mining Probabilistic Classification Fall 2017 Admin Assignment 0 is due tonight: you should be almost done. 1 late day to hand it in Monday, 2 late days for Wednesday.
More informationMachine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm
Machine Learning (CSE 446): Concepts & the i.i.d. Supervised Learning Paradigm Sham M Kakade c 2018 University of Washington cse446-staff@cs.washington.edu 1 / 17 Review 1 / 17 Decision Tree: Making a
More informationCS229 Lecture notes. Raphael John Lamarre Townshend
CS229 Lecture notes Raphael John Lamarre Townshend Decision Trees We now turn our attention to decision trees, a simple yet flexible class of algorithms. We will first consider the non-linear, region-based
More informationarxiv: v1 [cs.lg] 5 May 2015
Reinforced Decision Trees Reinforced Decision Trees arxiv:1505.00908v1 [cs.lg] 5 May 2015 Aurélia Léon aurelia.leon@lip6.fr Sorbonne Universités, UPMC Univ Paris 06, UMR 7606, LIP6, F-75005, Paris, France
More information15-451/651: Design & Analysis of Algorithms November 4, 2015 Lecture #18 last changed: November 22, 2015
15-451/651: Design & Analysis of Algorithms November 4, 2015 Lecture #18 last changed: November 22, 2015 While we have good algorithms for many optimization problems, the previous lecture showed that many
More informationWe can use a max-heap to sort data.
Sorting 7B N log N Sorts 1 Heap Sort We can use a max-heap to sort data. Convert an array to a max-heap. Remove the root from the heap and store it in its proper position in the same array. Repeat until
More informationObject Classification Problem
HIERARCHICAL OBJECT CATEGORIZATION" Gregory Griffin and Pietro Perona. Learning and Using Taxonomies For Fast Visual Categorization. CVPR 2008 Marcin Marszalek and Cordelia Schmid. Constructing Category
More informationTesting Continuous Distributions. Artur Czumaj. DIMAP (Centre for Discrete Maths and it Applications) & Department of Computer Science
Testing Continuous Distributions Artur Czumaj DIMAP (Centre for Discrete Maths and it Applications) & Department of Computer Science University of Warwick Joint work with A. Adamaszek & C. Sohler Testing
More informationHierarchical Clustering 4/5/17
Hierarchical Clustering 4/5/17 Hypothesis Space Continuous inputs Output is a binary tree with data points as leaves. Useful for explaining the training data. Not useful for making new predictions. Direction
More informationWe assume uniform hashing (UH):
We assume uniform hashing (UH): the probe sequence of each key is equally likely to be any of the! permutations of 0,1,, 1 UH generalizes the notion of SUH that produces not just a single number, but a
More informationCS369G: Algorithmic Techniques for Big Data Spring
CS369G: Algorithmic Techniques for Big Data Spring 2015-2016 Lecture 11: l 0 -Sampling and Introduction to Graph Streaming Prof. Moses Charikar Scribe: Austin Benson 1 Overview We present and analyze the
More information7. Decision or classification trees
7. Decision or classification trees Next we are going to consider a rather different approach from those presented so far to machine learning that use one of the most common and important data structure,
More informationEnsemble Methods: Bagging
Ensemble Methods: Bagging Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Eric Eaton (UPenn), Jenna Wiens (UMich), Tommi Jaakola (MIT), David Kauchak (Pomona), David Sontag
More informationKernel Methods & Support Vector Machines
& Support Vector Machines & Support Vector Machines Arvind Visvanathan CSCE 970 Pattern Recognition 1 & Support Vector Machines Question? Draw a single line to separate two classes? 2 & Support Vector
More information1 Matchings in Graphs
Matchings in Graphs J J 2 J 3 J 4 J 5 J J J 6 8 7 C C 2 C 3 C 4 C 5 C C 7 C 8 6 J J 2 J 3 J 4 J 5 J J J 6 8 7 C C 2 C 3 C 4 C 5 C C 7 C 8 6 Definition Two edges are called independent if they are not adjacent
More informationMetric Learning for Large-Scale Image Classification:
Metric Learning for Large-Scale Image Classification: Generalizing to New Classes at Near-Zero Cost Florent Perronnin 1 work published at ECCV 2012 with: Thomas Mensink 1,2 Jakob Verbeek 2 Gabriela Csurka
More informationData Mining Part 5. Prediction
Data Mining Part 5. Prediction 5.4. Spring 2010 Instructor: Dr. Masoud Yaghini Outline Using IF-THEN Rules for Classification Rule Extraction from a Decision Tree 1R Algorithm Sequential Covering Algorithms
More informationThe Limits of Sorting Divide-and-Conquer Comparison Sorts II
The Limits of Sorting Divide-and-Conquer Comparison Sorts II CS 311 Data Structures and Algorithms Lecture Slides Monday, October 12, 2009 Glenn G. Chappell Department of Computer Science University of
More informationLecture and notes by: Nate Chenette, Brent Myers, Hari Prasad November 8, Property Testing
Property Testing 1 Introduction Broadly, property testing is the study of the following class of problems: Given the ability to perform (local) queries concerning a particular object (e.g., a function,
More informationNearest neighbor classification DSE 220
Nearest neighbor classification DSE 220 Decision Trees Target variable Label Dependent variable Output space Person ID Age Gender Income Balance Mortgag e payment 123213 32 F 25000 32000 Y 17824 49 M 12000-3000
More informationMergesort again. 1. Split the list into two equal parts
Quicksort Mergesort again 1. Split the list into two equal parts 5 3 9 2 8 7 3 2 1 4 5 3 9 2 8 7 3 2 1 4 Mergesort again 2. Recursively mergesort the two parts 5 3 9 2 8 7 3 2 1 4 2 3 5 8 9 1 2 3 4 7 Mergesort
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationHow many leaves on the decision tree? There are n! leaves, because every permutation appears at least once.
Chapter 8. Sorting in Linear Time Types of Sort Algorithms The only operation that may be used to gain order information about a sequence is comparison of pairs of elements. Quick Sort -- comparison-based
More information06: Logistic Regression
06_Logistic_Regression 06: Logistic Regression Previous Next Index Classification Where y is a discrete value Develop the logistic regression algorithm to determine what class a new input should fall into
More informationToday s Outline. Motivation. Disjoint Sets. Disjoint Sets and Dynamic Equivalence Relations. Announcements. Today s Topics:
Today s Outline Disjoint Sets and Dynamic Equivalence Relations Announcements Assignment # due Thurs 0/ at pm Today s Topics: Disjoint Sets & Dynamic Equivalence CSE Data Structures and Algorithms 0//0
More information!!! Warning!!! Learning jargon is always painful even if the concepts behind the jargon are not hard. So, let s get used to it. In mathematics you don't understand things. You just get used to them. von
More informationLecture 7: Asymmetric K-Center
Advanced Approximation Algorithms (CMU 18-854B, Spring 008) Lecture 7: Asymmetric K-Center February 5, 007 Lecturer: Anupam Gupta Scribe: Jeremiah Blocki In this lecture, we will consider the K-center
More informationBalanced Search Trees
Balanced Search Trees Computer Science E-22 Harvard Extension School David G. Sullivan, Ph.D. Review: Balanced Trees A tree is balanced if, for each node, the node s subtrees have the same height or have
More information1 Overview. 2 Applications of submodular maximization. AM 221: Advanced Optimization Spring 2016
AM : Advanced Optimization Spring 06 Prof. Yaron Singer Lecture 0 April th Overview Last time we saw the problem of Combinatorial Auctions and framed it as a submodular maximization problem under a partition
More informationCSE373: Data Structures and Algorithms Lecture 4: Asymptotic Analysis. Aaron Bauer Winter 2014
CSE373: Data Structures and Algorithms Lecture 4: Asymptotic Analysis Aaron Bauer Winter 2014 Previously, on CSE 373 We want to analyze algorithms for efficiency (in time and space) And do so generally
More informationEvaluation Metrics. (Classifiers) CS229 Section Anand Avati
Evaluation Metrics (Classifiers) CS Section Anand Avati Topics Why? Binary classifiers Metrics Rank view Thresholding Confusion Matrix Point metrics: Accuracy, Precision, Recall / Sensitivity, Specificity,
More informationCoverage Approximation Algorithms
DATA MINING LECTURE 12 Coverage Approximation Algorithms Example Promotion campaign on a social network We have a social network as a graph. People are more likely to buy a product if they have a friend
More informationMotivation for B-Trees
1 Motivation for Assume that we use an AVL tree to store about 20 million records We end up with a very deep binary tree with lots of different disk accesses; log2 20,000,000 is about 24, so this takes
More informationCSC 411 Lecture 4: Ensembles I
CSC 411 Lecture 4: Ensembles I Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 04-Ensembles I 1 / 22 Overview We ve seen two particular classification algorithms:
More informationOptimisation While Streaming
Optimisation While Streaming Amit Chakrabarti Dartmouth College Joint work with S. Kale, A. Wirth DIMACS Workshop on Big Data Through the Lens of Sublinear Algorithms, Aug 2015 Combinatorial Optimisation
More informationCSE 546 Machine Learning, Autumn 2013 Homework 2
CSE 546 Machine Learning, Autumn 2013 Homework 2 Due: Monday, October 28, beginning of class 1 Boosting [30 Points] We learned about boosting in lecture and the topic is covered in Murphy 16.4. On page
More information4.8 Huffman Codes. These lecture slides are supplied by Mathijs de Weerd
4.8 Huffman Codes These lecture slides are supplied by Mathijs de Weerd Data Compression Q. Given a text that uses 32 symbols (26 different letters, space, and some punctuation characters), how can we
More informationIntegral Geometry and the Polynomial Hirsch Conjecture
Integral Geometry and the Polynomial Hirsch Conjecture Jonathan Kelner, MIT Partially based on joint work with Daniel Spielman Introduction n A lot of recent work on Polynomial Hirsch Conjecture has focused
More informationimproving raytracing speed
ray tracing II computer graphics ray tracing II 2006 fabio pellacini 1 improving raytracing speed computer graphics ray tracing II 2006 fabio pellacini 2 raytracing computational complexity ray-scene intersection
More informationprinceton univ. F 15 cos 521: Advanced Algorithm Design Lecture 2: Karger s Min Cut Algorithm
princeton univ. F 5 cos 5: Advanced Algorithm Design Lecture : Karger s Min Cut Algorithm Lecturer: Pravesh Kothari Scribe:Pravesh (These notes are a slightly modified version of notes from previous offerings
More informationFINAL EXAM SOLUTIONS
COMP/MATH 3804 Design and Analysis of Algorithms I Fall 2015 FINAL EXAM SOLUTIONS Question 1 (12%). Modify Euclid s algorithm as follows. function Newclid(a,b) if a
More informationLarge Margin Classification Using the Perceptron Algorithm
Large Margin Classification Using the Perceptron Algorithm Yoav Freund Robert E. Schapire Presented by Amit Bose March 23, 2006 Goals of the Paper Enhance Rosenblatt s Perceptron algorithm so that it can
More informationExact Algorithms Lecture 7: FPT Hardness and the ETH
Exact Algorithms Lecture 7: FPT Hardness and the ETH February 12, 2016 Lecturer: Michael Lampis 1 Reminder: FPT algorithms Definition 1. A parameterized problem is a function from (χ, k) {0, 1} N to {0,
More informationData Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification
Data Mining 3.3 Fall 2008 Instructor: Dr. Masoud Yaghini Outline Using IF-THEN Rules for Classification Rules With Exceptions Rule Extraction from a Decision Tree 1R Algorithm Sequential Covering Algorithms
More informationPattern Recognition for Neuroimaging Data
Pattern Recognition for Neuroimaging Data Edinburgh, SPM course April 2013 C. Phillips, Cyclotron Research Centre, ULg, Belgium http://www.cyclotron.ulg.ac.be Overview Introduction Univariate & multivariate
More informationJohn Oliver from The Daily Show. Supporting worthy causes at the G20 Pittsburgh Summit: Bayesians Against Discrimination. Ban Genetic Algorithms
John Oliver from The Daily Show Supporting worthy causes at the G20 Pittsburgh Summit: Bayesians Against Discrimination Ban Genetic Algorithms Support Vector Machines Watch out for the protests tonight
More informationComputerlinguistische Anwendungen Support Vector Machines
with Scikitlearn Computerlinguistische Anwendungen Support Vector Machines Thang Vu CIS, LMU thangvu@cis.uni-muenchen.de May 20, 2015 1 Introduction Shared Task 1 with Scikitlearn Today we will learn about
More informationII (Sorting and) Order Statistics
II (Sorting and) Order Statistics Heapsort Quicksort Sorting in Linear Time Medians and Order Statistics 8 Sorting in Linear Time The sorting algorithms introduced thus far are comparison sorts Any comparison
More informationCMPSCI 250: Introduction to Computation. Lecture #22: Graphs, Paths, and Trees David Mix Barrington 12 March 2014
CMPSCI 250: Introduction to Computation Lecture #22: Graphs, Paths, and Trees David Mix Barrington 12 March 2014 Graphs, Paths, and Trees Graph Definitions Paths and the Path Predicate Cycles, Directed
More informationBusiness Club. Decision Trees
Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building
More informationSUBDIVISIONS OF TRANSITIVE TOURNAMENTS A.D. SCOTT
SUBDIVISIONS OF TRANSITIVE TOURNAMENTS A.D. SCOTT Abstract. We prove that, for r 2 and n n(r), every directed graph with n vertices and more edges than the r-partite Turán graph T (r, n) contains a subdivision
More informationMachine Learning for NLP
Machine Learning for NLP Support Vector Machines Aurélie Herbelot 2018 Centre for Mind/Brain Sciences University of Trento 1 Support Vector Machines: introduction 2 Support Vector Machines (SVMs) SVMs
More informationEvaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München
Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics
More informationDATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines
DATA MINING LECTURE 10B Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines NEAREST NEIGHBOR CLASSIFICATION 10 10 Illustrating Classification Task Tid Attrib1
More information6.842 Randomness and Computation September 25-27, Lecture 6 & 7. Definition 1 Interactive Proof Systems (IPS) [Goldwasser, Micali, Rackoff]
6.84 Randomness and Computation September 5-7, 017 Lecture 6 & 7 Lecturer: Ronitt Rubinfeld Scribe: Leo de Castro & Kritkorn Karntikoon 1 Interactive Proof Systems An interactive proof system is a protocol
More informationAdvanced Algorithm Design and Analysis (Lecture 12) SW5 fall 2005 Simonas Šaltenis E1-215b
Advanced Algorithm Design and Analysis (Lecture 12) SW5 fall 2005 Simonas Šaltenis E1-215b simas@cs.aau.dk Range Searching in 2D Main goals of the lecture: to understand and to be able to analyze the kd-trees
More information10601 Machine Learning. Model and feature selection
10601 Machine Learning Model and feature selection Model selection issues We have seen some of this before Selecting features (or basis functions) Logistic regression SVMs Selecting parameter value Prior
More informationLecture 6-Decision Tree & MDL
6-Decision Tree & MDL-1 Machine Learning Lecture 6-Decision Tree & MDL Lecturer: Haim Permuter Scribes: Asaf Lavi and Ben Marinberg This lecture discusses decision trees and the minimum description length
More informationDynamic Indexability and Lower Bounds for Dynamic One-Dimensional Range Query Indexes
Dynamic Indexability and Lower Bounds for Dynamic One-Dimensional Range Query Indexes Ke Yi HKUST 1-1 First Annual SIGMOD Programming Contest (to be held at SIGMOD 2009) Student teams from degree granting
More informationOn the Agenda Control Problem for Knockout Tournaments
On the Agenda Control Problem for Knockout Tournaments Thuc Vu, Alon Altman, Yoav Shoham Abstract Knockout tournaments are very common in practice for various settings such as sport events and sequential
More informationDecision Tree CE-717 : Machine Learning Sharif University of Technology
Decision Tree CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adapted from: Prof. Tom Mitchell Decision tree Approximating functions of usually discrete
More informationMore on Neural Networks. Read Chapter 5 in the text by Bishop, except omit Sections 5.3.3, 5.3.4, 5.4, 5.5.4, 5.5.5, 5.5.6, 5.5.7, and 5.
More on Neural Networks Read Chapter 5 in the text by Bishop, except omit Sections 5.3.3, 5.3.4, 5.4, 5.5.4, 5.5.5, 5.5.6, 5.5.7, and 5.6 Recall the MLP Training Example From Last Lecture log likelihood
More informationPatterns for! Parallel Programming II!
Lecture 4! Patterns for! Parallel Programming II! John Cavazos! Dept of Computer & Information Sciences! University of Delaware! www.cis.udel.edu/~cavazos/cisc879! Task Decomposition Also known as functional
More informationTreaps. 1 Binary Search Trees (BSTs) CSE341T/CSE549T 11/05/2014. Lecture 19
CSE34T/CSE549T /05/04 Lecture 9 Treaps Binary Search Trees (BSTs) Search trees are tree-based data structures that can be used to store and search for items that satisfy a total order. There are many types
More informationSpatial Data Structures and Speed-Up Techniques. Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology
Spatial Data Structures and Speed-Up Techniques Tomas Akenine-Möller Department of Computer Engineering Chalmers University of Technology Spatial data structures What is it? Data structure that organizes
More informationCredit card Fraud Detection using Predictive Modeling: a Review
February 207 IJIRT Volume 3 Issue 9 ISSN: 2396002 Credit card Fraud Detection using Predictive Modeling: a Review Varre.Perantalu, K. BhargavKiran 2 PG Scholar, CSE, Vishnu Institute of Technology, Bhimavaram,
More information6 Distributed data management I Hashing
6 Distributed data management I Hashing There are two major approaches for the management of data in distributed systems: hashing and caching. The hashing approach tries to minimize the use of communication
More informationKernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Kernels + K-Means Matt Gormley Lecture 29 April 25, 2018 1 Reminders Homework 8:
More informationLecture 9: Support Vector Machines
Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and
More informationParallel Algorithms for Geometric Graph Problems Grigory Yaroslavtsev
Parallel Algorithms for Geometric Graph Problems Grigory Yaroslavtsev http://grigory.us Appeared in STOC 2014, joint work with Alexandr Andoni, Krzysztof Onak and Aleksandar Nikolov. The Big Data Theory
More informationBoosting Simple Model Selection Cross Validation Regularization
Boosting: (Linked from class website) Schapire 01 Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 8 th,
More informationS1) It's another form of peak finder problem that we discussed in class, We exploit the idea used in binary search.
Q1) Given an array A which stores 0 and 1, such that each entry containing 0 appears before all those entries containing 1. In other words, it is like {0, 0, 0,..., 0, 0, 1, 1,..., 111}. Design an algorithm
More informationCSE373: Data Structure & Algorithms Lecture 18: Comparison Sorting. Dan Grossman Fall 2013
CSE373: Data Structure & Algorithms Lecture 18: Comparison Sorting Dan Grossman Fall 2013 Introduction to Sorting Stacks, queues, priority queues, and dictionaries all focused on providing one element
More information1 Minimum Cut Problem
CS 6 Lecture 6 Min Cut and Karger s Algorithm Scribes: Peng Hui How, Virginia Williams (05) Date: November 7, 07 Anthony Kim (06), Mary Wootters (07) Adapted from Virginia Williams lecture notes Minimum
More informationClassification. Instructor: Wei Ding
Classification Decision Tree Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Preliminaries Each data record is characterized by a tuple (x, y), where x is the attribute
More informationMetric Techniques and Approximation Algorithms. Anupam Gupta Carnegie Mellon University
Metric Techniques and Approximation Algorithms Anupam Gupta Carnegie Mellon University Metric space M = (V, d) set Vof points y z distances d(x,y) triangle inequality d(x,y) d(x,z) + d(z,y) x why metric
More informationAdvances in Structured Prediction
Advances in Structured Prediction John Langford Microsoft Research jl@hunch.net Hal Daumé III U Maryland me@hal3.name Slides and more: http://hunch.net/~l2s Examples of structured prediction joint The
More informationGlobally Induced Forest: A Prepruning Compression Scheme
Globally Induced Forest: A Prepruning Compression Scheme Jean-Michel Begon, Arnaud Joly, Pierre Geurts Systems and Modeling, Dept. of EE and CS, University of Liege, Belgium ICML 2017 Goal and motivation
More informationLecture 7: Decision Trees
Lecture 7: Decision Trees Instructor: Outline 1 Geometric Perspective of Classification 2 Decision Trees Geometric Perspective of Classification Perspective of Classification Algorithmic Geometric Probabilistic...
More informationComputational Geometry
Windowing queries Windowing Windowing queries Zoom in; re-center and zoom in; select by outlining Windowing Windowing queries Windowing Windowing queries Given a set of n axis-parallel line segments, preprocess
More informationClassification in a large number of categories
Classification in a large number of categories T. Artières, Joint work with the MLIA team at LIP6, the AMA team at LIG and Demokritos Lab. d informatique de Paris 6, France Lab. d Informatique de Grenoble
More informationAbusing a hypergraph partitioner for unweighted graph partitioning
Abusing a hypergraph partitioner for unweighted graph partitioning B. O. Fagginger Auer R. H. Bisseling Utrecht University February 13, 2012 Fagginger Auer, Bisseling (UU) Mondriaan Graph Partitioning
More information