Climbing the Kaggle Leaderboard by Exploiting the Log-Loss Oracle
|
|
- Edgar Park
- 6 years ago
- Views:
Transcription
1 Climbing the Kaggle Leaderboard by Exploiting the Log-Loss Oracle Jacob Whitehill Worcester Polytechnic Institute
2 Machine learning competitions Data-mining competitions (Kaggle, KDDCup, DrivenData, etc.) have become a mainstay of machine learning practice. They help ensure comparability and reproducibility by providing a common test set and common rules.
3 Machine learning competitions They can help incentivize machine learning innovations in specific application domains: Cervical cancer diagnosis from images Passenger security screening in airports Restaurant visitor forecasting Dog breed identification Ancillary benefit: provide credibility to new data scientists when searching for a job.
4 Machine learning competitions 1. Competition organizer assembles training & testing data. Competition organizer Training data Testing data Images Images Labels Labels 1 0 y 1,...,y n (ground-truth) 4
5 Machine learning competitions Competition organizer Training data Testing data Images Images Labels Labels 1 0 y 1,...,y n (ground-truth) 2. Contestant obtains training examples+labels & testing examples without labels. Contestant Training data Images Testing data Images Labels
6 Machine learning competitions Competition organizer Training data Testing data Images Images Labels Labels 1 0 y 1,...,y n (ground-truth) 3. Contestant uses machine learning to guess the test labels. Contestant Training data Testing data Images Images Labels Guesses ŷ 1,...,ŷ n (guesses) Machine learning 6
7 Machine learning competitions Competition organizer Training data Testing data Images Images Labels Labels Contestant submits guesses to the organizer, who records the accuracy c. y 1,...,y n (ground-truth) Contestant Training data Images Labels Testing data Images Guesses ŷ 1,...,ŷ n (guesses) 7
8 Machine learning competitions Competition organizer Training data Testing data Images Images Labels Labels When competition is over, the organizer reports which contestant won the contest. c= c= c= Contestant A Contestant B Contestant C 8
9 Machine learning competitions Competition organizer Training data Testing data Images Images Labels Labels 1 0 Oracle Contestant sends guesses ŷ 1,...,ŷ n to oracle Contestant Images Training data Testing data Images Labels Guesses
10 Machine learning competitions Competition organizer Training data Testing data Images Images Labels Labels 1 0 Oracle Oracle reports accuracy c during the competition. Contestant Images Training data Testing data Images Labels Guesses
11 Machine learning competitions Competition organizer Training data Testing data Images Images Labels Labels 1 0 Oracle Oracle reports accuracy c during the competition. Contestant Images Training data Testing data Images Labels Guesses Oracle feedback can help candidates to identify more/less promising ML approaches. 11
12 Machine learning competitions Competition organizer Training data Testing data Images Images Labels Labels 1 0 Oracle Oracle reports accuracy c during the competition. Contestant Images Training data Testing data Images Labels Guesses Question: can the oracle be exploited to deduce the true labels illicitly? 12
13 Log-loss One of the most common error metrics for classification problems is the log-loss. If Yn and Ŷn are the (n x c) ground-truth and guess matrices respectively (each element in [0,1]), then the log-loss is: `n. = f(y n, Y b n )= 1 n nx cx y ij log by ij i=1 j=1
14 Example Suppose there are 2 examples and 3 classes, and the contestant submits the following guesses to the oracle: by 2 = apple e 2 e 1 1 e 2 e 1 e 8 e 4 1 e 8 e 4 Suppose the oracle reports that: f(y 2, b Y 2 )= 1 2 2X i=1 3X j=1 y ij log by ij =3 By iterating over all 3 2 possible ground-truths, we can easily determine that: Y 2 = apple
15 Exploiting the log-loss In general, there are c n possible ground-truths for n examples too many to be tractable. However: because the log-loss decomposes across examples, we can iteratively apply brute-force search over small batches of size m n. l n mm Over batches, we can deduce the ground-truth labels of the entire test set.
16 Exploiting the log-loss `n = = 1 n 1 n nx i=1 2 4 kx i=1 cx j=1 X nx j y ij log by ij y ij log by ij + X k+m X i=k+1 3 y ij log by 5 ij X j y ij log by ij + i=k+m+1 j
17 Exploiting the log-loss `n = = 1 n 1 n nx i=1 2 4 kx i=1 cx j=1 X j y ij log by ij Already inferred nx y ij log by ij + Unprobed X k+m X i=k+1 3 y ij log by 5 ij X j Probed y ij log by ij + i=k+m+1 j
18 Exploiting the log-loss `n = = 1 n 1 n nx i=1 2 4 kx i=1 cx j=1 X j y ij log by ij Already inferred nx 0 y ij log by ij + Unprobed X k+m X i=k+1 3 y ij log by 5 ij X j Probed y ij log by ij + i=k+m+1 j For the already inferred examples, the log-loss kx X y ij log ŷ ij 0 i=1 j *It s actually not exactly 0 because ŷ ij 2 [, 1 (c 1) ]
19 Exploiting the log-loss `n = = 1 n 1 n nx i=1 2 4 kx i=1 cx j=1 X j i=k+m+1 y ij log by ij Already inferred 0 y ij log by ij + k+m X i=k+1 3 nx Unprobed X n y ij log by 5 (n m k) log c ij j X j Probed y ij log by ij + For each unprobed example i, if we set ŷij = 1/c for all j, then the log-loss for example i is just -log c:
20 Exploiting the log-loss Hence, the log-loss is: `m due to just the m probed examples `m. = 1 m k+m X X y ij log by ij i=k+1 j = 1 m ((n m k) log c n `n) Log-loss over all n examples. Returned by the oracle.
21 Exploiting the log-loss Now that we know the log-loss on just the batch of examples, we can apply brute-force optimization over Ym. For any possible ground-truth Ym of the probed examples, define the estimation error:. = `m f(y m, b Y m )) Select the Ym that minimizes ε.
22 Choosing the guesses Ŷm But how to choose the guesses Ŷm? If the oracles s floating-point precision was infinite, then we could just choose a random real-valued matrix Ŷm. With probability 1, the log-loss `m would be different for every possible ground-truth, and we could recover Ym unambiguously.
23 Choosing the guesses Ŷm In practice, the oracle s floating-point precision is finite, e.g., 5th decimal place. Collisions can occur: different Ym that result in very similar log-loss values. `m
24 Choosing the guesses Ŷm Consider the following set of guesses, in which no two values are closer than apart: Yet two possible ground-truth matrices result in log-loss values that are very close to each other (<10-4 ): and `m = `m =
25 Choosing the guesses Ŷm To avoid collisions, we want to choose Ŷm so that no two distinct ground-truths Ym and Ym' result in a similar loss: Q( b Y m ). = min Y m 6=Y 0 m f(y m, b Y m ) f(y 0 m, b Y m ) We want to maximize Q(Ŷm): by m. = arg max Q( Y b m ) This is a constrained minimax (maximin) optimization problem.
26 Choosing the guesses Ŷm Special optimization algorithms do exist for constrained minimax problems. However, in our case they are impractical because the number of constraints grows exponentially with m. In practice, we employed an ad-hoc heuristic to maximize Q w.r.t. Ŷm.
27 Upper bound on quality Note that m cannot be too large because the quality Q of the best Ŷm decreases exponentially with m.
28 Upper bound on quality For m=6, we found a Ŷm that worked well in practice.
29 Intel-MobileODT Kaggle competition We applied this algorithm to climb the Kaggle leaderboard for the Intel-MobileODT Kaggle (2017) competition. Topic: diagnosis of cervical cancer from medical images. Competition structure: 1st phase: test set of 512 examples 2nd phase: test set of 4018 examples (including original 512)
30 Intel-MobileODT Kaggle competition We applied this algorithm to climb the Kaggle leaderboard for the Intel-MobileODT Kaggle (2017) competition. Topic: diagnosis of cervical cancer from medical images. Competition structure: 1st phase: test set of 512 examples Mostly for informational purposes 2nd phase: test set of 4018 examples (including original 512) Used to decide who wins $100K
31 Example submission We submit these guesses to the oracle: Already inferred Probed Unprobed image_name,type_1,type_2,type_3 0.jpg,0.,1.,0. 1.jpg, e-01, e-01, e-02 2.jpg, e-01, e-01, e-04 3.jpg, e-01, e-01, e-02 4.jpg, e-01, e-01, e-01 5.jpg, , , jpg, , , jpg, , , jpg, , , jpg, , ,
32 Example submission 1. We receive the log-loss from the oracle: `n = We calculate the loss on just the probed examples: `m = We conduct brute-force optimization to identify Ym: 1.jpg 2.jpg 3.jpg 4.jpg , 1, 0 1, 0, 0 0, 0, 1 0, 0,
33 Kaggle MobileODT competition (1st stage) We repeat this process until we ve inferred the labels of all 512 examples
34 Kaggle MobileODT competition (1st stage) We repeat this process until we ve inferred the labels of all 512 examples
35 Kaggle MobileODT competition (1st stage) We repeat this process until we ve inferred the labels of all 512 examples
36 Kaggle MobileODT competition (1st stage) We repeat this process until we ve inferred the labels of all 512 examples
37 Kaggle MobileODT competition (1st stage) We repeat this process until we ve inferred the labels of all 512 examples
38 Kaggle MobileODT competition (1st stage) We repeat this process until we ve inferred the labels of all 512 examples
39 Kaggle MobileODT competition (1st stage) Eventually, we achieved a log-loss of 0 and climbed to rank #4 on the leaderboard without doing any real machine learning.
40 Kaggle MobileODT competition To be clear: the 2nd stage of the competition (which we did not win) was the basis for awarding the $100K prize. Our 2nd-stage rank was 225 out of 884 contestants (top 30%), since 512 out of 4018 examples were the same as in 1st stage.
41 Kaggle MobileODT competition Some other data-mining competition sites, such as DataDriven, host competitions that have no 2nd stage. In light of the log-loss attack, this seems dangerous. There might be some ancillary value in performing well even without winning actual prize money. Bragging rights, useful for getting a job interview?
42 Inferring the subset of examples In the previous example (1st stage), the oracle s accuracy was reported on the entire test set. What if the oracle reports log-loss on subset of the test examples but doesn t say which ones?
43 Inferring the subset of examples This can also be done as long as the evaluated subset E is fixed throughout the competition. High-level algorithm: 1. Find a single example that is a member of E. 2. Using this example, infer the size s = E. 3. When inferring Ym of each batch, we must also consider whether or not i E for i = 1,, m.
44 Inferring the subset of examples In a simulation in which we varied the floating-point precision p of the oracle, we found that even without knowing E the attacker can infer Yn with high accuracy. Baseline guess rate: 33.33%
45 Foiling the log-loss attack
46 Adaptive data analysis The previous log-loss attack is an example of (malicious) adaptive data analysis we use the performance of a previous analysis to inform the next one. Other forms of adaptive data analysis also exist, e.g., overfitting to test data by changing hyperparameters. Recent research on privacy-preserving machine learning and complexity theory has sought to find remedies.
47 Ladder algorithm (Blum & Hardt 2015) Problem (for ML community) with log-loss attack: The classifier does very well on the test set, but is useless on the true data distribution.
48 Ladder algorithm (Blum & Hardt 2015) Ladder algorithm (Blum & Hardt 2015): goal is to ensure that leaderboard accuracies reflect each classifier s true loss w.r.t. entire data distribution, not just the empirical (test-set) loss. In essence: accept a new submission only if its accuracy is significantly better than the previous one.
49 Ladder algorithm (Blum & Hardt 2015) Algorithm: Let R0 := For each classifier submission t {1,, k}: If loss(ft) < Rt-1 - η: Rt := round(loss(ft), η) Else: Rt := Rt-1 R1,, Rk are then reported as the leaderboard accuracies.
50 Ladder algorithm (Blum & Hardt 2015) Indeed, in simulation we can verify that the Ladder mechanism foils our log-loss attack: The sequence of accuracies produced by our log-loss attack is not monotonic. Since the oracle returns Rt-1 when we do not improve enough, we receive no new information from the oracle.
51 Ladder algorithm (Blum & Hardt 2015) In practice, Ladder seems not (yet?) to be used widely (or at all?). Might be unpopular with contestants since a small improvement in the true loss might be rejected. In any case, Ladder is designed to prevent sequential probing but what about one-shot attacks?
52 Exploiting an Oracle That Reports AUC Scores in Machine Learning Contests Whitehill (2016), AAAI.
53 AUC metric One of the most widely used accuracy metrics for binary classification problems is the Area Under the receiver operating characteristics Curve (AUC). 53
54 AUC metric The AUC metric has two equivalent definitions: 1. Area under TPR vs. FPR curve: 54
55 AUC metric The AUC metric has two equivalent definitions: 2. Probability of correct response in a 2- alternative forced choice task (one + example, one - example): 55
56 AUC attacks Since the AUC is a fraction of pairs, it is a rational number. Let AUC c = p/q. If contestant knows p/q exactly, what can she/he infer about the ground-truth labels? 56
57 Attack 1: Infer knowledge of n0, n1 Based on AUC c=p/q and test set size n, we can infer the set S of possible values for (n0, n1): q must divide n0n1. n0+n1 must equal n. 57
58 Example Suppose n = 100 and c = = p/q = 197/200. Then since n0+n1=n and since 200 n0n1, we know S = {(20, 80), (40, 60), (60, 40), (80, 20)}. 58
59 Attack 2 Suppose contestant knows that n 1 2 S. Suppose contestant s guesses ŷ 1,...,ŷ n obtain AUC c. Then if for all n 1 2 S, then the first k examples according to the rank order of ŷ 1,...,ŷ n must be negatively labeled. 59
60 Attack 2 Suppose contestant knows that n 1 2 S. Suppose contestant s guesses ŷ 1,...,ŷ n obtain AUC c. Then if for all n 1 2 S, then the first k examples according to the rank order of ŷ 1,...,ŷ n must be negatively labeled. Intuition: these are the guesses on which the classifier is most confident are negatively labeled. If the classifier were wrong about any of them, the AUC would be much lower. 60
61 Attack 2 Suppose contestant knows that n 1 2 S. Suppose contestant s guesses ŷ 1,...,ŷ n obtain AUC c. Then if for all n 1 2 S, then the first k examples according to the rank order of ŷ 1,...,ŷ n must be negatively labeled. Intuition: these are the guesses on which the classifier is most confident are negatively labeled. If the classifier were wrong about any of them, the AUC would be much lower. Analogous result holds for the last few examples being positive. 61
62 Example Suppose n=100, c=0.99, and the contestant knows that between 25% and 75% of examples are positive. Then the first * 5 examples must be negative, and the last * 5 examples must be positive. * according to rank order of ŷ 1,...,ŷ n 62
63 Example Suppose n=100, c=0.99, and the contestant knows that between 25% and 75% of examples are positive. Then the first * 5 examples must be negative, and the last * 5 examples must be positive. The contestant can deduce the identity of 10% of the test examples. * according to rank order of ŷ 1,...,ŷ n 63
64 Attack 2: Implications Knowing a few test labels is useful because: 1. Since you know them definitively, add them to the training set. 64
65 Attack 2: Implications Knowing a few test labels is useful because: 1. Since you know them definitively, add them to the training set. 2. They might be re-used in subsequent contests. 65
66 Attack 2: Implications Knowing a few test labels is useful because: 1. Since you know them definitively, add them to the training set. 2. They might be re-used in subsequent contests. 3. You could collude with other contestants who have deduced a few labels. 66
67 Attack 3 Search for all possible ground-truths y 1,...,y n for which the AUC of the guesses is some fixed value c. 67
68 Example Consider a tiny test set of just 4 examples. Suppose your guesses are ŷ 1 =0.5, ŷ 2 =0.6, ŷ 3 =0.9, ŷ 4 =0.4 Suppose the oracle says the accuracy (AUC) for these guesses is c=
69 Example For guesses: ŷ 1 =0.5, ŷ 2 =0.6, ŷ 3 =0.9, ŷ 4 =0.4 AUC for different labelings y 1 y 2 y 3 y 4 AUC
70 Example For guesses: ŷ 1 =0.5, ŷ 2 =0.6, ŷ 3 =0.9, ŷ 4 =0.4 The true labels must be: y 1 =1,y 2 =0,y 3 =1,y 4 =0 AUC for different labelings y 1 y 2 y 3 y 4 AUC
71 Example For guesses: ŷ 1 =0.5, ŷ 2 =0.6, ŷ 3 =0.9, ŷ 4 =0.4 The true labels must be: y 1 =1,y 2 =0,y 3 =1,y 4 =0 Contestant can now re-submit and obtain a perfect score in one shot. AUC for different labelings y 1 y 2 y 3 y 4 AUC
72 Attack 3 How many different ground-truth vectors are there such that the AUC of the guesses is some fixed number c? 72
73 Number of satisfying labelings grows exponentially in n for every AUC xxxxx c 2 (0, 1) For every fixed AUC c = p/q 2 (0, 1) on a test set of size n=4q, the number of different labelings y 1,...,y n such that f(y 1:n, ŷ 1:n )=c is at least: (2 2 c 0.5 ) n/4 73
74 Number of satisfying labelings grows exponentially in n for every AUC xxxxx c 2 (0, 1) For every fixed AUC c = p/q 2 (0, 1) on a test set of size n=4q, the number of different labelings y 1,...,y n such that f(y 1:n, ŷ 1:n )=c is at least: (2 2 c 0.5 ) n/4 What about for n 4q? Open question: Might there be some pathological combination of p, q, n0, n1 (for non-trivial n) such that the number of satisfying labelings is small? 74
75 Conclusions Given that Ladder is rarely implemented, ML practitioners (and job recruiters) should be aware of the danger of log-loss attacks in data-mining competitions. The AUC admits fundamentally different attacks from log-loss: Log-loss: decomposes across single examples. AUC: decomposes across pairs (one +, one -) of examples. Greater goal of this work is to raise awareness of the potential for cheating in machine learning contests. 75
76 Thank you 76
Climbing the Kaggle Leaderboard by Exploiting the Log-Loss Oracle
Climbing the Kaggle Leaderboard by Exploiting the Log-Loss Oracle Jacob Whitehill jrwhitehill@wpi.edu Worcester Polytechnic Institute AAAI 2018 Workshop Talk Machine learning competitions Data-mining competitions
More informationEvaluation Metrics. (Classifiers) CS229 Section Anand Avati
Evaluation Metrics (Classifiers) CS Section Anand Avati Topics Why? Binary classifiers Metrics Rank view Thresholding Confusion Matrix Point metrics: Accuracy, Precision, Recall / Sensitivity, Specificity,
More informationEvaluation. Evaluate what? For really large amounts of data... A: Use a validation set.
Evaluate what? Evaluation Charles Sutton Data Mining and Exploration Spring 2012 Do you want to evaluate a classifier or a learning algorithm? Do you want to predict accuracy or predict which one is better?
More informationCSEP 573: Artificial Intelligence
CSEP 573: Artificial Intelligence Machine Learning: Perceptron Ali Farhadi Many slides over the course adapted from Luke Zettlemoyer and Dan Klein. 1 Generative vs. Discriminative Generative classifiers:
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2016/11/16 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 08: Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu October 24, 2017 Learnt Prediction and Classification Methods Vector Data
More informationPattern recognition (4)
Pattern recognition (4) 1 Things we have discussed until now Statistical pattern recognition Building simple classifiers Supervised classification Minimum distance classifier Bayesian classifier (1D and
More informationLogistic Regression: Probabilistic Interpretation
Logistic Regression: Probabilistic Interpretation Approximate 0/1 Loss Logistic Regression Adaboost (z) SVM Solution: Approximate 0/1 loss with convex loss ( surrogate loss) 0-1 z = y w x SVM (hinge),
More informationEvaluating Classifiers
Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts
More information1 Linear programming relaxation
Cornell University, Fall 2010 CS 6820: Algorithms Lecture notes: Primal-dual min-cost bipartite matching August 27 30 1 Linear programming relaxation Recall that in the bipartite minimum-cost perfect matching
More informationLouis Fourrier Fabien Gaie Thomas Rolf
CS 229 Stay Alert! The Ford Challenge Louis Fourrier Fabien Gaie Thomas Rolf Louis Fourrier Fabien Gaie Thomas Rolf 1. Problem description a. Goal Our final project is a recent Kaggle competition submitted
More informationEvaluating Classifiers
Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts
More informationMIT Samberg Center Cambridge, MA, USA. May 30 th June 2 nd, by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA
Exploratory Machine Learning studies for disruption prediction on DIII-D by C. Rea, R.S. Granetz MIT Plasma Science and Fusion Center, Cambridge, MA, USA Presented at the 2 nd IAEA Technical Meeting on
More informationData Mining and Knowledge Discovery Practice notes 2
Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms
More information6. Advanced Topics in Computability
227 6. Advanced Topics in Computability The Church-Turing thesis gives a universally acceptable definition of algorithm Another fundamental concept in computer science is information No equally comprehensive
More informationMetrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?
Model Evaluation Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Methods for Model Comparison How to
More informationFeature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate
More informationEvaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München
Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 8.11.2017 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization
More informationThe exam is closed book, closed notes except your one-page (two-sided) cheat sheet.
CS 189 Spring 2015 Introduction to Machine Learning Final You have 2 hours 50 minutes for the exam. The exam is closed book, closed notes except your one-page (two-sided) cheat sheet. No calculators or
More informationAnnouncements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.
CS 188: Artificial Intelligence Spring 2011 Lecture 21: Perceptrons 4/13/2010 Announcements Project 4: due Friday. Final Contest: up and running! Project 5 out! Pieter Abbeel UC Berkeley Many slides adapted
More informationCS4491/CS 7265 BIG DATA ANALYTICS
CS4491/CS 7265 BIG DATA ANALYTICS EVALUATION * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Dr. Mingon Kang Computer Science, Kennesaw State University Evaluation for
More informationAlgorithmic Approaches to Preventing Overfitting in Adaptive Data Analysis. Part 1 Aaron Roth
Algorithmic Approaches to Preventing Overfitting in Adaptive Data Analysis Part 1 Aaron Roth The 2015 ImageNet competition An image classification competition during a heated war for deep learning talent
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal
More informationData Mining Classification: Alternative Techniques. Imbalanced Class Problem
Data Mining Classification: Alternative Techniques Imbalanced Class Problem Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Class Imbalance Problem Lots of classification problems
More informationCPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016
CPSC 340: Machine Learning and Data Mining Non-Parametric Models Fall 2016 Assignment 0: Admin 1 late day to hand it in tonight, 2 late days for Wednesday. Assignment 1 is out: Due Friday of next week.
More informationInformation Management course
Università degli Studi di Milano Master Degree in Computer Science Information Management course Teacher: Alberto Ceselli Lecture 20: 10/12/2015 Data Mining: Concepts and Techniques (3 rd ed.) Chapter
More informationMore about The Competition
Data Structure I More about The Competition TLE? 10 8 integer operations, or 10 7 floating point operations, in a for loop, run in around one second for a typical desktop PC. Therefore, if n = 20, then
More information1. Lecture notes on bipartite matching February 4th,
1. Lecture notes on bipartite matching February 4th, 2015 6 1.1.1 Hall s Theorem Hall s theorem gives a necessary and sufficient condition for a bipartite graph to have a matching which saturates (or matches)
More informationWhat is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.
What is Learning? CS 343: Artificial Intelligence Machine Learning Herbert Simon: Learning is any process by which a system improves performance from experience. What is the task? Classification Problem
More informationProblems 1 and 5 were graded by Amin Sorkhei, Problems 2 and 3 by Johannes Verwijnen and Problem 4 by Jyrki Kivinen. Entropy(D) = Gini(D) = 1
Problems and were graded by Amin Sorkhei, Problems and 3 by Johannes Verwijnen and Problem by Jyrki Kivinen.. [ points] (a) Gini index and Entropy are impurity measures which can be used in order to measure
More informationCS246: Mining Massive Datasets Jure Leskovec, Stanford University
CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu [Kumar et al. 99] 2/13/2013 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu
More informationSolving lexicographic multiobjective MIPs with Branch-Cut-Price
Solving lexicographic multiobjective MIPs with Branch-Cut-Price Marta Eso (The Hotchkiss School) Laszlo Ladanyi (IBM T.J. Watson Research Center) David Jensen (IBM T.J. Watson Research Center) McMaster
More informationWeka ( )
Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised
More informationRecap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach
Truth Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 2.04.205 Discriminative Approaches (5 weeks)
More informationArtificial Intelligence
Artificial Intelligence Shortest Path Problem G. Guérard Department of Nouvelles Energies Ecole Supérieur d Ingénieurs Léonard de Vinci Lecture 3 GG A.I. 1/42 Outline 1 The Shortest Path Problem Introduction
More informationFeature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.
CS 188: Artificial Intelligence Fall 2008 Lecture 24: Perceptrons II 11/24/2008 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit
More informationA Computational Theory of Clustering
A Computational Theory of Clustering Avrim Blum Carnegie Mellon University Based on work joint with Nina Balcan, Anupam Gupta, and Santosh Vempala Point of this talk A new way to theoretically analyze
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More informationCCRMA MIR Workshop 2014 Evaluating Information Retrieval Systems. Leigh M. Smith Humtap Inc.
CCRMA MIR Workshop 2014 Evaluating Information Retrieval Systems Leigh M. Smith Humtap Inc. leigh@humtap.com Basic system overview Segmentation (Frames, Onsets, Beats, Bars, Chord Changes, etc) Feature
More informationMidterm Examination CS 540-2: Introduction to Artificial Intelligence
Midterm Examination CS 54-2: Introduction to Artificial Intelligence March 9, 217 LAST NAME: FIRST NAME: Problem Score Max Score 1 15 2 17 3 12 4 6 5 12 6 14 7 15 8 9 Total 1 1 of 1 Question 1. [15] State
More informationSEQUENCES, MATHEMATICAL INDUCTION, AND RECURSION
CHAPTER 5 SEQUENCES, MATHEMATICAL INDUCTION, AND RECURSION Alessandro Artale UniBZ - http://www.inf.unibz.it/ artale/ SECTION 5.5 Application: Correctness of Algorithms Copyright Cengage Learning. All
More informationTutorials Case studies
1. Subject Three curves for the evaluation of supervised learning methods. Evaluation of classifiers is an important step of the supervised learning process. We want to measure the performance of the classifier.
More informationCS 188: Artificial Intelligence Spring Announcements
CS 188: Artificial Intelligence Spring 2011 Lecture 20: Naïve Bayes 4/11/2011 Pieter Abbeel UC Berkeley Slides adapted from Dan Klein. W4 due right now Announcements P4 out, due Friday First contest competition
More informationClassification and Regression Trees
Classification and Regression Trees David S. Rosenberg New York University April 3, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 April 3, 2018 1 / 51 Contents 1 Trees 2 Regression
More informationComputational complexity
Computational complexity Heuristic Algorithms Giovanni Righini University of Milan Department of Computer Science (Crema) Definitions: problems and instances A problem is a general question expressed in
More informationCS573 Data Privacy and Security. Differential Privacy. Li Xiong
CS573 Data Privacy and Security Differential Privacy Li Xiong Outline Differential Privacy Definition Basic techniques Composition theorems Statistical Data Privacy Non-interactive vs interactive Privacy
More informationRegularization and model selection
CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial
More informationLecture 25: Review I
Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,
More informationEvaluating Machine-Learning Methods. Goals for the lecture
Evaluating Machine-Learning Methods Mark Craven and David Page Computer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from
More informationWe show that the composite function h, h(x) = g(f(x)) is a reduction h: A m C.
219 Lemma J For all languages A, B, C the following hold i. A m A, (reflexive) ii. if A m B and B m C, then A m C, (transitive) iii. if A m B and B is Turing-recognizable, then so is A, and iv. if A m
More informationCSE 417 Network Flows (pt 4) Min Cost Flows
CSE 417 Network Flows (pt 4) Min Cost Flows Reminders > HW6 is due Monday Review of last three lectures > Defined the maximum flow problem find the feasible flow of maximum value flow is feasible if it
More informationCSE 573: Artificial Intelligence Autumn 2010
CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine
More informationChapter 7: Frequent Itemsets and Association Rules
Chapter 7: Frequent Itemsets and Association Rules Information Retrieval & Data Mining Universität des Saarlandes, Saarbrücken Winter Semester 2011/12 VII.1-1 Chapter VII: Frequent Itemsets and Association
More informationApplying Machine Learning to Real Problems: Why is it Difficult? How Research Can Help?
Applying Machine Learning to Real Problems: Why is it Difficult? How Research Can Help? Olivier Bousquet, Google, Zürich, obousquet@google.com June 4th, 2007 Outline 1 Introduction 2 Features 3 Minimax
More informationProblem 1: Complexity of Update Rules for Logistic Regression
Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox January 16 th, 2014 1
More informationDATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane
DATA MINING AND MACHINE LEARNING Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Data preprocessing Feature normalization Missing
More informationLecture 9: Support Vector Machines
Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and
More informationGradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz
Gradient Descent Wed Sept 20th, 2017 James McInenrey Adapted from slides by Francisco J. R. Ruiz Housekeeping A few clarifications of and adjustments to the course schedule: No more breaks at the midpoint
More informationIntroduction to Machine Learning
Introduction to Machine Learning Eric Medvet 16/3/2017 1/77 Outline Machine Learning: what and why? Motivating example Tree-based methods Regression trees Trees aggregation 2/77 Teachers Eric Medvet Dipartimento
More informationThe Further Mathematics Support Programme
Degree Topics in Mathematics Groups A group is a mathematical structure that satisfies certain rules, which are known as axioms. Before we look at the axioms, we will consider some terminology. Elements
More informationA Mathematical Proof. Zero Knowledge Protocols. Interactive Proof System. Other Kinds of Proofs. When referring to a proof in logic we usually mean:
A Mathematical Proof When referring to a proof in logic we usually mean: 1. A sequence of statements. 2. Based on axioms. Zero Knowledge Protocols 3. Each statement is derived via the derivation rules.
More informationZero Knowledge Protocols. c Eli Biham - May 3, Zero Knowledge Protocols (16)
Zero Knowledge Protocols c Eli Biham - May 3, 2005 442 Zero Knowledge Protocols (16) A Mathematical Proof When referring to a proof in logic we usually mean: 1. A sequence of statements. 2. Based on axioms.
More informationMedical images, segmentation and analysis
Medical images, segmentation and analysis ImageLab group http://imagelab.ing.unimo.it Università degli Studi di Modena e Reggio Emilia Medical Images Macroscopic Dermoscopic ELM enhance the features of
More informationMathematical and Algorithmic Foundations Linear Programming and Matchings
Adavnced Algorithms Lectures Mathematical and Algorithmic Foundations Linear Programming and Matchings Paul G. Spirakis Department of Computer Science University of Patras and Liverpool Paul G. Spirakis
More information(b) Linking and dynamic graph t=
1 (a) (b) (c) 2 2 2 1 1 1 6 3 4 5 6 3 4 5 6 3 4 5 7 7 7 Supplementary Figure 1: Controlling a directed tree of seven nodes. To control the whole network we need at least 3 driver nodes, which can be either
More informationWeighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD. Abstract
Weighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD Abstract There are two common main approaches to ML recommender systems, feedback-based systems and content-based systems.
More information1 Overview Definitions (read this section carefully) 2
MLPerf User Guide Version 0.5 May 2nd, 2018 1 Overview 2 1.1 Definitions (read this section carefully) 2 2 General rules 3 2.1 Strive to be fair 3 2.2 System and framework must be consistent 4 2.3 System
More informationClassification: Feature Vectors
Classification: Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free YOUR_NAME MISSPELLED FROM_FRIEND... : : : : 2 0 2 0 PIXEL 7,12
More informationAnnouncements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron
CS 188: Artificial Intelligence Spring 2010 Lecture 24: Perceptrons and More! 4/20/2010 Announcements W7 due Thursday [that s your last written for the semester!] Project 5 out Thursday Contest running
More informationAdvanced Video Content Analysis and Video Compression (5LSH0), Module 8B
Advanced Video Content Analysis and Video Compression (5LSH0), Module 8B 1 Supervised learning Catogarized / labeled data Objects in a picture: chair, desk, person, 2 Classification Fons van der Sommen
More informationSemi-supervised learning and active learning
Semi-supervised learning and active learning Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Combining classifiers Ensemble learning: a machine learning paradigm where multiple learners
More informationSupport Vector Machines.
Support Vector Machines srihari@buffalo.edu SVM Discussion Overview 1. Overview of SVMs 2. Margin Geometry 3. SVM Optimization 4. Overlapping Distributions 5. Relationship to Logistic Regression 6. Dealing
More informationPropagate the Right Thing: How Preferences Can Speed-Up Constraint Solving
Propagate the Right Thing: How Preferences Can Speed-Up Constraint Solving Christian Bessiere Anais Fabre* LIRMM-CNRS (UMR 5506) 161, rue Ada F-34392 Montpellier Cedex 5 (bessiere,fabre}@lirmm.fr Ulrich
More informationEvaluating Classifiers
Evaluating Classifiers Charles Elkan elkan@cs.ucsd.edu January 18, 2011 In a real-world application of supervised learning, we have a training set of examples with labels, and a test set of examples with
More information1. Lecture notes on bipartite matching
Massachusetts Institute of Technology 18.453: Combinatorial Optimization Michel X. Goemans February 5, 2017 1. Lecture notes on bipartite matching Matching problems are among the fundamental problems in
More informationCalibrating Random Forests
Calibrating Random Forests Henrik Boström Informatics Research Centre University of Skövde 541 28 Skövde, Sweden henrik.bostrom@his.se Abstract When using the output of classifiers to calculate the expected
More informationUnsupervised Learning
Unsupervised Learning Learning without Class Labels (or correct outputs) Density Estimation Learn P(X) given training data for X Clustering Partition data into clusters Dimensionality Reduction Discover
More informationRobust PDF Table Locator
Robust PDF Table Locator December 17, 2016 1 Introduction Data scientists rely on an abundance of tabular data stored in easy-to-machine-read formats like.csv files. Unfortunately, most government records
More informationMulti-label classification using rule-based classifier systems
Multi-label classification using rule-based classifier systems Shabnam Nazmi (PhD candidate) Department of electrical and computer engineering North Carolina A&T state university Advisor: Dr. A. Homaifar
More information6.856 Randomized Algorithms
6.856 Randomized Algorithms David Karger Handout #4, September 21, 2002 Homework 1 Solutions Problem 1 MR 1.8. (a) The min-cut algorithm given in class works because at each step it is very unlikely (probability
More informationKernel Methods & Support Vector Machines
& Support Vector Machines & Support Vector Machines Arvind Visvanathan CSCE 970 Pattern Recognition 1 & Support Vector Machines Question? Draw a single line to separate two classes? 2 & Support Vector
More informationCMPSCI 646, Information Retrieval (Fall 2003)
CMPSCI 646, Information Retrieval (Fall 2003) Midterm exam solutions Problem CO (compression) 1. The problem of text classification can be described as follows. Given a set of classes, C = {C i }, where
More informationPresentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, Dynamic Programming
Presentation for use with the textbook, Algorithm Design and Applications, by M. T. Goodrich and R. Tamassia, Wiley, 25 Dynamic Programming Terrible Fibonacci Computation Fibonacci sequence: f = f(n) 2
More informationVariables and Data Representation
You will recall that a computer program is a set of instructions that tell a computer how to transform a given set of input into a specific output. Any program, procedural, event driven or object oriented
More informationSolutions to Assignment# 4
Solutions to Assignment# 4 Liana Yepremyan 1 Nov.12: Text p. 651 problem 1 Solution: (a) One example is the following. Consider the instance K = 2 and W = {1, 2, 1, 2}. The greedy algorithm would load
More informationLinear combinations of simple classifiers for the PASCAL challenge
Linear combinations of simple classifiers for the PASCAL challenge Nik A. Melchior and David Lee 16 721 Advanced Perception The Robotics Institute Carnegie Mellon University Email: melchior@cmu.edu, dlee1@andrew.cmu.edu
More informationCSE 258. Web Mining and Recommender Systems. Advanced Recommender Systems
CSE 258 Web Mining and Recommender Systems Advanced Recommender Systems This week Methodological papers Bayesian Personalized Ranking Factorizing Personalized Markov Chains Personalized Ranking Metric
More informationCPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016
CPSC 340: Machine Learning and Data Mining Non-Parametric Models Fall 2016 Admin Course add/drop deadline tomorrow. Assignment 1 is due Friday. Setup your CS undergrad account ASAP to use Handin: https://www.cs.ubc.ca/getacct
More information1 Non greedy algorithms (which we should have covered
1 Non greedy algorithms (which we should have covered earlier) 1.1 Floyd Warshall algorithm This algorithm solves the all-pairs shortest paths problem, which is a problem where we want to find the shortest
More informationUse of Synthetic Data in Testing Administrative Records Systems
Use of Synthetic Data in Testing Administrative Records Systems K. Bradley Paxton and Thomas Hager ADI, LLC 200 Canal View Boulevard, Rochester, NY 14623 brad.paxton@adillc.net, tom.hager@adillc.net Executive
More informationNaïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others
Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Things We d Like to Do Spam Classification Given an email, predict
More informationComputational problems. Lecture 2: Combinatorial search and optimisation problems. Computational problems. Examples. Example
Lecture 2: Combinatorial search and optimisation problems Different types of computational problems Examples of computational problems Relationships between problems Computational properties of different
More informationHow Learning Differs from Optimization. Sargur N. Srihari
How Learning Differs from Optimization Sargur N. srihari@cedar.buffalo.edu 1 Topics in Optimization Optimization for Training Deep Models: Overview How learning differs from optimization Risk, empirical
More informationAlgorithms for Learning and Teaching. Sets of Vertices in Graphs. Patricia A. Evans and Michael R. Fellows. University of Victoria
Algorithms for Learning and Teaching Sets of Vertices in Graphs Patricia A. Evans and Michael R. Fellows Department of Computer Science University of Victoria Victoria, B.C. V8W 3P6, Canada Lane H. Clark
More informationNP-Complete Problems
1 / 34 NP-Complete Problems CS 584: Algorithm Design and Analysis Daniel Leblanc 1 1 Senior Adjunct Instructor Portland State University Maseeh College of Engineering and Computer Science Winter 2018 2
More information15-451/651: Design & Analysis of Algorithms October 11, 2018 Lecture #13: Linear Programming I last changed: October 9, 2018
15-451/651: Design & Analysis of Algorithms October 11, 2018 Lecture #13: Linear Programming I last changed: October 9, 2018 In this lecture, we describe a very general problem called linear programming
More informationLearning Dense Models of Query Similarity from User Click Logs
Learning Dense Models of Query Similarity from User Click Logs Fabio De Bona, Stefan Riezler*, Keith Hall, Massi Ciaramita, Amac Herdagdelen, Maria Holmqvist Google Research, Zürich *Dept. of Computational
More informationnode2vec: Scalable Feature Learning for Networks
node2vec: Scalable Feature Learning for Networks A paper by Aditya Grover and Jure Leskovec, presented at Knowledge Discovery and Data Mining 16. 11/27/2018 Presented by: Dharvi Verma CS 848: Graph Database
More informationA Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression
Journal of Data Analysis and Information Processing, 2016, 4, 55-63 Published Online May 2016 in SciRes. http://www.scirp.org/journal/jdaip http://dx.doi.org/10.4236/jdaip.2016.42005 A Comparative Study
More information