Applying Machine Learning to Real Problems: Why is it Difficult? How Research Can Help?
|
|
- Muriel Ross
- 5 years ago
- Views:
Transcription
1 Applying Machine Learning to Real Problems: Why is it Difficult? How Research Can Help? Olivier Bousquet, Google, Zürich, June 4th, 2007
2 Outline 1 Introduction 2 Features 3 Minimax Revisited 4 Time Matters
3 Goal of this talk Entertain you (Gilles request) Share experience, propose questions, not results Get feedback
4 Goal of this talk Entertain you (Gilles request) Share experience, propose questions, not results Get feedback
5 Goal of this talk Entertain you (Gilles request) Share experience, propose questions, not results Get feedback
6 Why this talk? The good thing about Machine Learning: many real-world problem can benefit almost directly from it So it should be easy to have a positive impact using ML algorithms Unfortunately, there are many obstacles when trying to do so So, what can be done?
7 Why this talk? The good thing about Machine Learning: many real-world problem can benefit almost directly from it So it should be easy to have a positive impact using ML algorithms Unfortunately, there are many obstacles when trying to do so So, what can be done?
8 Process Optimization Collect data on an industrial process Goal is to tune this process (reduce the scrap) Easy to put sensors (collect a lot of possibly irrelevant data) Hard to make controlled tests (few examples or poor exploration of the design space) Many practical constraints Requires a decision system (not prediction)
9 Process Optimization Collect data on an industrial process Goal is to tune this process (reduce the scrap) Easy to put sensors (collect a lot of possibly irrelevant data) Hard to make controlled tests (few examples or poor exploration of the design space) Many practical constraints Requires a decision system (not prediction)
10 Process Optimization Collect data on an industrial process Goal is to tune this process (reduce the scrap) Easy to put sensors (collect a lot of possibly irrelevant data) Hard to make controlled tests (few examples or poor exploration of the design space) Many practical constraints Requires a decision system (not prediction)
11 Spam Filetring Consider incoming s and classify them as spam or non-spam Not necessarily an absolute notion Large collection of instances (e.g. gmail, tens of millions of users) Huge feature space (e.g. all possible n-grams) Training and testing time need to be small Should handle special cases as well as general ones
12 Spam Filetring Consider incoming s and classify them as spam or non-spam Not necessarily an absolute notion Large collection of instances (e.g. gmail, tens of millions of users) Huge feature space (e.g. all possible n-grams) Training and testing time need to be small Should handle special cases as well as general ones
13 Spam Filetring Consider incoming s and classify them as spam or non-spam Not necessarily an absolute notion Large collection of instances (e.g. gmail, tens of millions of users) Huge feature space (e.g. all possible n-grams) Training and testing time need to be small Should handle special cases as well as general ones
14 Lessons Learned Real-world problems are often of extreme scale (too few instances, or too many, high dimensional) structured (data does not come as vectors of numbers) complex (the ML component is only a tiny part of the system) ill-defined (success criterion not necessarily the accuracy) mission-critical (require trust, validation, human intervention) buggy (data sources often corrupted)
15 Understandability is Crucial Introducing a system that can make decisions in an organization requires the system to be understandable: the relationship between training data and the model/predictions should be clear readable/interpretable: the model should be readable and easy to interpret diagnosable: if something is wrong e.g. in the data it should be visible modifiable: ability to modify the system (locally/globally) in a predictable way traceable: the decisions (e.g. for special cases) should be explained predictable: the evolution over time should be explained
16 What Does Matter? Experts should focus on their expertise and speak their own langugage No hidden assumptions, no meaningless parameter Take into account resource constraints Accuracy is good, understandability is better: understanding the behaviour of a system is more useful than being able to predict it
17 So, what would be helpful? Flexible ways to incorporate knowledge/expertise Provide tools that allow to formulate prior knowledge in a natural way Look for other types of prior assumptions that occur in various problems (e.g. manifold structure, clusteredness, analogy...) Ability to understand what is found by the algorithm (need a language to interact with experts) Investigate how to improve understandability (simpler models, separate models and language for interaction...) Improve interaction (understand user s intent) Computationally efficient algorithms Scalability, anytime Incorporate time complexity in the theoretical analysis (trade complexity for accuracy)
18 So, what would be helpful? Flexible ways to incorporate knowledge/expertise Provide tools that allow to formulate prior knowledge in a natural way Look for other types of prior assumptions that occur in various problems (e.g. manifold structure, clusteredness, analogy...) Ability to understand what is found by the algorithm (need a language to interact with experts) Investigate how to improve understandability (simpler models, separate models and language for interaction...) Improve interaction (understand user s intent) Computationally efficient algorithms Scalability, anytime Incorporate time complexity in the theoretical analysis (trade complexity for accuracy)
19 So, what would be helpful? Flexible ways to incorporate knowledge/expertise Provide tools that allow to formulate prior knowledge in a natural way Look for other types of prior assumptions that occur in various problems (e.g. manifold structure, clusteredness, analogy...) Ability to understand what is found by the algorithm (need a language to interact with experts) Investigate how to improve understandability (simpler models, separate models and language for interaction...) Improve interaction (understand user s intent) Computationally efficient algorithms Scalability, anytime Incorporate time complexity in the theoretical analysis (trade complexity for accuracy)
20 So, what would be helpful? Flexible ways to incorporate knowledge/expertise Provide tools that allow to formulate prior knowledge in a natural way Look for other types of prior assumptions that occur in various problems (e.g. manifold structure, clusteredness, analogy...) Ability to understand what is found by the algorithm (need a language to interact with experts) Investigate how to improve understandability (simpler models, separate models and language for interaction...) Improve interaction (understand user s intent) Computationally efficient algorithms Scalability, anytime Incorporate time complexity in the theoretical analysis (trade complexity for accuracy)
21 So, what would be helpful? Flexible ways to incorporate knowledge/expertise Provide tools that allow to formulate prior knowledge in a natural way Look for other types of prior assumptions that occur in various problems (e.g. manifold structure, clusteredness, analogy...) Ability to understand what is found by the algorithm (need a language to interact with experts) Investigate how to improve understandability (simpler models, separate models and language for interaction...) Improve interaction (understand user s intent) Computationally efficient algorithms Scalability, anytime Incorporate time complexity in the theoretical analysis (trade complexity for accuracy)
22 So, what would be helpful? Flexible ways to incorporate knowledge/expertise Provide tools that allow to formulate prior knowledge in a natural way Look for other types of prior assumptions that occur in various problems (e.g. manifold structure, clusteredness, analogy...) Ability to understand what is found by the algorithm (need a language to interact with experts) Investigate how to improve understandability (simpler models, separate models and language for interaction...) Improve interaction (understand user s intent) Computationally efficient algorithms Scalability, anytime Incorporate time complexity in the theoretical analysis (trade complexity for accuracy)
23 So, what would be helpful? Flexible ways to incorporate knowledge/expertise Provide tools that allow to formulate prior knowledge in a natural way Look for other types of prior assumptions that occur in various problems (e.g. manifold structure, clusteredness, analogy...) Ability to understand what is found by the algorithm (need a language to interact with experts) Investigate how to improve understandability (simpler models, separate models and language for interaction...) Improve interaction (understand user s intent) Computationally efficient algorithms Scalability, anytime Incorporate time complexity in the theoretical analysis (trade complexity for accuracy)
24 So, what would be helpful? Flexible ways to incorporate knowledge/expertise Provide tools that allow to formulate prior knowledge in a natural way Look for other types of prior assumptions that occur in various problems (e.g. manifold structure, clusteredness, analogy...) Ability to understand what is found by the algorithm (need a language to interact with experts) Investigate how to improve understandability (simpler models, separate models and language for interaction...) Improve interaction (understand user s intent) Computationally efficient algorithms Scalability, anytime Incorporate time complexity in the theoretical analysis (trade complexity for accuracy)
25 So, what would be helpful? Flexible ways to incorporate knowledge/expertise Provide tools that allow to formulate prior knowledge in a natural way Look for other types of prior assumptions that occur in various problems (e.g. manifold structure, clusteredness, analogy...) Ability to understand what is found by the algorithm (need a language to interact with experts) Investigate how to improve understandability (simpler models, separate models and language for interaction...) Improve interaction (understand user s intent) Computationally efficient algorithms Scalability, anytime Incorporate time complexity in the theoretical analysis (trade complexity for accuracy)
26 Outline 1 Introduction 2 Features 3 Minimax Revisited 4 Time Matters
27 Data and Features Matter More than Algorithms Statement: The time spent cleaning the data and engineering features may lead to much larger improvements than the time spent on fine-tuning the algorithm Example: Spam filtering using the content of the message (humans are very good at it, but it would take a lot of data to learn this from scratch), or using the fact that the sender IP is bad (would filter 90% of the spam without the need for any learning, just a lookup table). Issue: choice and construction of features is an engineering problem and requires expertise
28 Data and Features Matter More than Algorithms Statement: The time spent cleaning the data and engineering features may lead to much larger improvements than the time spent on fine-tuning the algorithm Example: Spam filtering using the content of the message (humans are very good at it, but it would take a lot of data to learn this from scratch), or using the fact that the sender IP is bad (would filter 90% of the spam without the need for any learning, just a lookup table). Issue: choice and construction of features is an engineering problem and requires expertise
29 Data and Features Matter More than Algorithms Statement: The time spent cleaning the data and engineering features may lead to much larger improvements than the time spent on fine-tuning the algorithm Example: Spam filtering using the content of the message (humans are very good at it, but it would take a lot of data to learn this from scratch), or using the fact that the sender IP is bad (would filter 90% of the spam without the need for any learning, just a lookup table). Issue: choice and construction of features is an engineering problem and requires expertise
30 Key question Instead of suggesting features, we can at least provide a way to assess their quality Given a feature X and a response Y the question is How are those two quantities X and Y related? Ideally: causality (active research area), otherwise: correlation
31 Key question Instead of suggesting features, we can at least provide a way to assess their quality Given a feature X and a response Y the question is How are those two quantities X and Y related? Ideally: causality (active research area), otherwise: correlation
32 Key question Instead of suggesting features, we can at least provide a way to assess their quality Given a feature X and a response Y the question is How are those two quantities X and Y related? Ideally: causality (active research area), otherwise: correlation
33 Possible Approach How to quantify the relationship between two variables X and Y from a sample? First idea: try to estimate some quantity like I (X : Y ) However this does not take into account the structure (two close X values correspond to two close Y values) Second idea: consider inf f E( f (X ) Y ) However, this works only if the class is restricted Third idea: choose F, consider inf f F E( f (X ) Y ) However this does not take into account the sample size Fourth idea: choose an algorithm f n, consider E n E( f n (X ) Y ) Of course it highly depends on the algorithm. But this is where we can specify some prior assumption.
34 Possible Approach How to quantify the relationship between two variables X and Y from a sample? First idea: try to estimate some quantity like I (X : Y ) However this does not take into account the structure (two close X values correspond to two close Y values) Second idea: consider inf f E( f (X ) Y ) However, this works only if the class is restricted Third idea: choose F, consider inf f F E( f (X ) Y ) However this does not take into account the sample size Fourth idea: choose an algorithm f n, consider E n E( f n (X ) Y ) Of course it highly depends on the algorithm. But this is where we can specify some prior assumption.
35 Possible Approach How to quantify the relationship between two variables X and Y from a sample? First idea: try to estimate some quantity like I (X : Y ) However this does not take into account the structure (two close X values correspond to two close Y values) Second idea: consider inf f E( f (X ) Y ) However, this works only if the class is restricted Third idea: choose F, consider inf f F E( f (X ) Y ) However this does not take into account the sample size Fourth idea: choose an algorithm f n, consider E n E( f n (X ) Y ) Of course it highly depends on the algorithm. But this is where we can specify some prior assumption.
36 Possible Approach How to quantify the relationship between two variables X and Y from a sample? First idea: try to estimate some quantity like I (X : Y ) However this does not take into account the structure (two close X values correspond to two close Y values) Second idea: consider inf f E( f (X ) Y ) However, this works only if the class is restricted Third idea: choose F, consider inf f F E( f (X ) Y ) However this does not take into account the sample size Fourth idea: choose an algorithm f n, consider E n E( f n (X ) Y ) Of course it highly depends on the algorithm. But this is where we can specify some prior assumption.
37 Possible Approach How to quantify the relationship between two variables X and Y from a sample? First idea: try to estimate some quantity like I (X : Y ) However this does not take into account the structure (two close X values correspond to two close Y values) Second idea: consider inf f E( f (X ) Y ) However, this works only if the class is restricted Third idea: choose F, consider inf f F E( f (X ) Y ) However this does not take into account the sample size Fourth idea: choose an algorithm f n, consider E n E( f n (X ) Y ) Of course it highly depends on the algorithm. But this is where we can specify some prior assumption.
38 Possible Approach How to quantify the relationship between two variables X and Y from a sample? First idea: try to estimate some quantity like I (X : Y ) However this does not take into account the structure (two close X values correspond to two close Y values) Second idea: consider inf f E( f (X ) Y ) However, this works only if the class is restricted Third idea: choose F, consider inf f F E( f (X ) Y ) However this does not take into account the sample size Fourth idea: choose an algorithm f n, consider E n E( f n (X ) Y ) Of course it highly depends on the algorithm. But this is where we can specify some prior assumption.
39 Why looking at subsets of features? Simplicity: quickly identify simple relationships Interpretability: combinations of few features easier to interpret Exploration, correction of obvious mistakes, visualization Understanding correlations and further causality
40 Outline 1 Introduction 2 Features 3 Minimax Revisited 4 Time Matters
41 Priors Algorithm design is composed of two steps Choosing a preference This first step is based on knowledge of the problem, this is where guidance (but no theory) is needed. Exploiting it for inference The second step can possibly be formalized (optimality with respect to assumptions). The main issue is computational cost.
42 Priors Algorithm design is composed of two steps Choosing a preference This first step is based on knowledge of the problem, this is where guidance (but no theory) is needed. Exploiting it for inference The second step can possibly be formalized (optimality with respect to assumptions). The main issue is computational cost.
43 Requirements Facing a problem, an expert should focus on his area of expertise Parameters should make sense or be self-tuned Choosing an algorithm should not be the expert s task (or assumptions encoded into the algorithms should be clear)
44 Requirements Facing a problem, an expert should focus on his area of expertise Parameters should make sense or be self-tuned Choosing an algorithm should not be the expert s task (or assumptions encoded into the algorithms should be clear)
45 Requirements Facing a problem, an expert should focus on his area of expertise Parameters should make sense or be self-tuned Choosing an algorithm should not be the expert s task (or assumptions encoded into the algorithms should be clear)
46 Key question We choose some objective measure of success and have some prior preference or expectation (e.g. we consider that it is prefereable to use linear functions) Given this objective, which algorithm gets closer to this objective in all circumstances? Theory cannot tell what is the right assumption But it can tell how to best exploit the assumptions
47 The Bayesian Way Assume something about how the data is generated Consider an algorithm specifically tuned to this property Prove that under this assumption the algorithm does well Most results are going in this direction (sometimes in a subtle way) Bayesian algorithms Most minimax results are of this form inf sup {g n} P P ( L(g n ) inf L(g) g Seems reasonable and useful for understanding but does not provide guarantees )
48 The Bayesian Way Assume something about how the data is generated Consider an algorithm specifically tuned to this property Prove that under this assumption the algorithm does well Most results are going in this direction (sometimes in a subtle way) Bayesian algorithms Most minimax results are of this form inf sup {g n} P P ( L(g n ) inf L(g) g Seems reasonable and useful for understanding but does not provide guarantees )
49 The Bayesian Way Assume something about how the data is generated Consider an algorithm specifically tuned to this property Prove that under this assumption the algorithm does well Most results are going in this direction (sometimes in a subtle way) Bayesian algorithms Most minimax results are of this form inf sup {g n} P P ( L(g n ) inf L(g) g Seems reasonable and useful for understanding but does not provide guarantees )
50 The Bayesian Way Assume something about how the data is generated Consider an algorithm specifically tuned to this property Prove that under this assumption the algorithm does well Most results are going in this direction (sometimes in a subtle way) Bayesian algorithms Most minimax results are of this form inf sup {g n} P P ( L(g n ) inf L(g) g Seems reasonable and useful for understanding but does not provide guarantees )
51 The Bayesian Way Assume something about how the data is generated Consider an algorithm specifically tuned to this property Prove that under this assumption the algorithm does well Most results are going in this direction (sometimes in a subtle way) Bayesian algorithms Most minimax results are of this form inf sup {g n} P P ( L(g n ) inf L(g) g Seems reasonable and useful for understanding but does not provide guarantees )
52 The Bayesian Way Assume something about how the data is generated Consider an algorithm specifically tuned to this property Prove that under this assumption the algorithm does well Most results are going in this direction (sometimes in a subtle way) Bayesian algorithms Most minimax results are of this form inf sup {g n} P P ( L(g n ) inf L(g) g Seems reasonable and useful for understanding but does not provide guarantees )
53 The Bayesian Way Assume something about how the data is generated Consider an algorithm specifically tuned to this property Prove that under this assumption the algorithm does well Most results are going in this direction (sometimes in a subtle way) Bayesian algorithms Most minimax results are of this form inf sup {g n} P P ( L(g n ) inf L(g) g Seems reasonable and useful for understanding but does not provide guarantees )
54 The Worst Case Way Assume nothing about the data (distribution-free) Restrict your objectives Derive an algorithm that reaches this objective no matter how the data is generated ( ) inf sup L(g n ) inf L(g) {g n} P g G Gives guarantees
55 The Worst Case Way Assume nothing about the data (distribution-free) Restrict your objectives Derive an algorithm that reaches this objective no matter how the data is generated ( ) inf sup L(g n ) inf L(g) {g n} P g G Gives guarantees
56 The Worst Case Way Assume nothing about the data (distribution-free) Restrict your objectives Derive an algorithm that reaches this objective no matter how the data is generated ( ) inf sup L(g n ) inf L(g) {g n} P g G Gives guarantees
57 The Worst Case Way Assume nothing about the data (distribution-free) Restrict your objectives Derive an algorithm that reaches this objective no matter how the data is generated ( ) inf sup L(g n ) inf L(g) {g n} P g G Gives guarantees
58 Going Further Prior preference can be more than a simple restriction Given a preference, what is the algorithm that does the best job? p(k) specifies how much regret one has in the fact that G k is the best class ( ) inf max L(g n ) inf L(g) p(k) k g G k sup {g n} P
59 Outline 1 Introduction 2 Features 3 Minimax Revisited 4 Time Matters
60 What are the constraints? Real-world problems are resource-constrained (computation, memory and data). Approaches to model this: Asymptotic results: no constraints PAC learning: polynomial time constraint (in n and ɛ, δ) to reach accuracy ɛ with probability 1 δ, assuming Bayes classifier in the class Convergence rates: no computational constraint, best accuracy with constrained sample size (data) Can we go further?
61 What are the constraints? Real-world problems are resource-constrained (computation, memory and data). Approaches to model this: Asymptotic results: no constraints PAC learning: polynomial time constraint (in n and ɛ, δ) to reach accuracy ɛ with probability 1 δ, assuming Bayes classifier in the class Convergence rates: no computational constraint, best accuracy with constrained sample size (data) Can we go further?
62 What are the constraints? Real-world problems are resource-constrained (computation, memory and data). Approaches to model this: Asymptotic results: no constraints PAC learning: polynomial time constraint (in n and ɛ, δ) to reach accuracy ɛ with probability 1 δ, assuming Bayes classifier in the class Convergence rates: no computational constraint, best accuracy with constrained sample size (data) Can we go further?
63 What are the constraints? Real-world problems are resource-constrained (computation, memory and data). Approaches to model this: Asymptotic results: no constraints PAC learning: polynomial time constraint (in n and ɛ, δ) to reach accuracy ɛ with probability 1 δ, assuming Bayes classifier in the class Convergence rates: no computational constraint, best accuracy with constrained sample size (data) Can we go further?
64 Key Question For practical purposes, we need to answer the following question: Given these computational resources, which algorithms gets closer to the objective under all circumstances? The question is not what accuracy you can reach with a given number of examples But rather what accuracy you can reach with a given set of resources (computation time/memory)
65 Key Question For practical purposes, we need to answer the following question: Given these computational resources, which algorithms gets closer to the objective under all circumstances? The question is not what accuracy you can reach with a given number of examples But rather what accuracy you can reach with a given set of resources (computation time/memory)
66 Possible Approach Decompose Error into three terms: Approximation, Estimation, Optimization Assume infinite supply of examples, but limited computation time An algorithm may choose to request more data or to process the one it has The goal is to be close to the Bayes classifier, not to the empirical minimizer
67 Possible Approach Decompose Error into three terms: Approximation, Estimation, Optimization Assume infinite supply of examples, but limited computation time An algorithm may choose to request more data or to process the one it has The goal is to be close to the Bayes classifier, not to the empirical minimizer
68 Possible Approach Decompose Error into three terms: Approximation, Estimation, Optimization Assume infinite supply of examples, but limited computation time An algorithm may choose to request more data or to process the one it has The goal is to be close to the Bayes classifier, not to the empirical minimizer
69 Possible Approach Decompose Error into three terms: Approximation, Estimation, Optimization Assume infinite supply of examples, but limited computation time An algorithm may choose to request more data or to process the one it has The goal is to be close to the Bayes classifier, not to the empirical minimizer
70 Formalization E( f n ) E(f ) = ( E(f F) E(f ) ) (Approximation) + ( E(f n ) E(f F) ) (Estimation) + ( E( f n ) E(f n ) ) (Optimization) min F,ρ,n E app + E est + E opt subject to T (F, ρ, n) T max
71 Application Batch gradient: iteration( cost ) Nd, iterations to reach optimization error ρ: O log 1 ρ, estimation error d/n, hence T = O ( d log 1 ) ɛ 2 ɛ Stochastic gradient: iteration cost d, iterations to reach optimization error ρ: O (1/ρ), hence T = O ( ) d ɛ 2 Refinements depending on the loss function, noise conditions...
72 Take Home Messages Make assumptions explicit when estimating relationship between features and response Assumptions should concern how to evaluate, not how reality is. Once measure is clear, what is the best algorithm (independent of how reality is)? Furthermore, what is the best algorithm given resource constraints?
73 Take Home Messages Make assumptions explicit when estimating relationship between features and response Assumptions should concern how to evaluate, not how reality is. Once measure is clear, what is the best algorithm (independent of how reality is)? Furthermore, what is the best algorithm given resource constraints?
74 Take Home Messages Make assumptions explicit when estimating relationship between features and response Assumptions should concern how to evaluate, not how reality is. Once measure is clear, what is the best algorithm (independent of how reality is)? Furthermore, what is the best algorithm given resource constraints?
Hyperparameter optimization. CS6787 Lecture 6 Fall 2017
Hyperparameter optimization CS6787 Lecture 6 Fall 2017 Review We ve covered many methods Stochastic gradient descent Step size/learning rate, how long to run Mini-batching Batch size Momentum Momentum
More informationMODEL SELECTION AND REGULARIZATION PARAMETER CHOICE
MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE REGULARIZATION METHODS FOR HIGH DIMENSIONAL LEARNING Francesca Odone and Lorenzo Rosasco odone@disi.unige.it - lrosasco@mit.edu June 3, 2013 ABOUT THIS
More informationRelational inductive biases, deep learning, and graph networks
Relational inductive biases, deep learning, and graph networks Peter Battaglia et al. 2018 1 What The authors explore how we can combine relational inductive biases and DL. They introduce graph network
More informationClassification: Feature Vectors
Classification: Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free YOUR_NAME MISSPELLED FROM_FRIEND... : : : : 2 0 2 0 PIXEL 7,12
More informationBuilding Classifiers using Bayesian Networks
Building Classifiers using Bayesian Networks Nir Friedman and Moises Goldszmidt 1997 Presented by Brian Collins and Lukas Seitlinger Paper Summary The Naive Bayes classifier has reasonable performance
More informationTransductive Learning: Motivation, Model, Algorithms
Transductive Learning: Motivation, Model, Algorithms Olivier Bousquet Centre de Mathématiques Appliquées Ecole Polytechnique, FRANCE olivier.bousquet@m4x.org University of New Mexico, January 2002 Goal
More informationNatural Language Processing
Natural Language Processing Classification III Dan Klein UC Berkeley 1 Classification 2 Linear Models: Perceptron The perceptron algorithm Iteratively processes the training set, reacting to training errors
More informationRegularization and model selection
CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial
More informationSimple Model Selection Cross Validation Regularization Neural Networks
Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February
More informationFeature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.
CS 188: Artificial Intelligence Fall 2008 Lecture 24: Perceptrons II 11/24/2008 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit
More informationMODEL SELECTION AND REGULARIZATION PARAMETER CHOICE
MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE REGULARIZATION METHODS FOR HIGH DIMENSIONAL LEARNING Francesca Odone and Lorenzo Rosasco odone@disi.unige.it - lrosasco@mit.edu June 6, 2011 ABOUT THIS
More informationAM205: lecture 2. 1 These have been shifted to MD 323 for the rest of the semester.
AM205: lecture 2 Luna and Gary will hold a Python tutorial on Wednesday in 60 Oxford Street, Room 330 Assignment 1 will be posted this week Chris will hold office hours on Thursday (1:30pm 3:30pm, Pierce
More information4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.
1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when
More informationOptimization Methods for Machine Learning (OMML)
Optimization Methods for Machine Learning (OMML) 2nd lecture Prof. L. Palagi References: 1. Bishop Pattern Recognition and Machine Learning, Springer, 2006 (Chap 1) 2. V. Cherlassky, F. Mulier - Learning
More informationCase Study 1: Estimating Click Probabilities
Case Study 1: Estimating Click Probabilities SGD cont d AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade March 31, 2015 1 Support/Resources Office Hours Yao Lu:
More informationBayes Net Learning. EECS 474 Fall 2016
Bayes Net Learning EECS 474 Fall 2016 Homework Remaining Homework #3 assigned Homework #4 will be about semi-supervised learning and expectation-maximization Homeworks #3-#4: the how of Graphical Models
More informationHow to speed up a database which has gotten slow
Triad Area, NC USA E-mail: info@geniusone.com Web: http://geniusone.com How to speed up a database which has gotten slow hardware OS database parameters Blob fields Indices table design / table contents
More informationGaussian Processes for Robotics. McGill COMP 765 Oct 24 th, 2017
Gaussian Processes for Robotics McGill COMP 765 Oct 24 th, 2017 A robot must learn Modeling the environment is sometimes an end goal: Space exploration Disaster recovery Environmental monitoring Other
More informationStatistical Techniques in Robotics (16-831, F10) Lecture #02 (Thursday, August 28) Bayes Filtering
Statistical Techniques in Robotics (16-831, F10) Lecture #02 (Thursday, August 28) Bayes Filtering Lecturer: Drew Bagnell Scribes: Pranay Agrawal, Trevor Decker, and Humphrey Hu 1 1 A Brief Example Let
More informationA Course in Machine Learning
A Course in Machine Learning Hal Daumé III 13 UNSUPERVISED LEARNING If you have access to labeled training data, you know what to do. This is the supervised setting, in which you have a teacher telling
More informationExtracting and Composing Robust Features with Denoising Autoencoders
Presenter: Alexander Truong March 16, 2017 Extracting and Composing Robust Features with Denoising Autoencoders Pascal Vincent, Hugo Larochelle, Yoshua Bengio, Pierre-Antoine Manzagol 1 Outline Introduction
More information1. Lecture notes on bipartite matching February 4th,
1. Lecture notes on bipartite matching February 4th, 2015 6 1.1.1 Hall s Theorem Hall s theorem gives a necessary and sufficient condition for a bipartite graph to have a matching which saturates (or matches)
More informationA Brief Look at Optimization
A Brief Look at Optimization CSC 412/2506 Tutorial David Madras January 18, 2018 Slides adapted from last year s version Overview Introduction Classes of optimization problems Linear programming Steepest
More informationDetecting Network Intrusions
Detecting Network Intrusions Naveen Krishnamurthi, Kevin Miller Stanford University, Computer Science {naveenk1, kmiller4}@stanford.edu Abstract The purpose of this project is to create a predictive model
More informationMissing Data. Where did it go?
Missing Data Where did it go? 1 Learning Objectives High-level discussion of some techniques Identify type of missingness Single vs Multiple Imputation My favourite technique 2 Problem Uh data are missing
More informationDeep Learning. Architecture Design for. Sargur N. Srihari
Architecture Design for Deep Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation
More informationCSE 573: Artificial Intelligence Autumn 2010
CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine
More informationSimulation Calibration with Correlated Knowledge-Gradients
Simulation Calibration with Correlated Knowledge-Gradients Peter Frazier Warren Powell Hugo Simão Operations Research & Information Engineering, Cornell University Operations Research & Financial Engineering,
More informationChapter 2 Overview of the Design Methodology
Chapter 2 Overview of the Design Methodology This chapter presents an overview of the design methodology which is developed in this thesis, by identifying global abstraction levels at which a distributed
More informationJoint Entity Resolution
Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute
More informationRobust Signal-Structure Reconstruction
Robust Signal-Structure Reconstruction V. Chetty 1, D. Hayden 2, J. Gonçalves 2, and S. Warnick 1 1 Information and Decision Algorithms Laboratories, Brigham Young University 2 Control Group, Department
More informationSparse & Redundant Representations and Their Applications in Signal and Image Processing
Sparse & Redundant Representations and Their Applications in Signal and Image Processing Sparseland: An Estimation Point of View Michael Elad The Computer Science Department The Technion Israel Institute
More informationUsing Subspace Constraints to Improve Feature Tracking Presented by Bryan Poling. Based on work by Bryan Poling, Gilad Lerman, and Arthur Szlam
Presented by Based on work by, Gilad Lerman, and Arthur Szlam What is Tracking? Broad Definition Tracking, or Object tracking, is a general term for following some thing through multiple frames of a video
More informationInteraction Design. Task Analysis & Modelling
Interaction Design Task Analysis & Modelling This Lecture Conducting task analysis Constructing task models Understanding the shortcomings of task analysis Task Analysis for Interaction Design Find out
More informationCSEP 573: Artificial Intelligence
CSEP 573: Artificial Intelligence Machine Learning: Perceptron Ali Farhadi Many slides over the course adapted from Luke Zettlemoyer and Dan Klein. 1 Generative vs. Discriminative Generative classifiers:
More informationAdministrivia. Added 20 more so far. Software Process. Only one TA so far. CS169 Lecture 2. Start thinking about project proposal
Administrivia Software Process CS169 Lecture 2 Added 20 more so far Will limit enrollment to ~65 students Only one TA so far Start thinking about project proposal Bonus points for proposals that will be
More information1 Counting triangles and cliques
ITCSC-INC Winter School 2015 26 January 2014 notes by Andrej Bogdanov Today we will talk about randomness and some of the surprising roles it plays in the theory of computing and in coding theory. Let
More informationEnsemble methods in machine learning. Example. Neural networks. Neural networks
Ensemble methods in machine learning Bootstrap aggregating (bagging) train an ensemble of models based on randomly resampled versions of the training set, then take a majority vote Example What if you
More informationRandom projection for non-gaussian mixture models
Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,
More informationCombine the PA Algorithm with a Proximal Classifier
Combine the Passive and Aggressive Algorithm with a Proximal Classifier Yuh-Jye Lee Joint work with Y.-C. Tseng Dept. of Computer Science & Information Engineering TaiwanTech. Dept. of Statistics@NCKU
More informationDM6 Support Vector Machines
DM6 Support Vector Machines Outline Large margin linear classifier Linear separable Nonlinear separable Creating nonlinear classifiers: kernel trick Discussion on SVM Conclusion SVM: LARGE MARGIN LINEAR
More informationAlgorithms for convex optimization
Algorithms for convex optimization Michal Kočvara Institute of Information Theory and Automation Academy of Sciences of the Czech Republic and Czech Technical University kocvara@utia.cas.cz http://www.utia.cas.cz/kocvara
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationLecture 17. Lower bound for variable-length source codes with error. Coding a sequence of symbols: Rates and scheme (Arithmetic code)
Lecture 17 Agenda for the lecture Lower bound for variable-length source codes with error Coding a sequence of symbols: Rates and scheme (Arithmetic code) Introduction to universal codes 17.1 variable-length
More information/ Approximation Algorithms Lecturer: Michael Dinitz Topic: Linear Programming Date: 2/24/15 Scribe: Runze Tang
600.469 / 600.669 Approximation Algorithms Lecturer: Michael Dinitz Topic: Linear Programming Date: 2/24/15 Scribe: Runze Tang 9.1 Linear Programming Suppose we are trying to approximate a minimization
More informationToday. Gradient descent for minimization of functions of real variables. Multi-dimensional scaling. Self-organizing maps
Today Gradient descent for minimization of functions of real variables. Multi-dimensional scaling Self-organizing maps Gradient Descent Derivatives Consider function f(x) : R R. The derivative w.r.t. x
More informationPrinciples of AI Planning. Principles of AI Planning. 7.1 How to obtain a heuristic. 7.2 Relaxed planning tasks. 7.1 How to obtain a heuristic
Principles of AI Planning June 8th, 2010 7. Planning as search: relaxed planning tasks Principles of AI Planning 7. Planning as search: relaxed planning tasks Malte Helmert and Bernhard Nebel 7.1 How to
More informationRobust Regression. Robust Data Mining Techniques By Boonyakorn Jantaranuson
Robust Regression Robust Data Mining Techniques By Boonyakorn Jantaranuson Outline Introduction OLS and important terminology Least Median of Squares (LMedS) M-estimator Penalized least squares What is
More informationCPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017
CPSC 340: Machine Learning and Data Mining Probabilistic Classification Fall 2017 Admin Assignment 0 is due tonight: you should be almost done. 1 late day to hand it in Monday, 2 late days for Wednesday.
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences
More informationIntroduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14
600.363 Introduction to Algorithms / 600.463 Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14 23.1 Introduction We spent last week proving that for certain problems,
More informationRepresentation Learning for Clustering: A Statistical Framework
Representation Learning for Clustering: A Statistical Framework Hassan Ashtiani School of Computer Science University of Waterloo mhzokaei@uwaterloo.ca Shai Ben-David School of Computer Science University
More informationThe Comparative Study of Machine Learning Algorithms in Text Data Classification*
The Comparative Study of Machine Learning Algorithms in Text Data Classification* Wang Xin School of Science, Beijing Information Science and Technology University Beijing, China Abstract Classification
More informationSafety verification for deep neural networks
Safety verification for deep neural networks Marta Kwiatkowska Department of Computer Science, University of Oxford UC Berkeley, 8 th November 2016 Setting the scene Deep neural networks have achieved
More informationSlides for Data Mining by I. H. Witten and E. Frank
Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-
More informationCPSC 340: Machine Learning and Data Mining. More Regularization Fall 2017
CPSC 340: Machine Learning and Data Mining More Regularization Fall 2017 Assignment 3: Admin Out soon, due Friday of next week. Midterm: You can view your exam during instructor office hours or after class
More informationDiscrete Optimization. Lecture Notes 2
Discrete Optimization. Lecture Notes 2 Disjunctive Constraints Defining variables and formulating linear constraints can be straightforward or more sophisticated, depending on the problem structure. The
More informationBoosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989]
Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 3 rd, 2007 1 Boosting [Schapire, 1989] Idea: given a weak
More informationAnnouncements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.
CS 188: Artificial Intelligence Spring 2011 Lecture 21: Perceptrons 4/13/2010 Announcements Project 4: due Friday. Final Contest: up and running! Project 5 out! Pieter Abbeel UC Berkeley Many slides adapted
More informationMetaheuristic Development Methodology. Fall 2009 Instructor: Dr. Masoud Yaghini
Metaheuristic Development Methodology Fall 2009 Instructor: Dr. Masoud Yaghini Phases and Steps Phases and Steps Phase 1: Understanding Problem Step 1: State the Problem Step 2: Review of Existing Solution
More informationCPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2016
CPSC 34: Machine Learning and Data Mining Feature Selection Fall 26 Assignment 3: Admin Solutions will be posted after class Wednesday. Extra office hours Thursday: :3-2 and 4:3-6 in X836. Midterm Friday:
More informationECE521 Lecture 18 Graphical Models Hidden Markov Models
ECE521 Lecture 18 Graphical Models Hidden Markov Models Outline Graphical models Conditional independence Conditional independence after marginalization Sequence models hidden Markov models 2 Graphical
More information1 Achieving IND-CPA security
ISA 562: Information Security, Theory and Practice Lecture 2 1 Achieving IND-CPA security 1.1 Pseudorandom numbers, and stateful encryption As we saw last time, the OTP is perfectly secure, but it forces
More informationChapter S:II. II. Search Space Representation
Chapter S:II II. Search Space Representation Systematic Search Encoding of Problems State-Space Representation Problem-Reduction Representation Choosing a Representation S:II-1 Search Space Representation
More informationLimitations of Algorithmic Solvability In this Chapter we investigate the power of algorithms to solve problems Some can be solved algorithmically and
Computer Language Theory Chapter 4: Decidability 1 Limitations of Algorithmic Solvability In this Chapter we investigate the power of algorithms to solve problems Some can be solved algorithmically and
More informationKernels and Clustering
Kernels and Clustering Robert Platt Northeastern University All slides in this file are adapted from CS188 UC Berkeley Case-Based Learning Non-Separable Data Case-Based Reasoning Classification from similarity
More informationThe Bizarre Truth! Automating the Automation. Complicated & Confusing taxonomy of Model Based Testing approach A CONFORMIQ WHITEPAPER
The Bizarre Truth! Complicated & Confusing taxonomy of Model Based Testing approach A CONFORMIQ WHITEPAPER By Kimmo Nupponen 1 TABLE OF CONTENTS 1. The context Introduction 2. The approach Know the difference
More informationClassification: Linear Discriminant Functions
Classification: Linear Discriminant Functions CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Discriminant functions Linear Discriminant functions
More informationData Mining and Data Warehousing Classification-Lazy Learners
Motivation Data Mining and Data Warehousing Classification-Lazy Learners Lazy Learners are the most intuitive type of learners and are used in many practical scenarios. The reason of their popularity is
More informationWeighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD. Abstract
Weighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD Abstract There are two common main approaches to ML recommender systems, feedback-based systems and content-based systems.
More informationThe Encoding Complexity of Network Coding
The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network
More informationStructured Models in. Dan Huttenlocher. June 2010
Structured Models in Computer Vision i Dan Huttenlocher June 2010 Structured Models Problems where output variables are mutually dependent or constrained E.g., spatial or temporal relations Such dependencies
More informationEmpirical risk minimization (ERM) A first model of learning. The excess risk. Getting a uniform guarantee
A first model of learning Let s restrict our attention to binary classification our labels belong to (or ) Empirical risk minimization (ERM) Recall the definitions of risk/empirical risk We observe the
More informationChapter 10. Conclusion Discussion
Chapter 10 Conclusion 10.1 Discussion Question 1: Usually a dynamic system has delays and feedback. Can OMEGA handle systems with infinite delays, and with elastic delays? OMEGA handles those systems with
More informationEfficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002 1225 Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms S. Sathiya Keerthi Abstract This paper
More informationParallel Build Visualization Diagnosing and Troubleshooting Common Pitfalls of Parallel Builds
Parallel Build Visualization Diagnosing and Troubleshooting Common Pitfalls of Parallel Builds John Graham-Cumming Chief Scientist Electric Cloud, Inc. February, 2006 Contents Parallel Build Visualization...
More informationReddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011
Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions
More informationHigh Performance Computing
The Need for Parallelism High Performance Computing David McCaughan, HPC Analyst SHARCNET, University of Guelph dbm@sharcnet.ca Scientific investigation traditionally takes two forms theoretical empirical
More informationSystem Configuration. Paul Anderson. publications/oslo-2008a-talk.pdf I V E R S I U N T Y T H
E U N I V E R S I System Configuration T H O T Y H F G Paul Anderson E D I N B U R dcspaul@ed.ac.uk http://homepages.inf.ed.ac.uk/dcspaul/ publications/oslo-2008a-talk.pdf System Configuration What is
More informationlow bias high variance high bias low variance error test set training set high low Model Complexity Typical Behaviour Lecture 11:
Lecture 11: Overfitting and Capacity Control high bias low variance Typical Behaviour low bias high variance Sam Roweis error test set training set November 23, 4 low Model Complexity high Generalization,
More informationUsing Arithmetic of Real Numbers to Explore Limits and Continuity
Using Arithmetic of Real Numbers to Explore Limits and Continuity by Maria Terrell Cornell University Problem Let a =.898989... and b =.000000... (a) Find a + b. (b) Use your ideas about how to add a and
More informationConfidence sharing: an economic strategy for efficient information flows in animal groups 1
1 / 35 Confidence sharing: an economic strategy for efficient information flows in animal groups 1 Amos Korman 2 CNRS and University Paris Diderot 1 Appears in PLoS Computational Biology, Oct. 2014 2 Joint
More informationThe Interaction. Using Norman s model. Donald Norman s model of interaction. Human error - slips and mistakes. Seven stages
The Interaction Interaction models Ergonomics Interaction styles Donald Norman s model of interaction Seven stages execution user establishes the goal formulates intention specifies actions at interface
More informationAnnouncements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron
CS 188: Artificial Intelligence Spring 2010 Lecture 24: Perceptrons and More! 4/20/2010 Announcements W7 due Thursday [that s your last written for the semester!] Project 5 out Thursday Contest running
More information/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18
601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18 22.1 Introduction We spent the last two lectures proving that for certain problems, we can
More informationOO Development and Maintenance Complexity. Daniel M. Berry based on a paper by Eliezer Kantorowitz
OO Development and Maintenance Complexity Daniel M. Berry based on a paper by Eliezer Kantorowitz Traditional Complexity Measures Traditionally, Time Complexity Space Complexity Both use theoretical measures,
More informationTheory and Algorithms Introduction: insertion sort, merge sort
Theory and Algorithms Introduction: insertion sort, merge sort Rafael Ramirez rafael@iua.upf.es Analysis of algorithms The theoretical study of computer-program performance and resource usage. What s also
More informationUser-Centered Design Data Entry
User-Centered Design Data Entry CS 4640 Programming Languages for Web Applications [The Design of Everyday Things, Don Norman, Ch 7] 1 Seven Principles for Making Hard Things Easy 1. Use knowledge in the
More informationA Computational Theory of Clustering
A Computational Theory of Clustering Avrim Blum Carnegie Mellon University Based on work joint with Nina Balcan, Anupam Gupta, and Santosh Vempala Point of this talk A new way to theoretically analyze
More information(Refer Slide Time: 1:27)
Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture 1 Introduction to Data Structures and Algorithms Welcome to data
More informationKernel Methods & Support Vector Machines
& Support Vector Machines & Support Vector Machines Arvind Visvanathan CSCE 970 Pattern Recognition 1 & Support Vector Machines Question? Draw a single line to separate two classes? 2 & Support Vector
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 5 Inference
More informationBayesian Methods in Vision: MAP Estimation, MRFs, Optimization
Bayesian Methods in Vision: MAP Estimation, MRFs, Optimization CS 650: Computer Vision Bryan S. Morse Optimization Approaches to Vision / Image Processing Recurring theme: Cast vision problem as an optimization
More informationCMPT 882 Week 3 Summary
CMPT 882 Week 3 Summary! Artificial Neural Networks (ANNs) are networks of interconnected simple units that are based on a greatly simplified model of the brain. ANNs are useful learning tools by being
More informationApplication of Support Vector Machine Algorithm in Spam Filtering
Application of Support Vector Machine Algorithm in E-Mail Spam Filtering Julia Bluszcz, Daria Fitisova, Alexander Hamann, Alexey Trifonov, Advisor: Patrick Jähnichen Abstract The problem of spam classification
More information14.1 Encoding for different models of computation
Lecture 14 Decidable languages In the previous lecture we discussed some examples of encoding schemes, through which various objects can be represented by strings over a given alphabet. We will begin this
More informationLecture #3: PageRank Algorithm The Mathematics of Google Search
Lecture #3: PageRank Algorithm The Mathematics of Google Search We live in a computer era. Internet is part of our everyday lives and information is only a click away. Just open your favorite search engine,
More informationChapter 9. Software Testing
Chapter 9. Software Testing Table of Contents Objectives... 1 Introduction to software testing... 1 The testers... 2 The developers... 2 An independent testing team... 2 The customer... 2 Principles of
More informationNaïve Bayes for text classification
Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Kernels and Clustering Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.
More information