Applying Machine Learning to Real Problems: Why is it Difficult? How Research Can Help?

Size: px
Start display at page:

Download "Applying Machine Learning to Real Problems: Why is it Difficult? How Research Can Help?"

Transcription

1 Applying Machine Learning to Real Problems: Why is it Difficult? How Research Can Help? Olivier Bousquet, Google, Zürich, June 4th, 2007

2 Outline 1 Introduction 2 Features 3 Minimax Revisited 4 Time Matters

3 Goal of this talk Entertain you (Gilles request) Share experience, propose questions, not results Get feedback

4 Goal of this talk Entertain you (Gilles request) Share experience, propose questions, not results Get feedback

5 Goal of this talk Entertain you (Gilles request) Share experience, propose questions, not results Get feedback

6 Why this talk? The good thing about Machine Learning: many real-world problem can benefit almost directly from it So it should be easy to have a positive impact using ML algorithms Unfortunately, there are many obstacles when trying to do so So, what can be done?

7 Why this talk? The good thing about Machine Learning: many real-world problem can benefit almost directly from it So it should be easy to have a positive impact using ML algorithms Unfortunately, there are many obstacles when trying to do so So, what can be done?

8 Process Optimization Collect data on an industrial process Goal is to tune this process (reduce the scrap) Easy to put sensors (collect a lot of possibly irrelevant data) Hard to make controlled tests (few examples or poor exploration of the design space) Many practical constraints Requires a decision system (not prediction)

9 Process Optimization Collect data on an industrial process Goal is to tune this process (reduce the scrap) Easy to put sensors (collect a lot of possibly irrelevant data) Hard to make controlled tests (few examples or poor exploration of the design space) Many practical constraints Requires a decision system (not prediction)

10 Process Optimization Collect data on an industrial process Goal is to tune this process (reduce the scrap) Easy to put sensors (collect a lot of possibly irrelevant data) Hard to make controlled tests (few examples or poor exploration of the design space) Many practical constraints Requires a decision system (not prediction)

11 Spam Filetring Consider incoming s and classify them as spam or non-spam Not necessarily an absolute notion Large collection of instances (e.g. gmail, tens of millions of users) Huge feature space (e.g. all possible n-grams) Training and testing time need to be small Should handle special cases as well as general ones

12 Spam Filetring Consider incoming s and classify them as spam or non-spam Not necessarily an absolute notion Large collection of instances (e.g. gmail, tens of millions of users) Huge feature space (e.g. all possible n-grams) Training and testing time need to be small Should handle special cases as well as general ones

13 Spam Filetring Consider incoming s and classify them as spam or non-spam Not necessarily an absolute notion Large collection of instances (e.g. gmail, tens of millions of users) Huge feature space (e.g. all possible n-grams) Training and testing time need to be small Should handle special cases as well as general ones

14 Lessons Learned Real-world problems are often of extreme scale (too few instances, or too many, high dimensional) structured (data does not come as vectors of numbers) complex (the ML component is only a tiny part of the system) ill-defined (success criterion not necessarily the accuracy) mission-critical (require trust, validation, human intervention) buggy (data sources often corrupted)

15 Understandability is Crucial Introducing a system that can make decisions in an organization requires the system to be understandable: the relationship between training data and the model/predictions should be clear readable/interpretable: the model should be readable and easy to interpret diagnosable: if something is wrong e.g. in the data it should be visible modifiable: ability to modify the system (locally/globally) in a predictable way traceable: the decisions (e.g. for special cases) should be explained predictable: the evolution over time should be explained

16 What Does Matter? Experts should focus on their expertise and speak their own langugage No hidden assumptions, no meaningless parameter Take into account resource constraints Accuracy is good, understandability is better: understanding the behaviour of a system is more useful than being able to predict it

17 So, what would be helpful? Flexible ways to incorporate knowledge/expertise Provide tools that allow to formulate prior knowledge in a natural way Look for other types of prior assumptions that occur in various problems (e.g. manifold structure, clusteredness, analogy...) Ability to understand what is found by the algorithm (need a language to interact with experts) Investigate how to improve understandability (simpler models, separate models and language for interaction...) Improve interaction (understand user s intent) Computationally efficient algorithms Scalability, anytime Incorporate time complexity in the theoretical analysis (trade complexity for accuracy)

18 So, what would be helpful? Flexible ways to incorporate knowledge/expertise Provide tools that allow to formulate prior knowledge in a natural way Look for other types of prior assumptions that occur in various problems (e.g. manifold structure, clusteredness, analogy...) Ability to understand what is found by the algorithm (need a language to interact with experts) Investigate how to improve understandability (simpler models, separate models and language for interaction...) Improve interaction (understand user s intent) Computationally efficient algorithms Scalability, anytime Incorporate time complexity in the theoretical analysis (trade complexity for accuracy)

19 So, what would be helpful? Flexible ways to incorporate knowledge/expertise Provide tools that allow to formulate prior knowledge in a natural way Look for other types of prior assumptions that occur in various problems (e.g. manifold structure, clusteredness, analogy...) Ability to understand what is found by the algorithm (need a language to interact with experts) Investigate how to improve understandability (simpler models, separate models and language for interaction...) Improve interaction (understand user s intent) Computationally efficient algorithms Scalability, anytime Incorporate time complexity in the theoretical analysis (trade complexity for accuracy)

20 So, what would be helpful? Flexible ways to incorporate knowledge/expertise Provide tools that allow to formulate prior knowledge in a natural way Look for other types of prior assumptions that occur in various problems (e.g. manifold structure, clusteredness, analogy...) Ability to understand what is found by the algorithm (need a language to interact with experts) Investigate how to improve understandability (simpler models, separate models and language for interaction...) Improve interaction (understand user s intent) Computationally efficient algorithms Scalability, anytime Incorporate time complexity in the theoretical analysis (trade complexity for accuracy)

21 So, what would be helpful? Flexible ways to incorporate knowledge/expertise Provide tools that allow to formulate prior knowledge in a natural way Look for other types of prior assumptions that occur in various problems (e.g. manifold structure, clusteredness, analogy...) Ability to understand what is found by the algorithm (need a language to interact with experts) Investigate how to improve understandability (simpler models, separate models and language for interaction...) Improve interaction (understand user s intent) Computationally efficient algorithms Scalability, anytime Incorporate time complexity in the theoretical analysis (trade complexity for accuracy)

22 So, what would be helpful? Flexible ways to incorporate knowledge/expertise Provide tools that allow to formulate prior knowledge in a natural way Look for other types of prior assumptions that occur in various problems (e.g. manifold structure, clusteredness, analogy...) Ability to understand what is found by the algorithm (need a language to interact with experts) Investigate how to improve understandability (simpler models, separate models and language for interaction...) Improve interaction (understand user s intent) Computationally efficient algorithms Scalability, anytime Incorporate time complexity in the theoretical analysis (trade complexity for accuracy)

23 So, what would be helpful? Flexible ways to incorporate knowledge/expertise Provide tools that allow to formulate prior knowledge in a natural way Look for other types of prior assumptions that occur in various problems (e.g. manifold structure, clusteredness, analogy...) Ability to understand what is found by the algorithm (need a language to interact with experts) Investigate how to improve understandability (simpler models, separate models and language for interaction...) Improve interaction (understand user s intent) Computationally efficient algorithms Scalability, anytime Incorporate time complexity in the theoretical analysis (trade complexity for accuracy)

24 So, what would be helpful? Flexible ways to incorporate knowledge/expertise Provide tools that allow to formulate prior knowledge in a natural way Look for other types of prior assumptions that occur in various problems (e.g. manifold structure, clusteredness, analogy...) Ability to understand what is found by the algorithm (need a language to interact with experts) Investigate how to improve understandability (simpler models, separate models and language for interaction...) Improve interaction (understand user s intent) Computationally efficient algorithms Scalability, anytime Incorporate time complexity in the theoretical analysis (trade complexity for accuracy)

25 So, what would be helpful? Flexible ways to incorporate knowledge/expertise Provide tools that allow to formulate prior knowledge in a natural way Look for other types of prior assumptions that occur in various problems (e.g. manifold structure, clusteredness, analogy...) Ability to understand what is found by the algorithm (need a language to interact with experts) Investigate how to improve understandability (simpler models, separate models and language for interaction...) Improve interaction (understand user s intent) Computationally efficient algorithms Scalability, anytime Incorporate time complexity in the theoretical analysis (trade complexity for accuracy)

26 Outline 1 Introduction 2 Features 3 Minimax Revisited 4 Time Matters

27 Data and Features Matter More than Algorithms Statement: The time spent cleaning the data and engineering features may lead to much larger improvements than the time spent on fine-tuning the algorithm Example: Spam filtering using the content of the message (humans are very good at it, but it would take a lot of data to learn this from scratch), or using the fact that the sender IP is bad (would filter 90% of the spam without the need for any learning, just a lookup table). Issue: choice and construction of features is an engineering problem and requires expertise

28 Data and Features Matter More than Algorithms Statement: The time spent cleaning the data and engineering features may lead to much larger improvements than the time spent on fine-tuning the algorithm Example: Spam filtering using the content of the message (humans are very good at it, but it would take a lot of data to learn this from scratch), or using the fact that the sender IP is bad (would filter 90% of the spam without the need for any learning, just a lookup table). Issue: choice and construction of features is an engineering problem and requires expertise

29 Data and Features Matter More than Algorithms Statement: The time spent cleaning the data and engineering features may lead to much larger improvements than the time spent on fine-tuning the algorithm Example: Spam filtering using the content of the message (humans are very good at it, but it would take a lot of data to learn this from scratch), or using the fact that the sender IP is bad (would filter 90% of the spam without the need for any learning, just a lookup table). Issue: choice and construction of features is an engineering problem and requires expertise

30 Key question Instead of suggesting features, we can at least provide a way to assess their quality Given a feature X and a response Y the question is How are those two quantities X and Y related? Ideally: causality (active research area), otherwise: correlation

31 Key question Instead of suggesting features, we can at least provide a way to assess their quality Given a feature X and a response Y the question is How are those two quantities X and Y related? Ideally: causality (active research area), otherwise: correlation

32 Key question Instead of suggesting features, we can at least provide a way to assess their quality Given a feature X and a response Y the question is How are those two quantities X and Y related? Ideally: causality (active research area), otherwise: correlation

33 Possible Approach How to quantify the relationship between two variables X and Y from a sample? First idea: try to estimate some quantity like I (X : Y ) However this does not take into account the structure (two close X values correspond to two close Y values) Second idea: consider inf f E( f (X ) Y ) However, this works only if the class is restricted Third idea: choose F, consider inf f F E( f (X ) Y ) However this does not take into account the sample size Fourth idea: choose an algorithm f n, consider E n E( f n (X ) Y ) Of course it highly depends on the algorithm. But this is where we can specify some prior assumption.

34 Possible Approach How to quantify the relationship between two variables X and Y from a sample? First idea: try to estimate some quantity like I (X : Y ) However this does not take into account the structure (two close X values correspond to two close Y values) Second idea: consider inf f E( f (X ) Y ) However, this works only if the class is restricted Third idea: choose F, consider inf f F E( f (X ) Y ) However this does not take into account the sample size Fourth idea: choose an algorithm f n, consider E n E( f n (X ) Y ) Of course it highly depends on the algorithm. But this is where we can specify some prior assumption.

35 Possible Approach How to quantify the relationship between two variables X and Y from a sample? First idea: try to estimate some quantity like I (X : Y ) However this does not take into account the structure (two close X values correspond to two close Y values) Second idea: consider inf f E( f (X ) Y ) However, this works only if the class is restricted Third idea: choose F, consider inf f F E( f (X ) Y ) However this does not take into account the sample size Fourth idea: choose an algorithm f n, consider E n E( f n (X ) Y ) Of course it highly depends on the algorithm. But this is where we can specify some prior assumption.

36 Possible Approach How to quantify the relationship between two variables X and Y from a sample? First idea: try to estimate some quantity like I (X : Y ) However this does not take into account the structure (two close X values correspond to two close Y values) Second idea: consider inf f E( f (X ) Y ) However, this works only if the class is restricted Third idea: choose F, consider inf f F E( f (X ) Y ) However this does not take into account the sample size Fourth idea: choose an algorithm f n, consider E n E( f n (X ) Y ) Of course it highly depends on the algorithm. But this is where we can specify some prior assumption.

37 Possible Approach How to quantify the relationship between two variables X and Y from a sample? First idea: try to estimate some quantity like I (X : Y ) However this does not take into account the structure (two close X values correspond to two close Y values) Second idea: consider inf f E( f (X ) Y ) However, this works only if the class is restricted Third idea: choose F, consider inf f F E( f (X ) Y ) However this does not take into account the sample size Fourth idea: choose an algorithm f n, consider E n E( f n (X ) Y ) Of course it highly depends on the algorithm. But this is where we can specify some prior assumption.

38 Possible Approach How to quantify the relationship between two variables X and Y from a sample? First idea: try to estimate some quantity like I (X : Y ) However this does not take into account the structure (two close X values correspond to two close Y values) Second idea: consider inf f E( f (X ) Y ) However, this works only if the class is restricted Third idea: choose F, consider inf f F E( f (X ) Y ) However this does not take into account the sample size Fourth idea: choose an algorithm f n, consider E n E( f n (X ) Y ) Of course it highly depends on the algorithm. But this is where we can specify some prior assumption.

39 Why looking at subsets of features? Simplicity: quickly identify simple relationships Interpretability: combinations of few features easier to interpret Exploration, correction of obvious mistakes, visualization Understanding correlations and further causality

40 Outline 1 Introduction 2 Features 3 Minimax Revisited 4 Time Matters

41 Priors Algorithm design is composed of two steps Choosing a preference This first step is based on knowledge of the problem, this is where guidance (but no theory) is needed. Exploiting it for inference The second step can possibly be formalized (optimality with respect to assumptions). The main issue is computational cost.

42 Priors Algorithm design is composed of two steps Choosing a preference This first step is based on knowledge of the problem, this is where guidance (but no theory) is needed. Exploiting it for inference The second step can possibly be formalized (optimality with respect to assumptions). The main issue is computational cost.

43 Requirements Facing a problem, an expert should focus on his area of expertise Parameters should make sense or be self-tuned Choosing an algorithm should not be the expert s task (or assumptions encoded into the algorithms should be clear)

44 Requirements Facing a problem, an expert should focus on his area of expertise Parameters should make sense or be self-tuned Choosing an algorithm should not be the expert s task (or assumptions encoded into the algorithms should be clear)

45 Requirements Facing a problem, an expert should focus on his area of expertise Parameters should make sense or be self-tuned Choosing an algorithm should not be the expert s task (or assumptions encoded into the algorithms should be clear)

46 Key question We choose some objective measure of success and have some prior preference or expectation (e.g. we consider that it is prefereable to use linear functions) Given this objective, which algorithm gets closer to this objective in all circumstances? Theory cannot tell what is the right assumption But it can tell how to best exploit the assumptions

47 The Bayesian Way Assume something about how the data is generated Consider an algorithm specifically tuned to this property Prove that under this assumption the algorithm does well Most results are going in this direction (sometimes in a subtle way) Bayesian algorithms Most minimax results are of this form inf sup {g n} P P ( L(g n ) inf L(g) g Seems reasonable and useful for understanding but does not provide guarantees )

48 The Bayesian Way Assume something about how the data is generated Consider an algorithm specifically tuned to this property Prove that under this assumption the algorithm does well Most results are going in this direction (sometimes in a subtle way) Bayesian algorithms Most minimax results are of this form inf sup {g n} P P ( L(g n ) inf L(g) g Seems reasonable and useful for understanding but does not provide guarantees )

49 The Bayesian Way Assume something about how the data is generated Consider an algorithm specifically tuned to this property Prove that under this assumption the algorithm does well Most results are going in this direction (sometimes in a subtle way) Bayesian algorithms Most minimax results are of this form inf sup {g n} P P ( L(g n ) inf L(g) g Seems reasonable and useful for understanding but does not provide guarantees )

50 The Bayesian Way Assume something about how the data is generated Consider an algorithm specifically tuned to this property Prove that under this assumption the algorithm does well Most results are going in this direction (sometimes in a subtle way) Bayesian algorithms Most minimax results are of this form inf sup {g n} P P ( L(g n ) inf L(g) g Seems reasonable and useful for understanding but does not provide guarantees )

51 The Bayesian Way Assume something about how the data is generated Consider an algorithm specifically tuned to this property Prove that under this assumption the algorithm does well Most results are going in this direction (sometimes in a subtle way) Bayesian algorithms Most minimax results are of this form inf sup {g n} P P ( L(g n ) inf L(g) g Seems reasonable and useful for understanding but does not provide guarantees )

52 The Bayesian Way Assume something about how the data is generated Consider an algorithm specifically tuned to this property Prove that under this assumption the algorithm does well Most results are going in this direction (sometimes in a subtle way) Bayesian algorithms Most minimax results are of this form inf sup {g n} P P ( L(g n ) inf L(g) g Seems reasonable and useful for understanding but does not provide guarantees )

53 The Bayesian Way Assume something about how the data is generated Consider an algorithm specifically tuned to this property Prove that under this assumption the algorithm does well Most results are going in this direction (sometimes in a subtle way) Bayesian algorithms Most minimax results are of this form inf sup {g n} P P ( L(g n ) inf L(g) g Seems reasonable and useful for understanding but does not provide guarantees )

54 The Worst Case Way Assume nothing about the data (distribution-free) Restrict your objectives Derive an algorithm that reaches this objective no matter how the data is generated ( ) inf sup L(g n ) inf L(g) {g n} P g G Gives guarantees

55 The Worst Case Way Assume nothing about the data (distribution-free) Restrict your objectives Derive an algorithm that reaches this objective no matter how the data is generated ( ) inf sup L(g n ) inf L(g) {g n} P g G Gives guarantees

56 The Worst Case Way Assume nothing about the data (distribution-free) Restrict your objectives Derive an algorithm that reaches this objective no matter how the data is generated ( ) inf sup L(g n ) inf L(g) {g n} P g G Gives guarantees

57 The Worst Case Way Assume nothing about the data (distribution-free) Restrict your objectives Derive an algorithm that reaches this objective no matter how the data is generated ( ) inf sup L(g n ) inf L(g) {g n} P g G Gives guarantees

58 Going Further Prior preference can be more than a simple restriction Given a preference, what is the algorithm that does the best job? p(k) specifies how much regret one has in the fact that G k is the best class ( ) inf max L(g n ) inf L(g) p(k) k g G k sup {g n} P

59 Outline 1 Introduction 2 Features 3 Minimax Revisited 4 Time Matters

60 What are the constraints? Real-world problems are resource-constrained (computation, memory and data). Approaches to model this: Asymptotic results: no constraints PAC learning: polynomial time constraint (in n and ɛ, δ) to reach accuracy ɛ with probability 1 δ, assuming Bayes classifier in the class Convergence rates: no computational constraint, best accuracy with constrained sample size (data) Can we go further?

61 What are the constraints? Real-world problems are resource-constrained (computation, memory and data). Approaches to model this: Asymptotic results: no constraints PAC learning: polynomial time constraint (in n and ɛ, δ) to reach accuracy ɛ with probability 1 δ, assuming Bayes classifier in the class Convergence rates: no computational constraint, best accuracy with constrained sample size (data) Can we go further?

62 What are the constraints? Real-world problems are resource-constrained (computation, memory and data). Approaches to model this: Asymptotic results: no constraints PAC learning: polynomial time constraint (in n and ɛ, δ) to reach accuracy ɛ with probability 1 δ, assuming Bayes classifier in the class Convergence rates: no computational constraint, best accuracy with constrained sample size (data) Can we go further?

63 What are the constraints? Real-world problems are resource-constrained (computation, memory and data). Approaches to model this: Asymptotic results: no constraints PAC learning: polynomial time constraint (in n and ɛ, δ) to reach accuracy ɛ with probability 1 δ, assuming Bayes classifier in the class Convergence rates: no computational constraint, best accuracy with constrained sample size (data) Can we go further?

64 Key Question For practical purposes, we need to answer the following question: Given these computational resources, which algorithms gets closer to the objective under all circumstances? The question is not what accuracy you can reach with a given number of examples But rather what accuracy you can reach with a given set of resources (computation time/memory)

65 Key Question For practical purposes, we need to answer the following question: Given these computational resources, which algorithms gets closer to the objective under all circumstances? The question is not what accuracy you can reach with a given number of examples But rather what accuracy you can reach with a given set of resources (computation time/memory)

66 Possible Approach Decompose Error into three terms: Approximation, Estimation, Optimization Assume infinite supply of examples, but limited computation time An algorithm may choose to request more data or to process the one it has The goal is to be close to the Bayes classifier, not to the empirical minimizer

67 Possible Approach Decompose Error into three terms: Approximation, Estimation, Optimization Assume infinite supply of examples, but limited computation time An algorithm may choose to request more data or to process the one it has The goal is to be close to the Bayes classifier, not to the empirical minimizer

68 Possible Approach Decompose Error into three terms: Approximation, Estimation, Optimization Assume infinite supply of examples, but limited computation time An algorithm may choose to request more data or to process the one it has The goal is to be close to the Bayes classifier, not to the empirical minimizer

69 Possible Approach Decompose Error into three terms: Approximation, Estimation, Optimization Assume infinite supply of examples, but limited computation time An algorithm may choose to request more data or to process the one it has The goal is to be close to the Bayes classifier, not to the empirical minimizer

70 Formalization E( f n ) E(f ) = ( E(f F) E(f ) ) (Approximation) + ( E(f n ) E(f F) ) (Estimation) + ( E( f n ) E(f n ) ) (Optimization) min F,ρ,n E app + E est + E opt subject to T (F, ρ, n) T max

71 Application Batch gradient: iteration( cost ) Nd, iterations to reach optimization error ρ: O log 1 ρ, estimation error d/n, hence T = O ( d log 1 ) ɛ 2 ɛ Stochastic gradient: iteration cost d, iterations to reach optimization error ρ: O (1/ρ), hence T = O ( ) d ɛ 2 Refinements depending on the loss function, noise conditions...

72 Take Home Messages Make assumptions explicit when estimating relationship between features and response Assumptions should concern how to evaluate, not how reality is. Once measure is clear, what is the best algorithm (independent of how reality is)? Furthermore, what is the best algorithm given resource constraints?

73 Take Home Messages Make assumptions explicit when estimating relationship between features and response Assumptions should concern how to evaluate, not how reality is. Once measure is clear, what is the best algorithm (independent of how reality is)? Furthermore, what is the best algorithm given resource constraints?

74 Take Home Messages Make assumptions explicit when estimating relationship between features and response Assumptions should concern how to evaluate, not how reality is. Once measure is clear, what is the best algorithm (independent of how reality is)? Furthermore, what is the best algorithm given resource constraints?

Hyperparameter optimization. CS6787 Lecture 6 Fall 2017

Hyperparameter optimization. CS6787 Lecture 6 Fall 2017 Hyperparameter optimization CS6787 Lecture 6 Fall 2017 Review We ve covered many methods Stochastic gradient descent Step size/learning rate, how long to run Mini-batching Batch size Momentum Momentum

More information

MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE

MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE REGULARIZATION METHODS FOR HIGH DIMENSIONAL LEARNING Francesca Odone and Lorenzo Rosasco odone@disi.unige.it - lrosasco@mit.edu June 3, 2013 ABOUT THIS

More information

Relational inductive biases, deep learning, and graph networks

Relational inductive biases, deep learning, and graph networks Relational inductive biases, deep learning, and graph networks Peter Battaglia et al. 2018 1 What The authors explore how we can combine relational inductive biases and DL. They introduce graph network

More information

Classification: Feature Vectors

Classification: Feature Vectors Classification: Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free YOUR_NAME MISSPELLED FROM_FRIEND... : : : : 2 0 2 0 PIXEL 7,12

More information

Building Classifiers using Bayesian Networks

Building Classifiers using Bayesian Networks Building Classifiers using Bayesian Networks Nir Friedman and Moises Goldszmidt 1997 Presented by Brian Collins and Lukas Seitlinger Paper Summary The Naive Bayes classifier has reasonable performance

More information

Transductive Learning: Motivation, Model, Algorithms

Transductive Learning: Motivation, Model, Algorithms Transductive Learning: Motivation, Model, Algorithms Olivier Bousquet Centre de Mathématiques Appliquées Ecole Polytechnique, FRANCE olivier.bousquet@m4x.org University of New Mexico, January 2002 Goal

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Classification III Dan Klein UC Berkeley 1 Classification 2 Linear Models: Perceptron The perceptron algorithm Iteratively processes the training set, reacting to training errors

More information

Regularization and model selection

Regularization and model selection CS229 Lecture notes Andrew Ng Part VI Regularization and model selection Suppose we are trying select among several different models for a learning problem. For instance, we might be using a polynomial

More information

Simple Model Selection Cross Validation Regularization Neural Networks

Simple Model Selection Cross Validation Regularization Neural Networks Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February

More information

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule. CS 188: Artificial Intelligence Fall 2008 Lecture 24: Perceptrons II 11/24/2008 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit

More information

MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE

MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE MODEL SELECTION AND REGULARIZATION PARAMETER CHOICE REGULARIZATION METHODS FOR HIGH DIMENSIONAL LEARNING Francesca Odone and Lorenzo Rosasco odone@disi.unige.it - lrosasco@mit.edu June 6, 2011 ABOUT THIS

More information

AM205: lecture 2. 1 These have been shifted to MD 323 for the rest of the semester.

AM205: lecture 2. 1 These have been shifted to MD 323 for the rest of the semester. AM205: lecture 2 Luna and Gary will hold a Python tutorial on Wednesday in 60 Oxford Street, Room 330 Assignment 1 will be posted this week Chris will hold office hours on Thursday (1:30pm 3:30pm, Pierce

More information

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used. 1 4.12 Generalization In back-propagation learning, as many training examples as possible are typically used. It is hoped that the network so designed generalizes well. A network generalizes well when

More information

Optimization Methods for Machine Learning (OMML)

Optimization Methods for Machine Learning (OMML) Optimization Methods for Machine Learning (OMML) 2nd lecture Prof. L. Palagi References: 1. Bishop Pattern Recognition and Machine Learning, Springer, 2006 (Chap 1) 2. V. Cherlassky, F. Mulier - Learning

More information

Case Study 1: Estimating Click Probabilities

Case Study 1: Estimating Click Probabilities Case Study 1: Estimating Click Probabilities SGD cont d AdaGrad Machine Learning for Big Data CSE547/STAT548, University of Washington Sham Kakade March 31, 2015 1 Support/Resources Office Hours Yao Lu:

More information

Bayes Net Learning. EECS 474 Fall 2016

Bayes Net Learning. EECS 474 Fall 2016 Bayes Net Learning EECS 474 Fall 2016 Homework Remaining Homework #3 assigned Homework #4 will be about semi-supervised learning and expectation-maximization Homeworks #3-#4: the how of Graphical Models

More information

How to speed up a database which has gotten slow

How to speed up a database which has gotten slow Triad Area, NC USA E-mail: info@geniusone.com Web: http://geniusone.com How to speed up a database which has gotten slow hardware OS database parameters Blob fields Indices table design / table contents

More information

Gaussian Processes for Robotics. McGill COMP 765 Oct 24 th, 2017

Gaussian Processes for Robotics. McGill COMP 765 Oct 24 th, 2017 Gaussian Processes for Robotics McGill COMP 765 Oct 24 th, 2017 A robot must learn Modeling the environment is sometimes an end goal: Space exploration Disaster recovery Environmental monitoring Other

More information

Statistical Techniques in Robotics (16-831, F10) Lecture #02 (Thursday, August 28) Bayes Filtering

Statistical Techniques in Robotics (16-831, F10) Lecture #02 (Thursday, August 28) Bayes Filtering Statistical Techniques in Robotics (16-831, F10) Lecture #02 (Thursday, August 28) Bayes Filtering Lecturer: Drew Bagnell Scribes: Pranay Agrawal, Trevor Decker, and Humphrey Hu 1 1 A Brief Example Let

More information

A Course in Machine Learning

A Course in Machine Learning A Course in Machine Learning Hal Daumé III 13 UNSUPERVISED LEARNING If you have access to labeled training data, you know what to do. This is the supervised setting, in which you have a teacher telling

More information

Extracting and Composing Robust Features with Denoising Autoencoders

Extracting and Composing Robust Features with Denoising Autoencoders Presenter: Alexander Truong March 16, 2017 Extracting and Composing Robust Features with Denoising Autoencoders Pascal Vincent, Hugo Larochelle, Yoshua Bengio, Pierre-Antoine Manzagol 1 Outline Introduction

More information

1. Lecture notes on bipartite matching February 4th,

1. Lecture notes on bipartite matching February 4th, 1. Lecture notes on bipartite matching February 4th, 2015 6 1.1.1 Hall s Theorem Hall s theorem gives a necessary and sufficient condition for a bipartite graph to have a matching which saturates (or matches)

More information

A Brief Look at Optimization

A Brief Look at Optimization A Brief Look at Optimization CSC 412/2506 Tutorial David Madras January 18, 2018 Slides adapted from last year s version Overview Introduction Classes of optimization problems Linear programming Steepest

More information

Detecting Network Intrusions

Detecting Network Intrusions Detecting Network Intrusions Naveen Krishnamurthi, Kevin Miller Stanford University, Computer Science {naveenk1, kmiller4}@stanford.edu Abstract The purpose of this project is to create a predictive model

More information

Missing Data. Where did it go?

Missing Data. Where did it go? Missing Data Where did it go? 1 Learning Objectives High-level discussion of some techniques Identify type of missingness Single vs Multiple Imputation My favourite technique 2 Problem Uh data are missing

More information

Deep Learning. Architecture Design for. Sargur N. Srihari

Deep Learning. Architecture Design for. Sargur N. Srihari Architecture Design for Deep Learning Sargur N. srihari@cedar.buffalo.edu 1 Topics Overview 1. Example: Learning XOR 2. Gradient-Based Learning 3. Hidden Units 4. Architecture Design 5. Backpropagation

More information

CSE 573: Artificial Intelligence Autumn 2010

CSE 573: Artificial Intelligence Autumn 2010 CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine

More information

Simulation Calibration with Correlated Knowledge-Gradients

Simulation Calibration with Correlated Knowledge-Gradients Simulation Calibration with Correlated Knowledge-Gradients Peter Frazier Warren Powell Hugo Simão Operations Research & Information Engineering, Cornell University Operations Research & Financial Engineering,

More information

Chapter 2 Overview of the Design Methodology

Chapter 2 Overview of the Design Methodology Chapter 2 Overview of the Design Methodology This chapter presents an overview of the design methodology which is developed in this thesis, by identifying global abstraction levels at which a distributed

More information

Joint Entity Resolution

Joint Entity Resolution Joint Entity Resolution Steven Euijong Whang, Hector Garcia-Molina Computer Science Department, Stanford University 353 Serra Mall, Stanford, CA 94305, USA {swhang, hector}@cs.stanford.edu No Institute

More information

Robust Signal-Structure Reconstruction

Robust Signal-Structure Reconstruction Robust Signal-Structure Reconstruction V. Chetty 1, D. Hayden 2, J. Gonçalves 2, and S. Warnick 1 1 Information and Decision Algorithms Laboratories, Brigham Young University 2 Control Group, Department

More information

Sparse & Redundant Representations and Their Applications in Signal and Image Processing

Sparse & Redundant Representations and Their Applications in Signal and Image Processing Sparse & Redundant Representations and Their Applications in Signal and Image Processing Sparseland: An Estimation Point of View Michael Elad The Computer Science Department The Technion Israel Institute

More information

Using Subspace Constraints to Improve Feature Tracking Presented by Bryan Poling. Based on work by Bryan Poling, Gilad Lerman, and Arthur Szlam

Using Subspace Constraints to Improve Feature Tracking Presented by Bryan Poling. Based on work by Bryan Poling, Gilad Lerman, and Arthur Szlam Presented by Based on work by, Gilad Lerman, and Arthur Szlam What is Tracking? Broad Definition Tracking, or Object tracking, is a general term for following some thing through multiple frames of a video

More information

Interaction Design. Task Analysis & Modelling

Interaction Design. Task Analysis & Modelling Interaction Design Task Analysis & Modelling This Lecture Conducting task analysis Constructing task models Understanding the shortcomings of task analysis Task Analysis for Interaction Design Find out

More information

CSEP 573: Artificial Intelligence

CSEP 573: Artificial Intelligence CSEP 573: Artificial Intelligence Machine Learning: Perceptron Ali Farhadi Many slides over the course adapted from Luke Zettlemoyer and Dan Klein. 1 Generative vs. Discriminative Generative classifiers:

More information

Administrivia. Added 20 more so far. Software Process. Only one TA so far. CS169 Lecture 2. Start thinking about project proposal

Administrivia. Added 20 more so far. Software Process. Only one TA so far. CS169 Lecture 2. Start thinking about project proposal Administrivia Software Process CS169 Lecture 2 Added 20 more so far Will limit enrollment to ~65 students Only one TA so far Start thinking about project proposal Bonus points for proposals that will be

More information

1 Counting triangles and cliques

1 Counting triangles and cliques ITCSC-INC Winter School 2015 26 January 2014 notes by Andrej Bogdanov Today we will talk about randomness and some of the surprising roles it plays in the theory of computing and in coding theory. Let

More information

Ensemble methods in machine learning. Example. Neural networks. Neural networks

Ensemble methods in machine learning. Example. Neural networks. Neural networks Ensemble methods in machine learning Bootstrap aggregating (bagging) train an ensemble of models based on randomly resampled versions of the training set, then take a majority vote Example What if you

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

Combine the PA Algorithm with a Proximal Classifier

Combine the PA Algorithm with a Proximal Classifier Combine the Passive and Aggressive Algorithm with a Proximal Classifier Yuh-Jye Lee Joint work with Y.-C. Tseng Dept. of Computer Science & Information Engineering TaiwanTech. Dept. of Statistics@NCKU

More information

DM6 Support Vector Machines

DM6 Support Vector Machines DM6 Support Vector Machines Outline Large margin linear classifier Linear separable Nonlinear separable Creating nonlinear classifiers: kernel trick Discussion on SVM Conclusion SVM: LARGE MARGIN LINEAR

More information

Algorithms for convex optimization

Algorithms for convex optimization Algorithms for convex optimization Michal Kočvara Institute of Information Theory and Automation Academy of Sciences of the Czech Republic and Czech Technical University kocvara@utia.cas.cz http://www.utia.cas.cz/kocvara

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Lecture 17. Lower bound for variable-length source codes with error. Coding a sequence of symbols: Rates and scheme (Arithmetic code)

Lecture 17. Lower bound for variable-length source codes with error. Coding a sequence of symbols: Rates and scheme (Arithmetic code) Lecture 17 Agenda for the lecture Lower bound for variable-length source codes with error Coding a sequence of symbols: Rates and scheme (Arithmetic code) Introduction to universal codes 17.1 variable-length

More information

/ Approximation Algorithms Lecturer: Michael Dinitz Topic: Linear Programming Date: 2/24/15 Scribe: Runze Tang

/ Approximation Algorithms Lecturer: Michael Dinitz Topic: Linear Programming Date: 2/24/15 Scribe: Runze Tang 600.469 / 600.669 Approximation Algorithms Lecturer: Michael Dinitz Topic: Linear Programming Date: 2/24/15 Scribe: Runze Tang 9.1 Linear Programming Suppose we are trying to approximate a minimization

More information

Today. Gradient descent for minimization of functions of real variables. Multi-dimensional scaling. Self-organizing maps

Today. Gradient descent for minimization of functions of real variables. Multi-dimensional scaling. Self-organizing maps Today Gradient descent for minimization of functions of real variables. Multi-dimensional scaling Self-organizing maps Gradient Descent Derivatives Consider function f(x) : R R. The derivative w.r.t. x

More information

Principles of AI Planning. Principles of AI Planning. 7.1 How to obtain a heuristic. 7.2 Relaxed planning tasks. 7.1 How to obtain a heuristic

Principles of AI Planning. Principles of AI Planning. 7.1 How to obtain a heuristic. 7.2 Relaxed planning tasks. 7.1 How to obtain a heuristic Principles of AI Planning June 8th, 2010 7. Planning as search: relaxed planning tasks Principles of AI Planning 7. Planning as search: relaxed planning tasks Malte Helmert and Bernhard Nebel 7.1 How to

More information

Robust Regression. Robust Data Mining Techniques By Boonyakorn Jantaranuson

Robust Regression. Robust Data Mining Techniques By Boonyakorn Jantaranuson Robust Regression Robust Data Mining Techniques By Boonyakorn Jantaranuson Outline Introduction OLS and important terminology Least Median of Squares (LMedS) M-estimator Penalized least squares What is

More information

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017

CPSC 340: Machine Learning and Data Mining. Probabilistic Classification Fall 2017 CPSC 340: Machine Learning and Data Mining Probabilistic Classification Fall 2017 Admin Assignment 0 is due tonight: you should be almost done. 1 late day to hand it in Monday, 2 late days for Wednesday.

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 22, 2016 Course Information Website: http://www.stat.ucdavis.edu/~chohsieh/teaching/ ECS289G_Fall2016/main.html My office: Mathematical Sciences

More information

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14

Introduction to Algorithms / Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14 600.363 Introduction to Algorithms / 600.463 Algorithms I Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/18/14 23.1 Introduction We spent last week proving that for certain problems,

More information

Representation Learning for Clustering: A Statistical Framework

Representation Learning for Clustering: A Statistical Framework Representation Learning for Clustering: A Statistical Framework Hassan Ashtiani School of Computer Science University of Waterloo mhzokaei@uwaterloo.ca Shai Ben-David School of Computer Science University

More information

The Comparative Study of Machine Learning Algorithms in Text Data Classification*

The Comparative Study of Machine Learning Algorithms in Text Data Classification* The Comparative Study of Machine Learning Algorithms in Text Data Classification* Wang Xin School of Science, Beijing Information Science and Technology University Beijing, China Abstract Classification

More information

Safety verification for deep neural networks

Safety verification for deep neural networks Safety verification for deep neural networks Marta Kwiatkowska Department of Computer Science, University of Oxford UC Berkeley, 8 th November 2016 Setting the scene Deep neural networks have achieved

More information

Slides for Data Mining by I. H. Witten and E. Frank

Slides for Data Mining by I. H. Witten and E. Frank Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-

More information

CPSC 340: Machine Learning and Data Mining. More Regularization Fall 2017

CPSC 340: Machine Learning and Data Mining. More Regularization Fall 2017 CPSC 340: Machine Learning and Data Mining More Regularization Fall 2017 Assignment 3: Admin Out soon, due Friday of next week. Midterm: You can view your exam during instructor office hours or after class

More information

Discrete Optimization. Lecture Notes 2

Discrete Optimization. Lecture Notes 2 Discrete Optimization. Lecture Notes 2 Disjunctive Constraints Defining variables and formulating linear constraints can be straightforward or more sophisticated, depending on the problem structure. The

More information

Boosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989]

Boosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989] Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 3 rd, 2007 1 Boosting [Schapire, 1989] Idea: given a weak

More information

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday. CS 188: Artificial Intelligence Spring 2011 Lecture 21: Perceptrons 4/13/2010 Announcements Project 4: due Friday. Final Contest: up and running! Project 5 out! Pieter Abbeel UC Berkeley Many slides adapted

More information

Metaheuristic Development Methodology. Fall 2009 Instructor: Dr. Masoud Yaghini

Metaheuristic Development Methodology. Fall 2009 Instructor: Dr. Masoud Yaghini Metaheuristic Development Methodology Fall 2009 Instructor: Dr. Masoud Yaghini Phases and Steps Phases and Steps Phase 1: Understanding Problem Step 1: State the Problem Step 2: Review of Existing Solution

More information

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2016

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2016 CPSC 34: Machine Learning and Data Mining Feature Selection Fall 26 Assignment 3: Admin Solutions will be posted after class Wednesday. Extra office hours Thursday: :3-2 and 4:3-6 in X836. Midterm Friday:

More information

ECE521 Lecture 18 Graphical Models Hidden Markov Models

ECE521 Lecture 18 Graphical Models Hidden Markov Models ECE521 Lecture 18 Graphical Models Hidden Markov Models Outline Graphical models Conditional independence Conditional independence after marginalization Sequence models hidden Markov models 2 Graphical

More information

1 Achieving IND-CPA security

1 Achieving IND-CPA security ISA 562: Information Security, Theory and Practice Lecture 2 1 Achieving IND-CPA security 1.1 Pseudorandom numbers, and stateful encryption As we saw last time, the OTP is perfectly secure, but it forces

More information

Chapter S:II. II. Search Space Representation

Chapter S:II. II. Search Space Representation Chapter S:II II. Search Space Representation Systematic Search Encoding of Problems State-Space Representation Problem-Reduction Representation Choosing a Representation S:II-1 Search Space Representation

More information

Limitations of Algorithmic Solvability In this Chapter we investigate the power of algorithms to solve problems Some can be solved algorithmically and

Limitations of Algorithmic Solvability In this Chapter we investigate the power of algorithms to solve problems Some can be solved algorithmically and Computer Language Theory Chapter 4: Decidability 1 Limitations of Algorithmic Solvability In this Chapter we investigate the power of algorithms to solve problems Some can be solved algorithmically and

More information

Kernels and Clustering

Kernels and Clustering Kernels and Clustering Robert Platt Northeastern University All slides in this file are adapted from CS188 UC Berkeley Case-Based Learning Non-Separable Data Case-Based Reasoning Classification from similarity

More information

The Bizarre Truth! Automating the Automation. Complicated & Confusing taxonomy of Model Based Testing approach A CONFORMIQ WHITEPAPER

The Bizarre Truth! Automating the Automation. Complicated & Confusing taxonomy of Model Based Testing approach A CONFORMIQ WHITEPAPER The Bizarre Truth! Complicated & Confusing taxonomy of Model Based Testing approach A CONFORMIQ WHITEPAPER By Kimmo Nupponen 1 TABLE OF CONTENTS 1. The context Introduction 2. The approach Know the difference

More information

Classification: Linear Discriminant Functions

Classification: Linear Discriminant Functions Classification: Linear Discriminant Functions CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Discriminant functions Linear Discriminant functions

More information

Data Mining and Data Warehousing Classification-Lazy Learners

Data Mining and Data Warehousing Classification-Lazy Learners Motivation Data Mining and Data Warehousing Classification-Lazy Learners Lazy Learners are the most intuitive type of learners and are used in many practical scenarios. The reason of their popularity is

More information

Weighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD. Abstract

Weighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD. Abstract Weighted Alternating Least Squares (WALS) for Movie Recommendations) Drew Hodun SCPD Abstract There are two common main approaches to ML recommender systems, feedback-based systems and content-based systems.

More information

The Encoding Complexity of Network Coding

The Encoding Complexity of Network Coding The Encoding Complexity of Network Coding Michael Langberg Alexander Sprintson Jehoshua Bruck California Institute of Technology Email: mikel,spalex,bruck @caltech.edu Abstract In the multicast network

More information

Structured Models in. Dan Huttenlocher. June 2010

Structured Models in. Dan Huttenlocher. June 2010 Structured Models in Computer Vision i Dan Huttenlocher June 2010 Structured Models Problems where output variables are mutually dependent or constrained E.g., spatial or temporal relations Such dependencies

More information

Empirical risk minimization (ERM) A first model of learning. The excess risk. Getting a uniform guarantee

Empirical risk minimization (ERM) A first model of learning. The excess risk. Getting a uniform guarantee A first model of learning Let s restrict our attention to binary classification our labels belong to (or ) Empirical risk minimization (ERM) Recall the definitions of risk/empirical risk We observe the

More information

Chapter 10. Conclusion Discussion

Chapter 10. Conclusion Discussion Chapter 10 Conclusion 10.1 Discussion Question 1: Usually a dynamic system has delays and feedback. Can OMEGA handle systems with infinite delays, and with elastic delays? OMEGA handles those systems with

More information

Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms

Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 13, NO. 5, SEPTEMBER 2002 1225 Efficient Tuning of SVM Hyperparameters Using Radius/Margin Bound and Iterative Algorithms S. Sathiya Keerthi Abstract This paper

More information

Parallel Build Visualization Diagnosing and Troubleshooting Common Pitfalls of Parallel Builds

Parallel Build Visualization Diagnosing and Troubleshooting Common Pitfalls of Parallel Builds Parallel Build Visualization Diagnosing and Troubleshooting Common Pitfalls of Parallel Builds John Graham-Cumming Chief Scientist Electric Cloud, Inc. February, 2006 Contents Parallel Build Visualization...

More information

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011

Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 Reddit Recommendation System Daniel Poon, Yu Wu, David (Qifan) Zhang CS229, Stanford University December 11 th, 2011 1. Introduction Reddit is one of the most popular online social news websites with millions

More information

High Performance Computing

High Performance Computing The Need for Parallelism High Performance Computing David McCaughan, HPC Analyst SHARCNET, University of Guelph dbm@sharcnet.ca Scientific investigation traditionally takes two forms theoretical empirical

More information

System Configuration. Paul Anderson. publications/oslo-2008a-talk.pdf I V E R S I U N T Y T H

System Configuration. Paul Anderson.  publications/oslo-2008a-talk.pdf I V E R S I U N T Y T H E U N I V E R S I System Configuration T H O T Y H F G Paul Anderson E D I N B U R dcspaul@ed.ac.uk http://homepages.inf.ed.ac.uk/dcspaul/ publications/oslo-2008a-talk.pdf System Configuration What is

More information

low bias high variance high bias low variance error test set training set high low Model Complexity Typical Behaviour Lecture 11:

low bias high variance high bias low variance error test set training set high low Model Complexity Typical Behaviour Lecture 11: Lecture 11: Overfitting and Capacity Control high bias low variance Typical Behaviour low bias high variance Sam Roweis error test set training set November 23, 4 low Model Complexity high Generalization,

More information

Using Arithmetic of Real Numbers to Explore Limits and Continuity

Using Arithmetic of Real Numbers to Explore Limits and Continuity Using Arithmetic of Real Numbers to Explore Limits and Continuity by Maria Terrell Cornell University Problem Let a =.898989... and b =.000000... (a) Find a + b. (b) Use your ideas about how to add a and

More information

Confidence sharing: an economic strategy for efficient information flows in animal groups 1

Confidence sharing: an economic strategy for efficient information flows in animal groups 1 1 / 35 Confidence sharing: an economic strategy for efficient information flows in animal groups 1 Amos Korman 2 CNRS and University Paris Diderot 1 Appears in PLoS Computational Biology, Oct. 2014 2 Joint

More information

The Interaction. Using Norman s model. Donald Norman s model of interaction. Human error - slips and mistakes. Seven stages

The Interaction. Using Norman s model. Donald Norman s model of interaction. Human error - slips and mistakes. Seven stages The Interaction Interaction models Ergonomics Interaction styles Donald Norman s model of interaction Seven stages execution user establishes the goal formulates intention specifies actions at interface

More information

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron CS 188: Artificial Intelligence Spring 2010 Lecture 24: Perceptrons and More! 4/20/2010 Announcements W7 due Thursday [that s your last written for the semester!] Project 5 out Thursday Contest running

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18 601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Approximation algorithms Date: 11/27/18 22.1 Introduction We spent the last two lectures proving that for certain problems, we can

More information

OO Development and Maintenance Complexity. Daniel M. Berry based on a paper by Eliezer Kantorowitz

OO Development and Maintenance Complexity. Daniel M. Berry based on a paper by Eliezer Kantorowitz OO Development and Maintenance Complexity Daniel M. Berry based on a paper by Eliezer Kantorowitz Traditional Complexity Measures Traditionally, Time Complexity Space Complexity Both use theoretical measures,

More information

Theory and Algorithms Introduction: insertion sort, merge sort

Theory and Algorithms Introduction: insertion sort, merge sort Theory and Algorithms Introduction: insertion sort, merge sort Rafael Ramirez rafael@iua.upf.es Analysis of algorithms The theoretical study of computer-program performance and resource usage. What s also

More information

User-Centered Design Data Entry

User-Centered Design Data Entry User-Centered Design Data Entry CS 4640 Programming Languages for Web Applications [The Design of Everyday Things, Don Norman, Ch 7] 1 Seven Principles for Making Hard Things Easy 1. Use knowledge in the

More information

A Computational Theory of Clustering

A Computational Theory of Clustering A Computational Theory of Clustering Avrim Blum Carnegie Mellon University Based on work joint with Nina Balcan, Anupam Gupta, and Santosh Vempala Point of this talk A new way to theoretically analyze

More information

(Refer Slide Time: 1:27)

(Refer Slide Time: 1:27) Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture 1 Introduction to Data Structures and Algorithms Welcome to data

More information

Kernel Methods & Support Vector Machines

Kernel Methods & Support Vector Machines & Support Vector Machines & Support Vector Machines Arvind Visvanathan CSCE 970 Pattern Recognition 1 & Support Vector Machines Question? Draw a single line to separate two classes? 2 & Support Vector

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 5 Inference

More information

Bayesian Methods in Vision: MAP Estimation, MRFs, Optimization

Bayesian Methods in Vision: MAP Estimation, MRFs, Optimization Bayesian Methods in Vision: MAP Estimation, MRFs, Optimization CS 650: Computer Vision Bryan S. Morse Optimization Approaches to Vision / Image Processing Recurring theme: Cast vision problem as an optimization

More information

CMPT 882 Week 3 Summary

CMPT 882 Week 3 Summary CMPT 882 Week 3 Summary! Artificial Neural Networks (ANNs) are networks of interconnected simple units that are based on a greatly simplified model of the brain. ANNs are useful learning tools by being

More information

Application of Support Vector Machine Algorithm in Spam Filtering

Application of Support Vector Machine Algorithm in  Spam Filtering Application of Support Vector Machine Algorithm in E-Mail Spam Filtering Julia Bluszcz, Daria Fitisova, Alexander Hamann, Alexey Trifonov, Advisor: Patrick Jähnichen Abstract The problem of spam classification

More information

14.1 Encoding for different models of computation

14.1 Encoding for different models of computation Lecture 14 Decidable languages In the previous lecture we discussed some examples of encoding schemes, through which various objects can be represented by strings over a given alphabet. We will begin this

More information

Lecture #3: PageRank Algorithm The Mathematics of Google Search

Lecture #3: PageRank Algorithm The Mathematics of Google Search Lecture #3: PageRank Algorithm The Mathematics of Google Search We live in a computer era. Internet is part of our everyday lives and information is only a click away. Just open your favorite search engine,

More information

Chapter 9. Software Testing

Chapter 9. Software Testing Chapter 9. Software Testing Table of Contents Objectives... 1 Introduction to software testing... 1 The testers... 2 The developers... 2 An independent testing team... 2 The customer... 2 Principles of

More information

Naïve Bayes for text classification

Naïve Bayes for text classification Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support

More information

CS 343: Artificial Intelligence

CS 343: Artificial Intelligence CS 343: Artificial Intelligence Kernels and Clustering Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.

More information