Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples.
|
|
- Pamela Harris
- 5 years ago
- Views:
Transcription
1 Supervised Learning with Neural Networks We now look at how an agent might learn to solve a general problem by seeing examples. Aims: to present an outline of supervised learning as part of AI; to introduce much of the notation and terminology used; to introduce the classical perceptron, and to show how it can be applied more generally using kernels; to introduce multilayer perceptrons and the backpropagation algorithm for training them. Reading: Russell and Norvig, chapter 18 and 19. Copyright csean Holden
2 An example A common source of problems in AI is medical diagnosis. Imagine that we want to automate the diagnosis of an embarrassing disease (call it ) by constructing a machine: Machine Measurements taken from the patient, e.g. heart rate, blood pressure, presence of green spots etc. if the patient suffers from otherwise Could we do this by explicitly writing a program that examines the measurements and outputs a diagnosis? Experience suggests that this is unlikely.
3 An example, continued... Let s look at an alternative approach. Each collection of measurements can be written as a vector, where, heart rate blood pressure if the patient has green spots otherwise. and so on
4 An example, continued... A vector of this kind contains all the measurements for a single patient and is generally called a feature vector or instance. The measurements are usually referred to as attributes or features. (Technically, there is a difference between an attribute and a feature - but we won t have time to explore it.) Attributes or features generally appear as one of three basic types: continuous: binary: discrete:. or where ; ; can take one of a finite number of values, say
5 An example, continued... Now imagine that we have a large collection of patient histories ( in total) and for each of these we know whether or not the patient suffered from. The th patient history gives us an instance. This can be paired with a single bit or denoting whether or the not th patient suffers from. The resulting pair is called an example or a labelled example. Collecting all the examples together we obtain a training sequence,
6 An example, continued... In the form of machine learning to be emphasised here, we aim to design a learning algorithm which takes s and produces a hypothesis. Learning Algorithm Intuitively, a hypothesis is something that lets us diagnose new patients.
7 But what s a hypothesis? What s a hypothesis? Denote by the set of all possible instances. Then a hypothesis is a possible instance could simply be a function from to. In other words, can take any instance and produce a or a, depending on whether (according to ) the patient the measurements were taken from is suffering from.
8 But what s a hypothesis? There are some important issues here, which are effectively right at the heart of machine learning. Most importantly: the hypothesis can assign a or a to any. that did not appear in the training se- This includes instances quence. The overall process therefore involves rather more than memorising the training sequence. Ideally, the aim is that should be able to generalize. That is, it should be possible to use it to diagnose new patients.
9 But what s a hypothesis? In fact we need a slightly more flexible definition of a hypothesis. A hypothesis is a function from to some suitable set because: there may be more than two classes: No disease Disease Disease or, we might want has disease Disease to indicated how likely it is that the patient where denotes definitely does have the disease and denotes definitely does not have it. that the patient is reasonably certain to have the disease. might for example denote
10 But what s a hypothesis? One way of thinking about the previous case is in terms of probabilities: is in class We may have. For example if contains several recent measurements of a currency exchange rate and we want to be a prediction of what the rate will be in minutes time. Such problems are generally known as regression problems.
11 Types of learning The form of machine learning described is called supervised learning. This introduction will concentrate on this kind of learning. In particular, the literature also discusses: 1. Unsupervised learning. 2. Learning using membership queries and equivalence queries. 3. Reinforcement learning.
12 Speech recognition. Some further examples Deciding whether or not to give credit. Detecting credit card fraud. Deciding whether to buy or sell a stock option. Deciding whether a tumour is benign. Data mining - that is, extracting interesting but hidden knowledge from existing, large databases. For example, databases containing financial transactions or loan applications. Deciding whether driving conditions are dangerous. Automatic driving. (See Pomerleau, 1989, in which a car is driven for 90 miles at 70 miles per hour, on a public road with other cars present, but with no assistance from humans!) Playing games. (For example, see Tesauro, 1992 and 1995, where a world class backgammon player is described.)
13 What have we got so far? Extracting what we have so far, we get the following central ideas: A collection of possible instances A collection of classes to which any instance can belong. In some scenarios we might have. A training sequence containing. labelled examples, with and for A learning algorithm. We can write, which takes. and produces a hypothesis
14 What have we got so far? But we need some more ideas as well: We often need to state what kinds of hypotheses are available to. The collection of available hypotheses is called the hypothesis space and denoted is available to. Some learning algorithms do not always return the same each run on a given sequence for. In this case probability distribution on. is a
15 What s the right answer? We may sometimes assume that there is a correct function that governs the relationship between the s and the labels. This is called the target concept and is denoted. It can be reas the perfect hypothesis, and sowe have garded. This is not always a sufficient way of thinking about the problem...
16 Generalization The learning algorithm never gets to know exactly what the correct relationship between instances and classes is - it only ever gets to see a finite number of examples. So: generalization corresponds to the ability of to pick a hypothesis which is close in some sense to the best possible ; however we have to be careful about what close means here. For example, what if some instances are much more likely than others?
17 Generalization performance How can generalization performance be assessed? Model the generation of training example using a probability distribution on. All examples are assumed to be independent and identically distributed (i.i.d.) according to. Given a hypothesis measure example. and any example we can introduce a of the error that makes in classifying that For example the following definitions for might be appropriate: for a classification problem when
18 Generalization performance A reasonable definition of generalization performance is then er In the case of the definition of the previous slide this gives er for classification problems given in In the case of the definition for given for regression problems, er is the expected square of the difference between true label and predicted label.
19 Problems encountered in practice In practice there are various problems that can arise: Measurements may be missing from the There may be noise present. in Classifications may be incorrect. vectors. The practical techniques to be presented have their own approaches to dealing with such problems. Similarly, problems arising in practice are addressed by the theory.
20 Examples Target Concept NOT KNOWN Some are available Some measurements may be missing. Noise Noisy Examples. Learning Algorithm Hypothesis Prior Knowledge
21 The perceptron: a review The initial development of linear discriminants was carried out by Fisher in 1936, and since then they have been central to supervised learning. Their influence continues to be felt in the recent and ongoing development of support vector machines, of which more later...
22 The perceptron: a review We have a two-class classification problem in. We output class if, or class if.
23 The perceptron: a review So the hypothesis is where if otherwise
24 The primal perceptron algorithm do ) for (each example in ) if ( while (mistakes are made in the for loop) return.
25 The perceptron algorithm does not converge if Novikoff s theorem is not linearly separable. However Novikoff proved the following: 1 If Theorem is non-trivial and linearly separable, where there exists a hyperplane with optimum optimum optimum and for optimum optimum, then the perceptron algorithm makes at most mistakes.
26 Dual form of the perceptron algorithm then the primal perceptron algorithm operates by adding and subtracting misclassified points to an initial at each step. If we set As a result, when it stops we can represent the final as Note: the values are positive and proportional to the number of times is misclassified; if is fixed then the vector is an alternative representation of.
27 Dual form of the perceptron algorithm Using these facts, the hypothesis can be re-written
28 Dual form of the perceptron algorithm do ) for (each example in ) if ( while (mistakes are made in the for loop) return.
29 Mapping to a bigger space There are many problems a perceptron can t solve. The classic example is the parity problem.
30 Mapping to a bigger space But what happens if we add another element to For example we could use.? x x1 * x x1 1 x x1 1 In a perceptron can easily solve this problem.
31 Mapping to a bigger space where
32 Mapping to a bigger space This is an old trick, and the functions can be anything we like. Example: In a multilayer perceptron where is the vector of weights and node hidden. the bias associated with Note however that for the time being the functions are fixed, whereas in a multilayer perceptron they are allowed to vary as a result of varying the and.
33 Mapping to a bigger space..
34 Mapping to a bigger space We can now use a perceptron in the high-dimensional space, obtaining a hypothesis of the form where the node. are the weights from the hidden nodes to the output So: start with ; use the to map it to a bigger space; use a (linear) perceptron in the bigger space.
35 where The cost associated with this is that we may have to calculate Mapping to a bigger space What happens if we use the dual form of the perceptron algorithm in this process? We end up with a hypothesis of the form is the vector Notice that this introduces the possibility of a tradeoff of and. The sum has (potentially) become smaller. several times.
36 This suggests that it might be useful to be able to calculate Kernels easily. In fact such an observation has far-reaching consequences. Definition 2 A kernel is a function such that for all vectors and Note that a given kernel naturally corresponds to an underlying collection of functions. Ideally, we want to make sure that the value of does not have a great effect on the calculation of is the case then is easy to evaluate.. If this
37 Kernels: an example We can use or to obtain polynomial kernels. In the latter case we have features that are monomials up to degree, so the decision boundary obtained using as described above will be a polynomial curve of degree.
38 Kernels: an example examples, p=10, c=1 Weighted sum value x x1 Value (truncated at 1000) x x2 1 2
39 Gradient descent An alternative method for training a basic perceptron works as follows. We define a measure of error for a given collection of weights. For example where the that has not been used. Modifying our notation slightly so gives
40 Gradient descent is parabolic and has a unique global minimum and no local minima. We therefore start with a random and update it as follows: where and is some small positive number. The vector tells us the direction of the steepest decrease in.
41 Simple feedforward neural networks We continue using the same notation as previously. Usually, we think in terms of a training algorithm based on a training sequence pothesis finding a hy- and then classifying new instances by evaluating. Usually with neural networks the training algorithm provides a vector of weights. We therefore have a hypothesis that depends on the weight vector. This is often made explicit by writing and representing the hypothesis as a mapping depending on both and the new instance, so classification of
42 Backpropagation: the general case First, let s look at the general case. We have a completely unrestricted feedforward structure:. For the time being, there may be several outputs, and no specific layering is assumed.
43 Backpropagation: the general case For each node:. node connects to node. is the weighted sum or activation for node is the activation function...
44 Backpropagation: the general case In addition, there is often a bias input for each node, which is always set to. This is not always included explicitly; sometimes the bias is included by writing the weighted summation as where is thebiasfor node.
45 measure of the error of the network on Backpropagation: the general case As usual we have: instances a training sequence ;. We also define a measure of training error where is the vector of all the weights in the network. Our aim is to find a set of weights that minimises.
46 Backpropagation: the general case How can we find a set of weights that minimises? The approach used by the backpropagation algorithm is very simple: 1. begin at step with a randomly chosen collection of weights at the 2. th step, calculate the gradient at the point ; 3. update the weight vector by taking a small step in the direction of the gradient of 4. repeat this process until is sufficiently small. ;
47 ple in Backpropagation: the general case In order to do this we have to calculate Often is the sum of separate components, one for each exam- in which case We can therefore consider examples individually.
48 Backpropagation: the general case Place example at the inputs and calculate the values all the nodes. This is called forward propagation. We have and for where we ve defined and used the fact that So we now need to calculate the values for...
49 Backpropagation: the general case When is an output unit this is easy as and the first term is in general easy to calculate for a given.
50 Backpropagation: the general case When is not an output unit we have where tion: sends a connec- nodes to which node are the..
51 Backpropagation: the general case Then by definition, and So
52 Backpropagation: the general case Summary: to calculate for one pattern: 1. Forward propagation: apply the nodes in the network. 2. Backpropagation 1: for outputs and calculate outputs etc for all 3. Backpropagation 2: For other nodes where the were calculated at an earlier step.
53 Backpropagation: a specific example.. output
54 Backpropagation: a specific example. For the output: For the other nodes: so Also,
55 Backpropagation: a specific example output output output output For the output: We have output and so output and
56 Backpropagation: a specific example For the hidden nodes: We have but there is only one output so output output and we have a value for output so output output
57 Putting it all together We can then use the derivatives in one of two basic ways: Batch: (as described previously) Sequential: using just one pattern at once selecting patterns in sequence or at random.
58 Example: the classical parity problem As an example we show the result of training a network with: two inputs; one output; one hidden layer containing ; all other details as above. units; noisy ex- The problem is the classical parity problem. There are amples. The sequential approach is used, with entire training sequence. repetitions through the
59 Example: the classical parity problem 2 Before training 2 After training x2 0.5 x x x1
60 Example: the classical parity problem Before training After training 1 Network output x x1 1 2 Network output x x1 1
61 Example: the classical parity problem 10 Error during training
Learning. Learning agents Inductive learning. Neural Networks. Different Learning Scenarios Evaluation
Learning Learning agents Inductive learning Different Learning Scenarios Evaluation Slides based on Slides by Russell/Norvig, Ronald Williams, and Torsten Reil Material from Russell & Norvig, chapters
More informationLecture 3: Linear Classification
Lecture 3: Linear Classification Roger Grosse 1 Introduction Last week, we saw an example of a learning task called regression. There, the goal was to predict a scalar-valued target from a set of features.
More informationWeek 3: Perceptron and Multi-layer Perceptron
Week 3: Perceptron and Multi-layer Perceptron Phong Le, Willem Zuidema November 12, 2013 Last week we studied two famous biological neuron models, Fitzhugh-Nagumo model and Izhikevich model. This week,
More informationLinear Models. Lecture Outline: Numeric Prediction: Linear Regression. Linear Classification. The Perceptron. Support Vector Machines
Linear Models Lecture Outline: Numeric Prediction: Linear Regression Linear Classification The Perceptron Support Vector Machines Reading: Chapter 4.6 Witten and Frank, 2nd ed. Chapter 4 of Mitchell Solving
More informationData Mining. Neural Networks
Data Mining Neural Networks Goals for this Unit Basic understanding of Neural Networks and how they work Ability to use Neural Networks to solve real problems Understand when neural networks may be most
More informationData Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)
Data Mining: Concepts and Techniques Chapter 9 Classification: Support Vector Machines 1 Support Vector Machines (SVMs) SVMs are a set of related supervised learning methods used for classification Based
More informationSupervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)
Supervised Learning (contd) Linear Separation Mausam (based on slides by UW-AI faculty) Images as Vectors Binary handwritten characters Treat an image as a highdimensional vector (e.g., by reading pixel
More informationClassification: Linear Discriminant Functions
Classification: Linear Discriminant Functions CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Discriminant functions Linear Discriminant functions
More informationAKA: Logistic Regression Implementation
AKA: Logistic Regression Implementation 1 Supervised classification is the problem of predicting to which category a new observation belongs. A category is chosen from a list of predefined categories.
More informationAssignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation
Farrukh Jabeen Due Date: November 2, 2009. Neural Networks: Backpropation Assignment # 5 The "Backpropagation" method is one of the most popular methods of "learning" by a neural network. Read the class
More informationCS 8520: Artificial Intelligence
CS 8520: Artificial Intelligence Machine Learning 2 Paula Matuszek Spring, 2013 1 Regression Classifiers We said earlier that the task of a supervised learning system can be viewed as learning a function
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More informationFeature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.
CS 188: Artificial Intelligence Fall 2007 Lecture 26: Kernels 11/29/2007 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit your
More informationFeature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.
CS 188: Artificial Intelligence Fall 2008 Lecture 24: Perceptrons II 11/24/2008 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit
More informationClassification: Feature Vectors
Classification: Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free YOUR_NAME MISSPELLED FROM_FRIEND... : : : : 2 0 2 0 PIXEL 7,12
More informationNeural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani
Neural Networks CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Biological and artificial neural networks Feed-forward neural networks Single layer
More informationLecture #11: The Perceptron
Lecture #11: The Perceptron Mat Kallada STAT2450 - Introduction to Data Mining Outline for Today Welcome back! Assignment 3 The Perceptron Learning Method Perceptron Learning Rule Assignment 3 Will be
More information5 Learning hypothesis classes (16 points)
5 Learning hypothesis classes (16 points) Consider a classification problem with two real valued inputs. For each of the following algorithms, specify all of the separators below that it could have generated
More informationNeural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.
Lecture 24: Learning 3 Victor R. Lesser CMPSCI 683 Fall 2010 Today s Lecture Continuation of Neural Networks Artificial Neural Networks Compose of nodes/units connected by links Each link has a numeric
More informationArtificial neural networks are the paradigm of connectionist systems (connectionism vs. symbolism)
Artificial Neural Networks Analogy to biological neural systems, the most robust learning systems we know. Attempt to: Understand natural biological systems through computational modeling. Model intelligent
More informationCS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek
CS 8520: Artificial Intelligence Machine Learning 2 Paula Matuszek Fall, 2015!1 Regression Classifiers We said earlier that the task of a supervised learning system can be viewed as learning a function
More informationCS 343: Artificial Intelligence
CS 343: Artificial Intelligence Kernels and Clustering Prof. Scott Niekum The University of Texas at Austin [These slides based on those of Dan Klein and Pieter Abbeel for CS188 Intro to AI at UC Berkeley.
More informationCSE 573: Artificial Intelligence Autumn 2010
CSE 573: Artificial Intelligence Autumn 2010 Lecture 16: Machine Learning Topics 12/7/2010 Luke Zettlemoyer Most slides over the course adapted from Dan Klein. 1 Announcements Syllabus revised Machine
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:
More informationKernels and Clustering
Kernels and Clustering Robert Platt Northeastern University All slides in this file are adapted from CS188 UC Berkeley Case-Based Learning Non-Separable Data Case-Based Reasoning Classification from similarity
More informationClassification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska
Classification Lecture Notes cse352 Neural Networks Professor Anita Wasilewska Neural Networks Classification Introduction INPUT: classification data, i.e. it contains an classification (class) attribute
More informationDM6 Support Vector Machines
DM6 Support Vector Machines Outline Large margin linear classifier Linear separable Nonlinear separable Creating nonlinear classifiers: kernel trick Discussion on SVM Conclusion SVM: LARGE MARGIN LINEAR
More informationMachine Learning for NLP
Machine Learning for NLP Support Vector Machines Aurélie Herbelot 2018 Centre for Mind/Brain Sciences University of Trento 1 Support Vector Machines: introduction 2 Support Vector Machines (SVMs) SVMs
More informationREGRESSION ANALYSIS : LINEAR BY MAUAJAMA FIRDAUS & TULIKA SAHA
REGRESSION ANALYSIS : LINEAR BY MAUAJAMA FIRDAUS & TULIKA SAHA MACHINE LEARNING It is the science of getting computer to learn without being explicitly programmed. Machine learning is an area of artificial
More informationMore Learning. Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA
More Learning Ensembles Bayes Rule Neural Nets K-means Clustering EM Clustering WEKA 1 Ensembles An ensemble is a set of classifiers whose combined results give the final decision. test feature vector
More informationArtificial Neural Networks (Feedforward Nets)
Artificial Neural Networks (Feedforward Nets) y w 03-1 w 13 y 1 w 23 y 2 w 01 w 21 w 22 w 02-1 w 11 w 12-1 x 1 x 2 6.034 - Spring 1 Single Perceptron Unit y w 0 w 1 w n w 2 w 3 x 0 =1 x 1 x 2 x 3... x
More informationCombine the PA Algorithm with a Proximal Classifier
Combine the Passive and Aggressive Algorithm with a Proximal Classifier Yuh-Jye Lee Joint work with Y.-C. Tseng Dept. of Computer Science & Information Engineering TaiwanTech. Dept. of Statistics@NCKU
More informationFor Monday. Read chapter 18, sections Homework:
For Monday Read chapter 18, sections 10-12 The material in section 8 and 9 is interesting, but we won t take time to cover it this semester Homework: Chapter 18, exercise 25 a-b Program 4 Model Neuron
More information1 Case study of SVM (Rob)
DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how
More informationMachine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013
Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork
More informationA Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York
A Systematic Overview of Data Mining Algorithms Sargur Srihari University at Buffalo The State University of New York 1 Topics Data Mining Algorithm Definition Example of CART Classification Iris, Wine
More informationCSE 417T: Introduction to Machine Learning. Lecture 22: The Kernel Trick. Henry Chai 11/15/18
CSE 417T: Introduction to Machine Learning Lecture 22: The Kernel Trick Henry Chai 11/15/18 Linearly Inseparable Data What can we do if the data is not linearly separable? Accept some non-zero in-sample
More informationData mining with Support Vector Machine
Data mining with Support Vector Machine Ms. Arti Patle IES, IPS Academy Indore (M.P.) artipatle@gmail.com Mr. Deepak Singh Chouhan IES, IPS Academy Indore (M.P.) deepak.schouhan@yahoo.com Abstract: Machine
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines CS 536: Machine Learning Littman (Wu, TA) Administration Slides borrowed from Martin Law (from the web). 1 Outline History of support vector machines (SVM) Two classes,
More information6. Linear Discriminant Functions
6. Linear Discriminant Functions Linear Discriminant Functions Assumption: we know the proper forms for the discriminant functions, and use the samples to estimate the values of parameters of the classifier
More informationAnnouncements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron
CS 188: Artificial Intelligence Spring 2010 Lecture 24: Perceptrons and More! 4/20/2010 Announcements W7 due Thursday [that s your last written for the semester!] Project 5 out Thursday Contest running
More informationMachine Learning Classifiers and Boosting
Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve
More informationReview on Methods of Selecting Number of Hidden Nodes in Artificial Neural Network
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 11, November 2014,
More informationCMPT 882 Week 3 Summary
CMPT 882 Week 3 Summary! Artificial Neural Networks (ANNs) are networks of interconnected simple units that are based on a greatly simplified model of the brain. ANNs are useful learning tools by being
More informationNeural Network Neurons
Neural Networks Neural Network Neurons 1 Receives n inputs (plus a bias term) Multiplies each input by its weight Applies activation function to the sum of results Outputs result Activation Functions Given
More informationNotes on Multilayer, Feedforward Neural Networks
Notes on Multilayer, Feedforward Neural Networks CS425/528: Machine Learning Fall 2012 Prepared by: Lynne E. Parker [Material in these notes was gleaned from various sources, including E. Alpaydin s book
More informationRapid growth of massive datasets
Overview Rapid growth of massive datasets E.g., Online activity, Science, Sensor networks Data Distributed Clusters are Pervasive Data Distributed Computing Mature Methods for Common Problems e.g., classification,
More information1 Machine Learning System Design
Machine Learning System Design Prioritizing what to work on: Spam classification example Say you want to build a spam classifier Spam messages often have misspelled words We ll have a labeled training
More informationPractice EXAM: SPRING 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE
Practice EXAM: SPRING 0 CS 6375 INSTRUCTOR: VIBHAV GOGATE The exam is closed book. You are allowed four pages of double sided cheat sheets. Answer the questions in the spaces provided on the question sheets.
More informationIntroduction to Machine Learning
Introduction to Machine Learning Isabelle Guyon Notes written by: Johann Leithon. Introduction The process of Machine Learning consist of having a big training data base, which is the input to some learning
More informationPerceptron Introduction to Machine Learning. Matt Gormley Lecture 5 Jan. 31, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Perceptron Matt Gormley Lecture 5 Jan. 31, 2018 1 Q&A Q: We pick the best hyperparameters
More informationAnalytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.
Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied
More informationLecture 2 Notes. Outline. Neural Networks. The Big Idea. Architecture. Instructors: Parth Shah, Riju Pahwa
Instructors: Parth Shah, Riju Pahwa Lecture 2 Notes Outline 1. Neural Networks The Big Idea Architecture SGD and Backpropagation 2. Convolutional Neural Networks Intuition Architecture 3. Recurrent Neural
More information4. Feedforward neural networks. 4.1 Feedforward neural network structure
4. Feedforward neural networks 4.1 Feedforward neural network structure Feedforward neural network is one of the most common network architectures. Its structure and some basic preprocessing issues required
More informationSimple Model Selection Cross Validation Regularization Neural Networks
Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February
More informationLECTURE 5: DUAL PROBLEMS AND KERNELS. * Most of the slides in this lecture are from
LECTURE 5: DUAL PROBLEMS AND KERNELS * Most of the slides in this lecture are from http://www.robots.ox.ac.uk/~az/lectures/ml Optimization Loss function Loss functions SVM review PRIMAL-DUAL PROBLEM Max-min
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right
More informationThe Perceptron. Simon Šuster, University of Groningen. Course Learning from data November 18, 2013
The Perceptron Simon Šuster, University of Groningen Course Learning from data November 18, 2013 References Hal Daumé III: A Course in Machine Learning http://ciml.info Tom M. Mitchell: Machine Learning
More informationSupport vector machines
Support vector machines When the data is linearly separable, which of the many possible solutions should we prefer? SVM criterion: maximize the margin, or distance between the hyperplane and the closest
More informationNeural Networks. By Laurence Squires
Neural Networks By Laurence Squires Machine learning What is it? Type of A.I. (possibly the ultimate A.I.?!?!?!) Algorithms that learn how to classify data The algorithms slowly change their own variables
More informationA Systematic Overview of Data Mining Algorithms
A Systematic Overview of Data Mining Algorithms 1 Data Mining Algorithm A well-defined procedure that takes data as input and produces output as models or patterns well-defined: precisely encoded as a
More informationLinear Classification and Perceptron
Linear Classification and Perceptron INFO-4604, Applied Machine Learning University of Colorado Boulder September 7, 2017 Prof. Michael Paul Prediction Functions Remember: a prediction function is the
More informationOptimization Methods for Machine Learning (OMML)
Optimization Methods for Machine Learning (OMML) 2nd lecture Prof. L. Palagi References: 1. Bishop Pattern Recognition and Machine Learning, Springer, 2006 (Chap 1) 2. V. Cherlassky, F. Mulier - Learning
More informationKernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Kernels + K-Means Matt Gormley Lecture 29 April 25, 2018 1 Reminders Homework 8:
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationHow Learning Differs from Optimization. Sargur N. Srihari
How Learning Differs from Optimization Sargur N. srihari@cedar.buffalo.edu 1 Topics in Optimization Optimization for Training Deep Models: Overview How learning differs from optimization Risk, empirical
More informationLearning and Generalization in Single Layer Perceptrons
Learning and Generalization in Single Layer Perceptrons Neural Computation : Lecture 4 John A. Bullinaria, 2015 1. What Can Perceptrons do? 2. Decision Boundaries The Two Dimensional Case 3. Decision Boundaries
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationInstructor: Jessica Wu Harvey Mudd College
The Perceptron Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Eric Eaton (UPenn), David Kauchak (Pomona), and the many others who made their course
More informationSupport Vector Machines.
Support Vector Machines srihari@buffalo.edu SVM Discussion Overview 1. Overview of SVMs 2. Margin Geometry 3. SVM Optimization 4. Overlapping Distributions 5. Relationship to Logistic Regression 6. Dealing
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationSearch Engines. Information Retrieval in Practice
Search Engines Information Retrieval in Practice All slides Addison Wesley, 2008 Classification and Clustering Classification and clustering are classical pattern recognition / machine learning problems
More informationCSEP 573: Artificial Intelligence
CSEP 573: Artificial Intelligence Machine Learning: Perceptron Ali Farhadi Many slides over the course adapted from Luke Zettlemoyer and Dan Klein. 1 Generative vs. Discriminative Generative classifiers:
More informationTopics in Machine Learning
Topics in Machine Learning Gilad Lerman School of Mathematics University of Minnesota Text/slides stolen from G. James, D. Witten, T. Hastie, R. Tibshirani and A. Ng Machine Learning - Motivation Arthur
More informationMachine Learning in Biology
Università degli studi di Padova Machine Learning in Biology Luca Silvestrin (Dottorando, XXIII ciclo) Supervised learning Contents Class-conditional probability density Linear and quadratic discriminant
More informationMachine Learning 13. week
Machine Learning 13. week Deep Learning Convolutional Neural Network Recurrent Neural Network 1 Why Deep Learning is so Popular? 1. Increase in the amount of data Thanks to the Internet, huge amount of
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Example Learning Problem Example Learning Problem Celebrity Faces in the Wild Machine Learning Pipeline Raw data Feature extract. Feature computation Inference: prediction,
More informationCPSC 340: Machine Learning and Data Mining. Logistic Regression Fall 2016
CPSC 340: Machine Learning and Data Mining Logistic Regression Fall 2016 Admin Assignment 1: Marks visible on UBC Connect. Assignment 2: Solution posted after class. Assignment 3: Due Wednesday (at any
More informationIn this assignment, we investigated the use of neural networks for supervised classification
Paul Couchman Fabien Imbault Ronan Tigreat Gorka Urchegui Tellechea Classification assignment (group 6) Image processing MSc Embedded Systems March 2003 Classification includes a broad range of decision-theoric
More informationSUPPORT VECTOR MACHINE ACTIVE LEARNING
SUPPORT VECTOR MACHINE ACTIVE LEARNING CS 101.2 Caltech, 03 Feb 2009 Paper by S. Tong, D. Koller Presented by Krzysztof Chalupka OUTLINE SVM intro Geometric interpretation Primal and dual form Convexity,
More informationLogistic Regression and Gradient Ascent
Logistic Regression and Gradient Ascent CS 349-02 (Machine Learning) April 0, 207 The perceptron algorithm has a couple of issues: () the predictions have no probabilistic interpretation or confidence
More informationIntroduction to Machine Learning
Introduction to Machine Learning Maximum Margin Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2017 Assignment 3: 2 late days to hand in tonight. Admin Assignment 4: Due Friday of next week. Last Time: MAP Estimation MAP
More informationAll lecture slides will be available at CSC2515_Winter15.html
CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 9: Support Vector Machines All lecture slides will be available at http://www.cs.toronto.edu/~urtasun/courses/csc2515/ CSC2515_Winter15.html Many
More informationData Mining: Models and Methods
Data Mining: Models and Methods Author, Kirill Goltsman A White Paper July 2017 --------------------------------------------------- www.datascience.foundation Copyright 2016-2017 What is Data Mining? Data
More informationMachine Learning / Jan 27, 2010
Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,
More information06: Logistic Regression
06_Logistic_Regression 06: Logistic Regression Previous Next Index Classification Where y is a discrete value Develop the logistic regression algorithm to determine what class a new input should fall into
More informationSUPPORT VECTOR MACHINES
SUPPORT VECTOR MACHINES Today Reading AIMA 8.9 (SVMs) Goals Finish Backpropagation Support vector machines Backpropagation. Begin with randomly initialized weights 2. Apply the neural network to each training
More information1-Nearest Neighbor Boundary
Linear Models Bankruptcy example R is the ratio of earnings to expenses L is the number of late payments on credit cards over the past year. We would like here to draw a linear separator, and get so a
More informationGradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz
Gradient Descent Wed Sept 20th, 2017 James McInenrey Adapted from slides by Francisco J. R. Ruiz Housekeeping A few clarifications of and adjustments to the course schedule: No more breaks at the midpoint
More informationCPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017
CPSC 340: Machine Learning and Data Mining More Linear Classifiers Fall 2017 Admin Assignment 3: Due Friday of next week. Midterm: Can view your exam during instructor office hours next week, or after
More informationNeural Networks. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Neural Networks These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these slides
More information.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar..
.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar.. Machine Learning: Support Vector Machines: Linear Kernel Support Vector Machines Extending Perceptron Classifiers. There are two ways to
More informationDr. Qadri Hamarsheh Supervised Learning in Neural Networks (Part 1) learning algorithm Δwkj wkj Theoretically practically
Supervised Learning in Neural Networks (Part 1) A prescribed set of well-defined rules for the solution of a learning problem is called a learning algorithm. Variety of learning algorithms are existing,
More informationPreface to the Second Edition. Preface to the First Edition. 1 Introduction 1
Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches
More informationCOMS 4771 Support Vector Machines. Nakul Verma
COMS 4771 Support Vector Machines Nakul Verma Last time Decision boundaries for classification Linear decision boundary (linear classification) The Perceptron algorithm Mistake bound for the perceptron
More informationProgramming Exercise 3: Multi-class Classification and Neural Networks
Programming Exercise 3: Multi-class Classification and Neural Networks Machine Learning Introduction In this exercise, you will implement one-vs-all logistic regression and neural networks to recognize
More informationCombined Weak Classifiers
Combined Weak Classifiers Chuanyi Ji and Sheng Ma Department of Electrical, Computer and System Engineering Rensselaer Polytechnic Institute, Troy, NY 12180 chuanyi@ecse.rpi.edu, shengm@ecse.rpi.edu Abstract
More informationClassification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging
1 CS 9 Final Project Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging Feiyu Chen Department of Electrical Engineering ABSTRACT Subject motion is a significant
More informationPerceptron as a graph
Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 10 th, 2007 2005-2007 Carlos Guestrin 1 Perceptron as a graph 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0-6 -4-2
More information