AKA: Logistic Regression Implementation

Size: px
Start display at page:

Download "AKA: Logistic Regression Implementation"

Transcription

1 AKA: Logistic Regression Implementation 1

2 Supervised classification is the problem of predicting to which category a new observation belongs. A category is chosen from a list of predefined categories. The categories are unspecified and unknown in unsupervised classification. The machine learning algorithm discovers categories from an analysis of the training examples. Examples cases in which the data analysis task is supervised classification: After having trained on a labeled set of training examples, the learner is presented with a new (unlabeled): , and the learner must decide if it is spam or not spam (yes or no?). Credit card transaction, and the learner must decide if it is fraudulent or not (yes or no?). Image of a cell, and the learner must decide if it is cancerous or not (malignant or benign?). In the above examples a classifier is constructed to predict categorical labels. These labels are yes or no for and credit cards, and malignant or benign for cells. 2

3 Difference Between Classification and Regression CLASSIFICATION In classification, a classifier assigns a given input to one of a finite set of categories (typically a set of two categories). A classifier learns how to classify an input by undergoing a training phase, in which it trains on a set of training examples and constructs a model. After the learning phase, the classifier uses the learned model to classify new data presented to it. The output is a discrete value (0/1), which indicates its membership of a category. REGRESSION In regression, the learner predicts a continuous value for each given input. The predictor learns how to predict values by fitting a curve to a set of training examples, thereby constructing a model. After the learning stage, the predictor predicts a value for a new input by applying the input to the learned curve and extracting the output value. The output is a continuous value on the fitted curve. 3

4 Typically, there are only two categories in supervised classification (i.e., output of the classifier) 0: Negative Class (e.g., benign tumor) 1: Positive Class (e.g., malignant tumor) There can be many categories, aka Multiclass classification problem. 4

5 Try Applying Linear Regression to a Classification Problem Tumor size is labelled as either Yes (malignant) or No by the supervised training data. Linear regression produces a straight line to approximate the training data. New inputs would be fitted to the straight line, and generally not mapped to the labels. Some cases of new inputs do not even fall between the labels 0 and 1. But, we need to specify 0 (benign) or 1 (malignant). Therefore, linear regression, in its unmodified standard form, does not make sense for classification problems. 5

6 Modifying Linear Regression for Classification Problems 0.5 Tumor size Threshold We can try modifying linear regression, for example, by modifying our interpretation of the result. We may interpret as: for all tumor sizes such that, then the tumor is benign, and otherwise it is malignant: 6

7 Incorrectly Classified 0.5 Relearning with an additional training example, while still using the previous threshold value of 0.5, causes the linear regression model to incorrectly classify some of the training examples. The threshold could be changed to say Note: a user sets the threshold value. But it may need to change for each new example added to the training set, especially, in online training mode, and this would be inappropriate for a user to change the threshold each time a new input is added. We would rather prefer the threshold be learned by the machine in the process of training. Therefore, linear regression and even a slight modification to linear regression are not appropriate for classification problems. 7

8 Instead of using linear regression for classification problems, let us develop a classifier from scratch. First, let s look at typical linearly separable data for classification purposes, and then determine what function can be used to separate the data. In many real-world cases, the data is not totally separable, due to noise and other causes, so we need to modify the interpretation of the output of the classifier, e.g., in terms of the classifier s certainty or probability that an input belongs to a certain class. Our goal is to use the training data to learn the function that separates the data. 8

9 Linear Decision Boundary Decision boundary Predict x if Predict o if Consider a classification system in which the data is represented by two features. As shown, the training examples may be separated by a line embedded in 2-dimensional space. The line forms a decision boundary. For all feature points that lie above or on the line, classify them as the x class, otherwise as the o class. Note, that for all feature points that lie above or on the decision boundary, then,. Likewise, for all feature points below the line,. 9

10 Decision boundary Note that the RHS of the decision boundary equation is very similar to the hypothesis used for regression. We will also call the hypothesis for this classification problem. To perform the classification task, we introduce a threshold function. If, then, else if, then. In regression, we used the hypothesis equation to fit a curve to the training data. In classification, we use the hypothesis equation to separate the training data into one of two labels (two, for this example). Let: Predict Predict if if 10

11 Non-linear Decision Boundary Decision boundary Let: Predict if Predict if 11

12 Non-linear Decision Boundary Depending on the polynomials or other non-linear functions used in the hypothesis, other more general non-linear boundaries are possible. 12

13 Note that a step function was used to decide if a given input should be classified as class 1 or class 0. I.e., that classifier makes binary decisions. Input Object Binary Decision Classifier: 1 if 0 if A problem with using a step function is that most real-world classification problems cannot be simply separated by a crisp boundary (i.e., a step function). The world is not black and white; there are different shades of grey. For data points very close to the decision boundary, the classifier may or may not classify them correctly. Real world problems have incomplete, noisy, and possibly incorrect data, and so, the classifier may not correctly classify those inputs. 13

14 A better approach would be to have the classifier output a measure of its level of certainty that the input data belongs to a given class. For example, consider the Logistic function (aka Sigmoid function). As the value gets closer to the midpoint of (i.e., close to 0.5) the classifier becomes increasingly uncertain about its classification, while as the values get very far from the midpoint (i.e., close to 0.0 or 1.0) the classifier becomes very certain about its classification. Note that: i.e.,, This is desired, because class labels: 14

15 is computed as before in regression, but, in classification, the result is input to a threshold function, in this case, the Logistic function, which may be interpreted as the probability that, given input. If the logistic classifier gives an output very close to 1.0, then it is very certain that the input training example belongs to the class label 1. If the logistic classifier gives and output near 0.5, then it is uncertain that the input training example belongs to the class label 1. If the logistic classifier gives an output very close to 0.0, then it is very certain that the input training example does not belong to the class label 1. 15

16 Let, and Then, tell patient: 70% chance of tumor being malignant. Usual Probability properties: + probability that, parametrized by, given 16

17 predict. predict

18 Training set: m Examples: How should we choose the values of the weights (parameters)? 18

19 Linear regression: Logistic regression: For Logistic Regression, let us consider using the same cost function as in Linear Regression. We don t need the ½ term. Problem is that the in logistic regression will yield a non-convex cost function, which will have multiple minima. Would like to have a convex cost function, so that there will be only one minimum, i.e., the global minimum. 19

20 A curve C is convex if a line segment between any two points in C lies in C. Convex curve. Has one minimum, a global minimum. Non convex curve. Has many local minimum, and a global minimum. 20

21 Problem: Non-convex Function Using a linear hypothesis, the following definition of yields a convex cost function in linear regression: Using the following definition of would yield a non-convex cost function in logistic regression: 21

22 Consider using the log function to obtain a convex cost function. 22

23 For : 23

24 For : This captures the intuition that if while, then penalize the algorithm by a very large cost. Also, if while, then the cost zero. 24

25 F : 25

26 F : This captures the intuition that if while, then penalize the algorithm by a very large cost. Also, if while, then the cost zero. 26

27 27

28 Apply gradient descent to minimize. Repeat for each feature { ), } 28

29 Interpreting the Minimization of For training examples, the minimizer will attempt to find that will make. For training examples, the minimizer will attempt to find that will make. The sweet spot will be somewhere in the middle. log 1 Cost log Clost curve for examples Cost curve for examples The position of the global minimum depends on the training set. The shape of these curves depends on the training set. or 29

30 Example Minimization of Consider training with 11 positive ( ) training examples only. Then add one negative ( ) example and train again. Cost of 11 positive + one negative training examples. The global minimum. Cost Cost of 11 positive ( 1) training examples + one negative training example, using. Cost of 11 positive ( 1) training examples; Cost 0, using. is not optimal, since if minimizer moves this way, then it is able to find lower minimum. 30

31 Expression for 31

32 Expression for See slide 35 32

33 Expression for 33

34 Linear regression and logistic regression appear to have the same expression for the derivatives. However, the expression for the predictor is different: Linear regression: Logistic regression: 34

35 Expression for Let 35

36 For each feature (j=0; j<=n; j++) { } 36

37 Training Phase For each feature (j=0; j<=n; j++) { Classification Phase To predict an output for a new input : } Note that this consists of two steps. The first step is the computation of, which is exactly the same as for linear regression. In linear regression the predicted output is. In logistic classification the is the decision boundary., 1 In logistic classification, the tells us on what side the input falls on the decision boundary, the positive (side) class or the negative (side) class. The second step passes through the sigmoid function to determine the class of the input and the probability that the new input belongs to the positive class. In logistic classification, the predicted output is the class assigned by the logistic function with the probability the new input belongs to the positive class. 37

38 This example uses a very simple data set that represents scores on a test in the 1 st column (x) and the pass/fail indicator in the 2 nd column (y). 0 represents fail, and 1 represents pass. x y

39 Opening and Plotting Data Files clear all; close all; clc t0=cputime(); x = load('x.txt'); y = load('y.txt'); m = length(y); figure; hold on for i=1:m if (y(i)==1) plot(x(i),y(i),'s', 'color', 'g', 'markersize', 18, 'markerfacecolor', 'g'); else plot(x(i),y(i),'o', 'color', 'r', 'markersize', 18, 'markerfacecolor', 'r'); endif endfor ylabel('pass/fail', 'fontsize', 18, 'fontname', 'Arial'); xlabel('exam 1 Score', 'fontsize', 18, 'fontname', 'Arial'); title('pass/fail vs Exam Score', 'fontsize', 20, 'fontname', 'Arial'); 39

40 x = [ones(m, 1) x]; % Add a column of ones to x numfeatures = size(x,2); theta = zeros(numfeatures,1); numtrainsam = size(x,1); maxiterations = 1000; learningrate = 5.0; errorperiteration = zeros(maxiterations,1); 40

41 For each iteration { For each feature (j=0; j<=n; j++) { for t=1:maxiterations toterror = 0; for j=1:numfeatures totslope = 0; for i=1:m z=0; } for jj=1:numfeatures } z=z+prevtheta(jj)*x(i,jj); end h=1.0/(1.0+exp(-z)); totslope = (totslope + (h-y(i))*x(i,j)); toterror = (toterror + -y(i)*log(h)-(1-y(i))*log(1-h)); end toterror=toterror/numtrainsam; theta(j)=theta(j)-learningrate*(totslope/numtrainsam); end prevtheta=theta; errorperiteration(t)=toterror/numfeatures; end 41

42 for i=1:m z=0; for jj=1:numfeatures z=z+prevtheta(jj)*x(i,jj); end h=1.0/(1.0+exp(-z)); totslope = (totslope + (h-y(i))*x(i,j)); toterror = (toterror + -y(i)*log(h)-(1-y(i))*log(1-h)); end 42

43 For each feature (j=0; j<=n; j++) { for t=1:maxiterations toterror = 0; for j=1:numfeatures totslope = 0; } for i=1:m z=0; for jj=1:numfeatures z=z+prevtheta(jj)*x(i,jj); end h=1.0/(1.0+exp(-z)); totslope = (totslope The intermediate + (h-y(i))*x(i,j)); updated toterror = (toterror should + not -y(i)*log(h)-(1-y(i))*log(1-h)); be used in the end computation of. toterror=toterror/numtrainsam; theta(j)=theta(j)-learningrate*(totslope/numtrainsam); end prevtheta=theta; errorperiteration(t)=toterror/numfeatures; After all have been computed, then end should be updated. Note: in the batch update method, when computing each, for, the previous is used. 43

44 Learning rate = 5.0 Number of iterations =

45 x y Training example. Training labeled output. Prediction. 45

46 References 46

Linear Regression Implementation

Linear Regression Implementation Linear Regression Implementation 1 When you experience regression, you go back in some way. The process of regressing is to go back to a less perfect or less developed state. Modeling data is focused on

More information

06: Logistic Regression

06: Logistic Regression 06_Logistic_Regression 06: Logistic Regression Previous Next Index Classification Where y is a discrete value Develop the logistic regression algorithm to determine what class a new input should fall into

More information

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples.

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples. Supervised Learning with Neural Networks We now look at how an agent might learn to solve a general problem by seeing examples. Aims: to present an outline of supervised learning as part of AI; to introduce

More information

Data Mining: Models and Methods

Data Mining: Models and Methods Data Mining: Models and Methods Author, Kirill Goltsman A White Paper July 2017 --------------------------------------------------- www.datascience.foundation Copyright 2016-2017 What is Data Mining? Data

More information

Multi-Class Logistic Regression and Perceptron

Multi-Class Logistic Regression and Perceptron Multi-Class Logistic Regression and Perceptron Instructor: Wei Xu Some slides adapted from Dan Jurfasky, Brendan O Connor and Marine Carpuat MultiClass Classification Q: what if we have more than 2 categories?

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

Data mining with Support Vector Machine

Data mining with Support Vector Machine Data mining with Support Vector Machine Ms. Arti Patle IES, IPS Academy Indore (M.P.) artipatle@gmail.com Mr. Deepak Singh Chouhan IES, IPS Academy Indore (M.P.) deepak.schouhan@yahoo.com Abstract: Machine

More information

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013 Machine Learning Topic 5: Linear Discriminants Bryan Pardo, EECS 349 Machine Learning, 2013 Thanks to Mark Cartwright for his extensive contributions to these slides Thanks to Alpaydin, Bishop, and Duda/Hart/Stork

More information

CSE 158. Web Mining and Recommender Systems. Midterm recap

CSE 158. Web Mining and Recommender Systems. Midterm recap CSE 158 Web Mining and Recommender Systems Midterm recap Midterm on Wednesday! 5:10 pm 6:10 pm Closed book but I ll provide a similar level of basic info as in the last page of previous midterms CSE 158

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018 MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge

More information

Data Preprocessing. Supervised Learning

Data Preprocessing. Supervised Learning Supervised Learning Regression Given the value of an input X, the output Y belongs to the set of real values R. The goal is to predict output accurately for a new input. The predictions or outputs y are

More information

Logistic Regression. May 28, Decision boundary is a property of the hypothesis and not the data set e z. g(z) = g(z) 0.

Logistic Regression. May 28, Decision boundary is a property of the hypothesis and not the data set e z. g(z) = g(z) 0. Logistic Regression May 28, 202 Logistic Regression. Decision Boundary Decision boundary is a property of the hypothesis and not the data set. sigmoid function: h (x) = g( x) = P (y = x; ) suppose predict

More information

Week 3: Perceptron and Multi-layer Perceptron

Week 3: Perceptron and Multi-layer Perceptron Week 3: Perceptron and Multi-layer Perceptron Phong Le, Willem Zuidema November 12, 2013 Last week we studied two famous biological neuron models, Fitzhugh-Nagumo model and Izhikevich model. This week,

More information

Classification Algorithms in Data Mining

Classification Algorithms in Data Mining August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms

More information

Improving Positron Emission Tomography Imaging with Machine Learning David Fan-Chung Hsu CS 229 Fall

Improving Positron Emission Tomography Imaging with Machine Learning David Fan-Chung Hsu CS 229 Fall Improving Positron Emission Tomography Imaging with Machine Learning David Fan-Chung Hsu (fcdh@stanford.edu), CS 229 Fall 2014-15 1. Introduction and Motivation High- resolution Positron Emission Tomography

More information

Lecture #11: The Perceptron

Lecture #11: The Perceptron Lecture #11: The Perceptron Mat Kallada STAT2450 - Introduction to Data Mining Outline for Today Welcome back! Assignment 3 The Perceptron Learning Method Perceptron Learning Rule Assignment 3 Will be

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

Function Algorithms: Linear Regression, Logistic Regression

Function Algorithms: Linear Regression, Logistic Regression CS 4510/9010: Applied Machine Learning 1 Function Algorithms: Linear Regression, Logistic Regression Paula Matuszek Fall, 2016 Some of these slides originated from Andrew Moore Tutorials, at http://www.cs.cmu.edu/~awm/tutorials.html

More information

DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS

DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS 1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes and a class attribute

More information

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016 CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:

More information

Machine Learning Lecture-1

Machine Learning Lecture-1 Machine Learning Lecture-1 Programming Club Indian Institute of Technology Kanpur pclubiitk@gmail.com February 25, 2016 Programming Club (IIT Kanpur) ML-1 February 25, 2016 1 / 18 Acknowledgement This

More information

Machine Learning for NLP

Machine Learning for NLP Machine Learning for NLP Support Vector Machines Aurélie Herbelot 2018 Centre for Mind/Brain Sciences University of Trento 1 Support Vector Machines: introduction 2 Support Vector Machines (SVMs) SVMs

More information

Understanding Andrew Ng s Machine Learning Course Notes and codes (Matlab version)

Understanding Andrew Ng s Machine Learning Course Notes and codes (Matlab version) Understanding Andrew Ng s Machine Learning Course Notes and codes (Matlab version) Note: All source materials and diagrams are taken from the Coursera s lectures created by Dr Andrew Ng. Everything I have

More information

CS 237: Probability in Computing

CS 237: Probability in Computing CS 237: Probability in Computing Wayne Snyder Computer Science Department Boston University Lecture 25: Logistic Regression Motivation: Why Logistic Regression? Sigmoid functions the logit transformation

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description

More information

CSE4334/5334 DATA MINING

CSE4334/5334 DATA MINING CSE4334/5334 DATA MINING Lecture 4: Classification (1) CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides courtesy

More information

Machine Learning / Jan 27, 2010

Machine Learning / Jan 27, 2010 Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,

More information

Classification: Linear Discriminant Functions

Classification: Linear Discriminant Functions Classification: Linear Discriminant Functions CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Discriminant functions Linear Discriminant functions

More information

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Lecture 20: Neural Networks for NLP. Zubin Pahuja Lecture 20: Neural Networks for NLP Zubin Pahuja zpahuja2@illinois.edu courses.engr.illinois.edu/cs447 CS447: Natural Language Processing 1 Today s Lecture Feed-forward neural networks as classifiers simple

More information

Unsupervised Learning: Clustering

Unsupervised Learning: Clustering Unsupervised Learning: Clustering Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein & Luke Zettlemoyer Machine Learning Supervised Learning Unsupervised Learning

More information

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4

More information

Lecture 9: Support Vector Machines

Lecture 9: Support Vector Machines Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and

More information

1 Case study of SVM (Rob)

1 Case study of SVM (Rob) DRAFT a final version will be posted shortly COS 424: Interacting with Data Lecturer: Rob Schapire and David Blei Lecture # 8 Scribe: Indraneel Mukherjee March 1, 2007 In the previous lecture we saw how

More information

Lecture 3: Linear Classification

Lecture 3: Linear Classification Lecture 3: Linear Classification Roger Grosse 1 Introduction Last week, we saw an example of a learning task called regression. There, the goal was to predict a scalar-valued target from a set of features.

More information

Kernel Methods & Support Vector Machines

Kernel Methods & Support Vector Machines & Support Vector Machines & Support Vector Machines Arvind Visvanathan CSCE 970 Pattern Recognition 1 & Support Vector Machines Question? Draw a single line to separate two classes? 2 & Support Vector

More information

CMPT 882 Week 3 Summary

CMPT 882 Week 3 Summary CMPT 882 Week 3 Summary! Artificial Neural Networks (ANNs) are networks of interconnected simple units that are based on a greatly simplified model of the brain. ANNs are useful learning tools by being

More information

CSE 158 Lecture 2. Web Mining and Recommender Systems. Supervised learning Regression

CSE 158 Lecture 2. Web Mining and Recommender Systems. Supervised learning Regression CSE 158 Lecture 2 Web Mining and Recommender Systems Supervised learning Regression Supervised versus unsupervised learning Learning approaches attempt to model data in order to solve a problem Unsupervised

More information

CS 584 Data Mining. Classification 1

CS 584 Data Mining. Classification 1 CS 584 Data Mining Classification 1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class. Find a model for

More information

All lecture slides will be available at CSC2515_Winter15.html

All lecture slides will be available at  CSC2515_Winter15.html CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 9: Support Vector Machines All lecture slides will be available at http://www.cs.toronto.edu/~urtasun/courses/csc2515/ CSC2515_Winter15.html Many

More information

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017 CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2017 Assignment 3: 2 late days to hand in tonight. Admin Assignment 4: Due Friday of next week. Last Time: MAP Estimation MAP

More information

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu Natural Language Processing CS 6320 Lecture 6 Neural Language Models Instructor: Sanda Harabagiu In this lecture We shall cover: Deep Neural Models for Natural Language Processing Introduce Feed Forward

More information

2. On classification and related tasks

2. On classification and related tasks 2. On classification and related tasks In this part of the course we take a concise bird s-eye view of different central tasks and concepts involved in machine learning and classification particularly.

More information

Weka ( )

Weka (  ) Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised

More information

COMPUTATIONAL INTELLIGENCE (CS) (INTRODUCTION TO MACHINE LEARNING) SS16. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

COMPUTATIONAL INTELLIGENCE (CS) (INTRODUCTION TO MACHINE LEARNING) SS16. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions COMPUTATIONAL INTELLIGENCE (CS) (INTRODUCTION TO MACHINE LEARNING) SS16 Lecture 2: Linear Regression Gradient Descent Non-linear basis functions LINEAR REGRESSION MOTIVATION Why Linear Regression? Regression

More information

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition Pattern Recognition Kjell Elenius Speech, Music and Hearing KTH March 29, 2007 Speech recognition 2007 1 Ch 4. Pattern Recognition 1(3) Bayes Decision Theory Minimum-Error-Rate Decision Rules Discriminant

More information

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18 Lecture 2: Linear Regression Gradient Descent Non-linear basis functions LINEAR REGRESSION MOTIVATION Why Linear Regression? Simplest

More information

Comparative analysis of classifier algorithm in data mining Aikjot Kaur Narula#, Dr.Raman Maini*

Comparative analysis of classifier algorithm in data mining Aikjot Kaur Narula#, Dr.Raman Maini* Comparative analysis of classifier algorithm in data mining Aikjot Kaur Narula#, Dr.Raman Maini* #Student, Department of Computer Engineering, Punjabi university Patiala, India, aikjotnarula@gmail.com

More information

Stat 342 Exam 3 Fall 2014

Stat 342 Exam 3 Fall 2014 Stat 34 Exam 3 Fall 04 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed There are questions on the following 6 pages. Do as many of them as you can

More information

Machine Learning Classifiers and Boosting

Machine Learning Classifiers and Boosting Machine Learning Classifiers and Boosting Reading Ch 18.6-18.12, 20.1-20.3.2 Outline Different types of learning problems Different types of learning algorithms Supervised learning Decision trees Naïve

More information

Massachusetts Institute of Technology. Department of Computer Science and Electrical Engineering /6.866 Machine Vision Quiz I

Massachusetts Institute of Technology. Department of Computer Science and Electrical Engineering /6.866 Machine Vision Quiz I Massachusetts Institute of Technology Department of Computer Science and Electrical Engineering 6.801/6.866 Machine Vision Quiz I Handed out: 2004 Oct. 21st Due on: 2003 Oct. 28th Problem 1: Uniform reflecting

More information

k-nearest Neighbor (knn) Sept Youn-Hee Han

k-nearest Neighbor (knn) Sept Youn-Hee Han k-nearest Neighbor (knn) Sept. 2015 Youn-Hee Han http://link.koreatech.ac.kr ²Eager Learners Eager vs. Lazy Learning when given a set of training data, it will construct a generalization model before receiving

More information

Orange3 Educational Add-on Documentation

Orange3 Educational Add-on Documentation Orange3 Educational Add-on Documentation Release 0.1 Biolab Jun 01, 2018 Contents 1 Widgets 3 2 Indices and tables 27 i ii Widgets in Educational Add-on demonstrate several key data mining and machine

More information

Part I. Instructor: Wei Ding

Part I. Instructor: Wei Ding Classification Part I Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Classification: Definition Given a collection of records (training set ) Each record contains a set

More information

CHAPTER 6 DETECTION OF MASS USING NOVEL SEGMENTATION, GLCM AND NEURAL NETWORKS

CHAPTER 6 DETECTION OF MASS USING NOVEL SEGMENTATION, GLCM AND NEURAL NETWORKS 130 CHAPTER 6 DETECTION OF MASS USING NOVEL SEGMENTATION, GLCM AND NEURAL NETWORKS A mass is defined as a space-occupying lesion seen in more than one projection and it is described by its shapes and margin

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

Logistic Regression and Gradient Ascent

Logistic Regression and Gradient Ascent Logistic Regression and Gradient Ascent CS 349-02 (Machine Learning) April 0, 207 The perceptron algorithm has a couple of issues: () the predictions have no probabilistic interpretation or confidence

More information

6.034 Quiz 2, Spring 2005

6.034 Quiz 2, Spring 2005 6.034 Quiz 2, Spring 2005 Open Book, Open Notes Name: Problem 1 (13 pts) 2 (8 pts) 3 (7 pts) 4 (9 pts) 5 (8 pts) 6 (16 pts) 7 (15 pts) 8 (12 pts) 9 (12 pts) Total (100 pts) Score 1 1 Decision Trees (13

More information

CS 179 Lecture 16. Logistic Regression & Parallel SGD

CS 179 Lecture 16. Logistic Regression & Parallel SGD CS 179 Lecture 16 Logistic Regression & Parallel SGD 1 Outline logistic regression (stochastic) gradient descent parallelizing SGD for neural nets (with emphasis on Google s distributed neural net implementation)

More information

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models DB Tsai Steven Hillion Outline Introduction Linear / Nonlinear Classification Feature Engineering - Polynomial Expansion Big-data

More information

For Monday. Read chapter 18, sections Homework:

For Monday. Read chapter 18, sections Homework: For Monday Read chapter 18, sections 10-12 The material in section 8 and 9 is interesting, but we won t take time to cover it this semester Homework: Chapter 18, exercise 25 a-b Program 4 Model Neuron

More information

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target

More information

Experimenting with Multi-Class Semi-Supervised Support Vector Machines and High-Dimensional Datasets

Experimenting with Multi-Class Semi-Supervised Support Vector Machines and High-Dimensional Datasets Experimenting with Multi-Class Semi-Supervised Support Vector Machines and High-Dimensional Datasets Alex Gonopolskiy Ben Nash Bob Avery Jeremy Thomas December 15, 007 Abstract In this paper we explore

More information

HOUGH TRANSFORM CS 6350 C V

HOUGH TRANSFORM CS 6350 C V HOUGH TRANSFORM CS 6350 C V HOUGH TRANSFORM The problem: Given a set of points in 2-D, find if a sub-set of these points, fall on a LINE. Hough Transform One powerful global method for detecting edges

More information

Machine Learning in Biology

Machine Learning in Biology Università degli studi di Padova Machine Learning in Biology Luca Silvestrin (Dottorando, XXIII ciclo) Supervised learning Contents Class-conditional probability density Linear and quadratic discriminant

More information

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York A Systematic Overview of Data Mining Algorithms Sargur Srihari University at Buffalo The State University of New York 1 Topics Data Mining Algorithm Definition Example of CART Classification Iris, Wine

More information

Application of Support Vector Machine Algorithm in Spam Filtering

Application of Support Vector Machine Algorithm in  Spam Filtering Application of Support Vector Machine Algorithm in E-Mail Spam Filtering Julia Bluszcz, Daria Fitisova, Alexander Hamann, Alexey Trifonov, Advisor: Patrick Jähnichen Abstract The problem of spam classification

More information

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask Machine Learning and Data Mining Clustering (1): Basics Kalev Kask Unsupervised learning Supervised learning Predict target value ( y ) given features ( x ) Unsupervised learning Understand patterns of

More information

ML4Bio Lecture #1: Introduc3on. February 24 th, 2016 Quaid Morris

ML4Bio Lecture #1: Introduc3on. February 24 th, 2016 Quaid Morris ML4Bio Lecture #1: Introduc3on February 24 th, 216 Quaid Morris Course goals Prac3cal introduc3on to ML Having a basic grounding in the terminology and important concepts in ML; to permit self- study,

More information

Logistic Regression: Probabilistic Interpretation

Logistic Regression: Probabilistic Interpretation Logistic Regression: Probabilistic Interpretation Approximate 0/1 Loss Logistic Regression Adaboost (z) SVM Solution: Approximate 0/1 loss with convex loss ( surrogate loss) 0-1 z = y w x SVM (hinge),

More information

Programming Exercise 1: Linear Regression

Programming Exercise 1: Linear Regression Programming Exercise 1: Linear Regression Machine Learning Introduction In this exercise, you will implement linear regression and get to see it work on data. Before starting on this programming exercise,

More information

Machine Learning using Matlab. Lecture 3 Logistic regression and regularization

Machine Learning using Matlab. Lecture 3 Logistic regression and regularization Machine Learning using Matlab Lecture 3 Logistic regression and regularization Presentation Date (correction) 10.07.2017 11.07.2017 17.07.2017 18.07.2017 24.07.2017 25.07.2017 Project proposals 13 submissions,

More information

Types of Edges. Why Edge Detection? Types of Edges. Edge Detection. Gradient. Edge Detection

Types of Edges. Why Edge Detection? Types of Edges. Edge Detection. Gradient. Edge Detection Why Edge Detection? How can an algorithm extract relevant information from an image that is enables the algorithm to recognize objects? The most important information for the interpretation of an image

More information

1 Machine Learning System Design

1 Machine Learning System Design Machine Learning System Design Prioritizing what to work on: Spam classification example Say you want to build a spam classifier Spam messages often have misspelled words We ll have a labeled training

More information

Classification and Regression

Classification and Regression Classification and Regression Announcements Study guide for exam is on the LMS Sample exam will be posted by Monday Reminder that phase 3 oral presentations are being held next week during workshops Plan

More information

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017 CPSC 340: Machine Learning and Data Mining More Linear Classifiers Fall 2017 Admin Assignment 3: Due Friday of next week. Midterm: Can view your exam during instructor office hours next week, or after

More information

Classification: Feature Vectors

Classification: Feature Vectors Classification: Feature Vectors Hello, Do you want free printr cartriges? Why pay more when you can get them ABSOLUTELY FREE! Just # free YOUR_NAME MISSPELLED FROM_FRIEND... : : : : 2 0 2 0 PIXEL 7,12

More information

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners Data Mining 3.5 (Instance-Based Learners) Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction k-nearest-neighbor Classifiers References Introduction Introduction Lazy vs. eager learning Eager

More information

.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar..

.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar.. .. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar.. Machine Learning: Support Vector Machines: Linear Kernel Support Vector Machines Extending Perceptron Classifiers. There are two ways to

More information

Linear Regression and K-Nearest Neighbors 3/28/18

Linear Regression and K-Nearest Neighbors 3/28/18 Linear Regression and K-Nearest Neighbors 3/28/18 Linear Regression Hypothesis Space Supervised learning For every input in the data set, we know the output Regression Outputs are continuous A number,

More information

Chapter 2: The Normal Distribution

Chapter 2: The Normal Distribution Chapter 2: The Normal Distribution 2.1 Density Curves and the Normal Distributions 2.2 Standard Normal Calculations 1 2 Histogram for Strength of Yarn Bobbins 15.60 16.10 16.60 17.10 17.60 18.10 18.60

More information

Using Decision Boundary to Analyze Classifiers

Using Decision Boundary to Analyze Classifiers Using Decision Boundary to Analyze Classifiers Zhiyong Yan Congfu Xu College of Computer Science, Zhejiang University, Hangzhou, China yanzhiyong@zju.edu.cn Abstract In this paper we propose to use decision

More information

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz Gradient Descent Wed Sept 20th, 2017 James McInenrey Adapted from slides by Francisco J. R. Ruiz Housekeeping A few clarifications of and adjustments to the course schedule: No more breaks at the midpoint

More information

CS 8520: Artificial Intelligence

CS 8520: Artificial Intelligence CS 8520: Artificial Intelligence Machine Learning 2 Paula Matuszek Spring, 2013 1 Regression Classifiers We said earlier that the task of a supervised learning system can be viewed as learning a function

More information

Linear Regression & Gradient Descent

Linear Regression & Gradient Descent Linear Regression & Gradient Descent These slides were assembled by Byron Boots, with grateful acknowledgement to Eric Eaton and the many others who made their course materials freely available online.

More information

What is machine learning?

What is machine learning? Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship

More information

Trade-offs in Explanatory

Trade-offs in Explanatory 1 Trade-offs in Explanatory 21 st of February 2012 Model Learning Data Analysis Project Madalina Fiterau DAP Committee Artur Dubrawski Jeff Schneider Geoff Gordon 2 Outline Motivation: need for interpretable

More information

Lecture 15: Segmentation (Edge Based, Hough Transform)

Lecture 15: Segmentation (Edge Based, Hough Transform) Lecture 15: Segmentation (Edge Based, Hough Transform) c Bryan S. Morse, Brigham Young University, 1998 000 Last modified on February 3, 000 at :00 PM Contents 15.1 Introduction..............................................

More information

Lecture 1 Notes. Outline. Machine Learning. What is it? Instructors: Parth Shah, Riju Pahwa

Lecture 1 Notes. Outline. Machine Learning. What is it? Instructors: Parth Shah, Riju Pahwa Instructors: Parth Shah, Riju Pahwa Lecture 1 Notes Outline 1. Machine Learning What is it? Classification vs. Regression Error Training Error vs. Test Error 2. Linear Classifiers Goals and Motivations

More information

Linear Models. Lecture Outline: Numeric Prediction: Linear Regression. Linear Classification. The Perceptron. Support Vector Machines

Linear Models. Lecture Outline: Numeric Prediction: Linear Regression. Linear Classification. The Perceptron. Support Vector Machines Linear Models Lecture Outline: Numeric Prediction: Linear Regression Linear Classification The Perceptron Support Vector Machines Reading: Chapter 4.6 Witten and Frank, 2nd ed. Chapter 4 of Mitchell Solving

More information

Credit card Fraud Detection using Predictive Modeling: a Review

Credit card Fraud Detection using Predictive Modeling: a Review February 207 IJIRT Volume 3 Issue 9 ISSN: 2396002 Credit card Fraud Detection using Predictive Modeling: a Review Varre.Perantalu, K. BhargavKiran 2 PG Scholar, CSE, Vishnu Institute of Technology, Bhimavaram,

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! h0p://www.cs.toronto.edu/~rsalakhu/ Lecture 3 Parametric Distribu>ons We want model the probability

More information

Deep Neural Networks Optimization

Deep Neural Networks Optimization Deep Neural Networks Optimization Creative Commons (cc) by Akritasa http://arxiv.org/pdf/1406.2572.pdf Slides from Geoffrey Hinton CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy

More information

A Survey on Postive and Unlabelled Learning

A Survey on Postive and Unlabelled Learning A Survey on Postive and Unlabelled Learning Gang Li Computer & Information Sciences University of Delaware ligang@udel.edu Abstract In this paper we survey the main algorithms used in positive and unlabeled

More information

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule. CS 188: Artificial Intelligence Fall 2008 Lecture 24: Perceptrons II 11/24/2008 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit

More information

A Survey On Data Mining Algorithm

A Survey On Data Mining Algorithm A Survey On Data Mining Algorithm Rohit Jacob Mathew 1 Sasi Rekha Sankar 1 Preethi Varsha. V 2 1 Dept. of Software Engg., 2 Dept. of Electronics & Instrumentation Engg. SRM University India Abstract This

More information

Introduction. Welcome. Machine Learning

Introduction. Welcome. Machine Learning Introduction Welcome Machine Learning SPAM Machine Learning -Grew out of work in AI -New capability for computers Examples: -Database mining Large datasets from growth of automation/web. E.g., Web click

More information

Large Scale Data Analysis Using Deep Learning

Large Scale Data Analysis Using Deep Learning Large Scale Data Analysis Using Deep Learning Machine Learning Basics - 1 U Kang Seoul National University U Kang 1 In This Lecture Overview of Machine Learning Capacity, overfitting, and underfitting

More information

CS273 Midterm Exam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014

CS273 Midterm Exam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014 CS273 Midterm Eam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014 Your name: Your UCINetID (e.g., myname@uci.edu): Your seat (row and number): Total time is 80 minutes. READ THE

More information

Gene Clustering & Classification

Gene Clustering & Classification BINF, Introduction to Computational Biology Gene Clustering & Classification Young-Rae Cho Associate Professor Department of Computer Science Baylor University Overview Introduction to Gene Clustering

More information