Function Algorithms: Linear Regression, Logistic Regression

Similar documents
Supervised vs unsupervised clustering

Data Mining: Classifier Evaluation. CSCI-B490 Seminar in Computer Science (Data Mining)

Didacticiel - Études de cas

CSE 158. Web Mining and Recommender Systems. Midterm recap

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Weka ( )

CS 8520: Artificial Intelligence

DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi

CS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Performance Evaluation of Various Classification Algorithms

CS 8520: Artificial Intelligence. Weka Lab. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek

Slice Intelligence!

AKA: Logistic Regression Implementation

MACHINE LEARNING TOOLBOX. Logistic regression on Sonar

Topics in Machine Learning

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016

06: Logistic Regression

Dummy variables for categorical predictive attributes

Lecture 25: Review I

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

Tutorials Case studies

10/14/2017. Dejan Sarka. Anomaly Detection. Sponsors

COMP s1 - Getting started with the Weka Machine Learning Toolkit

Evaluating Classifiers

Data Mining: STATISTICA

CPSC 340: Machine Learning and Data Mining. Logistic Regression Fall 2016

Tour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers

I211: Information infrastructure II

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error

Outline. Prepare the data Classification and regression Clustering Association rules Graphic user interface

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

Machine Learning and Data Mining. Clustering (1): Basics. Kalev Kask

Unsupervised Learning I: K-Means Clustering

ADVANCED CLASSIFICATION TECHNIQUES

Attribute Discretization and Selection. Clustering. NIKOLA MILIKIĆ UROŠ KRČADINAC

Classification and Regression

Supervised Learning Classification Algorithms Comparison

CPSC 340: Machine Learning and Data Mining. Multi-Class Classification Fall 2017

Programming Exercise 3: Multi-class Classification and Neural Networks

Classification. 1 o Semestre 2007/2008

CS145: INTRODUCTION TO DATA MINING

Network Traffic Measurements and Analysis

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13

Multi-Class Logistic Regression and Perceptron

APPLICATION OF SOFTMAX REGRESSION AND ITS VALIDATION FOR SPECTRAL-BASED LAND COVER MAPPING

CS145: INTRODUCTION TO DATA MINING

Hsiaochun Hsu Date: 12/12/15. Support Vector Machine With Data Reduction

IBL and clustering. Relationship of IBL with CBR

Artificial Intelligence. Programming Styles

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

Tutorial on Machine Learning Tools

CS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM. Mingon Kang, PhD Computer Science, Kennesaw State University

Applying Improved Random Forest Explainability (RFEX 2.0) steps on synthetic data for variable features having a unimodal distribution

A Comparative Study of Locality Preserving Projection and Principle Component Analysis on Classification Performance Using Logistic Regression

Data Mining Algorithms: Basic Methods

Robot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning

Evaluating Classifiers

Data: a collection of numbers or facts that require further processing before they are meaningful

Supervised Learning. CS 586 Machine Learning. Prepared by Jugal Kalita. With help from Alpaydin s Introduction to Machine Learning, Chapter 2.

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

node2vec: Scalable Feature Learning for Networks

Practical Data Mining COMP-321B. Tutorial 4: Preprocessing

CS 4510/9010 Applied Machine Learning. Neural Nets. Paula Matuszek Fall copyright Paula Matuszek 2016

CS229 Final Project One Click: Object Removal

Bayes Net Learning. EECS 474 Fall 2016

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Using Weka for Classification. Preparing a data file

Text classification II CE-324: Modern Information Retrieval Sharif University of Technology

Support vector machines

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Machine Learning: Algorithms and Applications Mockup Examination

6. Dicretization methods 6.1 The purpose of discretization

CS 179 Lecture 16. Logistic Regression & Parallel SGD

Detecting Thoracic Diseases from Chest X-Ray Images Binit Topiwala, Mariam Alawadi, Hari Prasad { topbinit, malawadi, hprasad

Practice Questions for Midterm

Machine Learning. B. Unsupervised Learning B.1 Cluster Analysis. Lars Schmidt-Thieme, Nicolas Schilling

Unsupervised Learning : Clustering

What is Data Mining? Data Mining. Data Mining Architecture. Illustrative Applications. Pharmaceutical Industry. Pharmaceutical Industry

Non-trivial extraction of implicit, previously unknown and potentially useful information from data

Applying Supervised Learning

CS4495/6495 Introduction to Computer Vision. 8C-L1 Classification: Discriminative models

1 Machine Learning System Design

CSE4334/5334 DATA MINING

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2016

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Homework2 Chapter4 exersices Hongying Du

Input: Concepts, Instances, Attributes

SOCIAL MEDIA MINING. Data Mining Essentials

EFFECTIVENESS PREDICTION OF MEMORY BASED CLASSIFIERS FOR THE CLASSIFICATION OF MULTIVARIATE DATA SET

Didacticiel - Études de cas

CLASSIFICATION JELENA JOVANOVIĆ. Web:

Using Machine Learning to Optimize Storage Systems

Overview. State-of-the-Art. Relative cost of error correction. CS 619 Introduction to OO Design and Development. Testing.

Classification. Instructor: Wei Ding

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá

Extra readings beyond the lecture slides are important:

Transcription:

CS 4510/9010: Applied Machine Learning 1 Function Algorithms: Linear Regression, Logistic Regression Paula Matuszek Fall, 2016 Some of these slides originated from Andrew Moore Tutorials, at http://www.cs.cmu.edu/~awm/tutorials.html

Linear models for classification 2 Any regression technique can be used for classification. Binary classification Line separates the two classes Decision boundary - defines where the decision changes from one class value to the other Prediction is made by plugging in observed values of the attributes into the expression Predict one class if output cutoff, and the other class if output < cutoff Data Mining: Practical Machine Learning Tools and Techniques (Chapter 3)

Weka With Binary Class 3 LinearRegression can only be run if class is numeric. NominalToNumeric unsupervised attribute filter. Filters don t run on class attribute irispetal.arff Resulting predictions are close to 0 or 1. There is a filter to add classifier results top attributes AddClassification supervised attribute filter Does exactly the same thing as the classifier Can then get the predicted nominal class, using predicted Classification to predict the original class, using something as simple as OneR.

Classification 4 For multiple classes, One vs many comparison: Training: perform a regression for each class, setting the output to 1 for training instances that belong to class, and 0 for those that don t Prediction: predict class corresponding to model with largest output value (membership value) Sometimes called multi-response linear regression Data Mining: Practical Machine Learning Tools and Techniques (Chapter 4)

Logistic Regression 5 Linear regression methods predict a numeric attribute; we can then use it as a classifier by setting a cutoff score Complicated! These numeric predictions are not probabilities Can be negative, can be >1. If our outcome attribute is nominal rather than numeric, we more typically use a logistic regression model instead. Predict probability that an instance falls into each class Sometimes we would prefer to know the relative probabilities. Will it rain? Does this person have diabetes?

Logistic Regression 6 Still using linear sum of weighted attributes to predict class But it is transformed using a logit transformation Probability(class1 a1, a2,,ak) = 1/(1+e -w 0 -w 1 a 1 -w 2 a 2 - -w k a k) Results in a value between 0 and 1 Weights maximize log-likelihood https://www.ling.upenn.edu/~joseff/rstudy/week7.html

Logistic Regression in Weka 7 Expects a nominal class, not numeric Get the usual evaluation statistics for nominal class: Accuracy, confusion matrix, precision, recall, etc May be especially interesting to look at the instances vote.arff: democrat or republican Binary: Single model. Rrobability of democrat Predictions on test data give P(class) and errors most are near 1. Values lower than 1 are often errors Usual evaluation statistics: accuracy, confusing matrix, precision and recall, etc.

Logistic Regression, Multi-class 8 glass.arff Seven possible class values: Six models Does this instance belong to this class? If more than one model says yes, highest probability If none of them says yes, seventh class (headlamps) We have a model for vehicle wind non-float, but no instances in the training data and no predicted instances Model is not meaningful. Usual evaluation statistics

Logistic Regression 9 Most typically used when class is nominal and other attributes are numeric. Like linear regression, not sensitive to irrelevant attributes. Still normally assumes linearity.