Manuel Oviedo de la Fuente and Manuel Febrero Bande

Size: px
Start display at page:

Download "Manuel Oviedo de la Fuente and Manuel Febrero Bande"

Transcription

1 Supervised classification methods in by fda.usc package Manuel Oviedo de la Fuente and Manuel Febrero Bande Universidade de Santiago de Compostela CNTG (Centro de Novas Tecnoloxías de Galicia). Santiago de Compostela, 3 y 4 de octubre de 04

2 Introduction Multivariate Real Data Example Iris data. This data set gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris. The species are Iris setosa, versicolor, and virginica. Sepal.Length Sepal.Width Petal.Length Petal.Width Species October, 04 / 9

3 Introduction Multivariate Simulation Data Example Scatterplot of two uniform samples. October, 04 3 / 9

4 Introduction Functional Real Data Example Tecator data. 5 spectrometric curves of meat with Fat, Water and Protein contents. Goal: Explain the fat content through spectrometric curves. data(tecator) par(mfrow=c(,3)) fat5<-ifelse((y<-tecator$y$fat)<5,,4) boxplot(y,main="fat") plot((x<-tecator$absorp),col=fat5,main="spectrometric: X") plot((x.d<-fdata.deriv(tecator$absorp,)),col=fat5,main="derviative: X.d") Fat Spectrometric: X Derviative: X.d Absorbances d(absorbances,) Wavelength (mm) Wavelength (mm) October, 04 4 / 9

5 Introduction Functional Simulation Data Example Model, k=. Model, k=. Model 3, k=. X(t) m m X(t) m m X(t) m m t t t A sample of 40 functions (Ornstein Uhlenbeck process with Gaussian error) for every simulation model along with the means of each sub-group (G s (black lines) and G s (red line)). October, 04 5 / 9

6 Introduction Some definitions of Functional Data Functional data analysis is a branch of statistics that analyzes data providing information about curves, surfaces or anything else varying over a continuum. The continuum is often time, but may also be spatial location, wavelength, probability, etc. Functional data analysis is a branch of statistics concerned with analysing data in the form of functions, [Ferraty and Vieu, (006)]. Definition.. A random variable X is called a functional variable if it takes values in a functional space E complete normed (or seminormed) space. Definition.. A functional dataset {X,..., X n} is the observation of n functional variables X,..., X n identically distributed as X. October, 04 6 / 9

7 Introduction Example of functional dataset in fda.usc Load the library fda.usc [Febrero-Bande and Oviedo de la Fuente, 0] An object called fdata as a list of the following components: data: typically a matrix of (n x m) dimension which contains a set of n curves discretized in m points or argvals. argvals: locations of the discretization points, by default: {t =,..., t m = m}. rangeval: rangeval of discretization points. names: list with an overall title, xlab, a title for the x axis and ylab, a title for the y axis. library(fda.usc) data(tecator) class(tecator$absorp.fdata) [] "fdata" names(tecator$absorp.fdata) [] "data" "argvals" "rangeval" "names" October, 04 7 / 9

8 Introduction Example of image data: Yoga dataset The dataset was obtained by capturing two actors transiting between yoga poses in front of a green screen. It has been shown recently that in many domains it can be useful to convert images into pseudo time series. Therefore we have converted the motion capture data into time series by a well known technique, see [Keogh et al., 0]. October, 04 8 / 9

9 Introduction Form image data to functional data The dataset was obtained by capturing two actors transiting between yoga poses in front of a green screen. It has been shown recently that in many domains it can be useful to convert images into pseudo time series. Therefore we have converted the motion capture data into time series by a well-known technique, see [Keogh et al., 0]. October, 04 9 / 9

10 Supervised classification Functional supervised classification Aim: How predict the class Y of a functional variable X Bayes rule: Given a sample X, the aim is to estimate the posterior probability of belonging to each group: The classification rule: p g(x ) = P(Y = g χ = X ) = E( Y =g χ = X ) Ŷ = arg maxˆp g(x ) The estimate of the posterior probability p g(x ) can be calculated using logistic regression or non parametric regression. The package allows the estimation of the groups in a training set of functional data by k-nearest Neighbor Classifier: classif.knn Kernel Classifier: classif.kernel Logistic Classifier (linear model): classif.glm Logistic Classifier (additive model): classif.gsam and classif.gkam Distance Classifier: classif.dist DD classifier: classif.dd October, 04 0 / 9

11 Classification via regression models Generalized Functional Linear Model The scalar response y is estimated by functional {X q(t)} Q q= and also non functional Z = { } J Z j covariates by: j= Q y i = g α + Z i β + X q i (t), β q(t) + ε i q= g() is the inverse link function and ε i are random errors with mean zero and finite variance σ. [Ramsay and Silverman, 005] uses fixed basis representation of X (t) and β(t): B spline, Fourier, Wavelets, create.basis. [Cardot et al., 999] uses so-called functional principal components regression (FPC),create.pc.basis. [Preda et al., 007] uses so-called functional partial least squares components regression (FPLS), create.pls.basis. October, 04 / 9

12 Classification via regression models Generalized Functional Linear Model ldata<-list("df"=tecator$y,"absorp.fdata"=tecator$absorp.fdata, "absorp.d"=fdata.deriv(tecator$absorp.fdata)) ldata$df$fat5<-factor(ifelse(tecator$y$fat<5,0,)) res.glm<- classif.glm( fat5 ~ absorp.d,data=ldata) res.glm -Call: classif.glm(formula = fat5 ~ absorp.d, data = ldata) -Probability of correct classification: res.glm$fit[[]] Call: glm(formula = pf) Coefficients: (Intercept) absorp.d.bspl4. absorp.d.bspl absorp.d.bspl4.3 absorp.d.bspl4.4 absorp.d.bspl Degrees of Freedom: 4 Total (i.e. Null); 09 Residual Null Deviance: 98 Residual Deviance: 8.5 AIC: 40.5 October, 04 / 9

13 Classification via regression models Generalized Functional Additive Model Generalized Functional Spectral Additive Linear Model (FGSAM), [Müller and Yao, 008], J ( y i = g α + f j Z j ) Q ( (t)) + s i q X q + ε i i j= q= where f ( ), s( ) are the smoothed functions. res.gsam<-classif.gsam(fat5~ s(absorp.d),data=ldata) res.gsam -Call: classif.gsam(formula = fat5 ~ s(absorp.d), data = ldata) -Probability of correct classification: res.gsam$fit[[]] Family: binomial Link function: logit Formula: [] "fat5~+s(absorp.d.bspl4.,k=-)+s(absorp.d.bspl4.,k=-)+s(absorp.d.bspl4.3,k=-)+s(absorp.d.bs Estimated degrees of freedom: total = 8.5 UBRE score: October, 04 3 / 9

14 Classification via regression models Generalized Functional Additive Model Generalized Functional Kernel Additive Linear Model (FGKAM), [Febrero-Bande and González-Manteiga, 03], Q ( (t)) y i = g α + K X q + ε i i q= where K( ) is the kernel estimator (extends the knn classifier). res.gkam<-classif.gkam(fat5 ~ absorp.d,data=ldata) res.gkam -Call: classif.gkam(formula = fat5 ~ absorp.d, data = ldata) -Probability of correct classification: res.gkam$fit[[]] Family: binomial Link function: logit alpha= -8.9 n= 5 **** **** **** **** **** **** h cor(f(x),eta) edf f(absorp.d) **** **** **** **** **** **** edf: Equivalent degrees of freedom AIC= 69.3 Deviance explained = 89.8 % R-sq.= 0.93 R-sq.(adj)= 0.89 October, 04 4 / 9

15 Classification by depth functions Classification by DD-classifier Make the group classification of a training dataset using DD-classifier estimation in the following steps. Step. The function computes the selected depth measure of the points in x (multivariate data or multivariate functional data) w.r.t. a subsample of each G level group. October, 04 5 / 9

16 Classification by depth functions DD plot Step. The function calculates the misclassification rate based on data depth computed in step () using the following classifiers: "MaxD": Maximum depth. "DD","DD","DD3": Search the best separating polynomial of degree,, 3. DD-plot(HS,DD) DD-plot(HS,DD) depth depth depth depth DD-plot(HS,DD3) DD-plot(HS,MaxD) depth depth depth depth From left to right, top to bottom DD plot using DD, DD, DD3 and Maximum Depth classifiers to the DD-plot. The one-dimensional depth in all cases is the Tukey depth. October, 04 6 / 9

17 Classification by depth functions DD classifier for Multivariate Data "glm","gam": Generalized Linear (or Additive) Models. "lda","qda": Linear Discriminant (or Quadratic) Analysis. "knn","np": Non-parametric k-nearest Neighbour (or Kernel) classifier. DD-plot(HS,lda) DD-plot(HS,qda) depth depth depth depth DD-plot(HS,knn) DD-plot(HS,glm) depth depth depth depth DD-plot(HS,gam) DD-plot(HS,tree) depth depth depth depth From left to right, top to bottom DD plot using LDA, QDA, knn, GLM, GAM and tree classifiers to the DD plot. The one-dimensional depth in all cases is the Tukey depth. October, 04 7 / 9

18 Classification by depth functions DD classifier for functional data par(mfrow=c(,)) out=classif.dd(ldata$df$fat5,ldata$absorp.fdata,classif="gam") out=classif.dd(ldata$df$fat5,ldata$absorp.d,classif="gam") DD plot(fm,gam) depth 0 depth DD plot(fm,gam) depth 0 depth 0 0 out$misclassification;out$misclassification [] [] 0.07 October, 04 8 / 9

19 Conclusions The functional discrimintation extends the logistic regression (GLM, GAM) using a basis representation of the functional data (B-spline, Fourier, wavelets, PC, PLS,...). The DD G classifier converts the functional data in a multivariate dataset whose columns are constructed using depths and the new classifiers are classical multivariate classifiers based on discrimination procedures (LDA, QDA) or regression ones (knn, NP, GLM, GAM). More classifiers could be considered here (SVM, neural networks,...). Several depth procedures can be taken into account at the same time in order to improve the classification or in order to diagnose whether a depth contains or not useful information for the classification process. The DD G classifier trick" is specially interesting in a functional or high dimensional framework because it changes the dimension of the classification problem from infinite or large dimension to G, where G depends only on the number of groups and the number of depths employed. The functions needed to perform these preseentation are freely available at CRAN in the fda.usc package ([Febrero-Bande and Oviedo de la Fuente, 0]) for versions higher than..0. October, 04 9 / 9

20 Cardot H, Ferraty F and Sarda P (999). Functional Linear Model. Statistics and Probability Letters, 45(), -. Cueata-Albertos, J.A., Febrero-Bande, M. and Oviedo de la Fuente, M. The DDG-classifier in the functional setting. Submitted. Cuevas, A., Febrero, M., and Fraiman, R. (007). Robust estimation and classification for functional data via projection based depth notions. Computational Statistics, (3): Fraiman, R. and Muniz, G. (00). Trimmed means for functional data. Test, 0(): Febrero-Bande, M. and Oviedo de la Fuente, M. (0). Statistical computing in functional data analysis: The R package fda.usc. Journal of Statistical Software, 5(4): 8. Febrero-Bande, M. and González-Manteiga, W. (03). Generalized additive models for functional data. TEST, ():78 9. Ferraty F and Vieu P (006). Nonparametric functional data analysis. Springer Series in Statistics, New York. Keogh, E., Zhu, Q., Hu, B., Hao. Y., Xi, X., Wei, L. and Ratanamahatana, C. A. (0) The UCR Time Series Classification/Clustering data/ Müller, H.G. and Stadtmüller, U. (005). October, 04 9 / 9

21 Generalized functional linear models. Ann Stat, 33, Müller, H.G. and Yao, F. (008), Functional additive models. Journal of the American Statistical Association 03, Preda C, Saporta G, Lévéder CL. (007). PLS classification of functional data. Comput. Stat, (), Ramsay, J. and Silverman, B. (005). Functional Data Analysis. Springer. Ripley, B. (996). Pattern Recognition and Neural Networks. Cambridge Uni. Press, Cambridge. Wood, S. N. (004). Stable and efficient multiple smoothing parameter estimation for generalized additive models. Journal of the American Statistical Association, 99(467): October, 04 9 / 9

k Nearest Neighbors Super simple idea! Instance-based learning as opposed to model-based (no pre-processing)

k Nearest Neighbors Super simple idea! Instance-based learning as opposed to model-based (no pre-processing) k Nearest Neighbors k Nearest Neighbors To classify an observation: Look at the labels of some number, say k, of neighboring observations. The observation is then classified based on its nearest neighbors

More information

Data analysis case study using R for readily available data set using any one machine learning Algorithm

Data analysis case study using R for readily available data set using any one machine learning Algorithm Assignment-4 Data analysis case study using R for readily available data set using any one machine learning Algorithm Broadly, there are 3 types of Machine Learning Algorithms.. 1. Supervised Learning

More information

Linear discriminant analysis and logistic

Linear discriminant analysis and logistic Practical 6: classifiers Linear discriminant analysis and logistic This practical looks at two different methods of fitting linear classifiers. The linear discriminant analysis is implemented in the MASS

More information

Machine Learning / Jan 27, 2010

Machine Learning / Jan 27, 2010 Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,

More information

Introduction to Artificial Intelligence

Introduction to Artificial Intelligence Introduction to Artificial Intelligence COMP307 Machine Learning 2: 3-K Techniques Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline K-Nearest Neighbour method Classification (Supervised learning) Basic NN (1-NN)

More information

Introduction to R and Statistical Data Analysis

Introduction to R and Statistical Data Analysis Microarray Center Introduction to R and Statistical Data Analysis PART II Petr Nazarov petr.nazarov@crp-sante.lu 22-11-2010 OUTLINE PART II Descriptive statistics in R (8) sum, mean, median, sd, var, cor,

More information

Model Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018

Model Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Model Selection Matt Gormley Lecture 4 January 29, 2018 1 Q&A Q: How do we deal

More information

k-nearest Neighbors + Model Selection

k-nearest Neighbors + Model Selection 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University k-nearest Neighbors + Model Selection Matt Gormley Lecture 5 Jan. 30, 2019 1 Reminders

More information

NONPARAMETRIC CLASSIFICATION OF DIRECTIONAL DATA THROUGH DEPTH FUNCTIONS

NONPARAMETRIC CLASSIFICATION OF DIRECTIONAL DATA THROUGH DEPTH FUNCTIONS NONPARAMETRIC CLASSIFICATION OF DIRECTIONAL DATA THROUGH DEPTH FUNCTIONS Houyem Demni 1 Amor Messaoud 1 Giovanni C.Porzio 2 1 Tunis Business School University of Tunis, Tunisia 2 University of Cassino,

More information

Machine Learning: Algorithms and Applications Mockup Examination

Machine Learning: Algorithms and Applications Mockup Examination Machine Learning: Algorithms and Applications Mockup Examination 14 May 2012 FIRST NAME STUDENT NUMBER LAST NAME SIGNATURE Instructions for students Write First Name, Last Name, Student Number and Signature

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

Data Mining - Data. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - Data 1 / 47

Data Mining - Data. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - Data 1 / 47 Data Mining - Data Dr. Jean-Michel RICHER 2018 jean-michel.richer@univ-angers.fr Dr. Jean-Michel RICHER Data Mining - Data 1 / 47 Outline 1. Introduction 2. Data preprocessing 3. CPA with R 4. Exercise

More information

EPL451: Data Mining on the Web Lab 5

EPL451: Data Mining on the Web Lab 5 EPL451: Data Mining on the Web Lab 5 Παύλος Αντωνίου Γραφείο: B109, ΘΕΕ01 University of Cyprus Department of Computer Science Predictive modeling techniques IBM reported in June 2012 that 90% of data available

More information

The Curse of Dimensionality

The Curse of Dimensionality The Curse of Dimensionality ACAS 2002 p1/66 Curse of Dimensionality The basic idea of the curse of dimensionality is that high dimensional data is difficult to work with for several reasons: Adding more

More information

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1 Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches

More information

Performance Analysis of Data Mining Classification Techniques

Performance Analysis of Data Mining Classification Techniques Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal

More information

The alpha-procedure - a nonparametric invariant method for automatic classification of multi-dimensional objects

The alpha-procedure - a nonparametric invariant method for automatic classification of multi-dimensional objects The alpha-procedure - a nonparametric invariant method for automatic classification of multi-dimensional objects Tatjana Lange Pavlo Mozharovskyi Hochschule Merseburg, 06217 Merseburg, Germany Universität

More information

SPATIAL DEPTH-BASED CLASSIFICATION FOR FUNCTIONAL DATA. Carlo Sguera, Pedro Galeano and Rosa Lillo

SPATIAL DEPTH-BASED CLASSIFICATION FOR FUNCTIONAL DATA. Carlo Sguera, Pedro Galeano and Rosa Lillo Working Paper 12-09 Statistics and Econometrics Series 06 May 2012 Departamento de Estadística Universidad Carlos III de Madrid Calle Madrid, 126 28903 Getafe (Spain) Fax (34) 91 624-98-49 SPATIAL DEPTH-BASED

More information

Interpretable Dimension Reduction for Classifying Functional Data

Interpretable Dimension Reduction for Classifying Functional Data Interpretable Dimension Reduction for Classifying Functional Data TIAN SIVA TIAN GARETH M JAMES Abstract Classification problems involving a categorical class label Y and a functional predictor X(t) are

More information

A Self Organizing Map for dissimilarity data 0

A Self Organizing Map for dissimilarity data 0 A Self Organizing Map for dissimilarity data Aïcha El Golli,2, Brieuc Conan-Guez,2, and Fabrice Rossi,2,3 Projet AXIS, INRIA-Rocquencourt Domaine De Voluceau, BP 5 Bâtiment 8 7853 Le Chesnay Cedex, France

More information

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions

Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Computational Statistics The basics of maximum likelihood estimation, Bayesian estimation, object recognitions Thomas Giraud Simon Chabot October 12, 2013 Contents 1 Discriminant analysis 3 1.1 Main idea................................

More information

KTH ROYAL INSTITUTE OF TECHNOLOGY. Lecture 14 Machine Learning. K-means, knn

KTH ROYAL INSTITUTE OF TECHNOLOGY. Lecture 14 Machine Learning. K-means, knn KTH ROYAL INSTITUTE OF TECHNOLOGY Lecture 14 Machine Learning. K-means, knn Contents K-means clustering K-Nearest Neighbour Power Systems Analysis An automated learning approach Understanding states in

More information

Intro to R for Epidemiologists

Intro to R for Epidemiologists Lab 9 (3/19/15) Intro to R for Epidemiologists Part 1. MPG vs. Weight in mtcars dataset The mtcars dataset in the datasets package contains fuel consumption and 10 aspects of automobile design and performance

More information

Generative and discriminative classification techniques

Generative and discriminative classification techniques Generative and discriminative classification techniques Machine Learning and Category Representation 013-014 Jakob Verbeek, December 13+0, 013 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.13.14

More information

Clustering Functional Data with the SOM algorithm

Clustering Functional Data with the SOM algorithm Clustering Functional Data with the SOM algorithm Fabrice Rossi, Brieuc Conan-Guez and Aïcha El Golli Projet AxIS, INRIA, Domaine de Voluceau, Rocquencourt, B.P. 105 78153 Le Chesnay Cedex, France CEREMADE,

More information

Generalized Additive Models

Generalized Additive Models Generalized Additive Models Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Additive Models GAMs are one approach to non-parametric regression in the multiple predictor setting.

More information

INF 4300 Classification III Anne Solberg The agenda today:

INF 4300 Classification III Anne Solberg The agenda today: INF 4300 Classification III Anne Solberg 28.10.15 The agenda today: More on estimating classifier accuracy Curse of dimensionality and simple feature selection knn-classification K-means clustering 28.10.15

More information

Nina Zumel and John Mount Win-Vector LLC

Nina Zumel and John Mount Win-Vector LLC SUPERVISED LEARNING IN R: REGRESSION Logistic regression to predict probabilities Nina Zumel and John Mount Win-Vector LLC Predicting Probabilities Predicting whether an event occurs (yes/no): classification

More information

SUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018

SUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018 SUPERVISED LEARNING METHODS Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018 2 CHOICE OF ML You cannot know which algorithm will work

More information

Machine Learning with MATLAB --classification

Machine Learning with MATLAB --classification Machine Learning with MATLAB --classification Stanley Liang, PhD York University Classification the definition In machine learning and statistics, classification is the problem of identifying to which

More information

Stat 8053, Fall 2013: Additive Models

Stat 8053, Fall 2013: Additive Models Stat 853, Fall 213: Additive Models We will only use the package mgcv for fitting additive and later generalized additive models. The best reference is S. N. Wood (26), Generalized Additive Models, An

More information

Smoothing Dissimilarities for Cluster Analysis: Binary Data and Functional Data

Smoothing Dissimilarities for Cluster Analysis: Binary Data and Functional Data Smoothing Dissimilarities for Cluster Analysis: Binary Data and unctional Data David B. University of South Carolina Department of Statistics Joint work with Zhimin Chen University of South Carolina Current

More information

Stat 4510/7510 Homework 4

Stat 4510/7510 Homework 4 Stat 45/75 1/7. Stat 45/75 Homework 4 Instructions: Please list your name and student number clearly. In order to receive credit for a problem, your solution must show sufficient details so that the grader

More information

Machine Learning. Chao Lan

Machine Learning. Chao Lan Machine Learning Chao Lan Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian

More information

Random Forest A. Fornaser

Random Forest A. Fornaser Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University

More information

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York A Systematic Overview of Data Mining Algorithms Sargur Srihari University at Buffalo The State University of New York 1 Topics Data Mining Algorithm Definition Example of CART Classification Iris, Wine

More information

Lecture 25: Review I

Lecture 25: Review I Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,

More information

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017 Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

Moving Beyond Linearity

Moving Beyond Linearity Moving Beyond Linearity Basic non-linear models one input feature: polynomial regression step functions splines smoothing splines local regression. more features: generalized additive models. Polynomial

More information

Experimental Design + k- Nearest Neighbors

Experimental Design + k- Nearest Neighbors 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Experimental Design + k- Nearest Neighbors KNN Readings: Mitchell 8.2 HTF 13.3

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2015 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows K-Nearest

More information

Classification Algorithms in Data Mining

Classification Algorithms in Data Mining August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,

More information

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2015 11. Non-Parameteric Techniques

More information

Generalized Additive Models

Generalized Additive Models :p Texts in Statistical Science Generalized Additive Models An Introduction with R Simon N. Wood Contents Preface XV 1 Linear Models 1 1.1 A simple linear model 2 Simple least squares estimation 3 1.1.1

More information

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description

More information

Exploring high-dimensional classification boundaries

Exploring high-dimensional classification boundaries Exploring high-dimensional classification boundaries Hadley Wickham, Doina Caragea, Di Cook January 20, 2006 1 Introduction Given p-dimensional training data containing d groups (the design space), a classification

More information

Generalized Additive Model

Generalized Additive Model Generalized Additive Model by Huimin Liu Department of Mathematics and Statistics University of Minnesota Duluth, Duluth, MN 55812 December 2008 Table of Contents Abstract... 2 Chapter 1 Introduction 1.1

More information

DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS

DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS 1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes and a class attribute

More information

Functional Data Analysis

Functional Data Analysis Functional Data Analysis Venue: Tuesday/Thursday 1:25-2:40 WN 145 Lecturer: Giles Hooker Office Hours: Wednesday 2-4 Comstock 1186 Ph: 5-1638 e-mail: gjh27 Texts and Resources Ramsay and Silverman, 2007,

More information

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points]

CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, :59pm, PDF to Canvas [100 points] CIS 520, Machine Learning, Fall 2015: Assignment 7 Due: Mon, Nov 16, 2015. 11:59pm, PDF to Canvas [100 points] Instructions. Please write up your responses to the following problems clearly and concisely.

More information

Edinburgh Research Explorer

Edinburgh Research Explorer Edinburgh Research Explorer Interpretable support vector machines for functional data Citation for published version: Martin-Barragan, B, Lillo, R & Romo, J 2012, 'Interpretable support vector machines

More information

Topics in Machine Learning-EE 5359 Model Assessment and Selection

Topics in Machine Learning-EE 5359 Model Assessment and Selection Topics in Machine Learning-EE 5359 Model Assessment and Selection Ioannis D. Schizas Electrical Engineering Department University of Texas at Arlington 1 Training and Generalization Training stage: Utilizing

More information

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques.

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques. . Non-Parameteric Techniques University of Cambridge Engineering Part IIB Paper 4F: Statistical Pattern Processing Handout : Non-Parametric Techniques Mark Gales mjfg@eng.cam.ac.uk Michaelmas 23 Introduction

More information

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Statistical Analysis of Metabolomics Data Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte Outline Introduction Data pre-treatment 1. Normalization 2. Centering,

More information

Identification Of Iris Flower Species Using Machine Learning

Identification Of Iris Flower Species Using Machine Learning Identification Of Iris Flower Species Using Machine Learning Shashidhar T Halakatti 1, Shambulinga T Halakatti 2 1 Department. of Computer Science Engineering, Rural Engineering College,Hulkoti 582205

More information

Discriminant analysis in R QMMA

Discriminant analysis in R QMMA Discriminant analysis in R QMMA Emanuele Taufer file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20labs/l4-lda-eng.html#(1) 1/26 Default data Get the data set Default library(islr)

More information

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2011 11. Non-Parameteric Techniques

More information

Lecture 7: Linear Regression (continued)

Lecture 7: Linear Regression (continued) Lecture 7: Linear Regression (continued) Reading: Chapter 3 STATS 2: Data mining and analysis Jonathan Taylor, 10/8 Slide credits: Sergio Bacallado 1 / 14 Potential issues in linear regression 1. Interactions

More information

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures: Homework Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression 3.0-3.2 Pod-cast lecture on-line Next lectures: I posted a rough plan. It is flexible though so please come with suggestions Bayes

More information

Introduction to machine learning, pattern recognition and statistical data modelling Coryn Bailer-Jones

Introduction to machine learning, pattern recognition and statistical data modelling Coryn Bailer-Jones Introduction to machine learning, pattern recognition and statistical data modelling Coryn Bailer-Jones What is machine learning? Data interpretation describing relationship between predictors and responses

More information

Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010

Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010 Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010 Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 704: Data Analysis I, Fall 2010 1 / 26 Additive predictors

More information

On Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions

On Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions On Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions CAMCOS Report Day December 9th, 2015 San Jose State University Project Theme: Classification The Kaggle Competition

More information

Applying Supervised Learning

Applying Supervised Learning Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains

More information

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany Syllabus Fri. 27.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 3.11. (2) A.1 Linear Regression Fri. 10.11. (3) A.2 Linear Classification Fri. 17.11. (4) A.3 Regularization

More information

More on Classification: Support Vector Machine

More on Classification: Support Vector Machine More on Classification: Support Vector Machine The Support Vector Machine (SVM) is a classification method approach developed in the computer science field in the 1990s. It has shown good performance in

More information

Generalized additive models I

Generalized additive models I I Patrick Breheny October 6 Patrick Breheny BST 764: Applied Statistical Modeling 1/18 Introduction Thus far, we have discussed nonparametric regression involving a single covariate In practice, we often

More information

STAT 705 Introduction to generalized additive models

STAT 705 Introduction to generalized additive models STAT 705 Introduction to generalized additive models Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 22 Generalized additive models Consider a linear

More information

Image analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis

Image analysis. Computer Vision and Classification Image Segmentation. 7 Image analysis 7 Computer Vision and Classification 413 / 458 Computer Vision and Classification The k-nearest-neighbor method The k-nearest-neighbor (knn) procedure has been used in data analysis and machine learning

More information

What is machine learning?

What is machine learning? Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship

More information

Cluster Analysis and Visualization. Workshop on Statistics and Machine Learning 2004/2/6

Cluster Analysis and Visualization. Workshop on Statistics and Machine Learning 2004/2/6 Cluster Analysis and Visualization Workshop on Statistics and Machine Learning 2004/2/6 Outlines Introduction Stages in Clustering Clustering Analysis and Visualization One/two-dimensional Data Histogram,

More information

CAMCOS Report Day. December 9 th, 2015 San Jose State University Project Theme: Classification

CAMCOS Report Day. December 9 th, 2015 San Jose State University Project Theme: Classification CAMCOS Report Day December 9 th, 2015 San Jose State University Project Theme: Classification On Classification: An Empirical Study of Existing Algorithms based on two Kaggle Competitions Team 1 Team 2

More information

Generative and discriminative classification

Generative and discriminative classification Generative and discriminative classification Machine Learning and Object Recognition 2017-2018 Jakob Verbeek Classification in its simplest form Given training data labeled for two or more classes Classification

More information

Applied Statistics : Practical 9

Applied Statistics : Practical 9 Applied Statistics : Practical 9 This practical explores nonparametric regression and shows how to fit a simple additive model. The first item introduces the necessary R commands for nonparametric regression

More information

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)

More information

This is called a linear basis expansion, and h m is the mth basis function For example if X is one-dimensional: f (X) = β 0 + β 1 X + β 2 X 2, or

This is called a linear basis expansion, and h m is the mth basis function For example if X is one-dimensional: f (X) = β 0 + β 1 X + β 2 X 2, or STA 450/4000 S: February 2 2005 Flexible modelling using basis expansions (Chapter 5) Linear regression: y = Xβ + ɛ, ɛ (0, σ 2 ) Smooth regression: y = f (X) + ɛ: f (X) = E(Y X) to be specified Flexible

More information

Package DTRlearn. April 6, 2018

Package DTRlearn. April 6, 2018 Type Package Package DTRlearn April 6, 2018 Title Learning Algorithms for Dynamic Treatment Regimes Version 1.3 Date 2018-4-05 Author Ying Liu, Yuanjia Wang, Donglin Zeng Maintainer Ying Liu

More information

The mrmr variable selection method: a comparative study for functional data

The mrmr variable selection method: a comparative study for functional data To appear in the Journal of Statistical Computation and Simulation Vol. 00, No. 00, Month 20XX, 1 17 The mrmr variable selection method: a comparative study for functional data J.R. Berrendero, A. Cuevas

More information

Support Vector Machines + Classification for IR

Support Vector Machines + Classification for IR Support Vector Machines + Classification for IR Pierre Lison University of Oslo, Dep. of Informatics INF3800: Søketeknologi April 30, 2014 Outline of the lecture Recap of last week Support Vector Machines

More information

Machine Learning for. Artem Lind & Aleskandr Tkachenko

Machine Learning for. Artem Lind & Aleskandr Tkachenko Machine Learning for Object Recognition Artem Lind & Aleskandr Tkachenko Outline Problem overview Classification demo Examples of learning algorithms Probabilistic modeling Bayes classifier Maximum margin

More information

STA 414/2104 S: February Administration

STA 414/2104 S: February Administration 1 / 16 Administration HW 2 posted on web page, due March 4 by 1 pm Midterm on March 16; practice questions coming Lecture/questions on Thursday this week Regression: variable selection, regression splines,

More information

Data mining with Support Vector Machine

Data mining with Support Vector Machine Data mining with Support Vector Machine Ms. Arti Patle IES, IPS Academy Indore (M.P.) artipatle@gmail.com Mr. Deepak Singh Chouhan IES, IPS Academy Indore (M.P.) deepak.schouhan@yahoo.com Abstract: Machine

More information

Non-Parametric Modeling

Non-Parametric Modeling Non-Parametric Modeling CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Non-Parametric Density Estimation Parzen Windows Kn-Nearest Neighbor

More information

Instance-Based Representations. k-nearest Neighbor. k-nearest Neighbor. k-nearest Neighbor. exemplars + distance measure. Challenges.

Instance-Based Representations. k-nearest Neighbor. k-nearest Neighbor. k-nearest Neighbor. exemplars + distance measure. Challenges. Instance-Based Representations exemplars + distance measure Challenges. algorithm: IB1 classify based on majority class of k nearest neighbors learned structure is not explicitly represented choosing k

More information

Work 2. Case-based reasoning exercise

Work 2. Case-based reasoning exercise Work 2. Case-based reasoning exercise Marc Albert Garcia Gonzalo, Miquel Perelló Nieto November 19, 2012 1 Introduction In this exercise we have implemented a case-based reasoning system, specifically

More information

Package FWDselect. December 19, 2015

Package FWDselect. December 19, 2015 Title Selecting Variables in Regression Models Version 2.1.0 Date 2015-12-18 Author Marta Sestelo [aut, cre], Nora M. Villanueva [aut], Javier Roca-Pardinas [aut] Maintainer Marta Sestelo

More information

On Bias, Variance, 0/1 - Loss, and the Curse of Dimensionality

On Bias, Variance, 0/1 - Loss, and the Curse of Dimensionality RK April 13, 2014 Abstract The purpose of this document is to summarize the main points from the paper, On Bias, Variance, 0/1 - Loss, and the Curse of Dimensionality, written by Jerome H.Friedman1997).

More information

Support Vector Machines - Supplement

Support Vector Machines - Supplement Support Vector Machines - Supplement Prof. Dan A. Simovici UMB 1 / 1 Outline 2 / 1 Building an SVM Classifier for the Iris data set Data Set Description Attribute Information: sepal length in cm sepal

More information

Generative and discriminative classification techniques

Generative and discriminative classification techniques Generative and discriminative classification techniques Machine Learning and Object Recognition 2015-2016 Jakob Verbeek, December 11, 2015 Course website: http://lear.inrialpes.fr/~verbeek/mlor.15.16 Classification

More information

Nearest Neighbor Classification

Nearest Neighbor Classification Nearest Neighbor Classification Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms January 11, 2017 1 / 48 Outline 1 Administration 2 First learning algorithm: Nearest

More information

Machine Learning. A. Supervised Learning A.7. Decision Trees. Lars Schmidt-Thieme

Machine Learning. A. Supervised Learning A.7. Decision Trees. Lars Schmidt-Thieme Machine Learning A. Supervised Learning A.7. Decision Trees Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany 1 /

More information

Machine Learning: k-nearest Neighbors. Lecture 08. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Machine Learning: k-nearest Neighbors. Lecture 08. Razvan C. Bunescu School of Electrical Engineering and Computer Science Machine Learning: k-nearest Neighbors Lecture 08 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Nonparametric Methods: k-nearest Neighbors Input: A training dataset

More information

Lecture 17: Smoothing splines, Local Regression, and GAMs

Lecture 17: Smoothing splines, Local Regression, and GAMs Lecture 17: Smoothing splines, Local Regression, and GAMs Reading: Sections 7.5-7 STATS 202: Data mining and analysis November 6, 2017 1 / 24 Cubic splines Define a set of knots ξ 1 < ξ 2 < < ξ K. We want

More information

Nonparametric Methods Recap

Nonparametric Methods Recap Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority

More information

Orange3 Educational Add-on Documentation

Orange3 Educational Add-on Documentation Orange3 Educational Add-on Documentation Release 0.1 Biolab Jun 01, 2018 Contents 1 Widgets 3 2 Indices and tables 27 i ii Widgets in Educational Add-on demonstrate several key data mining and machine

More information

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation.

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation. Equation to LaTeX Abhinav Rastogi, Sevy Harris {arastogi,sharris5}@stanford.edu I. Introduction Copying equations from a pdf file to a LaTeX document can be time consuming because there is no easy way

More information

Supervised Learning for Image Segmentation

Supervised Learning for Image Segmentation Supervised Learning for Image Segmentation Raphael Meier 06.10.2016 Raphael Meier MIA 2016 06.10.2016 1 / 52 References A. Ng, Machine Learning lecture, Stanford University. A. Criminisi, J. Shotton, E.

More information

A Method for Comparing Multiple Regression Models

A Method for Comparing Multiple Regression Models CSIS Discussion Paper No. 141 A Method for Comparing Multiple Regression Models Yuki Hiruta Yasushi Asami Department of Urban Engineering, the University of Tokyo e-mail: hiruta@ua.t.u-tokyo.ac.jp asami@csis.u-tokyo.ac.jp

More information