Applied Statistics and Machine Learning
|
|
- Baldric Warner
- 6 years ago
- Views:
Transcription
1 Applied Statistics and Machine Learning Logistic Regression, and Generalized Linear models Model Selection, Lasso, and Structured Sparsity Bin Yu, IMA, June 19, 2013
2 Classification is supervised learning Y s are 0 s and 1 s: challenger data, MISR data Task: relate predictors x s with Y (1) prediction (IT sector, banking, etc) (2) interpretation: what are the important predictors and they are suggestive for interventions for causal inference later. 2
3 Challenger data These data are from Table 1 of the article "Risk Analysis of the Space Shuttle: Pre-Challenger Predication of Failure" by Dalal, Fowlkes and Hoadley, Journal of the American Statistical Association, Vol. 84, No. 408 (Dec. 1989), pp I got them from Professor Stacey Shancock at Clark Univ. s website She has this Tukey quote at her site: "Far better an approximate answer to the right question, than the exact answer to the wrong question, which can always be made precise." - John Tukey 3
4 Challenger data Temp temperature at launch Failure number of O-rings that failed Failure1 indicator of O-ring failure or not 4
5 Challenger data (jittered) 5
6 Suppose you are the first person who ever thought about classification problem What would you do? A method to fit the data to come up with a prediction rule for the next launch at temp* Postulate a statistical model (e.g. normal reg model) for uncertainty statement 6
7 Suppose you are the first person who ever thought about classification problem Fitting methods: nearest neighbor (NN), LS, logistic regression How to fit? what is the criterion to fit? MAXIMUM LIKELILHOOD In the derivations to come, we use notations and some materials from Dobson (2001). 7
8 Logistic regression model What does this model mean for the Challenger data? What assumptions might be violated? Which are reasonable? 8
9 Logistic regression model 9
10 Logistic regression model 10
11 Logistic regression model Note that a linear approximation to U is equivalent to a quadratic approximation to (β) 11
12 Logistic regression model 12
13 Logistic regression model 13
14 Logistic regression model vs LS for Challenger data 14
15 Logistic regression vs LS for Challenger data 15
16 Generalized Linear Models (GLMs) GLMs is a statistical framework that unifies normal regression models (for continuous data), logistic (profit) regression models (for binary data), and log linear models (for count data). Logistic (probit) regression models can be also used for multi-class classification. Original paper on GLMs: Nelder & Wedderburn (1972) GLMs, Journal of Royal Statistical Society- Series A (JRSS-A). Books: A. J. Dobson (2001): An introduction to GLMs (2nd ed) P. McCullagh & J. A. Nelder (1999) Generalized linear models (2 nd ed) 1 16
17 Exponential families: 17
18 A sketch of proof: HW: derive the formula for V(a(Y)). 18
19 GLMs: link function is the key to relate response variable to beta in a linear way 19
20 Likelihood function of a GML 20
21 Maximum Likelihood Estimation (MLE) 21
22 Parametric bootstrap for GLMs Fit a GLM with MLE and then take n samples from the fitted distribution. Note that this is not the same as the parametric bootstrap described for regression model where we sample from the residuals, not from the fitted normal distribution to the residuals. Parametric bootstrap works for nice parametric families, typically when asymptotic normality holds. For example, bootstrap does not work for Unif(0, a) with a as the parameter. Non-parametric bootstrap sample directly from observed data. 22
23 How to compute MLE in GLMs: IRWLS 23
24 IRWLS algorithm for MLE in GLMs 24
25 Statistical interpretation of the IRWLS algorithm Is the solution to a weighted LS problem with weight vector 25
26 Statistical interpretation of IRWLS 26
27 IRWLS in words IRWLS is an iterative algorithm. At each iteration, IRWLS solves a WLS problem that is the loglikelihood function of a heteroscedastic linear regression model in the g-domain (where g(mu_i) is approximately linear in \beta), for which the variances of g(y_i) are known from the previous iteration. 27
28 Back to Movie-fMRI Data (Nishimoto et al, 11) 7200s training (1 replicate) and 5400s test (10 replicates). 28
29 How Stimuli Evoke Brain Signals? } Quantitative models - both stimulus and response high-dimensional Encoding model Decoding model Natural Input (Image or video/movie) fmri of Brain Dayan and Abbott (2005) 29
30 Movie reconstruction results for 3 subjects 20 June 2013 June 20, 2013 page 30 page 30
31 Mind-Reading Computers in Media one of the 50 best inventions of 2011 by Time Magazine Others: Economist, NPR, 31
32 What model is behind the movie reconstruction algorithm? Is the model interpretable and reliable? 32
33 Domain knowledge: key for big data discovery Hubel and Wiesel (1959) discovered, in neuron cells of the primary cortex area V1, orientation and location selectivity, and excitatory and inhibitory regions. 33
34 Modern Description of Hubel-Wiesel work: Early Visual Area V1 } Preprocessing an image: } Gabor filters corresponding to particular spatial frequencies, locations, orientations (Hubel and Wiesel, 1959, ) Sparse representation after Gabor Filters, static or dynamic 34
35 2D Gabor Features June 20, 2013 page 35
36 3D Gabor Features } Data split to 1 second movie clips } 3D Gabor Filters applied to get features of a movie clip in 26K dim 36
37 Regularization: key for big data discovery Two regularization methods are behind the movie reconstruction algorithm, after tons of work for feature construction based on domain knowledge by humans: } Encoding through L1-penalized Least Squares (LS) or Lasso: Separate sparse linear model fitted to features for each voxel via Lasso (Chen and Dohono, 94; Tibshirani, 96) (cf. e-l2boosting) } Decoding through L2- or Tikhonov regularization of sample cov. matrix of residuals across voxels. (cf. Kernel Machines) June 20, 2013 page 37
38 Given a voxel, n=7k, p = 26K image 3D wavelet features (each of which has a location) For each frame, response is the fmri signal at the voxel. An underdetermined problem since p is much larger than n. June 20, 2013 page 38
39 Movie-fMRI: linear encoding model for a voxel For each voxel and the ith movie clip, we postulate a linear encoding (regression) model: Y i = β 1 x i1 + β 2 x i β p x ip + i = X T i β + i where X i =(x i1,x i2,...,x ip ) T is the feature vector of movie clip β =(β 1, β 2,...,β p ) T weight vector that combines feature strengths into mean fmri response is the disturbance or noise term i Y i is the fmri response June 20, 2013 page 39
40 Movie-fMRI: finding the weight vector Least Squares: find β 1, β 2,...,β p to minimize n (Y i β 1 x i1 β 2 x i2... β p x ip ) 2 := Y Xβ 2 i=1 Since p = 26,000 >> n=7,200, this LS problem has many solutions. June 20, 2013 page 40
41 Why doesn t LS work when p>>n? Reason: colinearity of the columns of X -- it also happens in low-dim Least Squares Function Surfaces as a function of (β 1, β 2 ) 41
42 How to fix this problem? In general, impossible. However, in our case, for each voxel, Hubel and Wiesel s work suggests that only a small number of the predictors among the 26,000 of them be active sparsity! This prior information motivates a sparsity-enforced revision to LS: Lasso = Least Absolute Selection and Shrinkage Operator 42
43 Modeling history at Gallant Lab } Prediction on validation set is the benchmark } Methods tried: neural nets, SVMs, Lasso, } Among models with similar predictions, simpler (sparser) models by Lasso are preferred for interpretation This practice reflects a general trend in statistical machine learning -- moving from prediction to simpler/sparser models for interpretation, faster computation or data transmission. June 20, 2013 page 43
44 Occam s Razor 14th-century English logician and Franciscan friar, William of Ockham" Principle of Parsimony: Entities must not be multiplied beyond necessity. Wikipedia June 20, 2013 page 44
45 Occam s Razor via Model Selection in Linear Regression Maximum likelihood (ML) is LS with Gaussian assumption There are submodels ML goes for the largest submodel with all predictors Largest model often gives bad prediction for p large June 20, 2013 page 45
46 Model Selection Criteria Akaike (73,74) and Mallows (1973) Cp used estimated prediction errors to choose a model (assuming is known): σ 2 Schwartz (1980) used asymptotic approximations to negative log posterior probabilities to choose a model (assuming is known) σ 2 Both are penalized LS by. Rissanen s Minimum Description Length (MDL) principle gives rise to many different different criteria. The two-part code leads to BIC. (see e.g. Rissanen (1978) and review article by Hansen and Yu (2000)) June 20, 2013 page 46
47 More details on AIC June 20, 2013 page 47
48 More details on AIC June 20, 2013 page 48
49 More details on AIC PE = expected Prediction Error June 20, 2013 page 49
50 More details on AIC Assume Then PE Hence when p increases, the prediction error increases because a more complex model is being estimated with an associated larger variance. How to use RSS to estimate PE? June 20, 2013 page 50
51 More details on AIC June 20, 2013 page 51
52 More details on BIC June 20, 2013 page 52
53 Model Selection for movie-fmri problem For the linear encoding model, the number of submodels Combinatorial search: too expensive and often not necessary A recent alternative: continuous embedding into a convex optimization problem through L1 penalized LS (Lasso). June 20, 2013 page 53
54 Lasso: L 1 -norm as a penalty to L2 loss } The L 1 penalty is defined for coefficients } Used initially with L 2 loss: Signal processing: Basis Pursuit (Chen & Donoho,1994) Statistics: Non-Negative Garrote (Breiman, 1995) Statistics: LASSO (Tibshirani, 1996) ˆβ(λ) =argmin β { Y Xβ λ β 1 } Smoothing parameter often selected by Cross-Validation (CV) June 20, 2013 page 54
55 Lasso eases the instability problem of LS Lasso Function Surfaces as a function of (β 1, β 2 ) 55
56 Recall: why doesn t LS work when p>>n? Reason: colinearity of the columns of X -- it also happens in low-dim Least Squares Function Surfaces as a function of (β 1, β 2 ) 56
57 Lasso: computation Initially: quadratic program (QP) for each a grid on λ. QP is called for each λ. Later: path following algorithms such as homotopy by Osborne et al (2000) LARS by Efron et al (2004) Current: first-order or gradient-based algorithms for large p (see Mairal s lecture) June 20, 2013 page 57
58 Recent theory on Lasso (more in my last lecture) Under sparse high-dim linear regression model and appropriate conditions: Lasso is model selection consistent (Irrepresentable condition) Lasso has optimal L2 estimation rate (restricted eigen value condition) Selective references: Freund and Schapire (1996), Chen and Donoho (1994), Tibshirani (1996), Friedman (2001), Efron et al (2004), Zhao and Yu (2006), Meinshausen and Buhlmann (2006), Wainwright (2009), Candes and Tao (2005), Meinshausen and Yu (2009), Huang and Zhang (2009), Bickel, Ritove and Tsybakov (2011), Raskutii, Wainwright, and Yu (2010), Neghaban et al (2012) June 20, 2013 page 58
59 Encoding: energy-motion model necessary 59
60 Encoding: sparsity necessary Sparse regression improves prediction over OLS Linear Regression Sparse Regression 60 Prediction accuracy (Full-brain data)
61 Knowledge discovery: interpreting encoding models Spatial locations of selected features are suggestive of driving factors for brain activities at a voxel. Lasso+CV CV=Cross- Validation Voxel A Voxel B Voxel C Prediction scores on Voxels A-C are 0.72 (CV) June 20, 2013 page 61
62 ES-CV: Statistical Stability (ES) (Lim & Yu, 2013) Given a smoothing parameter λ, divide the data units into M blocks. Get Lasso estimate form an estimate ˆβ m (λ) for data with m-th block deleted, and X ˆβ m (λ) for the mean regression function. ˆβ(λ) = 1 ˆβ m (λ) M Define the estimation stability (ES) measure as m ES(λ) = 1 M m X ˆβ m (λ) X ˆβ(λ) 2 X ˆβ(λ) 2 which is the reciprocal of a test statistic for testing H 0 : Xβ =0 62
63 ES-CV (or SSCV): Estimation Stability (ES)+CV (continued) ES-CV selection criterion for smoothing parameter λ : Choose the λ that minimizes ES(λ) and is not smaller that the CV selection. Related works: Shao (95), Breiman (96), Bach (08), Meinshausen and Buhlmann (2008), 63
64 Back to fmri prblem: Spatial Locations of Selected Features Voxel A Voxel B Voxel C CV ES-CV Prediction on Voxels A-C: CV 0.72, ES-CV 0.7 June 20, 2013 page 64
65 ESCV: Sparsity Gain (60%) with No Prediction Loss (-1.3%) Prediction (correlation) SSCV CV % Change Model size SSCV June 20, 2013 CV % Change Based on validation data for 2088 voxels page 65
66 ES-CV: Desirable Properties CV (cross-validation) is widely used in practice. ES-CV is an effective improvement over CV on stability and hence interpretability and reliability of results. Computational cost similar to CV Easily parallelizable as CV Empirically sensible or nonparametric Other forms of perturbation include: sub-sampling, bootstrap, variable permutation,... 66
67 Structured sparsity: Composite Absolute Penalties (CAP) (Zhao, Rocha and Yu, 09) Motivations: } } } side information available on predictors and/or sparsity at group level extra regularization need (p>>n) than what Lasso provides } } Examples of groups: } } } Genes belonging to the same pathway; Categorical variables represented by dummies ; Noisy measurements of the same variable. Examples of hierarchy: } } } Multi-resolution/wavelet models; Interactions terms in factorial analysis (ANOVA); Order selection in Markov Chain models. Existing works can be seen as special cases of CAP: Elastic Net (Zou & Hastie, 05), GLASSO (Yuan & Lin, 06), Blockwise Sparse Regression (Kim, Kim & Kim, 2006) June 20, 2013 page 67
68 Norm L γ Bridge Parameter γ 1 June 20, 2013 page 68
69 Composite Absolute Penalties (CAP) } The CAP parameter estimate is given by: } } G k 's, k=1,,k - indices of k-th pre-defined group β Gk corresponding vector of coefficients. }. γ k group Lγ k norm: N k = βγ k γ k ; }. γ 0 overall norm: T(β) = N γ 0 } groups may overlap (hierarchical selection) June 20, 2013 page 69
70 CAP Group selection } Tailoring T(β) for group selection: } Define non-overlapping groups } Set γ k >1 } Group norm γ k tunes similarity within its group } γ k >1 encourages all variables in group i to be included/ excluded together } Set γ 0 =1: } This yields grouped sparsity } γ i =2 has been studied by Yuan and Lin (Grouped Lasso, 2005). June 20, 2013 page 70
71 CAP Hierarchical Structures } Tailoring T(β) for Hierarchical Structure: } Set γ 0 =1 } } Set γ i >1, i Groups overlap: } If β 2 appears in all groups where β 1 is included Then X 2 is encouraged to enter the model after X 1 } As an example: June 20, 2013 page 71
72 CAP: a Bayesian interpretation } For non-overlapping groups: } Prior on group norms: } Prior on individual coefficients: June 20, 2013 page 72
Seeking Interpretable Models for High Dimensional Data
Seeking Interpretable Models for High Dimensional Data Bin Yu Statistics Department, EECS Department University of California, Berkeley http://www.stat.berkeley.edu/~binyu Characteristics of Modern Data
More informationStability. Bin Yu Statistics and EECS, University of California-Berkeley. SAMSI Opening Workshop on Massive Data, Sept, 2012
Stability Bin Yu Statistics and EECS, University of California-Berkeley SAMSI Opening Workshop on Massive Data, Sept, 2012 1 In honor of John W. Tukey June 16, 1915 July 26, 2000 2 1962: Future of Data
More informationGradient LASSO algoithm
Gradient LASSO algoithm Yongdai Kim Seoul National University, Korea jointly with Yuwon Kim University of Minnesota, USA and Jinseog Kim Statistical Research Center for Complex Systems, Korea Contents
More informationComparison of Optimization Methods for L1-regularized Logistic Regression
Comparison of Optimization Methods for L1-regularized Logistic Regression Aleksandar Jovanovich Department of Computer Science and Information Systems Youngstown State University Youngstown, OH 44555 aleksjovanovich@gmail.com
More informationNonparametric Methods Recap
Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority
More informationVariable Selection 6.783, Biomedical Decision Support
6.783, Biomedical Decision Support (lrosasco@mit.edu) Department of Brain and Cognitive Science- MIT November 2, 2009 About this class Why selecting variables Approaches to variable selection Sparsity-based
More informationTopics in Machine Learning-EE 5359 Model Assessment and Selection
Topics in Machine Learning-EE 5359 Model Assessment and Selection Ioannis D. Schizas Electrical Engineering Department University of Texas at Arlington 1 Training and Generalization Training stage: Utilizing
More information10601 Machine Learning. Model and feature selection
10601 Machine Learning Model and feature selection Model selection issues We have seen some of this before Selecting features (or basis functions) Logistic regression SVMs Selecting parameter value Prior
More informationSparsity and image processing
Sparsity and image processing Aurélie Boisbunon INRIA-SAM, AYIN March 6, Why sparsity? Main advantages Dimensionality reduction Fast computation Better interpretability Image processing pattern recognition
More informationMachine Learning / Jan 27, 2010
Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,
More informationWhat is machine learning?
Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship
More informationFMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu
FMA901F: Machine Learning Lecture 3: Linear Models for Regression Cristian Sminchisescu Machine Learning: Frequentist vs. Bayesian In the frequentist setting, we seek a fixed parameter (vector), with value(s)
More informationPackage EBglmnet. January 30, 2016
Type Package Package EBglmnet January 30, 2016 Title Empirical Bayesian Lasso and Elastic Net Methods for Generalized Linear Models Version 4.1 Date 2016-01-15 Author Anhui Huang, Dianting Liu Maintainer
More informationLecture 13: Model selection and regularization
Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always
More informationModel selection and validation 1: Cross-validation
Model selection and validation 1: Cross-validation Ryan Tibshirani Data Mining: 36-462/36-662 March 26 2013 Optional reading: ISL 2.2, 5.1, ESL 7.4, 7.10 1 Reminder: modern regression techniques Over the
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationNonparametric sparse hierarchical models describe V1 fmri responses to natural images
Nonparametric sparse hierarchical models describe V1 fmri responses to natural images Pradeep Ravikumar, Vincent Q. Vu and Bin Yu Department of Statistics University of California, Berkeley Berkeley, CA
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More informationPredictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA
Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,
More informationModeling Visual Cortex V4 in Naturalistic Conditions with Invari. Representations
Modeling Visual Cortex V4 in Naturalistic Conditions with Invariant and Sparse Image Representations Bin Yu Departments of Statistics and EECS University of California at Berkeley Rutgers University, May
More informationReconstructing visual experiences from brain activity evoked by natural movies
Reconstructing visual experiences from brain activity evoked by natural movies Shinji Nishimoto, An T. Vu, Thomas Naselaris, Yuval Benjamini, Bin Yu, and Jack L. Gallant, Current Biology, 2011 -Yi Gao,
More informationPenalizied Logistic Regression for Classification
Penalizied Logistic Regression for Classification Gennady G. Pekhimenko Department of Computer Science University of Toronto Toronto, ON M5S3L1 pgen@cs.toronto.edu Abstract Investigation for using different
More informationLinear Model Selection and Regularization. especially usefull in high dimensions p>>100.
Linear Model Selection and Regularization especially usefull in high dimensions p>>100. 1 Why Linear Model Regularization? Linear models are simple, BUT consider p>>n, we have more features than data records
More informationRandom Forest A. Fornaser
Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University
More informationCPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2017
CPSC 340: Machine Learning and Data Mining Feature Selection Fall 2017 Assignment 2: Admin 1 late day to hand in tonight, 2 for Wednesday, answers posted Thursday. Extra office hours Thursday at 4pm (ICICS
More informationAn R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation
An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation Xingguo Li Tuo Zhao Xiaoming Yuan Han Liu Abstract This paper describes an R package named flare, which implements
More informationMachine Learning. Topic 4: Linear Regression Models
Machine Learning Topic 4: Linear Regression Models (contains ideas and a few images from wikipedia and books by Alpaydin, Duda/Hart/ Stork, and Bishop. Updated Fall 205) Regression Learning Task There
More informationMapping of Hierarchical Activation in the Visual Cortex Suman Chakravartula, Denise Jones, Guillaume Leseur CS229 Final Project Report. Autumn 2008.
Mapping of Hierarchical Activation in the Visual Cortex Suman Chakravartula, Denise Jones, Guillaume Leseur CS229 Final Project Report. Autumn 2008. Introduction There is much that is unknown regarding
More informationSimple Model Selection Cross Validation Regularization Neural Networks
Neural Nets: Many possible refs e.g., Mitchell Chapter 4 Simple Model Selection Cross Validation Regularization Neural Networks Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February
More informationSparsity Based Regularization
9.520: Statistical Learning Theory and Applications March 8th, 200 Sparsity Based Regularization Lecturer: Lorenzo Rosasco Scribe: Ioannis Gkioulekas Introduction In previous lectures, we saw how regularization
More informationGAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.
GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential
More informationSUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018
SUPERVISED LEARNING METHODS Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018 2 CHOICE OF ML You cannot know which algorithm will work
More informationThe fastclime Package for Linear Programming and Large-Scale Precision Matrix Estimation in R
Journal of Machine Learning Research (2013) Submitted ; Published The fastclime Package for Linear Programming and Large-Scale Precision Matrix Estimation in R Haotian Pang Han Liu Robert Vanderbei Princeton
More informationFast or furious? - User analysis of SF Express Inc
CS 229 PROJECT, DEC. 2017 1 Fast or furious? - User analysis of SF Express Inc Gege Wen@gegewen, Yiyuan Zhang@yiyuan12, Kezhen Zhao@zkz I. MOTIVATION The motivation of this project is to predict the likelihood
More informationLasso. November 14, 2017
Lasso November 14, 2017 Contents 1 Case Study: Least Absolute Shrinkage and Selection Operator (LASSO) 1 1.1 The Lasso Estimator.................................... 1 1.2 Computation of the Lasso Solution............................
More informationLarge-Scale Lasso and Elastic-Net Regularized Generalized Linear Models
Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models DB Tsai Steven Hillion Outline Introduction Linear / Nonlinear Classification Feature Engineering - Polynomial Expansion Big-data
More informationRecommender Systems New Approaches with Netflix Dataset
Recommender Systems New Approaches with Netflix Dataset Robert Bell Yehuda Koren AT&T Labs ICDM 2007 Presented by Matt Rodriguez Outline Overview of Recommender System Approaches which are Content based
More informationMultiresponse Sparse Regression with Application to Multidimensional Scaling
Multiresponse Sparse Regression with Application to Multidimensional Scaling Timo Similä and Jarkko Tikka Helsinki University of Technology, Laboratory of Computer and Information Science P.O. Box 54,
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationCross-validation and the Bootstrap
Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. These methods refit a model of interest to samples formed from the training set,
More informationBoosting Simple Model Selection Cross Validation Regularization. October 3 rd, 2007 Carlos Guestrin [Schapire, 1989]
Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University October 3 rd, 2007 1 Boosting [Schapire, 1989] Idea: given a weak
More informationLasso Regression: Regularization for feature selection
Lasso Regression: Regularization for feature selection CSE 416: Machine Learning Emily Fox University of Washington April 12, 2018 Symptom of overfitting 2 Often, overfitting associated with very large
More informationLearning and Inferring Depth from Monocular Images. Jiyan Pan April 1, 2009
Learning and Inferring Depth from Monocular Images Jiyan Pan April 1, 2009 Traditional ways of inferring depth Binocular disparity Structure from motion Defocus Given a single monocular image, how to infer
More informationCPSC 340: Machine Learning and Data Mining
CPSC 340: Machine Learning and Data Mining Feature Selection Original version of these slides by Mark Schmidt, with modifications by Mike Gelbart. Admin Assignment 3: Due Friday Midterm: Feb 14 in class
More informationModel Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer
Model Assessment and Selection Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Model Training data Testing data Model Testing error rate Training error
More informationMULTIVARIATE ANALYSES WITH fmri DATA
MULTIVARIATE ANALYSES WITH fmri DATA Sudhir Shankar Raman Translational Neuromodeling Unit (TNU) Institute for Biomedical Engineering University of Zurich & ETH Zurich Motivation Modelling Concepts Learning
More informationAn R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation
An R Package flare for High Dimensional Linear Regression and Precision Matrix Estimation Xingguo Li Tuo Zhao Xiaoming Yuan Han Liu Abstract This paper describes an R package named flare, which implements
More informationTopics in Machine Learning
Topics in Machine Learning Gilad Lerman School of Mathematics University of Minnesota Text/slides stolen from G. James, D. Witten, T. Hastie, R. Tibshirani and A. Ng Machine Learning - Motivation Arthur
More informationLecture 9: Support Vector Machines
Lecture 9: Support Vector Machines William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 8 What we ll learn in this lecture Support Vector Machines (SVMs) a highly robust and
More informationAssessing the Quality of the Natural Cubic Spline Approximation
Assessing the Quality of the Natural Cubic Spline Approximation AHMET SEZER ANADOLU UNIVERSITY Department of Statisticss Yunus Emre Kampusu Eskisehir TURKEY ahsst12@yahoo.com Abstract: In large samples,
More informationCSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13
CSE 634 - Data Mining Concepts and Techniques STATISTICAL METHODS Professor- Anita Wasilewska (REGRESSION) Team 13 Contents Linear Regression Logistic Regression Bias and Variance in Regression Model Fit
More informationThe flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R
Journal of Machine Learning Research 6 (205) 553-557 Submitted /2; Revised 3/4; Published 3/5 The flare Package for High Dimensional Linear Regression and Precision Matrix Estimation in R Xingguo Li Department
More informationMachine Learning (BSMC-GA 4439) Wenke Liu
Machine Learning (BSMC-GA 4439) Wenke Liu 01-25-2018 Outline Background Defining proximity Clustering methods Determining number of clusters Other approaches Cluster analysis as unsupervised Learning Unsupervised
More informationSoft Threshold Estimation for Varying{coecient Models 2 ations of certain basis functions (e.g. wavelets). These functions are assumed to be smooth an
Soft Threshold Estimation for Varying{coecient Models Artur Klinger, Universitat Munchen ABSTRACT: An alternative penalized likelihood estimator for varying{coecient regression in generalized linear models
More informationNaïve Bayes for text classification
Road Map Basic concepts Decision tree induction Evaluation of classifiers Rule induction Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Support
More information08 An Introduction to Dense Continuous Robotic Mapping
NAVARCH/EECS 568, ROB 530 - Winter 2018 08 An Introduction to Dense Continuous Robotic Mapping Maani Ghaffari March 14, 2018 Previously: Occupancy Grid Maps Pose SLAM graph and its associated dense occupancy
More informationGeneralized Additive Models
:p Texts in Statistical Science Generalized Additive Models An Introduction with R Simon N. Wood Contents Preface XV 1 Linear Models 1 1.1 A simple linear model 2 Simple least squares estimation 3 1.1.1
More information10-701/15-781, Fall 2006, Final
-7/-78, Fall 6, Final Dec, :pm-8:pm There are 9 questions in this exam ( pages including this cover sheet). If you need more room to work out your answer to a question, use the back of the page and clearly
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More informationREPLACING MLE WITH BAYESIAN SHRINKAGE CAS ANNUAL MEETING NOVEMBER 2018 GARY G. VENTER
REPLACING MLE WITH BAYESIAN SHRINKAGE CAS ANNUAL MEETING NOVEMBER 2018 GARY G. VENTER ESTIMATION Problems with MLE known since Charles Stein 1956 paper He showed that when estimating 3 or more means, shrinking
More informationBoosting Simple Model Selection Cross Validation Regularization
Boosting: (Linked from class website) Schapire 01 Boosting Simple Model Selection Cross Validation Regularization Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 8 th,
More informationMoving Beyond Linearity
Moving Beyond Linearity Basic non-linear models one input feature: polynomial regression step functions splines smoothing splines local regression. more features: generalized additive models. Polynomial
More informationDeep Learning. Deep Learning. Practical Application Automatically Adding Sounds To Silent Movies
http://blog.csdn.net/zouxy09/article/details/8775360 Automatic Colorization of Black and White Images Automatically Adding Sounds To Silent Movies Traditionally this was done by hand with human effort
More informationPreface to the Second Edition. Preface to the First Edition. 1 Introduction 1
Preface to the Second Edition Preface to the First Edition vii xi 1 Introduction 1 2 Overview of Supervised Learning 9 2.1 Introduction... 9 2.2 Variable Types and Terminology... 9 2.3 Two Simple Approaches
More informationGENREG DID THAT? Clay Barker Research Statistician Developer JMP Division, SAS Institute
GENREG DID THAT? Clay Barker Research Statistician Developer JMP Division, SAS Institute GENREG WHAT IS IT? The Generalized Regression platform was introduced in JMP Pro 11 and got much better in version
More informationLecture on Modeling Tools for Clustering & Regression
Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into
More informationSparse Coding and Dictionary Learning for Image Analysis
Sparse Coding and Dictionary Learning for Image Analysis Part IV: Recent Advances in Computer Vision and New Models Francis Bach, Julien Mairal, Jean Ponce and Guillermo Sapiro CVPR 10 tutorial, San Francisco,
More informationMachine Learning Techniques for Detecting Hierarchical Interactions in GLM s for Insurance Premiums
Machine Learning Techniques for Detecting Hierarchical Interactions in GLM s for Insurance Premiums José Garrido Department of Mathematics and Statistics Concordia University, Montreal EAJ 2016 Lyon, September
More informationVoxel selection algorithms for fmri
Voxel selection algorithms for fmri Henryk Blasinski December 14, 2012 1 Introduction Functional Magnetic Resonance Imaging (fmri) is a technique to measure and image the Blood- Oxygen Level Dependent
More informationEffectiveness of Sparse Features: An Application of Sparse PCA
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationData mining techniques for actuaries: an overview
Data mining techniques for actuaries: an overview Emiliano A. Valdez joint work with Banghee So and Guojun Gan University of Connecticut Advances in Predictive Analytics (APA) Conference University of
More informationNetwork Lasso: Clustering and Optimization in Large Graphs
Network Lasso: Clustering and Optimization in Large Graphs David Hallac, Jure Leskovec, Stephen Boyd Stanford University September 28, 2015 Convex optimization Convex optimization is everywhere Introduction
More informationDS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University
DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University January 24 2019 Logistics HW 1 is due on Friday 01/25 Project proposal: due Feb 21 1 page description
More informationLecture 16: High-dimensional regression, non-linear regression
Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining and analysis November 3, 2017 1 / 17 High-dimensional regression Most of the methods we
More informationMachine Learning: An Applied Econometric Approach Online Appendix
Machine Learning: An Applied Econometric Approach Online Appendix Sendhil Mullainathan mullain@fas.harvard.edu Jann Spiess jspiess@fas.harvard.edu April 2017 A How We Predict In this section, we detail
More informationAnalysis of Functional MRI Timeseries Data Using Signal Processing Techniques
Analysis of Functional MRI Timeseries Data Using Signal Processing Techniques Sea Chen Department of Biomedical Engineering Advisors: Dr. Charles A. Bouman and Dr. Mark J. Lowe S. Chen Final Exam October
More informationSupervised vs unsupervised clustering
Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful
More informationGeneralized Additive Model
Generalized Additive Model by Huimin Liu Department of Mathematics and Statistics University of Minnesota Duluth, Duluth, MN 55812 December 2008 Table of Contents Abstract... 2 Chapter 1 Introduction 1.1
More informationLinear Methods for Regression and Shrinkage Methods
Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors
More information6 Model selection and kernels
6. Bias-Variance Dilemma Esercizio 6. While you fit a Linear Model to your data set. You are thinking about changing the Linear Model to a Quadratic one (i.e., a Linear Model with quadratic features φ(x)
More informationLeveling Up as a Data Scientist. ds/2014/10/level-up-ds.jpg
Model Optimization Leveling Up as a Data Scientist http://shorelinechurch.org/wp-content/uploa ds/2014/10/level-up-ds.jpg Bias and Variance Error = (expected loss of accuracy) 2 + flexibility of model
More informationELEG Compressive Sensing and Sparse Signal Representations
ELEG 867 - Compressive Sensing and Sparse Signal Representations Gonzalo R. Arce Depart. of Electrical and Computer Engineering University of Delaware Fall 211 Compressive Sensing G. Arce Fall, 211 1 /
More informationNonparametric Regression
Nonparametric Regression John Fox Department of Sociology McMaster University 1280 Main Street West Hamilton, Ontario Canada L8S 4M4 jfox@mcmaster.ca February 2004 Abstract Nonparametric regression analysis
More informationAdvanced and Predictive Analytics with JMP 12 PRO. JMP User Meeting 9. Juni Schwalbach
Advanced and Predictive Analytics with JMP 12 PRO JMP User Meeting 9. Juni 2016 -Schwalbach Definition Predictive Analytics encompasses a variety of statistical techniques from modeling, machine learning
More informationPart I. Hierarchical clustering. Hierarchical Clustering. Hierarchical clustering. Produces a set of nested clusters organized as a
Week 9 Based in part on slides from textbook, slides of Susan Holmes Part I December 2, 2012 Hierarchical Clustering 1 / 1 Produces a set of nested clusters organized as a Hierarchical hierarchical clustering
More informationLecture 21 : A Hybrid: Deep Learning and Graphical Models
10-708: Probabilistic Graphical Models, Spring 2018 Lecture 21 : A Hybrid: Deep Learning and Graphical Models Lecturer: Kayhan Batmanghelich Scribes: Paul Liang, Anirudha Rayasam 1 Introduction and Motivation
More informationCPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016
CPSC 340: Machine Learning and Data Mining Principal Component Analysis Fall 2016 A2/Midterm: Admin Grades/solutions will be posted after class. Assignment 4: Posted, due November 14. Extra office hours:
More informationLecture 25: Review I
Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,
More informationClustering and The Expectation-Maximization Algorithm
Clustering and The Expectation-Maximization Algorithm Unsupervised Learning Marek Petrik 3/7 Some of the figures in this presentation are taken from An Introduction to Statistical Learning, with applications
More informationMachine Learning. Chao Lan
Machine Learning Chao Lan Machine Learning Prediction Models Regression Model - linear regression (least square, ridge regression, Lasso) Classification Model - naive Bayes, logistic regression, Gaussian
More informationNaïve Bayes, Gaussian Distributions, Practical Applications
Naïve Bayes, Gaussian Distributions, Practical Applications Required reading: Mitchell draft chapter, sections 1 and 2. (available on class website) Machine Learning 10-601 Tom M. Mitchell Machine Learning
More informationDATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane
DATA MINING AND MACHINE LEARNING Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Data preprocessing Feature normalization Missing
More informationThe picasso Package for High Dimensional Regularized Sparse Learning in R
The picasso Package for High Dimensional Regularized Sparse Learning in R X. Li, J. Ge, T. Zhang, M. Wang, H. Liu, and T. Zhao Abstract We introduce an R package named picasso, which implements a unified
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Example Learning Problem Example Learning Problem Celebrity Faces in the Wild Machine Learning Pipeline Raw data Feature extract. Feature computation Inference: prediction,
More informationI How does the formulation (5) serve the purpose of the composite parameterization
Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)
More informationSparse & Functional Principal Components Analysis
Sparse & Functional Principal Components Analysis Genevera I. Allen Department of Statistics and Electrical and Computer Engineering, Rice University, Department of Pediatrics-Neurology, Baylor College
More informationChapter 7: Numerical Prediction
Ludwig-Maximilians-Universität München Institut für Informatik Lehr- und Forschungseinheit für Datenbanksysteme Knowledge Discovery in Databases SS 2016 Chapter 7: Numerical Prediction Lecture: Prof. Dr.
More informationA study of classification algorithms using Rapidminer
Volume 119 No. 12 2018, 15977-15988 ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu ijpam.eu A study of classification algorithms using Rapidminer Dr.J.Arunadevi 1, S.Ramya 2, M.Ramesh Raja
More informationInstance-based Learning
Instance-based Learning Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 19 th, 2007 2005-2007 Carlos Guestrin 1 Why not just use Linear Regression? 2005-2007 Carlos Guestrin
More informationSlides for Data Mining by I. H. Witten and E. Frank
Slides for Data Mining by I. H. Witten and E. Frank 7 Engineering the input and output Attribute selection Scheme-independent, scheme-specific Attribute discretization Unsupervised, supervised, error-
More information