SAS Workshop. Introduction to SAS Programming. Iowa State University DAY 3 SESSION I
|
|
- Francine Wade
- 5 years ago
- Views:
Transcription
1 SAS Workshop Introduction to SAS Programming DAY 3 SESSION I Iowa State University May 10, 2016
2 Sample Data: Prostate Data Set Example C8 further illustrates the use of all-subset selection options in proc reg. In this example, adjrsq is used instead of rsquare as the model selection criterion. The data used here came from a study that examined the correlation between the level of prostate specific antigen and a number of clinical measures in men who were about to receive a radical prostatectomy. The goal was to predict log prostate specific antigen (lpsa) from a number of measurements including log cancer volume (lcavol), log prostate weight (lweight), age, log benign prostatic hyperplasia amount (lbph), seminal vesicle invasion (svi), log capsular penetration (lcp), Gleason score (gleason), and percentage Gleason scores 4 or 5 (pgg45).
3 SAS Example C9 data prostate; infile"u:\documents\sas_workshop_spring2016\data \prostate.txt"; input case lcavol lweight age lbph svi lcp gleason pgg45 lpsa; ; ods pdf file="u:\documents\sas_workshop_spring2016 \c9out.pdf"; title 'Variable Subset Selection: Prostate Data'; proc reg data=prostate plots(only)= (cp(label) aic(label)); model lpsa=lcavol lweight age lbph svi lcp gleason pgg45/selection=adjrsq start=2 stop=5 best=12 sse mse aic cp b; run; ods pdf close;
4 Validation and Cross-Validation The average squared error (ASE) of prediction is used for estimating the error in prediction of a model fitted using the training data: 1 2 ASE= ( y ˆ i yi) N th Here y i is the i observation in the validation data set and y is its predicted ˆi value using the fitted model and N is the validation sample size. An alternative approach is to use K-fold cross-validation. Here the original data is first randomly divided into K equal-sized partitions. One of these th partitions (say, the k one) is considered the hold out data. The remaining K -1 partitions put together is considered the training data. The model selected is fitted to the training data. This fitted model is used on the hold out data to obtain the prediction error (ASE) of the model. th The whole procedure is repeated using the k fold, where k = 1,, K, as the hold out data set and remaining data as the training data and the ASE's resulting from each partition are then combined. For equal fold sample sizes, ASE of the whole cross-validation procedure is K 1 2 ( y ˆ ki yki ) noting yˆki for each k are calculated using prediction N k = 1 th equations fit to the training data sets in the k fold.
5 SAS Example C10: Validation data prostate; infile "U:\Documents\SAS_Workshop_Spring2016\Data\prostate.txt"; input case lcavol lweight age lbph svi lcp gleason pgg45 lpsa; ; ods pdf file="u:\documents\sas_workshop_spring2016\c10out.pdf"; proc glmselect data=prostate seed=12345 plots(stepaxis=number)=(criteria ASE); partition fraction(validate=.35); model lpsa=lcavol lweight age lbph svi lcp gleason pgg45 /selection=stepwise(choose = validate select = sl) stb; run; ods pdf close;
6 Notes: Options Used in Example C10 SAS procedure glmselect is used to illustrate how to perform validation. Use the stepwise selection method using the p-value criteria to select models of each size (selection=stepwise with select=sl suboption) However, choose=validate suboption is used to specify that the model with the smallest validation ASE be determined. glmselect produces tables and graphics that contains the validation ASE for models in each step when this option is used. The stepwise option uses significance levels of.15 for both entry and deletion of variable, by default. The best model is chosen using a validation data set obtained by randomly splitting the original prostate data set of 97 cases to obtain training data set with 69 cases and a validation set of 28 cases.
7 SAS Example C11: K-fold Cross-Validation data prostate; infile "U:\Documents\SAS_Workshop_Spring2016\Data\prostate.txt"; input case lcavol lweight age lbph svi lcp gleason pgg45 lpsa; ; ods pdf file="u:\documents\sas_workshop_spring2016\c11out.pdf"; proc glmselect data=prostate seed=12345 plots(stepaxis=number)=(criteria coefficients); model lpsa=lcavol lweight age lbph svi lcp gleason pgg45 /cvmethod=random(5) selection=stepwise(select=adjrsq choose=cv) stats=(cp aic sbc) stb; run; ods pdf close;
8 Comments on Example C11 SAS procedure glmselect is used to illustrate how to perform K-fold crossvalidation. The model option cvmethod=random(5) specifies that 5-fold cross-validation with the folds selected randomly (each of size, approximately N/K rounded down to an integer) be used to perform cross validation. 2 The stepwise selection method used here is the adjusted R to select models of each size (selection=adjrsq) However, choose=cv suboption is used to specify that the model with the smallest validation ASE be determined for each model selected at each step.this statistic is called CV PRESS in SAS. The option stats=(cp aic sbc) specifies that glmselect produces tables and graphics that contain these statistics for selected models. However, the cv press values for the models 2 through 7 are similar. Thus a possible good model may be one of the other models, say Model 4, that has the smallest Cp value of as well as several other good fit criteria. This model with an intercept and the predictors lcavol, lweight, svi, and lbph thus appears to have the smallest bias and comparably small error in prediction.
The Data. Math 158, Spring 2016 Jo Hardin Shrinkage Methods R code Ridge Regression & LASSO
Math 158, Spring 2016 Jo Hardin Shrinkage Methods R code Ridge Regression & LASSO The Data The following dataset is from Hastie, Tibshirani and Friedman (2009), from a studyby Stamey et al. (1989) of prostate
More informationSAS Workshop. Introduction to SAS Programming. Iowa State University DAY 2 SESSION IV
SAS Workshop Introduction to SAS Programming DAY 2 SESSION IV Iowa State University May 10, 2016 Controlling ODS graphical output from a procedure Many SAS procedures produce default plots in ODS graphics
More information[POLS 8500] Stochastic Gradient Descent, Linear Model Selection and Regularization
[POLS 8500] Stochastic Gradient Descent, Linear Model Selection and Regularization L. Jason Anastasopoulos ljanastas@uga.edu February 2, 2017 Gradient descent Let s begin with our simple problem of estimating
More informationInformation Criteria Methods in SAS for Multiple Linear Regression Models
Paper SA5 Information Criteria Methods in SAS for Multiple Linear Regression Models Dennis J. Beal, Science Applications International Corporation, Oak Ridge, TN ABSTRACT SAS 9.1 calculates Akaike s Information
More informationPenalized regression Statistical Learning, 2011
Penalized regression Statistical Learning, 2011 Niels Richard Hansen September 19, 2011 Penalized regression is implemented in several different R packages. Ridge regression can, in principle, be carried
More informationGLMSELECT for Model Selection
Winnipeg SAS User Group Meeting May 11, 2012 GLMSELECT for Model Selection Sylvain Tremblay SAS Canada Education Copyright 2010 SAS Institute Inc. All rights reserved. Proc GLM Proc REG Class Statement
More informationLast time... Coryn Bailer-Jones. check and if appropriate remove outliers, errors etc. linear regression
Machine learning, pattern recognition and statistical data modelling Lecture 3. Linear Methods (part 1) Coryn Bailer-Jones Last time... curse of dimensionality local methods quickly become nonlocal as
More informationLinear Model Selection and Regularization. especially usefull in high dimensions p>>100.
Linear Model Selection and Regularization especially usefull in high dimensions p>>100. 1 Why Linear Model Regularization? Linear models are simple, BUT consider p>>n, we have more features than data records
More informationLecture on Modeling Tools for Clustering & Regression
Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into
More informationThe lasso2 Package. April 5, 2006
The lasso2 Package April 5, 2006 Version 1.2-5 Date 2006-04-05 Author Justin Lokhorst, Bill Venables and Berwin Turlach; port to R, tests etc: Martin Maechler Maintainer Berwin
More informationPackage lasso2. November 27, 2018
Version 1.2-20 Date 2018-11-27 Package lasso2 November 27, 2018 Author Justin Lokhorst, Bill Venables and Berwin Turlach; port to R, tests etc: Martin Maechler Maintainer Berwin
More information2017 ITRON EFG Meeting. Abdul Razack. Specialist, Load Forecasting NV Energy
2017 ITRON EFG Meeting Abdul Razack Specialist, Load Forecasting NV Energy Topics 1. Concepts 2. Model (Variable) Selection Methods 3. Cross- Validation 4. Cross-Validation: Time Series 5. Example 1 6.
More informationStat 5100 Handout #14.a SAS: Logistic Regression
Stat 5100 Handout #14.a SAS: Logistic Regression Example: (Text Table 14.3) Individuals were randomly sampled within two sectors of a city, and checked for presence of disease (here, spread by mosquitoes).
More informationModel Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer
Model Assessment and Selection Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Model Training data Testing data Model Testing error rate Training error
More informationSTA121: Applied Regression Analysis
STA121: Applied Regression Analysis Variable Selection - Chapters 8 in Dielman Artin Department of Statistical Science October 23, 2009 Outline Introduction 1 Introduction 2 3 4 Variable Selection Model
More information[8] Data Mining: Trees
[8] Data Mining: Trees Nonlinear regression and classification with Decision Trees, CART, and Random Forests Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt.taddy/teaching
More informationSAS/STAT 15.1 User s Guide The HPREG Procedure
SAS/STAT 15.1 User s Guide The HPREG Procedure This document is an individual chapter from SAS/STAT 15.1 User s Guide. The correct bibliographic citation for this manual is as follows: SAS Institute Inc.
More informationCross-validation and the Bootstrap
Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. These methods refit a model of interest to samples formed from the training set,
More informationCross-validation and the Bootstrap
Cross-validation and the Bootstrap In the section we discuss two resampling methods: cross-validation and the bootstrap. 1/44 Cross-validation and the Bootstrap In the section we discuss two resampling
More informationSAS/STAT 15.1 User s Guide The GLMSELECT Procedure
SAS/STAT 15.1 User s Guide The GLMSELECT Procedure This document is an individual chapter from SAS/STAT 15.1 User s Guide. The correct bibliographic citation for this manual is as follows: SAS Institute
More informationStat 342 Exam 3 Fall 2014
Stat 34 Exam 3 Fall 04 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed There are questions on the following 6 pages. Do as many of them as you can
More informationCross-validation. Cross-validation is a resampling method.
Cross-validation Cross-validation is a resampling method. It refits a model of interest to samples formed from the training set, in order to obtain additional information about the fitted model. For example,
More informationNonparametric Approaches to Regression
Nonparametric Approaches to Regression In traditional nonparametric regression, we assume very little about the functional form of the mean response function. In particular, we assume the model where m(xi)
More informationAssignment No: 2. Assessment as per Schedule. Specifications Readability Assignments
Specifications Readability Assignments Assessment as per Schedule Oral Total 6 4 4 2 4 20 Date of Performance:... Expected Date of Completion:... Actual Date of Completion:... ----------------------------------------------------------------------------------------------------------------
More informationChapter 6: Linear Model Selection and Regularization
Chapter 6: Linear Model Selection and Regularization As p (the number of predictors) comes close to or exceeds n (the sample size) standard linear regression is faced with problems. The variance of the
More informationLogistic Model Selection with SAS PROC s LOGISTIC, HPLOGISTIC, HPGENSELECT
MWSUG 2017 - Paper AA02 Logistic Model Selection with SAS PROC s LOGISTIC, HPLOGISTIC, HPGENSELECT Bruce Lund, Magnify Analytic Solutions, Detroit MI, Wilmington DE, Charlotte NC ABSTRACT In marketing
More informationJournal of Statistical Software
JSS Journal of Statistical Software MMMMMM YYYY, Volume VV, Issue II. http://www.jstatsoft.org/ Variable Clustering in High-Dimensional Linear Regression: The R Package clere Loic Yengo CNRS UMR 8199 Univ.
More informationMODEL DEVELOPMENT: VARIABLE SELECTION
7 MODEL DEVELOPMENT: VARIABLE SELECTION The discussion of least squares regression thus far has presumed that the model was known with respect to which variables were to be included and the form these
More informationDiscussion Notes 3 Stepwise Regression and Model Selection
Discussion Notes 3 Stepwise Regression and Model Selection Stepwise Regression There are many different commands for doing stepwise regression. Here we introduce the command step. There are many arguments
More informationStatistical Consulting Topics Using cross-validation for model selection. Cross-validation is a technique that can be used for model evaluation.
Statistical Consulting Topics Using cross-validation for model selection Cross-validation is a technique that can be used for model evaluation. We often fit a model to a full data set and then perform
More informationI211: Information infrastructure II
Data Mining: Classifier Evaluation I211: Information infrastructure II 3-nearest neighbor labeled data find class labels for the 4 data points 1 0 0 6 0 0 0 5 17 1.7 1 1 4 1 7.1 1 1 1 0.4 1 2 1 3.0 0 0.1
More informationPerformance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018
Performance Estimation and Regularization Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Bias- Variance Tradeoff Fundamental to machine learning approaches Bias- Variance Tradeoff Error due to Bias:
More informationLeveling Up as a Data Scientist. ds/2014/10/level-up-ds.jpg
Model Optimization Leveling Up as a Data Scientist http://shorelinechurch.org/wp-content/uploa ds/2014/10/level-up-ds.jpg Bias and Variance Error = (expected loss of accuracy) 2 + flexibility of model
More informationInstruction on JMP IN of Chapter 19
Instruction on JMP IN of Chapter 19 Example 19.2 (1). Download the dataset xm19-02.jmp from the website for this course and open it. (2). Go to the Analyze menu and select Fit Model. Click on "REVENUE"
More informationOutline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model
Topic 16 - Other Remedies Ridge Regression Robust Regression Regression Trees Outline - Fall 2013 Piecewise Linear Model Bootstrapping Topic 16 2 Ridge Regression Modification of least squares that addresses
More informationChapter 9 Building the Regression Model I: Model Selection and Validation
Chapter 9 Building the Regression Model I: Model Selection and Validation 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 9 1 / 42 9.1 Polynomial Regression Models
More informationPackage ncvreg. April 18, 2018
Package ncvreg April 18, 2018 Title Regularization Paths for SCAD and MCP Penalized Regression Models Version 3.10-0 Date 2018-04-17 Author Patrick Breheny [aut,cre] Maintainer Patrick Breheny
More informationRegression Model Building for Large, Complex Data with SAS Viya Procedures
Paper SAS2033-2018 Regression Model Building for Large, Complex Data with SAS Viya Procedures Robert N. Rodriguez and Weijie Cai, SAS Institute Inc. Abstract Analysts who do statistical modeling, data
More informationSAS/STAT 13.1 User s Guide. The REG Procedure
SAS/STAT 13.1 User s Guide The REG Procedure This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete manual is as follows: SAS Institute
More information22s:152 Applied Linear Regression
22s:152 Applied Linear Regression Chapter 22: Model Selection In model selection, the idea is to find the smallest set of variables which provides an adequate description of the data. We will consider
More informationLinear Methods for Regression and Shrinkage Methods
Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors
More information22s:152 Applied Linear Regression
22s:152 Applied Linear Regression Chapter 22: Model Selection In model selection, the idea is to find the smallest set of variables which provides an adequate description of the data. We will consider
More informationStat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors
Week 10: Autocorrelated errors This week, I have done one possible analysis and provided lots of output for you to consider. Case study: predicting body fat Body fat is an important health measure, but
More informationTHIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010
THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE
More informationVariable Clustering in High-Dimensional Linear Regression: The R Package clere by Loïc Yengo, Julien Jacques, Christophe Biernacki and Mickael Canouil
Contributed research article 1 Variable Clustering in High-Dimensional Linear Regression: The R Package clere by Loïc Yengo, Julien Jacques, Christophe Biernacki and Mickael Canouil Abstract Dimension
More informationThe problem we have now is called variable selection or perhaps model selection. There are several objectives.
STAT-UB.0103 NOTES for Wednesday 01.APR.04 One of the clues on the library data comes through the VIF values. These VIFs tell you to what extent a predictor is linearly dependent on other predictors. We
More informationRegression on SAT Scores of 374 High Schools and K-means on Clustering Schools
Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools Abstract In this project, we study 374 public high schools in New York City. The project seeks to use regression techniques
More informationMulticollinearity and Validation CIVL 7012/8012
Multicollinearity and Validation CIVL 7012/8012 2 In Today s Class Recap Multicollinearity Model Validation MULTICOLLINEARITY 1. Perfect Multicollinearity 2. Consequences of Perfect Multicollinearity 3.
More informationLecture 13: Model selection and regularization
Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always
More informationPaper ST-157. Dennis J. Beal, Science Applications International Corporation, Oak Ridge, Tennessee
Paper ST-157 SAS Code for Variable Selection in Multiple Linear Regression Models Using Information Criteria Methods with Explicit Enumeration for a Large Number of Independent Regressors Dennis J. Beal,
More information2. Smoothing Binning
Macro %shtscore is primarily designed for score building when the dependent variable is binary. There are several components in %shtscore: 1. Variable pre-scanning; 2. Smoothing binning; 3. Information
More informationNonparametric Methods Recap
Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority
More informationIntermediate SAS: Statistics
Intermediate SAS: Statistics OIT TSS 293-4444 oithelp@mail.wvu.edu oit.wvu.edu/training/classmat/sas/ Table of Contents Procedures... 2 Two-sample t-test:... 2 Paired differences t-test:... 2 Chi Square
More informationAssignment 6 - Model Building
Assignment 6 - Model Building your name goes here Due: Wednesday, March 7, 2018, noon, to Sakai Summary Primarily from the topics in Chapter 9 of your text, this homework assignment gives you practice
More informationNonparametric Classification Methods
Nonparametric Classification Methods We now examine some modern, computationally intensive methods for regression and classification. Recall that the LDA approach constructs a line (or plane or hyperplane)
More informationResampling methods (Ch. 5 Intro)
Zavádějící faktor (Confounding factor), ale i 'současně působící faktor' Resampling methods (Ch. 5 Intro) Key terms: Train/Validation/Test data Crossvalitation One-leave-out = LOOCV Bootstrup key slides
More informationAssignment 4 (Sol.) Introduction to Data Analytics Prof. Nandan Sudarsanam & Prof. B. Ravindran
Assignment 4 (Sol.) Introduction to Data Analytics Prof. andan Sudarsanam & Prof. B. Ravindran 1. Which among the following techniques can be used to aid decision making when those decisions depend upon
More informationCORExpress 1.0 USER S GUIDE Jay Magidson
CORExpress 1.0 USER S GUIDE Jay Magidson 1 This document should be cited as: Magidson, J. (2014). CORExpress User s Guide: Manual for CORExpress, Belmont, MA: Statistical Innovations Inc. For more information
More informationST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false.
ST512 Fall Quarter, 2005 Exam 1 Name: Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false. 1. (42 points) A random sample of n = 30 NBA basketball
More informationData Management - 50%
Exam 1: SAS Big Data Preparation, Statistics, and Visual Exploration Data Management - 50% Navigate within the Data Management Studio Interface Register a new QKB Create and connect to a repository Define
More informationMODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES
UNIVERSITY OF GLASGOW MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES by KHUNESWARI GOPAL PILLAY A thesis submitted in partial fulfillment for the degree of Doctor of Philosophy in
More informationChapter 28 Command Reference. Chapter Table of Contents TSVIEW COMMAND FORECAST COMMAND
Chapter 28 Command Reference Chapter Table of Contents TSVIEW COMMAND...1353 FORECAST COMMAND...1353 1351 Part 3. General Information 1352 Chapter 28 Command Reference TSVIEW command The TSVIEW command
More informationRESAMPLING METHODS. Chapter 05
1 RESAMPLING METHODS Chapter 05 2 Outline Cross Validation The Validation Set Approach Leave-One-Out Cross Validation K-fold Cross Validation Bias-Variance Trade-off for k-fold Cross Validation Cross Validation
More informationSimulation of Imputation Effects Under Different Assumptions. Danny Rithy
Simulation of Imputation Effects Under Different Assumptions Danny Rithy ABSTRACT Missing data is something that we cannot always prevent. Data can be missing due to subjects' refusing to answer a sensitive
More informationHighly Customized Graphs Using ODS Graphics
Paper SAS-2016 Highly Customized Graphs Using ODS Graphics Warren F. Kuhfeld, SAS Institute Inc. ABSTRACT You can use annotation, modify templates, and change dynamic variables to customize graphs in SAS.
More informationST Lab 1 - The basics of SAS
ST 512 - Lab 1 - The basics of SAS What is SAS? SAS is a programming language based in C. For the most part SAS works in procedures called proc s. For instance, to do a correlation analysis there is proc
More informationSYS 6021 Linear Statistical Models
SYS 6021 Linear Statistical Models Project 2 Spam Filters Jinghe Zhang Summary The spambase data and time indexed counts of spams and hams are studied to develop accurate spam filters. Static models are
More informationStacked Ensemble Models for Improved Prediction Accuracy
ABSTRACT Paper SAS-2017 Stacked Ensemble Models for Improved Prediction Accuracy Funda Güneş, Russ Wolfinger, and Pei-Yi Tan SAS Institute Inc. Ensemble modeling is now a well-established means for improving
More informationMATH 829: Introduction to Data Mining and Analysis Model selection
1/12 MATH 829: Introduction to Data Mining and Analysis Model selection Dominique Guillot Departments of Mathematical Sciences University of Delaware February 24, 2016 2/12 Comparison of regression methods
More informationMultivariate Analysis Multivariate Calibration part 2
Multivariate Analysis Multivariate Calibration part 2 Prof. Dr. Anselmo E de Oliveira anselmo.quimica.ufg.br anselmo.disciplinas@gmail.com Linear Latent Variables An essential concept in multivariate data
More informationPredicting Percentage of Intramuscular Fat Using Two Types of Real-Time Ultrasound Equipment
Predicting Percentage of Intramuscular Fat Using Two Types of Real-Time Ultrasound Equipment A. S. Leaflet R1732 Abebe Hassen, assistant scientist Doyle Wilson, professor of animal science Viren Amin,
More information7. Collinearity and Model Selection
Sociology 740 John Fox Lecture Notes 7. Collinearity and Model Selection Copyright 2014 by John Fox Collinearity and Model Selection 1 1. Introduction I When there is a perfect linear relationship among
More informationLECTURE 11: LINEAR MODEL SELECTION PT. 2. October 18, 2017 SDS 293: Machine Learning
LECTURE 11: LINEAR MODEL SELECTION PT. 2 October 18, 2017 SDS 293: Machine Learning Announcements 1/2 CS Internship Lunch Presentations Come hear where Computer Science majors interned in Summer 2017!
More informationSASEG 9B Regression Assumptions
SASEG 9B Regression Assumptions (Fall 2015) Sources (adapted with permission)- T. P. Cronan, Jeff Mullins, Ron Freeze, and David E. Douglas Course and Classroom Notes Enterprise Systems, Sam M. Walton
More informationS2 Text. Instructions to replicate classification results.
S2 Text. Instructions to replicate classification results. Machine Learning (ML) Models were implemented using WEKA software Version 3.8. The software can be free downloaded at this link: http://www.cs.waikato.ac.nz/ml/weka/downloading.html.
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal
More informationGeneralized Additive Model
Generalized Additive Model by Huimin Liu Department of Mathematics and Statistics University of Minnesota Duluth, Duluth, MN 55812 December 2008 Table of Contents Abstract... 2 Chapter 1 Introduction 1.1
More informationStat 5100 Handout #6 SAS: Linear Regression Remedial Measures
Stat 5100 Handout #6 SAS: Linear Regression Remedial Measures Example: Age and plasma level for 25 healthy children in a study are reported. Of interest is how plasma level depends on age. (Text Table
More informationBig Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1
Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that
More informationSimulating Multivariate Normal Data
Simulating Multivariate Normal Data You have a population correlation matrix and wish to simulate a set of data randomly sampled from a population with that structure. I shall present here code and examples
More informationBIOL 458 BIOMETRY Lab 10 - Multiple Regression
BIOL 458 BIOMETRY Lab 0 - Multiple Regression Many problems in biology science involve the analysis of multivariate data sets. For data sets in which there is a single continuous dependent variable, but
More informationSAS/STAT 15.1 User s Guide The QUANTSELECT Procedure
SAS/STAT 15.1 User s Guide The QUANTSELECT Procedure This document is an individual chapter from SAS/STAT 15.1 User s Guide. The correct bibliographic citation for this manual is as follows: SAS Institute
More information14.2 The Regression Equation
14.2 The Regression Equation Tom Lewis Fall Term 2009 Tom Lewis () 14.2 The Regression Equation Fall Term 2009 1 / 12 Outline 1 Exact and inexact linear relationships 2 Fitting lines to data 3 Formulas
More informationFeature Subset Selection for Logistic Regression via Mixed Integer Optimization
Feature Subset Selection for Logistic Regression via Mixed Integer Optimization Yuichi TAKANO (Senshu University, Japan) Toshiki SATO (University of Tsukuba) Ryuhei MIYASHIRO (Tokyo University of Agriculture
More informationPredicting Percentage of Intramuscular Fat Using Two Types of Real-Time Ultrasound Equipment
Beef Research Report, 2000 Animal Science Research Reports 2001 Predicting Percentage of Intramuscular Fat Using Two Types of Real-Time Ultrasound Equipment Abebe Hassen Doyle Wilson Viren Amin Gene Rouse
More informationTopics in Machine Learning-EE 5359 Model Assessment and Selection
Topics in Machine Learning-EE 5359 Model Assessment and Selection Ioannis D. Schizas Electrical Engineering Department University of Texas at Arlington 1 Training and Generalization Training stage: Utilizing
More informationCPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2017
CPSC 340: Machine Learning and Data Mining Feature Selection Fall 2017 Assignment 2: Admin 1 late day to hand in tonight, 2 for Wednesday, answers posted Thursday. Extra office hours Thursday at 4pm (ICICS
More informationCARTWARE Documentation
CARTWARE Documentation CARTWARE is a collection of R functions written for Classification and Regression Tree (CART) Analysis of ecological data sets. All of these functions make use of existing R functions
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 2 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Classification - selection of tuning parameters
More informationDevelopment of Prediction Model for Linked Data based on the Decision Tree for Track A, Task A1
Development of Prediction Model for Linked Data based on the Decision Tree for Track A, Task A1 Dongkyu Jeon and Wooju Kim Dept. of Information and Industrial Engineering, Yonsei University, Seoul, Korea
More informationTHE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)
THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533 MIDTERM EXAMINATION: October 14, 2005 Instructor: Val LeMay Time: 50 minutes 40 Marks FRST 430 50 Marks FRST 533 (extra questions) This examination
More informationModel selection and validation 1: Cross-validation
Model selection and validation 1: Cross-validation Ryan Tibshirani Data Mining: 36-462/36-662 March 26 2013 Optional reading: ISL 2.2, 5.1, ESL 7.4, 7.10 1 Reminder: modern regression techniques Over the
More informationArtificial Neural Networks (Feedforward Nets)
Artificial Neural Networks (Feedforward Nets) y w 03-1 w 13 y 1 w 23 y 2 w 01 w 21 w 22 w 02-1 w 11 w 12-1 x 1 x 2 6.034 - Spring 1 Single Perceptron Unit y w 0 w 1 w n w 2 w 3 x 0 =1 x 1 x 2 x 3... x
More informationMotivating Example. Missing Data Theory. An Introduction to Multiple Imputation and its Application. Background
An Introduction to Multiple Imputation and its Application Craig K. Enders University of California - Los Angeles Department of Psychology cenders@psych.ucla.edu Background Work supported by Institute
More informationData Mining: Classifier Evaluation. CSCI-B490 Seminar in Computer Science (Data Mining)
Data Mining: Classifier Evaluation CSCI-B490 Seminar in Computer Science (Data Mining) Predictor Evaluation 1. Question: how good is our algorithm? how will we estimate its performance? 2. Question: what
More informationPrinciples of Machine Learning
Principles of Machine Learning Lab 3 Improving Machine Learning Models Overview In this lab you will explore techniques for improving and evaluating the performance of machine learning models. You will
More informationSAS Online Training: Course contents: Agenda:
SAS Online Training: Course contents: Agenda: (1) Base SAS (6) Clinical SAS Online Training with Real time Projects (2) Advance SAS (7) Financial SAS Training Real time Projects (3) SQL (8) CV preparation
More informationBest First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis
Best First and Greedy Search Based CFS and Naïve Bayes Algorithms for Hepatitis Diagnosis CHAPTER 3 BEST FIRST AND GREEDY SEARCH BASED CFS AND NAÏVE BAYES ALGORITHMS FOR HEPATITIS DIAGNOSIS 3.1 Introduction
More informationLecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017
Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last
More informationSAS/STAT 12.3 User s Guide. The QUANTSELECT Procedure (Chapter)
SAS/STAT 12.3 User s Guide The QUANTSELECT Procedure (Chapter) This document is an individual chapter from SAS/STAT 12.3 User s Guide. The correct bibliographic citation for the complete manual is as follows:
More information