Lab 10 Regression IV
|
|
- Rodney Hood
- 5 years ago
- Views:
Transcription
1 ggplot2 package: Lab 10 Regression IV Dave presented analysis of a data set on body fat which I would like to use to show features I think are worth knowing about in ggplot2 (and associated) packages. Look at the code and results below. # ggplot2 features Body_fat_complete <- read.csv("c:/users/user/desktop/body_fat_complete.csv") bf<- Body_fat_complete names(bf) We have 14 variables we can relate (7 per plot), and we do it 2 at a time using the ggpairs() command located in the GGally package. Download the help file on the command from our Moodle page, if you want to investigate its properties. library(ggplot2) library(corrplot) library(ggally) ggpairs(bf[,1:7]) ggpairs(bf[c(1,8:14)]) and -1-
2 Note that the main diagonal has the shape of each variable density (with row or column label), and the scatter plot pictures of each set of 2 variables, when folded along the main diagonal, will lay on top of the various correlation values of the scatter plots. So, for example, the bottom left scatter plot has the correlation value listed in the top right, etc. The scatter plots visually show which pairs correlate will, and the numbers verify that observation. The density plots, when sort of truncated before going across the total range of x axis values indicates outliers. So, for example, the density plot of ankle only goes about half way across the x axis, and the hip variable only goes about 4/5 of the way across the x axis these 2 variables have outliers which can be seen relatively easily in the respective scatter plots. This visualization matrix is quite handy for getting initial impressions of relationships. Below is some code showing various uses for ggplot2. # ggplot2 examples library(ggplot2) # create factors with value labels mtcars$gear <- factor(mtcars$gear,levels=c(3,4,5), labels=c("3gears","4gears","5gears")) mtcars$am <- factor(mtcars$am,levels=c(0,1), labels=c("automatic","manual")) mtcars$cyl <- factor(mtcars$cyl,levels=c(4,6,8), labels=c("4cyl","6cyl","8cyl")) # Kernel density plots for mpg # grouped by number of gears (indicated by color) -2-
3 qplot(mpg, data=mtcars, geom="density", fill=gear, alpha=i(.5), main="distribution of Gas Milage", xlab="miles Per Gallon", ylab="density") # Scatterplot of mpg vs. hp for each combination of gears and cylinders # in each facet, transmittion type is represented by shape and color qplot(hp, mpg, data=mtcars, shape=am, color=am, facets=gear~cyl, size=i(3), xlab="horsepower", ylab="miles per Gallon") # Separate regressions of mpg on weight for each number of cylinders qplot(wt, mpg, data=mtcars, geom=c("point", "smooth"), method="lm", formula=y~x, color=cyl, main="regression of MPG on Weight", xlab="weight", ylab="miles per Gallon") # Boxplots of mpg by number of gears # observations (points) are overlayed and jittered qplot(gear, mpg, data=mtcars, geom=c("boxplot", "jitter"), fill=gear, main="mileage by Gear Number", xlab="", ylab="miles per Gallon") and
4 also library(ggplot2) p <- qplot(hp, mpg, data=mtcars, shape=am, color=am, facets=gear~cyl, main="scatterplots of MPG vs. Horsepower", xlab="horsepower", ylab="miles per Gallon") # White background and black grid lines p + theme_bw() # Large brown bold italics labels # and legend placed at top of plot p + theme(axis.title=element_text(face="bold.italic", size="12", color="brown"), legend.position="top") we have results shown below. This just scratches the surface of what ggplot2 package does, but it does illustrate the fancy graphing capabilities R can produce with this package. More on model reduction: Dave has illustrated other model criteria we should look at when trying to optimize a model. These include looking at the following model statistics: R 2 adjusted R 2 mean squared error - MSE = sse/(n-p), where n = sample size, p=number of treatments Akaike's Information Criteria (AIC) Bayesian Information Criterion (BIC) Predicted Error Sum of Squares (PRESS) K-Fold cross-validation (CV) As you reduce models automatically using the various step() functions in the -4-
5 various packages, or you compute these values individually for each model you have in your reduction process, you should probably compute each of these listed items so that you can judge which model best suits your predictability concerns. Dave has discussed these things in lecture, their strengths and weaknesses. We will give an example of reducing a model, where we use these statistics and associated graphs. sat.csv data example: Below is a view of the sat.csv data set, containing various indicators of high school kids who took the SAT exam in the various states. Let us look at some criteria. The categories are: takers percentage of eligible students who took the exam income median income of families of test takers years average number of years of study in social science, natural sciences, and humanities by test takers public percentage of test takers in public schools expend state expenditures in hundreds of dollars per student rank median percentile ranking of test takers within their schools Let us first look at the scatter plots and correlation matrix. # sat example sat <- read.csv("c:/users/user/desktop/sat.csv") satdata <- sat[, 2:8] library(corrplot) library(ggally) names(satdata) ggpairs(satdata) We want a model where SAT scores (sat) are the response. Looking at the scatter plots below, I think we could get a better relationship if we had log(takers)(the green arrow) instead of takers (the red arrow) plotted against sat, since the scatter plot and correlations for log(takers) on sat look better. -5-
6 Output is shown below. Our full model (model.full) will be sat=β 0 +β 1 ln(takers)+β 2 income+β 3 years+β 4 public+β 5 expend+β 6 rank We will want to look at the following reduced models: model.1 will be sat=β 0 +β 1 ln(takers)+β 2 income+β 3 years+β 5 expend+β 6 rank model.2 will be sat=β 0 +β 1 ln(takers)+β 3 years+β 5 expend +β 6 rank model.3 will be sat=β 0 +β 1 ln(takers)+β 3 years+β 5 expend model.4 will be sat=β 0 +β 1 ln(takers)+β 5 expend model.5 will be sat=β 0 +β 1 ln(takers) I am not saying that we would want to reduce in this order, necessarily. I want to reduce in this order to demonstrate how the various statistical values change. model.full <- lm(sat ~ logtakers + income + years + public + expend + rank, data=satdata) summary(model.full) -6-
7 Below are the model summaries of the other reduced models. model.1 <- update(model.full,. ~. - public) summary(model.1) model.2<- update(model.1,. ~. - income) summary(model.2) model.3<- update(model.2,. ~. - rank) summary(model.3) model.4<- update(model.3,. ~. - years) summary(model.4) model.5<- update(model.4,. ~. - expend) summary(model.5) next
8 next The resulting r 2 's from these models is stored in vectors shown below. rsquare <- c(.8919,.8918,.8917,.8827,.8675,.8108) rsquare.adj<- c(.8769,.8795,.8821,.875,.8619,.8068) Now, let us get the AIC and BIC of all models. n <- 50 # number of states (observational units) in data set aic.full <- extractaic(model.full) aic.full bic.full <- extractaic(model.full, k=log(n)) bic.full aic.1 <- extractaic(model.1) bic.1 <- extractaic(model.1, k=log(n)) aic.2 <- extractaic(model.2) bic.2 <- extractaic(model.2, k=log(n)) aic.3 <- extractaic(model.3) bic.3 <- extractaic(model.3, k=log(n)) aic.4 <- extractaic(model.4) bic.4 <- extractaic(model.4, k=log(n)) aic.5 <- extractaic(model.5) bic.5 <- extractaic(model.5, k=log(n)) Let us store these values of AIC and BIC in vectors. vec.aic <- c(aic.full[2], aic.1[2], aic.2[2], aic.3[2], aic.4[2], aic.5[2]) vec.bic <- c(bic.full[2], bic.1[2], bic.2[2], bic.3[2], bic.4[2], bic.5[2]) -8-
9 Now, let us find the PRESS statistic for each model. library(daag) press.full <- press(model.full) press.full press.1 <- press(model.1) press.2 <- press(model.2) press.3 <- press(model.3) press.4 <- press(model.4) press.5 <- press(model.5) vec.press <- c(press.full, press.1, press.2, press.3, press.4, press.5) Now, let us compute CV. library(cvtools) cv.full <- repcv(model.full, K=10, R=20, seed=723) cv.1 <- repcv(model.1, K=10, R=20, seed=723) cv.2 <- repcv(model.2, K=10, R=20, seed=723) cv.3 <- repcv(model.3, K=10, R=20, seed=723) cv.4 <- repcv(model.4, K=10, R=20, seed=723) cv.5 <- repcv(model.5, K=10, R=20, seed=723) cv.full; cv.1; cv.2; cv.3; cv.4; cv.5 So, we can compute a cv vector from this information. -9-
10 cv.vec <- c( , , , , , ) Now, let us construct a table with our results. vec.model <- c("full.model", "model.1", "model.2", "model.3", "model.4", "model.5") vec.title <- c("model", "R^2", "R^2adj", "AIC", "BIC", "PRESS", "CV") table1 <- cbind(vec.model, rsquare, rsquare.adj, vec.aic, vec.bic, vec.press, cv.vec) table.final <-rbind(vec.title, table1) table.final Resulting table is below. Pondering the various statistics in the table, I would probably pick model.3 (or, if I wanted more simplicity, model4) for my overall most efficient model among the 6 choices of model. Homework[1]: In the basic data sets of R is the data set called state.x77. Find an optimum reduced model, using life expectancy (Life Ex) as response variable. A partial picture of the data set information is shown below, -10-
11 and can be viewed by the following commands in R.?state.x77 state.x77 Pictures are shown below. Be sure to show descriptives and some (many) of the criteria Dave and I have shown you,as well as give about 50 words concluding/justifying your final model pick. Homework [2]: Read very carefully the multregtutorial.pdf file, noting the author's use of the following R commands when doing multiple regression: glm( ) for generalized linear models (stats package) gam( ) for generalized additive models (gam package) lme( ) and lmer( ) for linear mixed-effects models (nlme and lme4-11-
12 packages) nls( ) and nlme( ) for nonlinear models (stats and nlme packages) various data frame and labeling commands par() and lm() command uses notes on interactions update(. ~. - factor) command anova() and aov() commands and contrasts model.int <- update(model5,.~.^3) use and meaning step() backward, forward, both uses and weaknesses plot(model), plot(model, 1), plot(model, 2), plot(model, 3) etc., displays confint(model) displays information on extractor functions from the lm() command usage of predict(lm.out, list(solar.r=200, Wind=11, Temp=80, Month=6), interval="conf") comments about partial correlations There is nothing to report here, only things to learn from within this tutorial from W. B. King -12-
Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools
Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools Abstract In this project, we study 374 public high schools in New York City. The project seeks to use regression techniques
More informationYour Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression
Your Name: Section: 36-201 INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression Objectives: 1. To learn how to interpret scatterplots. Specifically you will investigate, using
More informationR Workshop Guide. 1 Some Programming Basics. 1.1 Writing and executing code in R
R Workshop Guide This guide reviews the examples we will cover in today s workshop. It should be a helpful introduction to R, but for more details, you can access a more extensive user guide for R on the
More informationIntro to R for Epidemiologists
Lab 9 (3/19/15) Intro to R for Epidemiologists Part 1. MPG vs. Weight in mtcars dataset The mtcars dataset in the datasets package contains fuel consumption and 10 aspects of automobile design and performance
More informationLab 6 More Linear Regression
Lab 6 More Linear Regression Corrections from last lab 5: Last week we produced the following plot, using the code shown below. plot(sat$verbal, sat$math,, col=c(1,2)) legend("bottomright", legend=c("male",
More informationUsing Excel for Graphical Analysis of Data
Using Excel for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable physical parameters. Graphs are
More informationDiscussion Notes 3 Stepwise Regression and Model Selection
Discussion Notes 3 Stepwise Regression and Model Selection Stepwise Regression There are many different commands for doing stepwise regression. Here we introduce the command step. There are many arguments
More informationLecture 13: Model selection and regularization
Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always
More informationLesson 16: More on Modeling Relationships with a Line
Student Outcomes Students use the least squares line to predict values for a given data set. Students use residuals to evaluate the accuracy of predictions based on the least squares line. Lesson Notes
More information22s:152 Applied Linear Regression
22s:152 Applied Linear Regression Chapter 22: Model Selection In model selection, the idea is to find the smallest set of variables which provides an adequate description of the data. We will consider
More informationSurvey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9
Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9 Contents 1 Introduction to Using Excel Spreadsheets 2 1.1 A Serious Note About Data Security.................................... 2 1.2
More informationNotes based on: Data Mining for Business Intelligence
Chapter 9 Classification and Regression Trees Roger Bohn April 2017 Notes based on: Data Mining for Business Intelligence 1 Shmueli, Patel & Bruce 2 3 II. Results and Interpretation There are 1183 auction
More informationLecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017
Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last
More information22s:152 Applied Linear Regression
22s:152 Applied Linear Regression Chapter 22: Model Selection In model selection, the idea is to find the smallest set of variables which provides an adequate description of the data. We will consider
More informationInstall RStudio from - use the standard installation.
Session 1: Reading in Data Before you begin: Install RStudio from http://www.rstudio.com/ide/download/ - use the standard installation. Go to the course website; http://faculty.washington.edu/kenrice/rintro/
More informationChapter 2 Exploring Data with Graphs and Numerical Summaries
Chapter 2 Exploring Data with Graphs and Numerical Summaries Constructing a Histogram on the TI-83 Suppose we have a small class with the following scores on a quiz: 4.5, 5, 5, 6, 6, 7, 8, 8, 8, 8, 9,
More informationUsing Excel for Graphical Analysis of Data
EXERCISE Using Excel for Graphical Analysis of Data Introduction In several upcoming experiments, a primary goal will be to determine the mathematical relationship between two variable physical parameters.
More informationEXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression
EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression OBJECTIVES 1. Prepare a scatter plot of the dependent variable on the independent variable 2. Do a simple linear regression
More informationLecture 3 Questions that we should be able to answer by the end of this lecture:
Lecture 3 Questions that we should be able to answer by the end of this lecture: Which is the better exam score? 67 on an exam with mean 50 and SD 10 or 62 on an exam with mean 40 and SD 12 Is it fair
More informationLecture 3 Questions that we should be able to answer by the end of this lecture:
Lecture 3 Questions that we should be able to answer by the end of this lecture: Which is the better exam score? 67 on an exam with mean 50 and SD 10 or 62 on an exam with mean 40 and SD 12 Is it fair
More informationA Short Guide to R with RStudio
Short Guides to Microeconometrics Fall 2013 Prof. Dr. Kurt Schmidheiny Universität Basel A Short Guide to R with RStudio 2 1 Introduction A Short Guide to R with RStudio 1 Introduction 3 2 Installing R
More informationMixed models in R using the lme4 package Part 2: Lattice graphics
Mixed models in R using the lme4 package Part 2: Lattice graphics Douglas Bates University of Wisconsin - Madison and R Development Core Team University of Lausanne July 1,
More informationYou submitted this quiz on Sat 17 May :19 AM CEST. You got a score of out of
uiz Feedback Coursera 1 of 7 01/06/2014 20:02 Feedback Week 2 Quiz Help You submitted this quiz on Sat 17 May 2014 11:19 AM CEST. You got a score of 10.00 out of 10.00. Question 1 Under the lattice graphics
More informationR Visualizing Data. Fall Fall 2016 CS130 - Intro to R 1
R Visualizing Data Fall 2016 Fall 2016 CS130 - Intro to R 1 mtcars Data Frame R has a built-in data frame called mtcars Useful R functions length(object) # number of variables str(object) # structure of
More informationSTA 570 Spring Lecture 5 Tuesday, Feb 1
STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row
More information10.4 Measures of Central Tendency and Variation
10.4 Measures of Central Tendency and Variation Mode-->The number that occurs most frequently; there can be more than one mode ; if each number appears equally often, then there is no mode at all. (mode
More information10.4 Measures of Central Tendency and Variation
10.4 Measures of Central Tendency and Variation Mode-->The number that occurs most frequently; there can be more than one mode ; if each number appears equally often, then there is no mode at all. (mode
More informationMultivariate Data & Tables and Graphs
Multivariate Data & Tables and Graphs CS 4460/7450 - Information Visualization Jan. 13, 2009 John Stasko Agenda Data and its characteristics Tables and graphs Design principles Spring 2009 CS 4460/7450
More informationGeneralized Additive Model
Generalized Additive Model by Huimin Liu Department of Mathematics and Statistics University of Minnesota Duluth, Duluth, MN 55812 December 2008 Table of Contents Abstract... 2 Chapter 1 Introduction 1.1
More informationMinitab 17 commands Prepared by Jeffrey S. Simonoff
Minitab 17 commands Prepared by Jeffrey S. Simonoff Data entry and manipulation To enter data by hand, click on the Worksheet window, and enter the values in as you would in any spreadsheet. To then save
More information2017 ITRON EFG Meeting. Abdul Razack. Specialist, Load Forecasting NV Energy
2017 ITRON EFG Meeting Abdul Razack Specialist, Load Forecasting NV Energy Topics 1. Concepts 2. Model (Variable) Selection Methods 3. Cross- Validation 4. Cross-Validation: Time Series 5. Example 1 6.
More information/4 Directions: Graph the functions, then answer the following question.
1.) Graph y = x. Label the graph. Standard: F-BF.3 Identify the effect on the graph of replacing f(x) by f(x) +k, k f(x), f(kx), and f(x+k), for specific values of k; find the value of k given the graphs.
More informationIntroduction to R Software
1. Introduction R is a free software environment for statistical computing and graphics. It is almost perfectly compatible with S-plus. The only thing you need to do is download the software from the internet
More informationCourse Number 432/433 Title Algebra II (A & B) H Grade # of Days 120
Whitman-Hanson Regional High School provides all students with a high- quality education in order to develop reflective, concerned citizens and contributing members of the global community. Course Number
More informationGraphical Analysis of Data using Microsoft Excel [2016 Version]
Graphical Analysis of Data using Microsoft Excel [2016 Version] Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable physical parameters.
More informationFurther Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables
Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables
More informationThe table shows the frequency of the number of visits to the doctor per year for a group of children. Mean = Median = IQR =
Name Date: Lesson 3-1: Intro to Bivariate Stats Learning Goals: #1: What is Bivariate data? How do you calculate 2-variable data on the calculator? #2: How do we create a scatterplot? Review of Descriptive
More informationBar Charts and Frequency Distributions
Bar Charts and Frequency Distributions Use to display the distribution of categorical (nominal or ordinal) variables. For the continuous (numeric) variables, see the page Histograms, Descriptive Stats
More informationChapter 5. Normal. Normal Curve. the Normal. Curve Examples. Standard Units Standard Units Examples. for Data
curve Approximation Part II Descriptive Statistics The Approximation Approximation The famous normal curve can often be used as an 'ideal' histogram, to which histograms for data can be compared. Its equation
More informationMultivariate Data & Tables and Graphs. Agenda. Data and its characteristics Tables and graphs Design principles
Topic Notes Multivariate Data & Tables and Graphs CS 7450 - Information Visualization Aug. 27, 2012 John Stasko Agenda Data and its characteristics Tables and graphs Design principles Fall 2012 CS 7450
More informationMultivariate Data & Tables and Graphs. Agenda. Data and its characteristics Tables and graphs Design principles
Multivariate Data & Tables and Graphs CS 7450 - Information Visualization Aug. 24, 2015 John Stasko Agenda Data and its characteristics Tables and graphs Design principles Fall 2015 CS 7450 2 1 Data Data
More informationMulticollinearity and Validation CIVL 7012/8012
Multicollinearity and Validation CIVL 7012/8012 2 In Today s Class Recap Multicollinearity Model Validation MULTICOLLINEARITY 1. Perfect Multicollinearity 2. Consequences of Perfect Multicollinearity 3.
More informationLinear Model Selection and Regularization. especially usefull in high dimensions p>>100.
Linear Model Selection and Regularization especially usefull in high dimensions p>>100. 1 Why Linear Model Regularization? Linear models are simple, BUT consider p>>n, we have more features than data records
More informationCorrelation. January 12, 2019
Correlation January 12, 2019 Contents Correlations The Scattterplot The Pearson correlation The computational raw-score formula Survey data Fun facts about r Sensitivity to outliers Spearman rank-order
More informationUnit I Supplement OpenIntro Statistics 3rd ed., Ch. 1
Unit I Supplement OpenIntro Statistics 3rd ed., Ch. 1 KEY SKILLS: Organize a data set into a frequency distribution. Construct a histogram to summarize a data set. Compute the percentile for a particular
More information1 Introduction to Using Excel Spreadsheets
Survey of Math: Excel Spreadsheet Guide (for Excel 2007) Page 1 of 6 1 Introduction to Using Excel Spreadsheets This section of the guide is based on the file (a faux grade sheet created for messing with)
More informationMath 227 EXCEL / MEGASTAT Guide
Math 227 EXCEL / MEGASTAT Guide Introduction Introduction: Ch2: Frequency Distributions and Graphs Construct Frequency Distributions and various types of graphs: Histograms, Polygons, Pie Charts, Stem-and-Leaf
More informationData Visualization in R
Data Visualization in R L. Torgo ltorgo@fc.up.pt Faculdade de Ciências / LIAAD-INESC TEC, LA Universidade do Porto Aug, 2017 Introduction Motivation for Data Visualization Humans are outstanding at detecting
More informationResources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.
Resources for statistical assistance Quantitative covariates and regression analysis Carolyn Taylor Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC January 24, 2017 Department
More informationSTANDARDS OF LEARNING CONTENT REVIEW NOTES ALGEBRA I. 4 th Nine Weeks,
STANDARDS OF LEARNING CONTENT REVIEW NOTES ALGEBRA I 4 th Nine Weeks, 2016-2017 1 OVERVIEW Algebra I Content Review Notes are designed by the High School Mathematics Steering Committee as a resource for
More informationAn introduction to SPSS
An introduction to SPSS To open the SPSS software using U of Iowa Virtual Desktop... Go to https://virtualdesktop.uiowa.edu and choose SPSS 24. Contents NOTE: Save data files in a drive that is accessible
More informationHomework 1 Excel Basics
Homework 1 Excel Basics Excel is a software program that is used to organize information, perform calculations, and create visual displays of the information. When you start up Excel, you will see the
More informationSTA Module 2B Organizing Data and Comparing Distributions (Part II)
STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and
More informationSTA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)
STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and
More informationCanadian National Longitudinal Survey of Children and Youth (NLSCY)
Canadian National Longitudinal Survey of Children and Youth (NLSCY) Fathom workshop activity For more information about the survey, see: http://www.statcan.ca/ Daily/English/990706/ d990706a.htm Notice
More informationIntegrated Math 1 Module 7 Honors Connecting Algebra and Geometry Ready, Set, Go! Homework Solutions
1 Integrated Math 1 Module 7 Honors Connecting Algebra and Geometry Ready, Set, Go! Homework Solutions Adapted from The Mathematics Vision Project: Scott Hendrickson, Joleigh Honey, Barbara Kuehl, Travis
More informationVisual Analytics. Visualizing multivariate data:
Visual Analytics 1 Visualizing multivariate data: High density time-series plots Scatterplot matrices Parallel coordinate plots Temporal and spectral correlation plots Box plots Wavelets Radar and /or
More informationApplied Regression Modeling: A Business Approach
i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming
More informationMath 121 Project 4: Graphs
Math 121 Project 4: Graphs Purpose: To review the types of graphs, and use MS Excel to create them from a dataset. Outline: You will be provided with several datasets and will use MS Excel to create graphs.
More informationChapter 1 Histograms, Scatterplots, and Graphs of Functions
Chapter 1 Histograms, Scatterplots, and Graphs of Functions 1.1 Using Lists for Data Entry To enter data into the calculator you use the statistics menu. You can store data into lists labeled L1 through
More informationGetting started with ggplot2
Getting started with ggplot2 STAT 133 Gaston Sanchez Department of Statistics, UC Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133 ggplot2 2 Resources for
More informationLab 5, part b: Scatterplots and Correlation
Lab 5, part b: Scatterplots and Correlation Toews, Math 160, Fall 2014 November 21, 2014 Objectives: 1. Get more practice working with data frames 2. Start looking at relationships between two variables
More informationStat 5303 (Oehlert): Unbalanced Factorial Examples 1
Stat 5303 (Oehlert): Unbalanced Factorial Examples 1 > section
More informationRegression Models Course Project Vincent MARIN 28 juillet 2016
Regression Models Course Project Vincent MARIN 28 juillet 2016 Executive Summary "Is an automatic or manual transmission better for MPG" "Quantify the MPG difference between automatic and manual transmissions"
More informationStatistical Package for the Social Sciences INTRODUCTION TO SPSS SPSS for Windows Version 16.0: Its first version in 1968 In 1975.
Statistical Package for the Social Sciences INTRODUCTION TO SPSS SPSS for Windows Version 16.0: Its first version in 1968 In 1975. SPSS Statistics were designed INTRODUCTION TO SPSS Objective About the
More informationWill Landau. January 24, 2013
Iowa State University January 24, 2013 Iowa State University January 24, 2013 1 / 30 Outline Iowa State University January 24, 2013 2 / 30 statistics: the use of plots and numerical summaries to describe
More informationOutline. Part 2: Lattice graphics. The formula/data method of specifying graphics. Exploring and presenting data. Presenting data.
Outline Part 2: Lattice graphics ouglas ates University of Wisconsin - Madison and R evelopment ore Team Sept 08, 2010 Presenting data Scatter plots Histograms and density plots
More informationSTA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures
STA 2023 Module 3 Descriptive Measures Learning Objectives Upon completing this module, you should be able to: 1. Explain the purpose of a measure of center. 2. Obtain and interpret the mean, median, and
More informationbook 2014/5/6 15:21 page v #3 List of figures List of tables Preface to the second edition Preface to the first edition
book 2014/5/6 15:21 page v #3 Contents List of figures List of tables Preface to the second edition Preface to the first edition xvii xix xxi xxiii 1 Data input and output 1 1.1 Input........................................
More informationChapter 2 Assignment (due Thursday, April 19)
(due Thursday, April 19) Introduction: The purpose of this assignment is to analyze data sets by creating histograms and scatterplots. You will use the STATDISK program for both. Therefore, you should
More informationFathom Dynamic Data TM Version 2 Specifications
Data Sources Fathom Dynamic Data TM Version 2 Specifications Use data from one of the many sample documents that come with Fathom. Enter your own data by typing into a case table. Paste data from other
More informationCurve Fitting with Linear Models
1-4 1-4 Curve Fitting with Linear Models Warm Up Lesson Presentation Lesson Quiz Algebra 2 Warm Up Write the equation of the line passing through each pair of passing points in slope-intercept form. 1.
More informationIntroduction to R Jason Huff, QB3 CGRL UC Berkeley April 15, 2016
Introduction to R Jason Huff, QB3 CGRL UC Berkeley April 15, 2016 Installing R R is constantly updated and you should download a recent version; the version when this workshop was written was 3.2.4 I also
More informationData Visualization in R
Data Visualization in R L. Torgo ltorgo@fc.up.pt Faculdade de Ciências / LIAAD-INESC TEC, LA Universidade do Porto Oct, 216 Introduction Motivation for Data Visualization Humans are outstanding at detecting
More informationSTAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.
STAT 2607 REVIEW PROBLEMS 1 REMINDER: On the final exam 1. Word problems must be answered in words of the problem. 2. "Test" means that you must carry out a formal hypothesis testing procedure with H0,
More informationIntroducing Categorical Data/Variables (pp )
Notation: Means pencil-and-paper QUIZ Means coding QUIZ Definition: Feature Engineering (FE) = the process of transforming the data to an optimal representation for a given application. Scaling (see Chs.
More informationSTA121: Applied Regression Analysis
STA121: Applied Regression Analysis Variable Selection - Chapters 8 in Dielman Artin Department of Statistical Science October 23, 2009 Outline Introduction 1 Introduction 2 3 4 Variable Selection Model
More informationPackage robustgam. January 7, 2013
Package robustgam January 7, 2013 Type Package Title Robust Estimation for Generalized Additive Models Version 0.1.5 Date 2013-1-6 Author Raymond K. W. Wong Maintainer Raymond K. W. Wong
More informationIn Minitab interface has two windows named Session window and Worksheet window.
Minitab Minitab is a statistics package. It was developed at the Pennsylvania State University by researchers Barbara F. Ryan, Thomas A. Ryan, Jr., and Brian L. Joiner in 1972. Minitab began as a light
More informationIntro to R Graphics Center for Social Science Computation and Research, 2010 Stephanie Lee, Dept of Sociology, University of Washington
Intro to R Graphics Center for Social Science Computation and Research, 2010 Stephanie Lee, Dept of Sociology, University of Washington Class Outline - The R Environment and Graphics Engine - Basic Graphs
More informationGraphing on Excel. Open Excel (2013). The first screen you will see looks like this (it varies slightly, depending on the version):
Graphing on Excel Open Excel (2013). The first screen you will see looks like this (it varies slightly, depending on the version): The first step is to organize your data in columns. Suppose you obtain
More informationChapter 4: Analyzing Bivariate Data with Fathom
Chapter 4: Analyzing Bivariate Data with Fathom Summary: Building from ideas introduced in Chapter 3, teachers continue to analyze automobile data using Fathom to look for relationships between two quantitative
More informationPackage robustgam. February 20, 2015
Type Package Package robustgam February 20, 2015 Title Robust Estimation for Generalized Additive Models Version 0.1.7 Date 2013-5-7 Author Raymond K. W. Wong, Fang Yao and Thomas C. M. Lee Maintainer
More informationStat 8053, Fall 2013: Additive Models
Stat 853, Fall 213: Additive Models We will only use the package mgcv for fitting additive and later generalized additive models. The best reference is S. N. Wood (26), Generalized Additive Models, An
More informationSection 9: One Variable Statistics
The following Mathematics Florida Standards will be covered in this section: MAFS.912.S-ID.1.1 MAFS.912.S-ID.1.2 MAFS.912.S-ID.1.3 Represent data with plots on the real number line (dot plots, histograms,
More informationAlgebra II Notes Unit Two: Linear Equations and Functions
Syllabus Objectives:.1 The student will differentiate between a relation and a function.. The student will identify the domain and range of a relation or function.. The student will derive a function rule
More informationTHIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010
THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE
More informationChapter 2 Assignment (due Thursday, October 5)
(due Thursday, October 5) Introduction: The purpose of this assignment is to analyze data sets by creating histograms and scatterplots. You will use the STATDISK program for both. Therefore, you should
More informationGRADE CENTRE BEST PRACTICE FOR A4L
GRADE CENTRE BEST PRACTICE FOR A4L Overview A large number of reports use information from the Grade Centre to draw correlations between activity and student success (see appendix). This document serves
More informationProblem set for Week 7 Linear models: Linear regression, multiple linear regression, ANOVA, ANCOVA
ECL 290 Statistical Models in Ecology using R Problem set for Week 7 Linear models: Linear regression, multiple linear regression, ANOVA, ANCOVA Datasets in this problem set adapted from those provided
More informationData Science and Machine Learning Essentials
Data Science and Machine Learning Essentials Lab 3B Building Models in Azure ML By Stephen Elston and Graeme Malcolm Overview In this lab, you will learn how to use R or Python to engineer or construct
More informationUNIT 1A EXPLORING UNIVARIATE DATA
A.P. STATISTICS E. Villarreal Lincoln HS Math Department UNIT 1A EXPLORING UNIVARIATE DATA LESSON 1: TYPES OF DATA Here is a list of important terms that we must understand as we begin our study of statistics
More informationDensity Curve (p52) Density curve is a curve that - is always on or above the horizontal axis.
1.3 Density curves p50 Some times the overall pattern of a large number of observations is so regular that we can describe it by a smooth curve. It is easier to work with a smooth curve, because the histogram
More informationResting state network estimation in individual subjects
Resting state network estimation in individual subjects Data 3T NIL(21,17,10), Havard-MGH(692) Young adult fmri BOLD Method Machine learning algorithm MLP DR LDA Network image Correlation Spatial Temporal
More informationMultivariate Calibration Quick Guide
Last Updated: 06.06.2007 Table Of Contents 1. HOW TO CREATE CALIBRATION MODELS...1 1.1. Introduction into Multivariate Calibration Modelling... 1 1.1.1. Preparing Data... 1 1.2. Step 1: Calibration Wizard
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal
More informationUsing Built-in Plotting Functions
Workshop: Graphics in R Katherine Thompson (katherine.thompson@uky.edu Department of Statistics, University of Kentucky September 15, 2016 Using Built-in Plotting Functions ## Plotting One Quantitative
More informationExploratory model analysis
Exploratory model analysis with R and GGobi Hadley Wickham 6--8 Introduction Why do we build models? There are two basic reasons: explanation or prediction [Ripley, 4]. Using large ensembles of models
More informationSTANDARDS OF LEARNING CONTENT REVIEW NOTES. ALGEBRA I Part II. 3 rd Nine Weeks,
STANDARDS OF LEARNING CONTENT REVIEW NOTES ALGEBRA I Part II 3 rd Nine Weeks, 2016-2017 1 OVERVIEW Algebra I Content Review Notes are designed by the High School Mathematics Steering Committee as a resource
More informationChapter 2: The Normal Distributions
Chapter 2: The Normal Distributions Measures of Relative Standing & Density Curves Z-scores (Measures of Relative Standing) Suppose there is one spot left in the University of Michigan class of 2014 and
More information