Introduction to RStudio

Size: px
Start display at page:

Download "Introduction to RStudio"


1 First, take class through processes of: Signing in Changing password: Tools -> Shell, then use passwd command Installing packages Check that at least these are installed: MASS, ISLR, car, class, boot, leaps, glmnet, pls, splines, gam, akima, tree, randomforest, gbm, e1071, ROCR Creating vectors v1 = c(1, 4, 2, 7) # print result afterward v2 = -1:5 v3 = seq(2, 8,.25) v4 = c('phineas','ferb','candace','jeremy','isabella','buford','baljeet') v5 = (1:7) == 3 v6 = (1:5)^2 v7 = 2*exp(-2*v1) v8 = sample(c(0,1), 12, replace=true, prob=c(.25,.75)) Exercise 1. How might you simulate 10 rolls of dice, as in the game Monopoly? Manipulating/accessing vectors 5*v1 v2 v2[3] v2[4:6] v2[c(1,3,6)] v2[-c(2,5)] v2 > 3 v2[v2 > 3] v4[v2 > 3] v4[v5] Basic function plotting 2

2 curve(x^2, 0, 1.5, ylab="y") curve(x^3, 0, 1.5, add=true, col="blue") curve(x^4, 0, 1.5, add=true, col="red") legend(0.1, 2.1, c(expression('x'^2),expression('x'^3),expression('x'^4)), lty=c(1,1,1), lwd=2.5*c(1,1,1), col=c("black","blue","red")) y x 2 x 3 x x Distribution models (emphasis on normal family) plotting distributions Two normal curves, where the black has mean µ = 2 and standard deviation σ = 1, the blue has µ = 0, σ = 2. Both are symmetric, bell-shaped. In general the curve (probability density function, or pdf) of a normal distribution is f (x) = 1 σ 2π e (x µ)2 /(2σ2). We have plotted a pdf from the exponential family of models in red. Its formula, for a given parameter λ > 0 (below I have taken λ =.4), is 0, if x < 0, f (x) = λe λx, if x 0. 3

3 curve(dnorm(x, 2, 1), from=-6, to=6, ylab="y") curve(dnorm(x, 0, 2), add=true, to=6, from=-6, col="blue") curve(dexp(x,.4), add=true, to=6, from=-6, col="red") # an exponential pdf y x assessing probabilities from such (continuous) models: integration If X has a pdf f X (x), then P(a < X < b) := b a f X (x) dx. For a random variable X having the standard normal distribution, we can find P(0 < X < 1) via a command like integrate(dnorm, 0, 1) Exercise 2. Find other probabilities, such as P( 1 < X < 1), P( 2 < X < 2), P( 3 < X < 3). Try adapting the attempts so that, if X Norm(1, 2), we obtain the probability P(1 < X < 3). We run into the problem that integrate() does not allow you to specify arguments (like µ, σ) to your function. One work-around is to write a user-defined function with these arguments hardcoded: tlsfn = function(x) { } return( dnorm(x, 1, 2) ) # mu=1, sigma=2 is hardcoded in 4

4 integrate(tlsfn, 1, 3) But by the FTC, if we had an antiderivative function, one such as F X (x) := x f X (s) ds, = P(X < x), (1) then we could obtain probabilities using it: P(a < X < b) = b a f X (x) dx = F X (b) F X (a). For a given pdf f (x), an antiderivative F(x) defined by means of (1) is called the corresponding cumulative distribution function, or cdf. In R, pnorm(1,0,1) - pnorm(0,0,1) pnorm(1) - pnorm(0) pnorm(1) - pnorm(-1) pnorm(2) - pnorm(-2) pnorm(3,1,2) - pnorm(1,1,2) # same as above Exercise 3. Assume IQ scores in the U.S. adult population are well modeled by Norm(100, 15). If you pick an adult at random, what is the chance that person s IQ exceeds 130? Exercise 4. Suppose a part on an automobile has a lifetime X, measured in hours, modeled by an exponential distribution with parameter λ = What is the chance this part fails in the first 200 hours of use? Notes: There are many different sorts of functions f which can serve as a pdf (a probability model). It must be the case, however, that 1. f (x) dx = 1. That is, F(+ ) = P( < X < ) = f (x) 0 for all x. Quantiles, Percentiles, Median qnorm(.3, 64, 3) # Norm(64, 3) is model for female heights in inches qnorm((0:10)/10, 64, 3) 5

5 Means, Variances, Standard Deviations definitions: If X has a pdf f (x), take the mean µ X (expected value E(X)) and variance Var(X) to be E(X) := E((X µ X ) 2 ) := x f (x) dx, (2) (x µ X ) 2 f (x) dx. (3) The standard deviation, then, is σ X = Var(X), and hence Var(X) may sometimes be written as σ 2 X. Explore R commands to compute mean, variance for Norm(µ, σ), Exp(λ). Suppose you intend to flip a coin n times, and the coin (not necessarily fair) has probability p of being a head. Let X be the number of heads in the n flips. Use a probability tree to work out the probability mass function (pmf), then consider how one calculates things like the cumulative distribution P(X x), the mean and standard deviation. Sampling large samples begin to take on the characteristics of the population from which they are drawn x1 = rnorm(50, 18, 4) # draws sample of size 50 from Norm(18,4) x2 = rnorm(500, 18, 4) # draws sample of size 500 from Norm(18,4) hist(x1, xlab="", ylab="density", freq=false, main="hist. of x1", col="gray90") hist(x2, xlab="", ylab="density", freq=false, main="hist. of x2", col="gray90") curve(dnorm(x,18,4), 5, 30, xlab="", ylab="", main="norm(18,4) dist") Hist. of x1 Hist. of x2 Norm(18,4) dist density density using samples to estimate means, standard deviations, quantiles 6

6 Obtaining Data Complex data sets involve many measurements (variables) taken on a collection of like-objects variously called cases, subjects or units, depending on the context. The typical arrangement is to place these measurements in a table what, in R is known as a data frame. Each row of the table represents a unit studied, and the columns correspond to the variables. Data in packages data() help(faithful) head(faithful) # often gives details about the data # displays the first few records in the data Using delimited files The file at the specified url is a comma separated value (csv) file, containing responses to a survey conducted in 2004 by students then enrolled in introductory statistics classes at Calvin. You may view the questions as they were posed at this link Respondents were typically the students conducting the survey, along with other students with whom they came into contact. I could give the data frame any name I want, and have chosen ss for student survey." ss = read.csv(" names(ss) dim(ss) # shows size of table/data frame Viewing Data Frequency tables are convenient for exploring univariate categorical data. While larger populations/samples result in larger counts/frequencies within the various values of the categorical variable, one might expect the proportions of occurrences of these values to be relatively stable. ss$selfhandedness # produces vector containing selfhandedness responses ss[,5] # produces vector corresponding to 5th column ss[31:35, c(3,5,8)] # one way you can pair down a data frame subset(ss, select=c(gender,cds)) # another way xtabs(~ selfhandedness, data=ss) # Note the need for "cleaning" the data cleanedss = droplevels(subset(stusurvey, selfhandedness!="")) xtabs(~ selfhandedness, data=cleanedss) prop.table(xtabs(~ selfhandedness, data=cleanedss)) 7

7 Exercise 5. Look over the list of variables in this student survey dataset. Determine which are categorical and which are quantitative. Considering only those which are quantitative, further determine which are discrete, and which are continuous. Exercise 6. We might consider the students who took this survey as a population (as opposed to a sample from a larger population of students). Write a command that takes a sample (with replacement) of region values of size n = 10 from this population, and shows the proportion of respondents from the three region types (Rural, Suburban, Urban). Do these proportions look similar to those for the overall population (i.e., the dataset as a whole)? As you increase the size of your sample (Note that, since sampling with replacement, n can be made to be larger than the actual number of cases in the dataset!), does the distribution of values for the region variable appear increasingly like the distribution for the population? Exercise 7. Another variable in this student survey dataset is momhandedness. Create a frequency table of its values, noting that there are instances where survey participants did not respond to the corresponding question. Write a command, or sequence of commands, which produces a data frame that has been cleaned in the sense that the records where either selfhandedness or momhandedness are blank have been removed. Call your final data frame twicecleanedss. When two categorical variables are of interest, the counterpart to the frequency tables above is a contingency table. The commands given next produce tables of various sorts, some of which are contingency tables (those giving actual frequencies rather than fractional values). Execute them and reflect on the results. xtabs(~selfhandedness + momhandedness, data=ss) prop.table(xtabs(~selfhandedness + momhandedness, data=ss)) xtabs(~selfhandedness + momhandedness, data=twicecleanedss) prop.table(xtabs(~selfhandedness + momhandedness, data=twicecleanedss)) prop.table(xtabs(~selfhandedness + momhandedness, data=twicecleanedss), margin=1) prop.table(xtabs(~selfhandedness + momhandedness, data=twicecleanedss), margin=2) Exercise 8. The result of each command given above is, of course, closely related to that of the other commands. Describe the various contexts in which one might find a particular version of greater use than its conterparts. If you were imagining momhandedness in the role of explanatory variable and selfhandedness as response variable, which command(s) would seem most useful? Explain your choice? 8

8 Exercise 9. selfhandedness? Explain. Do you think momhandedness is useful in the prediction A mosaic plot is a visual depiction of the information in a contingency table. Try this command to view a mosaic plot depicting the information of the tables above. mosaicplot(xtabs(~selfhandedness + momhandedness, data=twicecleanedss)) For univariate quantitative data, the main type of plot is a histogram. We already used one R command, hist(), above. There is an alternate command, from the lattice package, which I tend to prefer. It is used below to produce a histogram of the number of cds owned (yes, that was back in the day"!) by respondents in our survey, the one on the left. As you see, there is a value (one student who said he owned 601 cds) that is far removed from the others, an outlier. One gets a better view of the rest of the responses on the right, the result of removing this outlier. histogram(~cds, data=ss) Percent of Total Percent of Total cds CDs owned Exercise 10. Write a command which reproduces the histogram on the right. This course focuses on relationships between variables, which suggests we again turn our attention to displays of two variables at a time. It may be that we may wonder whether there is an association between gender, a categorical variable, and number of cds owned. If so, a model predicting the number of cds a person owns may be enhanced by taking gender into account. A plot giving separate histograms of cds owned by gender is likely the first step in investigating whether an association exists. histogram(~cds gender, data=ss, n=20, layout=c(1,2)) 9

9 M Percent of Total 60 F cds Try leaving out the layout=c(1,2) part, and see if you think it helps to include it. Whatever you decide about that, it seems once again, we might be able to make a better comparison if we leave out Mr. over 600 CDs". Exercise 11. Write a command which produces histograms broken down by gender, as above, but does so without the outlier. What would you expect to see if there were no association between gender and numbers of cds owned? Is that what you see in this data? Exercise 12. Is there a difference in cd ownership for students coming from different regions/living environments? Do the analysis and explain your results. Prepare a nice report, employing R Markdown, of your answer. Of course, one may be interested in whether an association exists between two quantitative variables. For this purpose, we use a scatterplot. For each subject in the data set, we plot a single point whose x-coordinate comes from the explanatory variable and y-coordinate from the response. Below we 10

10 produce a scatterplot between two quantiative variables, pulse rate and gpa, in the student survey data. xyplot(gpa ~ pulse, data=ss, pch=19, cex=.5) pulse gpa Exercise 13. Imagine the appearance of a scatterplot between two quantitative variables when they are not associated. Given the result of the command above, are you quite convinced that an association exists between pulse and gpa? Why or why not? 11

11 Tonight s Assignment: Read Chapter 1 from the textbook. Do the exercises that appear in this document, writing them up (preferably in R Markdown). As per the examples we heard today in the video lecture, think of a challenging data question of particular interest to you. Do not use one you have heard of elsewhere. You need not limit your scope to problems you could, with training like that of the authors, tackle single-handedly. Write out what the problem is, and what response variable(s) and predictor variable(s) you would use. 12

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &

More information

CHAPTER 2 Modeling Distributions of Data

CHAPTER 2 Modeling Distributions of Data CHAPTER 2 Modeling Distributions of Data 2.2 Density Curves and Normal Distributions The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers HW 34. Sketch

More information


Chapter 6. THE NORMAL DISTRIBUTION Chapter 6. THE NORMAL DISTRIBUTION Introducing Normally Distributed Variables The distributions of some variables like thickness of the eggshell, serum cholesterol concentration in blood, white blood cells

More information

IT 403 Practice Problems (1-2) Answers

IT 403 Practice Problems (1-2) Answers IT 403 Practice Problems (1-2) Answers #1. Using Tukey's Hinges method ('Inclusionary'), what is Q3 for this dataset? 2 3 5 7 11 13 17 a. 7 b. 11 c. 12 d. 15 c (12) #2. How do quartiles and percentiles

More information


8. MINITAB COMMANDS WEEK-BY-WEEK 8. MINITAB COMMANDS WEEK-BY-WEEK In this section of the Study Guide, we give brief information about the Minitab commands that are needed to apply the statistical methods in each week s study. They are

More information


L E A R N I N G O B JE C T I V E S 2.2 Measures of Central Location L E A R N I N G O B JE C T I V E S 1. To learn the concept of the center of a data set. 2. To learn the meaning of each of three measures of the center of a data set the

More information

Chapter 2: Modeling Distributions of Data

Chapter 2: Modeling Distributions of Data Chapter 2: Modeling Distributions of Data Section 2.2 The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE Chapter 2 Modeling Distributions of Data 2.1 Describing Location in a Distribution

More information

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order. Chapter 2 2.1 Descriptive Statistics A stem-and-leaf graph, also called a stemplot, allows for a nice overview of quantitative data without losing information on individual observations. It can be a good

More information

Chapter 2 Modeling Distributions of Data

Chapter 2 Modeling Distributions of Data Chapter 2 Modeling Distributions of Data Section 2.1 Describing Location in a Distribution Describing Location in a Distribution Learning Objectives After this section, you should be able to: FIND and

More information


Chapter 6. THE NORMAL DISTRIBUTION Chapter 6. THE NORMAL DISTRIBUTION Introducing Normally Distributed Variables The distributions of some variables like thickness of the eggshell, serum cholesterol concentration in blood, white blood cells

More information

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data Chapter 2 Descriptive Statistics: Organizing, Displaying and Summarizing Data Objectives Student should be able to Organize data Tabulate data into frequency/relative frequency tables Display data graphically

More information

CHAPTER 2 Modeling Distributions of Data

CHAPTER 2 Modeling Distributions of Data CHAPTER 2 Modeling Distributions of Data 2.2 Density Curves and Normal Distributions The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Density Curves

More information

CHAPTER 2: Describing Location in a Distribution

CHAPTER 2: Describing Location in a Distribution CHAPTER 2: Describing Location in a Distribution 2.1 Goals: 1. Compute and use z-scores given the mean and sd 2. Compute and use the p th percentile of an observation 3. Intro to density curves 4. More

More information


6-1 THE STANDARD NORMAL DISTRIBUTION 6-1 THE STANDARD NORMAL DISTRIBUTION The major focus of this chapter is the concept of a normal probability distribution, but we begin with a uniform distribution so that we can see the following two very

More information

Stat 528 (Autumn 2008) Density Curves and the Normal Distribution. Measures of center and spread. Features of the normal distribution

Stat 528 (Autumn 2008) Density Curves and the Normal Distribution. Measures of center and spread. Features of the normal distribution Stat 528 (Autumn 2008) Density Curves and the Normal Distribution Reading: Section 1.3 Density curves An example: GRE scores Measures of center and spread The normal distribution Features of the normal

More information be able to make comparisons possible, we need to compare them with their respective distributions. be able to make comparisons possible, we need to compare them with their respective distributions. Unit 3 ~ Modeling Distributions of Data 1 ***Section 2.1*** Measures of Relative Standing and Density Curves (ex) Suppose that a professional soccer team has the money to sign one additional player and

More information

Chapter 6 Normal Probability Distributions

Chapter 6 Normal Probability Distributions Chapter 6 Normal Probability Distributions 6-1 Review and Preview 6-2 The Standard Normal Distribution 6-3 Applications of Normal Distributions 6-4 Sampling Distributions and Estimators 6-5 The Central

More information

CHAPTER 6. The Normal Probability Distribution

CHAPTER 6. The Normal Probability Distribution The Normal Probability Distribution CHAPTER 6 The normal probability distribution is the most widely used distribution in statistics as many statistical procedures are built around it. The central limit

More information

CHAPTER 2 Modeling Distributions of Data

CHAPTER 2 Modeling Distributions of Data CHAPTER 2 Modeling Distributions of Data 2.2 Density Curves and Normal Distributions The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Density Curves

More information

Chapter 2: The Normal Distributions

Chapter 2: The Normal Distributions Chapter 2: The Normal Distributions Measures of Relative Standing & Density Curves Z-scores (Measures of Relative Standing) Suppose there is one spot left in the University of Michigan class of 2014 and

More information

Topic 5 - Joint distributions and the CLT

Topic 5 - Joint distributions and the CLT Topic 5 - Joint distributions and the CLT Joint distributions Calculation of probabilities, mean and variance Expectations of functions based on joint distributions Central Limit Theorem Sampling distributions

More information

Distributions of Continuous Data

Distributions of Continuous Data C H A P T ER Distributions of Continuous Data New cars and trucks sold in the United States average about 28 highway miles per gallon (mpg) in 2010, up from about 24 mpg in 2004. Some of the improvement

More information

Install RStudio from - use the standard installation.

Install RStudio from   - use the standard installation. Session 1: Reading in Data Before you begin: Install RStudio from - use the standard installation. Go to the course website;

More information

R Programming Basics - Useful Builtin Functions for Statistics

R Programming Basics - Useful Builtin Functions for Statistics R Programming Basics - Useful Builtin Functions for Statistics Vectorized Arithmetic - most arthimetic operations in R work on vectors. Here are a few commonly used summary statistics. testvect = c(1,3,5,2,9,10,7,8,6)

More information

MAT 102 Introduction to Statistics Chapter 6. Chapter 6 Continuous Probability Distributions and the Normal Distribution

MAT 102 Introduction to Statistics Chapter 6. Chapter 6 Continuous Probability Distributions and the Normal Distribution MAT 102 Introduction to Statistics Chapter 6 Chapter 6 Continuous Probability Distributions and the Normal Distribution 6.2 Continuous Probability Distributions Characteristics of a Continuous Probability

More information

Lecture 6: Chapter 6 Summary

Lecture 6: Chapter 6 Summary 1 Lecture 6: Chapter 6 Summary Z-score: Is the distance of each data value from the mean in standard deviation Standardizes data values Standardization changes the mean and the standard deviation: o Z

More information

Section 2.2 Normal Distributions. Normal Distributions

Section 2.2 Normal Distributions. Normal Distributions Section 2.2 Normal Distributions Normal Distributions One particularly important class of density curves are the Normal curves, which describe Normal distributions. All Normal curves are symmetric, single-peaked,

More information

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea Chapter 3 Bootstrap 3.1 Introduction The estimation of parameters in probability distributions is a basic problem in statistics that one tends to encounter already during the very first course on the subject.

More information

Section 2.2 Normal Distributions

Section 2.2 Normal Distributions Section 2.2 Mrs. Daniel AP Statistics We abbreviate the Normal distribution with mean µ and standard deviation σ as N(µ,σ). Any particular Normal distribution is completely specified by two numbers: its

More information

Sections 4.3 and 4.4

Sections 4.3 and 4.4 Sections 4.3 and 4.4 Timothy Hanson Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences 1 / 32 4.3 Areas under normal densities Every

More information

Name: Date: Period: Chapter 2. Section 1: Describing Location in a Distribution

Name: Date: Period: Chapter 2. Section 1: Describing Location in a Distribution Name: Date: Period: Chapter 2 Section 1: Describing Location in a Distribution Suppose you earned an 86 on a statistics quiz. The question is: should you be satisfied with this score? What if it is the

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 3: Distributions Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture Examine data in graphical form Graphs for looking at univariate distributions

More information

Chapter 2. Frequency distribution. Summarizing and Graphing Data

Chapter 2. Frequency distribution. Summarizing and Graphing Data Frequency distribution Chapter 2 Summarizing and Graphing Data Shows how data are partitioned among several categories (or classes) by listing the categories along with the number (frequency) of data values

More information

AP Statistics Summer Assignment:

AP Statistics Summer Assignment: AP Statistics Summer Assignment: Read the following and use the information to help answer your summer assignment questions. You will be responsible for knowing all of the information contained in this

More information

Probability and Statistics for Final Year Engineering Students

Probability and Statistics for Final Year Engineering Students Probability and Statistics for Final Year Engineering Students By Yoni Nazarathy, Last Updated: April 11, 2011. Lecture 1: Introduction and Basic Terms Welcome to the course, time table, assessment, etc..

More information

Lab 4: Distributions of random variables

Lab 4: Distributions of random variables Lab 4: Distributions of random variables In this lab we ll investigate the probability distribution that is most central to statistics: the normal distribution If we are confident that our data are nearly

More information

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures Part I, Chapters 4 & 5 Data Tables and Data Analysis Statistics and Figures Descriptive Statistics 1 Are data points clumped? (order variable / exp. variable) Concentrated around one value? Concentrated

More information

Tutorial 3: Probability & Distributions Johannes Karreth RPOS 517, Day 3

Tutorial 3: Probability & Distributions Johannes Karreth RPOS 517, Day 3 Tutorial 3: Probability & Distributions Johannes Karreth RPOS 517, Day 3 This tutorial shows you: how to simulate a random process how to plot the distribution of a variable how to assess the distribution

More information

Measures of Dispersion

Measures of Dispersion Lesson 7.6 Objectives Find the variance of a set of data. Calculate standard deviation for a set of data. Read data from a normal curve. Estimate the area under a curve. Variance Measures of Dispersion

More information

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data. 1 CHAPTER 1 Introduction Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data. Variable: Any characteristic of a person or thing that can be expressed

More information

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression Your Name: Section: 36-201 INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression Objectives: 1. To learn how to interpret scatterplots. Specifically you will investigate, using

More information

Package simed. November 27, 2017

Package simed. November 27, 2017 Version 1.0.3 Title Simulation Education Author Barry Lawson, Larry Leemis Package simed November 27, 2017 Maintainer Barry Lawson Imports graphics, grdevices, methods, stats, utils

More information

BIOL Gradation of a histogram (a) into the normal curve (b)

BIOL Gradation of a histogram (a) into the normal curve (b) (التوزيع الطبيعي ( Distribution Normal (Gaussian) One of the most important distributions in statistics is a continuous distribution called the normal distribution or Gaussian distribution. Consider the

More information

Chapter 5: The standard deviation as a ruler and the normal model p131

Chapter 5: The standard deviation as a ruler and the normal model p131 Chapter 5: The standard deviation as a ruler and the normal model p131 Which is the better exam score? 67 on an exam with mean 50 and SD 10 62 on an exam with mean 40 and SD 12? Is it fair to say: 67 is

More information

4.3 The Normal Distribution

4.3 The Normal Distribution 4.3 The Normal Distribution Objectives. Definition of normal distribution. Standard normal distribution. Specialties of the graph of the standard normal distribution. Percentiles of the standard normal

More information

Probability and Statistics. Copyright Cengage Learning. All rights reserved.

Probability and Statistics. Copyright Cengage Learning. All rights reserved. Probability and Statistics Copyright Cengage Learning. All rights reserved. 14.6 Descriptive Statistics (Graphical) Copyright Cengage Learning. All rights reserved. Objectives Data in Categories Histograms

More information

Math 14 Lecture Notes Ch. 6.1

Math 14 Lecture Notes Ch. 6.1 6.1 Normal Distribution What is normal? a 10-year old boy that is 4' tall? 5' tall? 6' tall? a 25-year old woman with a shoe size of 5? 7? 9? an adult alligator that weighs 200 pounds? 500 pounds? 800

More information


Chapter 6: DESCRIPTIVE STATISTICS Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling

More information

Chapter 2: The Normal Distribution

Chapter 2: The Normal Distribution Chapter 2: The Normal Distribution 2.1 Density Curves and the Normal Distributions 2.2 Standard Normal Calculations 1 2 Histogram for Strength of Yarn Bobbins 15.60 16.10 16.60 17.10 17.60 18.10 18.60

More information

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable. 5-number summary 68-95-99.7 Rule Area principle Bar chart Bimodal Boxplot Case Categorical data Categorical variable Center Changing center and spread Conditional distribution Context Contingency table

More information

Unit 7 Statistics. AFM Mrs. Valentine. 7.1 Samples and Surveys

Unit 7 Statistics. AFM Mrs. Valentine. 7.1 Samples and Surveys Unit 7 Statistics AFM Mrs. Valentine 7.1 Samples and Surveys v Obj.: I will understand the different methods of sampling and studying data. I will be able to determine the type used in an example, and

More information

Fathom Dynamic Data TM Version 2 Specifications

Fathom Dynamic Data TM Version 2 Specifications Data Sources Fathom Dynamic Data TM Version 2 Specifications Use data from one of the many sample documents that come with Fathom. Enter your own data by typing into a case table. Paste data from other

More information

appstats6.notebook September 27, 2016

appstats6.notebook September 27, 2016 Chapter 6 The Standard Deviation as a Ruler and the Normal Model Objectives: 1.Students will calculate and interpret z scores. 2.Students will compare/contrast values from different distributions using

More information

Statistical Programming with R

Statistical Programming with R Statistical Programming with R Lecture 9: Basic graphics in R Part 2 Bisher M. Iqelan Department of Mathematics, Faculty of Science, The Islamic University of Gaza 2017-2018, Semester

More information

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency Math 1 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency lowest value + highest value midrange The word average: is very ambiguous and can actually refer to the mean,

More information

Today s Topics. Percentile ranks and percentiles. Standardized scores. Using standardized scores to estimate percentiles

Today s Topics. Percentile ranks and percentiles. Standardized scores. Using standardized scores to estimate percentiles Today s Topics Percentile ranks and percentiles Standardized scores Using standardized scores to estimate percentiles Using µ and σ x to learn about percentiles Percentiles, standardized scores, and the

More information

Density Curve (p52) Density curve is a curve that - is always on or above the horizontal axis.

Density Curve (p52) Density curve is a curve that - is always on or above the horizontal axis. 1.3 Density curves p50 Some times the overall pattern of a large number of observations is so regular that we can describe it by a smooth curve. It is easier to work with a smooth curve, because the histogram

More information

Distributions of random variables

Distributions of random variables Chapter 3 Distributions of random variables 31 Normal distribution Among all the distributions we see in practice, one is overwhelmingly the most common The symmetric, unimodal, bell curve is ubiquitous

More information

Week 7: The normal distribution and sample means

Week 7: The normal distribution and sample means Week 7: The normal distribution and sample means Goals Visualize properties of the normal distribution. Learning the Tools Understand the Central Limit Theorem. Calculate sampling properties of sample

More information

Introduction to the Practice of Statistics using R: Chapter 6

Introduction to the Practice of Statistics using R: Chapter 6 Introduction to the Practice of Statistics using R: Chapter 6 Ben Baumer Nicholas J. Horton March 10, 2013 Contents 1 Estimating with Confidence 2 1.1 Beyond the Basics.....................................

More information

Statistics: Interpreting Data and Making Predictions. Visual Displays of Data 1/31

Statistics: Interpreting Data and Making Predictions. Visual Displays of Data 1/31 Statistics: Interpreting Data and Making Predictions Visual Displays of Data 1/31 Last Time Last time we discussed central tendency; that is, notions of the middle of data. More specifically we discussed

More information

3. Data Analysis and Statistics

3. Data Analysis and Statistics 3. Data Analysis and Statistics 3.1 Visual Analysis of Data 3.2.1 Basic Statistics Examples 3.2.2 Basic Statistical Theory 3.3 Normal Distributions 3.4 Bivariate Data 3.1 Visual Analysis of Data Visual

More information

Lab 1: Introduction to data

Lab 1: Introduction to data Lab 1: Introduction to data Some define Statistics as the field that focuses on turning information into knowledge. The first step in that process is to summarize and describe the raw information - the

More information

Excel 2010 with XLSTAT

Excel 2010 with XLSTAT Excel 2010 with XLSTAT J E N N I F E R LE W I S PR I E S T L E Y, PH.D. Introduction to Excel 2010 with XLSTAT The layout for Excel 2010 is slightly different from the layout for Excel 2007. However, with

More information

Lecture 3 Questions that we should be able to answer by the end of this lecture:

Lecture 3 Questions that we should be able to answer by the end of this lecture: Lecture 3 Questions that we should be able to answer by the end of this lecture: Which is the better exam score? 67 on an exam with mean 50 and SD 10 or 62 on an exam with mean 40 and SD 12 Is it fair

More information

height VUD x = x 1 + x x N N 2 + (x 2 x) 2 + (x N x) 2. N

height VUD x = x 1 + x x N N 2 + (x 2 x) 2 + (x N x) 2. N Math 3: CSM Tutorial: Probability, Statistics, and Navels Fall 2 In this worksheet, we look at navel ratios, means, standard deviations, relative frequency density histograms, and probability density functions.

More information

The nor1mix Package. August 3, 2006

The nor1mix Package. August 3, 2006 The nor1mix Package August 3, 2006 Title Normal (1-d) Mixture Models (S3 Classes and Methods) Version 1.0-6 Date 2006-08-02 Author: Martin Mächler Maintainer Martin Maechler

More information

1 Overview of Statistics; Essential Vocabulary

1 Overview of Statistics; Essential Vocabulary 1 Overview of Statistics; Essential Vocabulary Statistics: the science of collecting, organizing, analyzing, and interpreting data in order to make decisions Population and sample Population: the entire

More information

Lecture 3 Questions that we should be able to answer by the end of this lecture:

Lecture 3 Questions that we should be able to answer by the end of this lecture: Lecture 3 Questions that we should be able to answer by the end of this lecture: Which is the better exam score? 67 on an exam with mean 50 and SD 10 or 62 on an exam with mean 40 and SD 12 Is it fair

More information

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe

Introduction to the Practice of Statistics Fifth Edition Moore, McCabe Introduction to the Practice of Statistics Fifth Edition Moore, McCabe Section 1.3 Homework Answers Assignment 5 1.80 If you ask a computer to generate "random numbers between 0 and 1, you uniform will

More information

Normal Distribution. 6.4 Applications of Normal Distribution

Normal Distribution. 6.4 Applications of Normal Distribution Normal Distribution 6.4 Applications of Normal Distribution 1 /20 Homework Read Sec 6-4. Discussion question p316 Do p316 probs 1-10, 16-22, 31, 32, 34-37, 39 2 /20 3 /20 Objective Find the probabilities

More information

Bar Charts and Frequency Distributions

Bar Charts and Frequency Distributions Bar Charts and Frequency Distributions Use to display the distribution of categorical (nominal or ordinal) variables. For the continuous (numeric) variables, see the page Histograms, Descriptive Stats

More information


STATS PAD USER MANUAL STATS PAD USER MANUAL For Version 2.0 Manual Version 2.0 1 Table of Contents Basic Navigation! 3 Settings! 7 Entering Data! 7 Sharing Data! 8 Managing Files! 10 Running Tests! 11 Interpreting Output! 11

More information

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 2.1- #

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 2.1- # Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series by Mario F. Triola Chapter 2 Summarizing and Graphing Data 2-1 Review and Preview 2-2 Frequency Distributions 2-3 Histograms

More information

Ch6: The Normal Distribution

Ch6: The Normal Distribution Ch6: The Normal Distribution Introduction Review: A continuous random variable can assume any value between two endpoints. Many continuous random variables have an approximately normal distribution, which

More information

Table of Contents (As covered from textbook)

Table of Contents (As covered from textbook) Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression

More information

MATH 1070 Introductory Statistics Lecture notes Descriptive Statistics and Graphical Representation

MATH 1070 Introductory Statistics Lecture notes Descriptive Statistics and Graphical Representation MATH 1070 Introductory Statistics Lecture notes Descriptive Statistics and Graphical Representation Objectives: 1. Learn the meaning of descriptive versus inferential statistics 2. Identify bar graphs,

More information

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables

More information


MATH NATION SECTION 9 H.M.H. RESOURCES MATH NATION SECTION 9 H.M.H. RESOURCES SPECIAL NOTE: These resources were assembled to assist in student readiness for their upcoming Algebra 1 EOC. Although these resources have been compiled for your

More information

Selected Introductory Statistical and Data Manipulation Procedures. Gordon & Johnson 2002 Minitab version 13.

Selected Introductory Statistical and Data Manipulation Procedures. Gordon & Johnson 2002 Minitab version 13. Minitab@Oneonta.Manual: Selected Introductory Statistical and Data Manipulation Procedures Gordon & Johnson 2002 Minitab version 13.0 Minitab@Oneonta.Manual: Selected Introductory Statistical and Data

More information

The nor1mix Package. June 12, 2007

The nor1mix Package. June 12, 2007 The nor1mix Package June 12, 2007 Title Normal (1-d) Mixture Models (S3 Classes and Methods) Version 1.0-7 Date 2007-03-15 Author Martin Mächler Maintainer Martin Maechler

More information

CS 112: Computer System Modeling Fundamentals. Prof. Jenn Wortman Vaughan April 21, 2011 Lecture 8

CS 112: Computer System Modeling Fundamentals. Prof. Jenn Wortman Vaughan April 21, 2011 Lecture 8 CS 112: Computer System Modeling Fundamentals Prof. Jenn Wortman Vaughan April 21, 2011 Lecture 8 Quiz #2 Reminders & Announcements Homework 2 is due in class on Tuesday be sure to check the posted homework

More information

Getting Started With R

Getting Started With R Installation. Getting Started With R The R software package can be obtained free from To install R on a Windows machine go to this web address; in the left margin under Download, select

More information

Learning Objectives. Continuous Random Variables & The Normal Probability Distribution. Continuous Random Variable

Learning Objectives. Continuous Random Variables & The Normal Probability Distribution. Continuous Random Variable Learning Objectives Continuous Random Variables & The Normal Probability Distribution 1. Understand characteristics about continuous random variables and probability distributions 2. Understand the uniform

More information


UNIT 1A EXPLORING UNIVARIATE DATA A.P. STATISTICS E. Villarreal Lincoln HS Math Department UNIT 1A EXPLORING UNIVARIATE DATA LESSON 1: TYPES OF DATA Here is a list of important terms that we must understand as we begin our study of statistics

More information

Female Brown Bear Weights

Female Brown Bear Weights CC-20 Normal Distributions Common Core State Standards MACC.92.S-ID..4 Use the mean and standard of a data set to fit it to a normal distribution and to estimate population percentages. Recognize that

More information

Applied Calculus. Lab 1: An Introduction to R

Applied Calculus. Lab 1: An Introduction to R 1 Math 131/135/194, Fall 2004 Applied Calculus Profs. Kaplan & Flath Macalester College Lab 1: An Introduction to R Goal of this lab To begin to see how to use R. What is R? R is a computer package for

More information


CHAPTER 2 DESCRIPTIVE STATISTICS CHAPTER 2 DESCRIPTIVE STATISTICS 1. Stem-and-Leaf Graphs, Line Graphs, and Bar Graphs The distribution of data is how the data is spread or distributed over the range of the data values. This is one of

More information


LAB #1: DESCRIPTIVE STATISTICS WITH R NAVAL POSTGRADUATE SCHOOL LAB #1: DESCRIPTIVE STATISTICS WITH R Statistics (OA3102) Lab #1: Descriptive Statistics with R Goal: Introduce students to various R commands for descriptive statistics. Lab

More information

Frequency Distributions

Frequency Distributions Displaying Data Frequency Distributions After collecting data, the first task for a researcher is to organize and summarize the data so that it is possible to get a general overview of the results. Remember,

More information

Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools

Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools Abstract In this project, we study 374 public high schools in New York City. The project seeks to use regression techniques

More information

23.2 Normal Distributions

23.2 Normal Distributions 1_ Locker LESSON 23.2 Normal Distributions Common Core Math Standards The student is expected to: S-ID.4 Use the mean and standard deviation of a data set to fit it to a normal distribution and to estimate

More information

AP Statistics Prerequisite Packet

AP Statistics Prerequisite Packet Types of Data Quantitative (or measurement) Data These are data that take on numerical values that actually represent a measurement such as size, weight, how many, how long, score on a test, etc. For these

More information

Making plots in R [things I wish someone told me when I started grad school]

Making plots in R [things I wish someone told me when I started grad school] Making plots in R [things I wish someone told me when I started grad school] Kirk Lohmueller Department of Ecology and Evolutionary Biology UCLA September 22, 2017 In honor of Talk Like a Pirate Day...

More information


STATISTICAL LABORATORY, April 30th, 2010 BIVARIATE PROBABILITY DISTRIBUTIONS STATISTICAL LABORATORY, April 3th, 21 BIVARIATE PROBABILITY DISTRIBUTIONS Mario Romanazzi 1 MULTINOMIAL DISTRIBUTION Ex1 Three players play 1 independent rounds of a game, and each player has probability

More information

Section 9: One Variable Statistics

Section 9: One Variable Statistics The following Mathematics Florida Standards will be covered in this section: MAFS.912.S-ID.1.1 MAFS.912.S-ID.1.2 MAFS.912.S-ID.1.3 Represent data with plots on the real number line (dot plots, histograms,

More information

CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1. Daphne Skipper, Augusta University (2016)

CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1. Daphne Skipper, Augusta University (2016) CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1 Daphne Skipper, Augusta University (2016) 1. Stem-and-Leaf Graphs, Line Graphs, and Bar Graphs The distribution of data is

More information

3.5 Applying the Normal Distribution: Z-Scores

3.5 Applying the Normal Distribution: Z-Scores 3.5 Applying the Normal Distribution: Z-Scores In the previous section, you learned about the normal curve and the normal distribution. You know that the area under any normal curve is 1, and that 68%

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Library, Teaching & Learning 014 Summary of Basic data Analysis DATA Qualitative Quantitative Counted Measured Discrete Continuous 3 Main Measures of Interest Central Tendency Dispersion

More information

The Normal Distribution

The Normal Distribution 14-4 OBJECTIVES Use the normal distribution curve. The Normal Distribution TESTING The class of 1996 was the first class to take the adjusted Scholastic Assessment Test. The test was adjusted so that the

More information