Week 2 Basic Statistical Concepts, Part II

Size: px
Start display at page:

Download "Week 2 Basic Statistical Concepts, Part II"

Transcription

1 Week 2 Basic Statistical Concepts, Part II

2 Week 2 Objectives 1 Data presentation through numerical and graphical summaries using R: sample mean, variance and percentiles; the box plot, histogram, stem and leaf diagram, the pie chart and bar graph, the scatter plot and scatter plot matrix. 2 The basics of comparative studies including randomization, confounding and Simpson s paradox. 3 Statistical experiments vs observational studies, and their relevance for establishing causation. 4 Factorial designs concepts: main effects and interactions. 5 Use of R for comparative graphics, the interaction plot, and for computing the main effects and interactions.

3 1 Basic Statistics and the Boxplot Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices and 3D Scatterplots 2 Randomization, Confounding and Simpson s Paradox 3 Factorial Experiments, Main and Interaction Effects 4

4 Outline Basic Statistics and the Boxplot Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices and 3D Scatterplots 1 Basic Statistics and the Boxplot Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices and 3D Scatterplots 2 Randomization, Confounding and Simpson s Paradox 3 Factorial Experiments, Main and Interaction Effects 4

5 Basic Statistics and the Boxplot Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices and 3D Scatterplots Mean, Variance and Standard Deviation With the data set in the R object x, use: mean(x) # for the mean var(x); sd(x) # for the variance and standard deviation If the population is categorical, table(x); table(x)/length(x) return the sizes and proportions of the categories, respectively. If v contains the statistical population use var(v)*(length(v)-1)/length(v) for the population variance.

6 Basic Statistics and the Boxplot Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices and 3D Scatterplots Example The productivity of each of the N = 10, 000 employees of a company is rated on a scale from 1-5. Let the statistical population v 1, v 2,..., v 10,000 be v i = 1, i = 1,..., 300, v i = 2, i = 301,..., 1, 000, v i = 3, i = 1, 001,..., 5, 000, v i = 4, i = 5, 001,..., 9, 000, v i = 5, i = 9, 001,..., 10, 000. Find the population proportions for each rating category, the average rating, and the population variance and standard deviation of rating.

7 Basic Statistics and the Boxplot Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices and 3D Scatterplots Solution. Set the statistical population in v: v=c(rep(1,300),rep(2,700),rep(3,4000),rep(4,4000),rep(5,1000)) Compute the proportions: table(v)/10000 Compute the mean, variance and standard deviation: mean(v); var(v)*(length(v)-1)/length(v) sqrt(var(v)*(length(v)-1)/length(v))

8 Basic Statistics and the Boxplot Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices and 3D Scatterplots Example Take a simple r.s. of size n = 500 from the population of employees from the previous example, and compute the sample proportions of the different ratings, the average rating, and the sample variance and standard deviation. Solution. x=sample(v, size = 500) table(x)/500 mean(x); var(x); sd(x)

9 Sample Percentiles Basic Statistics and the Boxplot Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices and 3D Scatterplots With the data set in the object x, the commands median(x) quantile(x,0.25) quantile(x,c(0.3,0.7,0.9)) summary(x) R commands for percentiles give, respectively, the median, the 25th percentile, the 30th, 70th and 90th percentiles, and a five number summary of the data consisting of x (1), q 1, x, q 3, and x (n). [summary(x) also gives the sample median.]

10 Basic Statistics and the Boxplot Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices and 3D Scatterplots Example Scientists have been monitoring the ozone hole since See the images shown in The 14 Ozone measurements (Dobson units) given in OzoneData set are taken in 2002 from the lower stratosphere, between 9 and 12 miles altitude. Obtain the five number summary as well as the 70th, 80th and 90th percentiles. Solution: Read the data in the R object oz using oz = read.table( http: //media.pearsoncmg.com/cmg/pmmg_mml_shared/mathstatsresources/akritas/ozonedata.txt, header =T) and use x=oz$ozonedata; summary(x); quantile(x, c(0.7, 0.8, 0.9)). NOTE: By typing the commands oz; x you can see the difference between the data frame oz and the data column x. Not all commands accept both: summary(oz) works but quantile(oz, c(0.7, 0.8, 0.9)) does not work.

11 Basic Statistics and the Boxplot Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices and 3D Scatterplots Example Sort the ozone measurements in increasing order and determine the sample percentiles each ordered observation corresponds to. Solution: The commands sort(x); 100*(1:length(x) - 0.5)/length(x) return the order statistics and the percentile each order statistic estimates.

12 The Boxplot Basic Statistics and the Boxplot Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices and 3D Scatterplots The five number summary given by the summary command is the basis for the boxplot. A boxplot displays the central 50% of the data with a box: the lower and upper edges are at q 1 and q 3, respectively, a line inside the box represents the median. Extending from each edge of the box are whiskers: The lower (upper) whisker extends from q 1 (q 3 ) until the smallest (largest) observation within 1.5 interquartile ranges from q 1 (q 3 ). Observations further from the box than the whisker ends (i.e., smaller than q IQR or larger than q IQR) are called outliers, and are plotted individually.

13 The R command boxplot Basic Statistics and the Boxplot Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices and 3D Scatterplots Example Construct the box plot for the ozone data. Are there any outliers? Solution: The ozone data are already in the object x. Use the command There are two outliers. boxplot(x, col= grey ).

14 Outline Basic Statistics and the Boxplot Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices and 3D Scatterplots 1 Basic Statistics and the Boxplot Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices and 3D Scatterplots 2 Randomization, Confounding and Simpson s Paradox 3 Factorial Experiments, Main and Interaction Effects 4

15 Pie Charts and Bar Graphs Basic Statistics and the Boxplot Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices and 3D Scatterplots Pie charts and bar graphs are used with count data which display the percentage of each category in the sample. For example, counts (or percentages or proportions) of different ethnic or education or income categories, the market share of different car companies, and so on. The pie chart is popular in the mass media and one of the most widely used statistical charts in the business world. It is a circular chart, where the sample is represented by a circle divided into sectors whose sizes represent proportions.

16 Basic Statistics and the Boxplot Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices and 3D Scatterplots It has been pointed out that it is difficult to compare different sections of a given pie chart. According to Steven s power law length is a better scale to use than area. The bar graph uses bars of height proportional to the proportion it represents. Remark: When the heights of the bars are arranged in a decreasing order, the bar graph is also called Pareto chart. The Pareto chart is one of the key tools for quality control, where it is often used to represent the most common sources of defects in a manufacturing process, the most frequent reasons for customer complaints, etc.

17 Basic Statistics and the Boxplot Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices and 3D Scatterplots R commands for the pie chart and bar graph Example The MarketShareLightVeh data set displays the November 2011 light vehicle market share of car companies. Import the data set into the R data frame lv, and construct a pie chart and bar graph. Solution: The data frame lv has two columns with labels Company and Percent containing the manes of companies and their percent market share, respectively. (You can see that by typing the command lv after importing the data.) The R commands for the pie chart and bar graph are: attach(lv) pie(percent, labels=company, col=rainbow(length(percent))) barplot(percent, names.arg=company, col= rainbow(length(percent)), las=2) detach(lv) The option las=2 in the barplot command is what results in the company names to be written vertically.

18 Histograms Basic Statistics and the Boxplot Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices and 3D Scatterplots In histograms the range of the data is divided into bins, and a box is constructed above each bin. The height of each box is the bin s frequency. Alternatively, the heights can be adjusted so the histogram s area is one. R will automatically choose the number of bins but it also allows user specified intervals. Moreover, R offers the option of constructing a smooth histogram. In stem and leaf plots each observation gets split into its stem, which is the beginning digit(s), and its leaf, which is the first of the remaining digits. They retain more information about the original data but do not offer as much flexibility in selecting the bins.

19 The R data set faithful Basic Statistics and the Boxplot Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices and 3D Scatterplots x = faithful$eruptions # set the eruption duration data in x hist(x) # basic frequency histogram hist(x, freq = FALSE) # histogram area = 1 plot(density(x)) # basic smooth histogram hist(x, freq = F) ; lines(density(x)) # superimposes the two stem(x) # basic stem and leaf plot hist(x, freq = F, col= grey, main= Histogram of Old Faithful eruption durations, xlab= Eruption durations ) ; lines(density(x), col= red )

20 Outline Basic Statistics and the Boxplot Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices and 3D Scatterplots 1 Basic Statistics and the Boxplot Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices and 3D Scatterplots 2 Randomization, Confounding and Simpson s Paradox 3 Factorial Experiments, Main and Interaction Effects 4

21 Scatterplot with gender identification Basic Statistics and the Boxplot Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices and 3D Scatterplots With the bear measurements data in the data frame br, a basic chest girth and weight scatterplot can be constructed by: attach(br); plot(chest.g, Weight) An enhanced chest girth and weight scatterplot with gender differentiation can be constructed by: plot(chest.g, Weight, pch=21, bg=c( red, green )[unclass(sex)]) legend( x=22, y=400,pch = c(21,21), col = c( red, green ), legend = c( Female, Male ))

22 Basic Statistics and the Boxplot Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices and 3D Scatterplots Scatterplot matrix with gender identification For more than two variables, a scatterplot matrix arranges all pairwise scatterplots in a matrix form. With the bear measurements in the data frame br use the command: pairs(br[4:8],pch=21,bg=c( red, green )[unclass(sex)]) # br[4:8] is a data frame consisting of columns 4-8 ( )For a variation, which gives histograms on the diagonal and additional information, use the commands: install.packages( psych ) # installs the package psych library(psych) # it activates the package pairs.panels(br[4:8], pch=21,bg=c( red, green )[unclass(sex)])

23 ( ) 3D Scatterplots Basic Statistics and the Boxplot Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices and 3D Scatterplots With the bear measurements data in the data frame br (use install.packages( scatterplot3d ) if not installed before) use: library(scatterplot3d); scatterplot3d(br[6:8]) # for the basic 3D scatterplot scatterplot3d(br[6:8],angle=35, col.axis= blue, col.grid= lightblue, color= red ) # angle and color controls scatterplot3d(br[6:8], angle=35, col.axis= blue, col.grid= lightblue, color= red, type= h, box=f) # vertical lines, no box scatterplot3d(br[6:8],pch=21,bg=c( red, green )[unclass(br$sex)]) # with gender differentiation detach(br)

24 Output options ( ) Basic Statistics and the Boxplot Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices and 3D Scatterplots The figure can be saved as pdf, or jpg etc. Alternatively: pdf( Desktop/HistOF.pdf ) # saves figure in Desktop/HistOF.pdf hist(x, freq = F, col= grey ); lines(density(x), col= red ) dev.off() # this must be done before opening the pdf file. To save it as a jpg file replace pdf( Desktop/HistOF.pdf ) in the above set of commands by jpeg( Desktop/HistOF.jpg ). To save text output to a txt file, for example the stem and leaf plot, copy and past, or use: sink( Desktop/StemOF.txt ); stem(x); sink(file=null)

25 Randomization, Confounding and Simpson s Paradox Comparative studies aim at discerning and explaining differences between two or more populations. Examples include: The comparison of two methods of cloud seeding for hail and fog suppression at international airports, the comparison of the survival times of a type of root system under different watering regimens, the comparison of the effectiveness of three cleaning products in removing four different types of stains.

26 Randomization, Confounding and Simpson s Paradox Some common terms used in comparative studies are: Experimental units: These are the subjects or objects on which measurements are made. Response variable: The variable being measured. One-factor studies. Factor levels; treatments; populations Multi-factor studies. Factor level combinations; treatments; populations The notions of factor(s), factor levels and factor level combinations are explained in the two examples that follow.

27 Randomization, Confounding and Simpson s Paradox Example To compare the effect of four different watering regimens on the survival times of a type of root system, The roots are the experimental units. The response variable is the survival time. Watering is the factor. The different watering regimens are the factor levels or treatments. Treatments correspond to populations.

28 Randomization, Confounding and Simpson s Paradox Example In the same root survival time as above, it is desired to also study the effect of depth on the survival of the root systems. Two different depths are to be considered. This is now a two-factor study: Factor A is depth with two levels. Factor B is watering with four levels. Treatments, or populations, are the different factor level combinations. There are 2 4 = 8 treatments. As before, the root systems are the experimental units, and the survival time is the response variable.

29 Randomization, Confounding and Simpson s Paradox The following table shows the eight factor level combinations of the above two-factor study: Factor B Factor A Tr 11 Tr 12 Tr 13 Tr 14 2 Tr 21 Tr 22 Tr 23 Tr 24

30 Contrasts Randomization, Confounding and Simpson s Paradox Comparisons of treatments, or populations, typically focus on differences (e.g., of means, or proportions). Such differences are called contrasts. For example, the comparison of two different cloud seeding methods may focus on the simple contrast µ 1 µ 2. In one-factor studies where the factor has more than two levels, a number of different contrasts may be of interest. An example follows. In multi-factor studies interest lies in more specialized contrasts, which are discussed in the section on Factorial Experiments.

31 Randomization, Confounding and Simpson s Paradox Example In a study to compare the mean tread life of four types of high performance tires, possible sets of contrasts of interest are 1 µ 1 µ 2, µ 1 µ 3, µ 1 µ 4 (control vs treatment) 2 µ 1 + µ 2 2 µ 3 + µ 4 2 (brand A vs brand B) 3 µ 1 µ, µ 2 µ, µ 3 µ, µ 4 µ (tire effects)

32 Outline Randomization, Confounding and Simpson s Paradox 1 Basic Statistics and the Boxplot Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices and 3D Scatterplots 2 Randomization, Confounding and Simpson s Paradox 3 Factorial Experiments, Main and Interaction Effects 4

33 Randomization, Confounding and Simpson s Paradox To avoid comparing apples with oranges, the experimental units for the different treatments must be homogenous. If fabric age affects the effectiveness of cleaning products then, unless the fabrics used in different treatments are age- homogenous, the comparison of treatments will be distorted. To mitigate the distorting effects, or confounding, of other possible factors, called lurking variables, it is recommended that the allocation of units to treatments be randomized.

34 Randomization, Confounding and Simpson s Paradox Randomizing the allocation of fabric pieces to the different treatments (cleaning product and stain) avoids confounding with the factor age of fabric. The distortion caused by lurking variables in the comparison of proportions is called Simpson s Paradox.

35 Randomization, Confounding and Simpson s Paradox Example The success rates of two treatments, Treatments A and B, for kidney stones are: Treatment A Treatment B 78% (273/350) 83% (289/350) The obvious conclusion is that Treatment B is more effective. The lurking variable here is the size of the kidney stone.

36 Randomization, Confounding and Simpson s Paradox Example (Kidney Stone Example Continued) When the size of the treated kidney stone is taken into consideration, the success rates are as follows: Small Large Combined Tr.A 81/87 or /263 or /350 or.78 Tr.B 234/270 or.87 55/80 or /350 or.83 Now we see that Treatment A has higher success rate for both small and large stones.

37 Randomization, Confounding and Simpson s Paradox Example (Batting Averages) The overall batting average of baseball players Derek Jeter and David Justice during the years 1995 and 1996 were and 0.270, respectively. But looking at each year separately we get a different picture: Combined Jeter 12/48 or /582 or /630 or.310 Justice 104/411 or /140 or /551 or.270 Justice had a higher batting average than Jeter in both 1995 and 1996.

38 Factorial Experiments, Main and Interaction Effects Definition A study is called a statistical experiment if the investigator controls the allocation of units to treatments or factor-level combinations, and this allocation is done in a randomized fashion. Otherwise the study is called observational. Causation can only be established via a statistical experiment. Thus, a relation between salary increase and productivity does not imply that salary increases cause increased productivity. Observational studies cannot establish causation, unless there is additional corroborating evidence. Thus, the link between smoking and health has been established through observational studies with the use of additional corroborating evidence.

39 Outline Factorial Experiments, Main and Interaction Effects 1 Basic Statistics and the Boxplot Pie Charts, Bar Graphs, and Histograms Scatterplots, Scatterplot Matrices and 3D Scatterplots 2 Randomization, Confounding and Simpson s Paradox 3 Factorial Experiments, Main and Interaction Effects 4

40 Factorial Experiments, Main and Interaction Effects A statistical experiment involving several factors is called a factorial experiment if all factor-level combinations are considered. Thus, Factor B Factor A Tr 11 Tr 12 Tr 13 Tr 14 2 Tr 21 Tr 22 Tr 23 Tr 24 is a factorial experiment if all 8 treatments are included in the study.

41 Main Effects and Interactions Factorial Experiments, Main and Interaction Effects In factorial experiments it is not enough to consider differences between the levels within each factor separately. Possible synergistic effects are also of interest. Definition If there are synergistic effects among two different factors, i.e., when a change in the level of factor A has different effects on the response depending on the level of factor B, we say that there is interaction between the two factors. The absence of interaction is called additivity.

42 Factorial Experiments, Main and Interaction Effects Example An experiment considers two types of corn, used for bio-fuel, and two types of fertilizer. The following two tables give possible population mean yields for the four combinations of seed type and fertilizer type.

43 Factorial Experiments, Main and Interaction Effects Fertilizer Row Main I II Averages Row Effects Seed A µ 11 = 107 µ 12 = 111 µ 1 = 109 α 1 = 0.25 Seed B µ 21 = 109 µ 22 = 110 µ 2 = α 2 = 0.25 Column Averages µ 1 = 108 µ 2 = µ = Main Column β 1 = 1.25 β 2 = 1.25 Effects Here the factors interact.

44 Factorial Experiments, Main and Interaction Effects Fertilizer Row Main Row I II Averages Effects Seed A µ 11 = 107 µ 12 = 111 µ 1 = 109 α 1 = 1 Seed B µ 21 = 109 µ 22 = 113 µ 2 = 111 α 2 = 1 Column Averages µ 1 = 108 µ 2 = 112 µ = 110 Main Column β 1 = 2 β 2 = 2 Effects Here the factors do not interact.

45 Factorial Experiments, Main and Interaction Effects Under additivity: There is an indisputably best level for each factor, and The best factor level combination is that of the best level of factor A with the best level of factor B. What is the best level of each factor in the above design? Under additivity, the comparison of the levels within each factor are based on the factor s main effects: Under additivity, α i = µ i µ, β j = µ j µ µ ij = µ + α i + β j

46 Factorial Experiments, Main and Interaction Effects When the factors interact, the cell means are not given in terms of the main effects as above. The difference γ ij = µ ij (µ + α i + β j ) quantifies the interaction effect. For example, in the above non-additive design, γ 11 = µ 11 µ α 1 β 1 = = 0.75.

47 Factorial Experiments, Main and Interaction Effects Data Versions of Main Effects and Interactions Data from a two-factor factorial experiment use three subscripts: Factor B Factor A x 11k, x 12k, x 13k, k = 1,..., n 11 k = 1,..., n 12 k = 1,..., n 13 2 x 21k, x 22k, x 23k, k = 1,..., n 21 k = 1,..., n 22 k = 1,..., n 23

48 Factorial Experiments, Main and Interaction Effects Sample versions of main effects and interactions are defined using x ij = 1 n ij n ij x ijk, k=1 instead of µ ij : α i = x i x, βj = x j x Sample Main Row and Column Effects γ ij = x ij (x + α i + β ) j Sample Interaction Effects

49 Factorial Experiments, Main and Interaction Effects Sample versions of main effects and interactions estimate their population counterparts but, in general, they are not equal to them. Thus, even if the data has come from an additive design, the sample interaction effects will not be zero. The interaction plot is a graphical technique that can help assess whether the sample interaction effects are significantly different from zero. For each level of, say, factor B, the interaction plot traces the cell means along the levels of factor A. See CloudSeedInterPlot.pdf for an example. For data coming from additive designs, these traces (or profiles) should be approximately parallel.

50 In this unit we will see the comparative boxplot, the comparative bar graph, and the interaction plot, where The comparative boxplot consists of side-by-side individual boxplots for the data sets from each population. It provides a visual impression of differences in the median and percentiles of the levels in one-factor studies. The comparative bar graph provides visual comparison of the categories proportions for two populations. The interaction plot provides a visual aid for assessing the presence of interactions in two-factor studies. R commands for computing the main effects and interactions for two-factor design will also be given in this unit.

51 The Comparative Boxplot Example Iron concentration measurements from four ore formations are given in the FeData data set. Read this data into the R data frame fe, and construct a comparative boxplot. Solution: With the data set read into the data frame fe, use the commands fe[1:3,] # to see what the data frame looks like boxplot(fe$conc fe$ind, col=rainbow(4))

52 The Notched Boxplot Notched boxplots provide additional information through the notches: If notches do not overlap we may, as an informal test, conclude that the population medians differ. Import the steal strength (SteelStrengthData.txt) and the robot reaction times (RobotReactTime.txt) data in the data frames ss and rt, respectively, use attach(ss); attach(rt), and compare the notched boxplots produced by boxplot(value Sample, col=rainbow(2),notch=t) boxplot(time Robot, col=rainbow(2),notch=t); detach(ss); detach(rt)

53 The Comparative Bar Graph Example The light vehicle market share of car companies for the month of November in 2010 and 2011 is given in the data file MarketShareLightVehComp.txt. Construct a comparative bar graph. Solution: With the data set read into the data frame lv2, use the commands m=rbind(lv2$percent 2010, lv2$percent 2011) barplot(m, names.arg=lv2$company, ylim=c(0,20), col=c( darkblue, red ), legend.text= c( 2010, 2011 ), beside=t,las=2)

54 The Interaction Plot The interaction plot is a useful graphical technique for assessing whether the sample interaction effects are sufficiently different from zero to imply a non-additive design. For each level of one factor, say factor B, the interaction plot traces the cell means along the levels of the other factor. If the design is additive, these traces (also called profiles) should be approximately parallel.

55 Example (Cloud seeding in Tasmania) The could seeding data (CloudSeed2w.txt) was collected to study the effect of the factors seed and season on rainfall (source: Miller, A.J, et al. (1979), Analyzing the results of a cloud-seeding experiment in Tasmania, Communications in Statistics - Theory & Methods, A8(10), ). Construct the interaction plot. Solution: Import the data set into the data frame cs and use attach(cs) and the command interaction.plot(season, seeded, rain, col=c(2,3), lty = 1, xlab= Season, ylab= Cell Means of Rainfall, trace.label= Seeding )

56 Computation of the main and interaction effects of a two-factor design require the use of the tapply command in R, and the commands for computing certain means of a matrix. These commands are presented first.

57 The R function tapply The tapply function is useful when we need to break up a set of numbers into subgroups, which are defined by some classifying factor(s), compute a statistic on each subgroup, and return the results in a convenient form. We will use the tapply function to compute the sample averages in all factor-level combinations in a b designs, i.e., we will break up the set of values of the response variable into the subgroups defined by the factor level combinations, compute the sample mean for each subgroup, and return the results in a matrix form. See the example that follows

58 Example Compute the sample averages for all factor-level combinations of the cloud seeding data set. Solution: With the cloud seeding data set in the data frame cs, use the command mcm=tapply(cs$rain, cs[,c(2,3)],mean) # matrix of cell means mcm # to display the results

59 Means of a Matrix The functions mean, rowmeans, and colmeans, when applies to a matrix, return the mean of all elements of the matrix, the vector of row means and the vector of column means. mean(mcm) # returns the mean of all sample means. rowmeans(mcm) # returns the vector of row means colmeans(mcm) # returns the vector of column means

60 Main Effects and Interactions in R Example Use R to compute the main and interaction effects for the factors seed and season of the cloud seeding data. Solution. Use the following commands: alphas= rowmeans(mcm) - mean(mcm) # main row effects betas = colmeans(mcm) - mean(mcm) # main column effects gammas=t(mcm-mean(mcm)-alphas)-betas # matrix of interaction effects.

Statistics 251: Statistical Methods

Statistics 251: Statistical Methods Statistics 251: Statistical Methods Summaries and Graphs in R Module R1 2018 file:///u:/documents/classes/lectures/251301/renae/markdown/master%20versions/summary_graphs.html#1 1/14 Summary Statistics

More information

Chapter 6: DESCRIPTIVE STATISTICS

Chapter 6: DESCRIPTIVE STATISTICS Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling

More information

Table of Contents (As covered from textbook)

Table of Contents (As covered from textbook) Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression

More information

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 Objectives 2.1 What Are the Types of Data? www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative

More information

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA This lab will assist you in learning how to summarize and display categorical and quantitative data in StatCrunch. In particular, you will learn how to

More information

AND NUMERICAL SUMMARIES. Chapter 2

AND NUMERICAL SUMMARIES. Chapter 2 EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 What Are the Types of Data? 2.1 Objectives www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative

More information

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable. 5-number summary 68-95-99.7 Rule Area principle Bar chart Bimodal Boxplot Case Categorical data Categorical variable Center Changing center and spread Conditional distribution Context Contingency table

More information

1 Introduction. 1.1 What is Statistics?

1 Introduction. 1.1 What is Statistics? 1 Introduction 1.1 What is Statistics? MATH1015 Biostatistics Week 1 Statistics is a scientific study of numerical data based on natural phenomena. It is also the science of collecting, organising, interpreting

More information

R Visualizing Data. Fall Fall 2016 CS130 - Intro to R 1

R Visualizing Data. Fall Fall 2016 CS130 - Intro to R 1 R Visualizing Data Fall 2016 Fall 2016 CS130 - Intro to R 1 mtcars Data Frame R has a built-in data frame called mtcars Useful R functions length(object) # number of variables str(object) # structure of

More information

Spring 2017 CS130 - Intro to R 1 R VISUALIZING DATA. Spring 2017 CS130 - Intro to R 2

Spring 2017 CS130 - Intro to R 1 R VISUALIZING DATA. Spring 2017 CS130 - Intro to R 2 Spring 2017 CS130 - Intro to R 1 R VISUALIZING DATA Spring 2017 Spring 2017 CS130 - Intro to R 2 Goals for this lecture: Review constructing Data Frame, Categorizing variables Construct basic graph, learn

More information

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order. Chapter 2 2.1 Descriptive Statistics A stem-and-leaf graph, also called a stemplot, allows for a nice overview of quantitative data without losing information on individual observations. It can be a good

More information

Chapter 3 - Displaying and Summarizing Quantitative Data

Chapter 3 - Displaying and Summarizing Quantitative Data Chapter 3 - Displaying and Summarizing Quantitative Data 3.1 Graphs for Quantitative Data (LABEL GRAPHS) August 25, 2014 Histogram (p. 44) - Graph that uses bars to represent different frequencies or relative

More information

WELCOME! Lecture 3 Thommy Perlinger

WELCOME! Lecture 3 Thommy Perlinger Quantitative Methods II WELCOME! Lecture 3 Thommy Perlinger Program Lecture 3 Cleaning and transforming data Graphical examination of the data Missing Values Graphical examination of the data It is important

More information

2.1: Frequency Distributions and Their Graphs

2.1: Frequency Distributions and Their Graphs 2.1: Frequency Distributions and Their Graphs Frequency Distribution - way to display data that has many entries - table that shows classes or intervals of data entries and the number of entries in each

More information

Bar Charts and Frequency Distributions

Bar Charts and Frequency Distributions Bar Charts and Frequency Distributions Use to display the distribution of categorical (nominal or ordinal) variables. For the continuous (numeric) variables, see the page Histograms, Descriptive Stats

More information

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data. Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting

More information

The Basics of Plotting in R

The Basics of Plotting in R The Basics of Plotting in R R has a built-in Datasets Package: iris mtcars precip faithful state.x77 USArrests presidents ToothGrowth USJudgeRatings You can call built-in functions like hist() or plot()

More information

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data. 1 CHAPTER 1 Introduction Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data. Variable: Any characteristic of a person or thing that can be expressed

More information

Section 2-2 Frequency Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc

Section 2-2 Frequency Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc Section 2-2 Frequency Distributions Copyright 2010, 2007, 2004 Pearson Education, Inc. 2.1-1 Frequency Distribution Frequency Distribution (or Frequency Table) It shows how a data set is partitioned among

More information

Statistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or me, I will answer promptly.

Statistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or  me, I will answer promptly. Statistical Methods Instructor: Lingsong Zhang 1 Issues before Class Statistical Methods Lingsong Zhang Office: Math 544 Email: lingsong@purdue.edu Phone: 765-494-7913 Office Hour: Monday 1:00 pm - 2:00

More information

MATH11400 Statistics Homepage

MATH11400 Statistics Homepage MATH11400 Statistics 1 2010 11 Homepage http://www.stats.bris.ac.uk/%7emapjg/teach/stats1/ 1.1 A Framework for Statistical Problems Many statistical problems can be described by a simple framework in which

More information

Summarising Data. Mark Lunt 09/10/2018. Arthritis Research UK Epidemiology Unit University of Manchester

Summarising Data. Mark Lunt 09/10/2018. Arthritis Research UK Epidemiology Unit University of Manchester Summarising Data Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 09/10/2018 Summarising Data Today we will consider Different types of data Appropriate ways to summarise these

More information

STA 570 Spring Lecture 5 Tuesday, Feb 1

STA 570 Spring Lecture 5 Tuesday, Feb 1 STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row

More information

Chapter 2: Descriptive Statistics

Chapter 2: Descriptive Statistics Chapter 2: Descriptive Statistics Student Learning Outcomes By the end of this chapter, you should be able to: Display data graphically and interpret graphs: stemplots, histograms and boxplots. Recognize,

More information

B. Graphing Representation of Data

B. Graphing Representation of Data B Graphing Representation of Data The second way of displaying data is by use of graphs Although such visual aids are even easier to read than tables, they often do not give the same detail It is essential

More information

3. Data Analysis and Statistics

3. Data Analysis and Statistics 3. Data Analysis and Statistics 3.1 Visual Analysis of Data 3.2.1 Basic Statistics Examples 3.2.2 Basic Statistical Theory 3.3 Normal Distributions 3.4 Bivariate Data 3.1 Visual Analysis of Data Visual

More information

Chapter 2 Describing, Exploring, and Comparing Data

Chapter 2 Describing, Exploring, and Comparing Data Slide 1 Chapter 2 Describing, Exploring, and Comparing Data Slide 2 2-1 Overview 2-2 Frequency Distributions 2-3 Visualizing Data 2-4 Measures of Center 2-5 Measures of Variation 2-6 Measures of Relative

More information

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data Chapter 2 Descriptive Statistics: Organizing, Displaying and Summarizing Data Objectives Student should be able to Organize data Tabulate data into frequency/relative frequency tables Display data graphically

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 152 Introduction When analyzing data, you often need to study the characteristics of a single group of numbers, observations, or measurements. You might want to know the center and the spread about

More information

Basic Statistical Terms and Definitions

Basic Statistical Terms and Definitions I. Basics Basic Statistical Terms and Definitions Statistics is a collection of methods for planning experiments, and obtaining data. The data is then organized and summarized so that professionals can

More information

8. MINITAB COMMANDS WEEK-BY-WEEK

8. MINITAB COMMANDS WEEK-BY-WEEK 8. MINITAB COMMANDS WEEK-BY-WEEK In this section of the Study Guide, we give brief information about the Minitab commands that are needed to apply the statistical methods in each week s study. They are

More information

VCEasy VISUAL FURTHER MATHS. Overview

VCEasy VISUAL FURTHER MATHS. Overview VCEasy VISUAL FURTHER MATHS Overview This booklet is a visual overview of the knowledge required for the VCE Year 12 Further Maths examination.! This booklet does not replace any existing resources that

More information

At the end of the chapter, you will learn to: Present data in textual form. Construct different types of table and graphs

At the end of the chapter, you will learn to: Present data in textual form. Construct different types of table and graphs DATA PRESENTATION At the end of the chapter, you will learn to: Present data in textual form Construct different types of table and graphs Identify the characteristics of a good table and graph Identify

More information

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 2.1- #

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 2.1- # Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series by Mario F. Triola Chapter 2 Summarizing and Graphing Data 2-1 Review and Preview 2-2 Frequency Distributions 2-3 Histograms

More information

1.3 Graphical Summaries of Data

1.3 Graphical Summaries of Data Arkansas Tech University MATH 3513: Applied Statistics I Dr. Marcel B. Finan 1.3 Graphical Summaries of Data In the previous section we discussed numerical summaries of either a sample or a data. In this

More information

Middle Years Data Analysis Display Methods

Middle Years Data Analysis Display Methods Middle Years Data Analysis Display Methods Double Bar Graph A double bar graph is an extension of a single bar graph. Any bar graph involves categories and counts of the number of people or things (frequency)

More information

Exploring and Understanding Data Using R.

Exploring and Understanding Data Using R. Exploring and Understanding Data Using R. Loading the data into an R data frame: variable

More information

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures Part I, Chapters 4 & 5 Data Tables and Data Analysis Statistics and Figures Descriptive Statistics 1 Are data points clumped? (order variable / exp. variable) Concentrated around one value? Concentrated

More information

IAT 355 Visual Analytics. Data and Statistical Models. Lyn Bartram

IAT 355 Visual Analytics. Data and Statistical Models. Lyn Bartram IAT 355 Visual Analytics Data and Statistical Models Lyn Bartram Exploring data Example: US Census People # of people in group Year # 1850 2000 (every decade) Age # 0 90+ Sex (Gender) # Male, female Marital

More information

Statistics Lecture 6. Looking at data one variable

Statistics Lecture 6. Looking at data one variable Statistics 111 - Lecture 6 Looking at data one variable Chapter 1.1 Moore, McCabe and Craig Probability vs. Statistics Probability 1. We know the distribution of the random variable (Normal, Binomial)

More information

Stat Day 6 Graphs in Minitab

Stat Day 6 Graphs in Minitab Stat 150 - Day 6 Graphs in Minitab Example 1: Pursuit of Happiness The General Social Survey (GSS) is a large-scale survey conducted in the U.S. every two years. One of the questions asked concerns how

More information

AP Statistics Summer Assignment:

AP Statistics Summer Assignment: AP Statistics Summer Assignment: Read the following and use the information to help answer your summer assignment questions. You will be responsible for knowing all of the information contained in this

More information

Brief Guide on Using SPSS 10.0

Brief Guide on Using SPSS 10.0 Brief Guide on Using SPSS 10.0 (Use student data, 22 cases, studentp.dat in Dr. Chang s Data Directory Page) (Page address: http://www.cis.ysu.edu/~chang/stat/) I. Processing File and Data To open a new

More information

CHAPTER 2 DESCRIPTIVE STATISTICS

CHAPTER 2 DESCRIPTIVE STATISTICS CHAPTER 2 DESCRIPTIVE STATISTICS 1. Stem-and-Leaf Graphs, Line Graphs, and Bar Graphs The distribution of data is how the data is spread or distributed over the range of the data values. This is one of

More information

STA Module 2B Organizing Data and Comparing Distributions (Part II)

STA Module 2B Organizing Data and Comparing Distributions (Part II) STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and

More information

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II) STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and

More information

Univariate Statistics Summary

Univariate Statistics Summary Further Maths Univariate Statistics Summary Types of Data Data can be classified as categorical or numerical. Categorical data are observations or records that are arranged according to category. For example:

More information

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Minitab 17 commands Prepared by Jeffrey S. Simonoff Minitab 17 commands Prepared by Jeffrey S. Simonoff Data entry and manipulation To enter data by hand, click on the Worksheet window, and enter the values in as you would in any spreadsheet. To then save

More information

Slide Copyright 2005 Pearson Education, Inc. SEVENTH EDITION and EXPANDED SEVENTH EDITION. Chapter 13. Statistics Sampling Techniques

Slide Copyright 2005 Pearson Education, Inc. SEVENTH EDITION and EXPANDED SEVENTH EDITION. Chapter 13. Statistics Sampling Techniques SEVENTH EDITION and EXPANDED SEVENTH EDITION Slide - Chapter Statistics. Sampling Techniques Statistics Statistics is the art and science of gathering, analyzing, and making inferences from numerical information

More information

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...

More information

MATH 1070 Introductory Statistics Lecture notes Descriptive Statistics and Graphical Representation

MATH 1070 Introductory Statistics Lecture notes Descriptive Statistics and Graphical Representation MATH 1070 Introductory Statistics Lecture notes Descriptive Statistics and Graphical Representation Objectives: 1. Learn the meaning of descriptive versus inferential statistics 2. Identify bar graphs,

More information

2.1: Frequency Distributions

2.1: Frequency Distributions 2.1: Frequency Distributions Frequency Distribution: organization of data into groups called. A: Categorical Frequency Distribution used for and level qualitative data that can be put into categories.

More information

Univariate descriptives

Univariate descriptives Univariate descriptives Johan A. Elkink University College Dublin 18 September 2014 18 September 2014 1 / Outline 1 Graphs for categorical variables 2 Graphs for scale variables 3 Frequency tables 4 Central

More information

Chapter 2 - Frequency Distributions and Graphs

Chapter 2 - Frequency Distributions and Graphs 1. Which of the following does not need to be done when constructing a frequency distribution? A) select the number of classes desired B) find the range C) make the class width an even number D) use classes

More information

Univariate Data - 2. Numeric Summaries

Univariate Data - 2. Numeric Summaries Univariate Data - 2. Numeric Summaries Young W. Lim 2018-08-01 Mon Young W. Lim Univariate Data - 2. Numeric Summaries 2018-08-01 Mon 1 / 36 Outline 1 Univariate Data Based on Numerical Summaries R Numeric

More information

BIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26

BIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26 Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26 INTRODUCTION Graphs are one of the most important aspects of data analysis and presentation of your of data. They are visual representations

More information

Chapter 11. Worked-Out Solutions Explorations (p. 585) Chapter 11 Maintaining Mathematical Proficiency (p. 583)

Chapter 11. Worked-Out Solutions Explorations (p. 585) Chapter 11 Maintaining Mathematical Proficiency (p. 583) Maintaining Mathematical Proficiency (p. 3) 1. After School Activities. Pets Frequency 1 1 3 7 Number of activities 3. Students Favorite Subjects Math English Science History Frequency 1 1 1 3 Number of

More information

Data Visualization in R

Data Visualization in R Data Visualization in R L. Torgo ltorgo@fc.up.pt Faculdade de Ciências / LIAAD-INESC TEC, LA Universidade do Porto Oct, 216 Introduction Motivation for Data Visualization Humans are outstanding at detecting

More information

Chapter 2 - Graphical Summaries of Data

Chapter 2 - Graphical Summaries of Data Chapter 2 - Graphical Summaries of Data Data recorded in the sequence in which they are collected and before they are processed or ranked are called raw data. Raw data is often difficult to make sense

More information

Selected Introductory Statistical and Data Manipulation Procedures. Gordon & Johnson 2002 Minitab version 13.

Selected Introductory Statistical and Data Manipulation Procedures. Gordon & Johnson 2002 Minitab version 13. Minitab@Oneonta.Manual: Selected Introductory Statistical and Data Manipulation Procedures Gordon & Johnson 2002 Minitab version 13.0 Minitab@Oneonta.Manual: Selected Introductory Statistical and Data

More information

Chapter 1. Looking at Data-Distribution

Chapter 1. Looking at Data-Distribution Chapter 1. Looking at Data-Distribution Statistics is the scientific discipline that provides methods to draw right conclusions: 1)Collecting the data 2)Describing the data 3)Drawing the conclusions Raw

More information

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures STA 2023 Module 3 Descriptive Measures Learning Objectives Upon completing this module, you should be able to: 1. Explain the purpose of a measure of center. 2. Obtain and interpret the mean, median, and

More information

UNIT 1A EXPLORING UNIVARIATE DATA

UNIT 1A EXPLORING UNIVARIATE DATA A.P. STATISTICS E. Villarreal Lincoln HS Math Department UNIT 1A EXPLORING UNIVARIATE DATA LESSON 1: TYPES OF DATA Here is a list of important terms that we must understand as we begin our study of statistics

More information

Math 227 EXCEL / MEGASTAT Guide

Math 227 EXCEL / MEGASTAT Guide Math 227 EXCEL / MEGASTAT Guide Introduction Introduction: Ch2: Frequency Distributions and Graphs Construct Frequency Distributions and various types of graphs: Histograms, Polygons, Pie Charts, Stem-and-Leaf

More information

Exploratory Data Analysis

Exploratory Data Analysis Chapter 10 Exploratory Data Analysis Definition of Exploratory Data Analysis (page 410) Definition 12.1. Exploratory data analysis (EDA) is a subfield of applied statistics that is concerned with the investigation

More information

Chpt 2. Frequency Distributions and Graphs. 2-4 Pareto chart, time series graph, Pie chart / 35

Chpt 2. Frequency Distributions and Graphs. 2-4 Pareto chart, time series graph, Pie chart / 35 Chpt 2 Frequency Distributions and Graphs 2-4 Pareto chart, time series graph, Pie chart 1 Chpt 2 2-4 Read pages 63-77 p76 Applying the Concepts p77 1, 7, 9, 11, 13, 14, 15 Homework 2 Chpt 2 Objectives

More information

Chapter 2 Organizing and Graphing Data. 2.1 Organizing and Graphing Qualitative Data

Chapter 2 Organizing and Graphing Data. 2.1 Organizing and Graphing Qualitative Data Chapter 2 Organizing and Graphing Data 2.1 Organizing and Graphing Qualitative Data 2.2 Organizing and Graphing Quantitative Data 2.3 Stem-and-leaf Displays 2.4 Dotplots 2.1 Organizing and Graphing Qualitative

More information

Install RStudio from - use the standard installation.

Install RStudio from   - use the standard installation. Session 1: Reading in Data Before you begin: Install RStudio from http://www.rstudio.com/ide/download/ - use the standard installation. Go to the course website; http://faculty.washington.edu/kenrice/rintro/

More information

Basics of Plotting Data

Basics of Plotting Data Basics of Plotting Data Luke Chang Last Revised July 16, 2010 One of the strengths of R over other statistical analysis packages is its ability to easily render high quality graphs. R uses vector based

More information

1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file

1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file 1 SPSS Guide 2009 Content 1. Basic Steps for Data Analysis. 3 2. Data Editor. 2.4.To create a new SPSS file 3 4 3. Data Analysis/ Frequencies. 5 4. Recoding the variable into classes.. 5 5. Data Analysis/

More information

Using a percent or a letter grade allows us a very easy way to analyze our performance. Not a big deal, just something we do regularly.

Using a percent or a letter grade allows us a very easy way to analyze our performance. Not a big deal, just something we do regularly. GRAPHING We have used statistics all our lives, what we intend to do now is formalize that knowledge. Statistics can best be defined as a collection and analysis of numerical information. Often times we

More information

An Introduction to R 2.2 Statistical graphics

An Introduction to R 2.2 Statistical graphics An Introduction to R 2.2 Statistical graphics Dan Navarro (daniel.navarro@adelaide.edu.au) School of Psychology, University of Adelaide ua.edu.au/ccs/people/dan DSTO R Workshop, 29-Apr-2015 Scatter plots

More information

AP Statistics Prerequisite Packet

AP Statistics Prerequisite Packet Types of Data Quantitative (or measurement) Data These are data that take on numerical values that actually represent a measurement such as size, weight, how many, how long, score on a test, etc. For these

More information

Enduring Understandings: Some basic math skills are required to be reviewed in preparation for the course.

Enduring Understandings: Some basic math skills are required to be reviewed in preparation for the course. Curriculum Map for Functions, Statistics and Trigonometry September 5 Days Targeted NJ Core Curriculum Content Standards: N-Q.1, N-Q.2, N-Q.3, A-CED.1, A-REI.1, A-REI.3 Enduring Understandings: Some basic

More information

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs.

Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs. 1 2 Things you ll know (or know better to watch out for!) when you leave in December: 1. What you can and cannot infer from graphs. 2. How to construct (in your head!) and interpret confidence intervals.

More information

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables

More information

SAS Visual Analytics 8.2: Working with Report Content

SAS Visual Analytics 8.2: Working with Report Content SAS Visual Analytics 8.2: Working with Report Content About Objects After selecting your data source and data items, add one or more objects to display the results. SAS Visual Analytics provides objects

More information

Understanding Statistical Questions

Understanding Statistical Questions Unit 6: Statistics Standards, Checklist and Concept Map Common Core Georgia Performance Standards (CCGPS): MCC6.SP.1: Recognize a statistical question as one that anticipates variability in the data related

More information

An Introductory Guide to R

An Introductory Guide to R An Introductory Guide to R By Claudia Mahler 1 Contents Installing and Operating R 2 Basics 4 Importing Data 5 Types of Data 6 Basic Operations 8 Selecting and Specifying Data 9 Matrices 11 Simple Statistics

More information

GRAPHING BAYOUSIDE CLASSROOM DATA

GRAPHING BAYOUSIDE CLASSROOM DATA LUMCON S BAYOUSIDE CLASSROOM GRAPHING BAYOUSIDE CLASSROOM DATA Focus/Overview This activity allows students to answer questions about their environment using data collected during water sampling. Learning

More information

KANRI DISTANCE CALCULATOR. User Guide v2.4.9

KANRI DISTANCE CALCULATOR. User Guide v2.4.9 KANRI DISTANCE CALCULATOR User Guide v2.4.9 KANRI DISTANCE CALCULATORTM FLOW Participants Input File Correlation Distance Type? Generate Target Profile General Target Define Target Profile Calculate Off-Target

More information

10.4 Measures of Central Tendency and Variation

10.4 Measures of Central Tendency and Variation 10.4 Measures of Central Tendency and Variation Mode-->The number that occurs most frequently; there can be more than one mode ; if each number appears equally often, then there is no mode at all. (mode

More information

10.4 Measures of Central Tendency and Variation

10.4 Measures of Central Tendency and Variation 10.4 Measures of Central Tendency and Variation Mode-->The number that occurs most frequently; there can be more than one mode ; if each number appears equally often, then there is no mode at all. (mode

More information

Name Date Types of Graphs and Creating Graphs Notes

Name Date Types of Graphs and Creating Graphs Notes Name Date Types of Graphs and Creating Graphs Notes Graphs are helpful visual representations of data. Different graphs display data in different ways. Some graphs show individual data, but many do not.

More information

Road Map. Data types Measuring data Data cleaning Data integration Data transformation Data reduction Data discretization Summary

Road Map. Data types Measuring data Data cleaning Data integration Data transformation Data reduction Data discretization Summary 2. Data preprocessing Road Map Data types Measuring data Data cleaning Data integration Data transformation Data reduction Data discretization Summary 2 Data types Categorical vs. Numerical Scale types

More information

3.3 The Five-Number Summary Boxplots

3.3 The Five-Number Summary Boxplots 3.3 The Five-Number Summary Boxplots Tom Lewis Fall Term 2009 Tom Lewis () 3.3 The Five-Number Summary Boxplots Fall Term 2009 1 / 9 Outline 1 Quartiles 2 Terminology Tom Lewis () 3.3 The Five-Number Summary

More information

Basic Medical Statistics Course

Basic Medical Statistics Course Basic Medical Statistics Course S0 SPSS Intro November 2013 Wilma Heemsbergen w.heemsbergen@nki.nl 1 13.00 ~ 15.30 Database (20 min) SPSS (40 min) Short break Exercise (60 min) This Afternoon During the

More information

Descriptive Statistics

Descriptive Statistics Chapter 2 Descriptive Statistics 2.1 Descriptive Statistics 1 2.1.1 Student Learning Objectives By the end of this chapter, the student should be able to: Display data graphically and interpret graphs:

More information

International Graduate School of Genetic and Molecular Epidemiology (GAME) Computing Notes and Introduction to Stata

International Graduate School of Genetic and Molecular Epidemiology (GAME) Computing Notes and Introduction to Stata International Graduate School of Genetic and Molecular Epidemiology (GAME) Computing Notes and Introduction to Stata Paul Dickman September 2003 1 A brief introduction to Stata Starting the Stata program

More information

This chapter will show how to organize data and then construct appropriate graphs to represent the data in a concise, easy-to-understand form.

This chapter will show how to organize data and then construct appropriate graphs to represent the data in a concise, easy-to-understand form. CHAPTER 2 Frequency Distributions and Graphs Objectives Organize data using frequency distributions. Represent data in frequency distributions graphically using histograms, frequency polygons, and ogives.

More information

3 Graphical Displays of Data

3 Graphical Displays of Data 3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked

More information

ECLT 5810 Data Preprocessing. Prof. Wai Lam

ECLT 5810 Data Preprocessing. Prof. Wai Lam ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate

More information

Chapter 3: Data Description - Part 3. Homework: Exercises 1-21 odd, odd, odd, 107, 109, 118, 119, 120, odd

Chapter 3: Data Description - Part 3. Homework: Exercises 1-21 odd, odd, odd, 107, 109, 118, 119, 120, odd Chapter 3: Data Description - Part 3 Read: Sections 1 through 5 pp 92-149 Work the following text examples: Section 3.2, 3-1 through 3-17 Section 3.3, 3-22 through 3.28, 3-42 through 3.82 Section 3.4,

More information

Tabular & Graphical Presentation of data

Tabular & Graphical Presentation of data Tabular & Graphical Presentation of data bjectives: To know how to make frequency distributions and its importance To know different terminology in frequency distribution table To learn different graphs/diagrams

More information

Organizing and Summarizing Data

Organizing and Summarizing Data 1 Organizing and Summarizing Data Key Definitions Frequency Distribution: This lists each category of data and how often they occur. : The percent of observations within the one of the categories. This

More information

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &

More information

CHAPTER 2: SAMPLING AND DATA

CHAPTER 2: SAMPLING AND DATA CHAPTER 2: SAMPLING AND DATA This presentation is based on material and graphs from Open Stax and is copyrighted by Open Stax and Georgia Highlands College. OUTLINE 2.1 Stem-and-Leaf Graphs (Stemplots),

More information

LESSON 3: CENTRAL TENDENCY

LESSON 3: CENTRAL TENDENCY LESSON 3: CENTRAL TENDENCY Outline Arithmetic mean, median and mode Ungrouped data Grouped data Percentiles, fractiles, and quartiles Ungrouped data Grouped data 1 MEAN Mean is defined as follows: Sum

More information

DAY 52 BOX-AND-WHISKER

DAY 52 BOX-AND-WHISKER DAY 52 BOX-AND-WHISKER VOCABULARY The Median is the middle number of a set of data when the numbers are arranged in numerical order. The Range of a set of data is the difference between the highest and

More information

28 CHAPTER 2 Summarizing and Graphing Data

28 CHAPTER 2 Summarizing and Graphing Data 8 CHAPTER Summarizing and Graphing Data. The two requested histograms are given below. They give very different visual images of the shape of the distribution. An outlier can have a significant effect

More information