Lecture 1: Exploratory data analysis

Size: px
Start display at page:

Download "Lecture 1: Exploratory data analysis"

Transcription

1 Lecture 1: Exploratory data analysis Statistics 101 Mine Çetinkaya-Rundel January 17, 2012

2 Announcements Announcements Any questions about the syllabus? If you sent me your gmail address your RStudio account should be working. If you haven t, please do so ASAP. Did you take the survey? If not, it s under Tests & Quizzes on Sakai. Did you get a clicker? If you ve made arrangements to get a used clicker using the list on Google Docs please remove yourself from the list. Once you have your clicker register it at support/ registeryourclicker. Enter your NetID in the Student ID field. If you missed lab last Wednesday, review it on your own. It s posted on the course webpage at stat.duke.edu/ courses/ Spring12/ sta Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

3 Announcements Introduction to data Process of scientific inquiry: 1 Identify a question or problem 2 Collect relevant data on the topic 3 Analyze the data 4 Form a conclusion Statistics focuses on making stages (2)-(4) objective, rigorous, and efficient. Statistics is the study of how best to collect, analyze, and draw conclusions from data. Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

4 Case study 1 Case study 2 Data basics 3 Examining numerical data Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, 2012

5 Case study (1) Identify a question or problem Is whether or not students identify with their parents political beliefs associated with their class year? How would we go about answering this question? Can you help design an appropriate study? Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

6 Case study (2) Collect relevant data on the topic 86 students took a survey where they were asked about their class year and whether or not they identify with their parents political beliefs. Student class parent politic 1 junior no 2 first-year yes 3 first-year NA sophomore yes Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

7 Case study (3) Analyze the data class parent politic no yes Total first-year sophomore junior senior Total Why is the total number of responses only 76 when 86 students took the survey? Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

8 Case study (3) Analyze the data (cont.) How can we answer the research question using these data? class parent politic no yes Total first-year sophomore junior senior Total Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

9 Case study (4) Form a conclusion Clicker question What is an appropriate conclusion for this study? (a) Proportion of graduate students who identify with their parents political beliefs will be less than 33%. (b) Proportion of high school students who identify with their parents political beliefs will be higher than 97%. (c) Staying in school longer causes students to not identify with their parents political beliefs. (d) While there is a difference in the proportion of students who identify with their parents political beliefs among the different class years, we cannot determine from this analysis alone if being in school for longer is the cause of having different political beliefs. Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

10 Data basics 1 Case study 2 Data basics Observations and variables Types of variables Relationships among variables Associated and independent variables 3 Examining numerical data Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, 2012

11 Data basics Observations and variables Observations and variables data matrix variable Stu. gender intro extra dread 1 male extravert 3 2 female extravert 2 3 female introvert 4 4 female extravert 2 observation male extravert 3 Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

12 Data basics Types of variables Types of variables all variables numerical categorical continuous discrete regular categorical ordinal measured counted unordered categories ordered categories Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

13 Data basics Types of variables Types of variables (cont.) gender: gender sleep bedtime countries dread 1 male female female female female female Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

14 Data basics Types of variables Types of variables (cont.) gender: gender sleep bedtime countries dread 1 male female female female female female Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

15 Data basics Types of variables Types of variables (cont.) gender: sleep: gender sleep bedtime countries dread 1 male female female female female female Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

16 Data basics Types of variables Types of variables (cont.) gender: sleep: gender sleep bedtime countries dread 1 male female female female female female Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

17 Data basics Types of variables Types of variables (cont.) gender: sleep: bedtime: gender sleep bedtime countries dread 1 male female female female female female Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

18 Data basics Types of variables Types of variables (cont.) gender: sleep: bedtime: gender sleep bedtime countries dread 1 male female female female female female Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

19 Data basics Types of variables Types of variables (cont.) gender: sleep: bedtime: countries: gender sleep bedtime countries dread 1 male female female female female female Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

20 Data basics Types of variables Types of variables (cont.) gender: sleep: bedtime: countries: gender sleep bedtime countries dread 1 male female female female female female Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

21 Data basics Types of variables Types of variables (cont.) gender: sleep: bedtime: countries: dread: gender sleep bedtime countries dread 1 male female female female female female Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

22 Data basics Types of variables Types of variables (cont.) gender: sleep: bedtime: countries: dread: gender sleep bedtime countries dread 1 male female female female female female Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

23 Data basics Types of variables Types of variables (cont.) gender: sleep: bedtime: countries: dread: gender sleep bedtime countries dread 1 male female female female female female Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

24 Data basics Types of variables Clicker question What type of variable is a zip code? (a) numerical, continuous (b) numerical, discrete (c) categorical (d) categorical, ordinal Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

25 Data basics Relationships among variables Relationships among variables 14 # of alcoholic drinks / week age at first alcohol consumption Does there appear to be a relationship between number of alcoholic drinks consumed per week and age at first alcohol consumption? Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

26 Data basics Associated and independent variables Associated and independent variables Clicker question Based on the scatterplot on the right, which of the following statements is correct about the head and skull lengths of possums? skull width (mm) head length (mm) (a) There is no relationship between head length and skull width, i.e. the variables are independent. (b) Head length and skull width are positively associated. (c) Skull width and head length are negatively associated. (d) A longer head causes the skull to be wider. (e) A wider skull causes the head to be longer. Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

27 Data basics Associated and independent variables Associated vs. independent When two variables show some connection with one another, they are called associated variables. Associated variables can also be called dependent variables and vice-versa. If two variables are not associated, i.e. there is no evident connection between the two, then they are said to be independent. It is also possible for observations to be independent as well. Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

28 1 Case study 2 Data basics 3 Examining numerical data Scatterplots for paired data Dot plots and the mean Histograms and shape Variance and standard deviation Box plots, quartiles, and the median Robust statistics Transforming data Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, 2012

29 Population to sample It is usually not feasible to collect information on the entire population due to high costs of data collection so statisticians instead work with samples that are (hopefully) representative of the populations they come from. population sample We try to understand certain features of the population as a whole using summary statistics and graphs based on these samples. Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

30 Scatterplots for paired data Cars:... vs. weight From the cars data presented in the textbook: 60 miles per gallon (city rating) price ($1000s) weight (pounds) weight (pounds) What do these scatterplots reveal about the data? How might they be useful? Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

31 Scatterplots for paired data Life expectancy vs. income Do life expectancy and income appear to be associated or independent? Was the relationship the same throughout the years, or did it change? world Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

32 Dot plots and the mean Dot plots Useful for visualizing one numerical variable gpa Do you see anything out of the ordinary? Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

33 Dot plots and the mean Dot plots (cont.) Useful for visualizing one numerical variable. Darker colors represent areas where there are more observations gpa How would you describe the distribution of GPAs in this data set? Make sure to say something about the center, shape, and spread of the distribution. Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

34 Dot plots and the mean Dot plots & mean gpa The mean, also called the average (marked with a triangle in the above plot), is one way to measure the center of a distribution of data. The mean GPA is Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

35 Dot plots and the mean Mean The sample mean, denoted as x, can be calculated as x = x 1 + x x n, n where x 1, x 2,, x n represent the n observed values. The population mean is also computed the same way but is denoted as µ. It is often not possible to calculate µ since population data is rarely available. The sample mean is a sample statistics, or a point estimate of the population mean. This estimate may not be perfect, but if the sample is good (representative of the population) it is usually a good guess. Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

36 Dot plots and the mean Stacked dot plot Higher bars represent areas where there are more observations, makes it a little easier to judge the center and the shape of the distribution gpa Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

37 Histograms and shape Histograms - GPA Higher bars represent areas where there are more observations, preferable when sample size is large but hides finer details like individual observations frequency gpa Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

38 Histograms and shape Anatomy of a histogram Order the data in ascending order: sort(d$gpa) [1] [16] [31] [46] [61] [76] Make a frequency table where the number of observations that fall in a certain bin are recorded by counting how many observations fall in each bin. Let s use a bin width of 0.1: GPA 2.9 to 3 3 to to to to to 4 Count Note: Histogram is shown on the previous slide. Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

39 Histograms and shape Histograms - Extracurricular hours Histograms provide a view of the data density. Higher bars represent where the data are relatively more common. Histograms are especially convenient for describing the shape of the data distribution. The chosen bin width can alter the story the histogram is telling. frequency extracurricular hrs / week Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

40 Histograms and shape Bin width Which one(s) of these histograms are useful? Which reveal too much about the data? Which hide too much? frequency frequency extracurricular hrs / week extracurricular hrs / week frequency frequency extracurricular hrs / week extracurricular hrs / week Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

41 Histograms and shape Shape of a distribution: modality The mode is defined as the most frequent observation in the data set. Does the histogram have a single prominent peak (unimodal), several prominent peaks (bimodal/multimodal), or no apparent peaks (uniform)? Note: In order to determine modality, it s best to step back and imagine a smooth curve over the histogram. Use the limp spaghetti method. Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

42 Histograms and shape Shape of a distribution: skewness Is the histogram right skewed, left skewed, or symmetric? Note: Histograms are said to be skewed to the side of the long tail. Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

43 Histograms and shape Shape of a distribution: unusual observations Are there any unusual observations or potential outliers? Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

44 Histograms and shape Clicker question Which of these variables do you expect to be uniformly distributed? (a) weights of adult females (b) salaries of a random sample of people from North Carolina (c) exam scores (d) birthdays of classmates (day of the month) Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

45 Histograms and shape Ages of my FB friends What would you guess my age is? frienddata Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

46 Histograms and shape Are you typical? watch?v=4b2xovkffz4 How useful are centers alone for conveying the true characteristics of a distribution? Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, 2012

47 Variance and standard deviation Variability in data How would you describe the amount of variability in the number of hours of sleep students get per night? 40 frequency sleep (hrs / night) Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

48 Variance and standard deviation Deviation The distance of an observation from the mean is its deviation: x i x. sort(d$sleep) [1] [30] [59] mean(d$sleep) [1] 4.6 x 1 x = = 3.6 x 2 x = = 3.6 x 3 x = = 2.6. x 86 x = = 4.4 Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

49 Variance and standard deviation Variance Sample variance, s 2, is roughly the average squared deviation from the mean. s 2 = ni=1 (x i x) 2 n 1 Note: When calculating the sample variance we divide by n 1 instead of n. Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

50 Variance and standard deviation Variance Sample variance, s 2, is roughly the average squared deviation from the mean. s 2 = ni=1 (x i x) 2 n 1 Note: When calculating the sample variance we divide by n 1 instead of n. The variance of amount of sleep students get per night can be calculated as: s 2 = ( 3.6)2 + ( 3.6) (4.4) = = 2.76 Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

51 Variance and standard deviation Variance (cont.) Calculating variance by hand is rather tedious, especially when the data set is large. var(d$sleep) [1] 2.76 Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

52 Variance and standard deviation Variance (cont.) Why do we use the squared deviation in the calculation of variance? Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

53 Variance and standard deviation Variance (cont.) Why do we use the squared deviation in the calculation of variance? To get rid of negatives so that observations equally distant from the mean are weighed equally. To weigh larger deviations more heavily Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

54 Variance and standard deviation Standard deviation The sample standard deviation, s, is the square root of the variance. s = n i=1 (x i x) 2 n 1 Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

55 Variance and standard deviation Standard deviation The sample standard deviation, s, is the square root of the variance. s = n i=1 (x i x) 2 n 1 The standard deviation of car prices can be calculated as: s = 2.76 = 1.66 Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

56 Variance and standard deviation Standard deviation The sample standard deviation, s, is the square root of the variance. s = n i=1 (x i x) 2 n 1 The standard deviation of car prices can be calculated as: s = 2.76 = 1.66 sd(d$sleep) [1] 1.66 Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

57 Variance and standard deviation Variability in car prices sleep, x = 4.6, s x = out of 86 students (80%) are within 1 SD of the mean. 80 out of 86 students (93%) are within 2 SDs of the mean. 86 out of 86 students (100%) are within 3 SDs of the mean. Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

58 Variance and standard deviation Describing distributions When describing distributions make sure to talk about the shape, center, spread, and if any, unusual observations Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

59 Variance and standard deviation Notation recap mean variance SD sample x s 2 s population µ σ 2 σ Do you see a trend in what types of letters are used for sample statistics vs. population parameters? Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

60 Box plots, quartiles, and the median Median The median is the value that splits the data in half when ordered in ascending order. 0, 1, 2, 3, 4 If there are an even number of observations, then the median is the average of the two values in the middle. 0, 1, 2, 3, 4, = Since the median is the midpoint of the data, 50% of the values are below it. Hence, it is also the 50 th percentile. Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

61 Box plots, quartiles, and the median Q1, Q3, and IQR The 25 th percentile is also called the first quartile, Q1. The 50 th percentile is also called the median. The 75 th percentile is also called the third quartile, Q3. summary(d$study_hours) Min. 1st Qu. Median Mean 3rd Qu. Max. NAs Between Q1 and Q3 is the middle 50% of the data. The range these data span is called the interquartile range, or the IQR. IQR = = 10 Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

62 Box plots, quartiles, and the median Box plot The box in a box plot represents the middle 50% of the data, and the thick line in the box is the median # of study hours / week Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

63 Box plots, quartiles, and the median Anatomy of a box plot 40 suspected outliers max whisker reach # of study hours / week upper whisker Q 3 (third quartile) median Q 1 (first quartile) 0 lower whisker Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

64 Box plots, quartiles, and the median Whiskers and outliers Whiskers of a box plot can extend up to 1.5 * IQR away from the quartiles. max upper whisker reach : Q IQR = = 35 max lower whisker reach : Q1 1.5 IQR = = 5 An outlier is defined as an observation beyond the maximum reach of the whiskers. It is an observation that appears extreme relative to the rest of the data. Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

65 Box plots, quartiles, and the median Outliers (cont.) Why is it important to look for outliers? Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

66 Box plots, quartiles, and the median Household income Clicker question Which of the below is the most reasonable estimate for the median household income? (n = 48) household income ($ thousands) (a) $50K (b) $150K (c) $300K (d) $400K (e) $500K Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

67 Robust statistics Extreme observations How would sample statistics such as mean, median, SD, and IQR of household income be affected if the largest value was replaced with $10 million? What if the smallest value was replaced with $10 million? household income ($ thousands) Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

68 Robust statistics Robust statistics household income ($ thousands) robust not robust scenario median IQR x s original data 165K 150K 211K 180K move largest to $10 million 165K 150K 398K 1,422K move smallest to $10 million 190K 163K 4,186K 1,424K Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

69 Robust statistics Robust statistics Median and IQR are more robust to skewness and outliers than mean and SD. Therefore, for skewed distributions it is more appropriate to use median and IQR to describe the center and spread for symmetric distributions it is more appropriate to use the mean and SD to describe the center and spread If you would like to estimate the typical household income for a Duke student, would you be more interested in the mean or median income? Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

70 Robust statistics Recap: finding the mean and the median Below are the final exam scores of 5 Stats 101 students: 79, 82, 94, 83, 92 Median: Put the values in increasing order: 79, 82, 83, 92, 94 Median is the value in the middle: 79, 82, 83, 92, 94 Mean is the arithmetic average: = 86 Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

71 Robust statistics Recap: finding the mean and the median (cont.) Let s add an extremely low score to the data set: 3, 79, 82, 83, 92, 94 Clicker question What is the new median? (a) 72.2 (b) 82 (c) 82.5 (d) 83 Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

72 Robust statistics Recap: finding the mean and the median (cont.) Let s add an extremely low score to the data set: 3, 79, 82, 83, 92, 94 Clicker question What is the new median? (a) 72.2 (b) 82 (c) 83 Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

73 Robust statistics Recap: finding the mean and the median (cont.) Let s add an extremely low score to the data set: 3, 79, 82, 83, 92, 94 Clicker question What is the new median? (a) 72.2 (b) 82 (c) 83 What is the new mean? Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

74 Robust statistics Mean vs. median If the distribution is symmetric, center is the mean Symmetric: mean median If the distribution is skewed or has outliers center is the median Right-skewed: mean > median Left-skewed: mean < median Right skewed Left skewed mean median mean median Symmetric mean median Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

75 Robust statistics Clicker question Which is true for the distribution of percentage of time actually spent taking notes in class versus on Facebook, Twitter, etc.? 20 frequency % of time spent taking notes in class (a) mean is larger than median (b) mean is smaller than median (c) mean is roughly equal to the median (d) impossible to tell Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

76 Robust statistics Clicker question Which is true for the distribution of number of schools accepted? 15 frequency # of schools accepted (a) mean is much larger than median (b) mean is much smaller than median (c) mean is roughly equal to the median (d) impossible to tell Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

77 Transforming data Duke basketball games attended frequency # of Duke basketball games attended Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

78 Transforming data Extremely skewed data When data are extremely skewed, transforming them might make modeling easier. A common transformation is the log transformation. frequency frequency # of Duke basketball games attended log(# of Duke basketball games attended) Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

79 Transforming data Pros and cons of transformations Transformed data are easier to work with when applying statistical models because they are much less skewed and the outliers are much less extreme. # of games log(# of games) However results of an analysis might be difficult to interpret because the log of a measured variable is usually meaningless what does log of number of games mean after all? What other variables would you expect to be extremely skewed? Note: We ll talk more about transformations when we get to inference. Statistics 101 (Mine Çetinkaya-Rundel) L1: Exploratory data analysis January 17, / 58

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order. Chapter 2 2.1 Descriptive Statistics A stem-and-leaf graph, also called a stemplot, allows for a nice overview of quantitative data without losing information on individual observations. It can be a good

More information

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures STA 2023 Module 3 Descriptive Measures Learning Objectives Upon completing this module, you should be able to: 1. Explain the purpose of a measure of center. 2. Obtain and interpret the mean, median, and

More information

Chapter 6: DESCRIPTIVE STATISTICS

Chapter 6: DESCRIPTIVE STATISTICS Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling

More information

STA Module 2B Organizing Data and Comparing Distributions (Part II)

STA Module 2B Organizing Data and Comparing Distributions (Part II) STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and

More information

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II) STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and

More information

Measures of Central Tendency. A measure of central tendency is a value used to represent the typical or average value in a data set.

Measures of Central Tendency. A measure of central tendency is a value used to represent the typical or average value in a data set. Measures of Central Tendency A measure of central tendency is a value used to represent the typical or average value in a data set. The Mean the sum of all data values divided by the number of values in

More information

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency Math 1 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency lowest value + highest value midrange The word average: is very ambiguous and can actually refer to the mean,

More information

AP Statistics Summer Assignment:

AP Statistics Summer Assignment: AP Statistics Summer Assignment: Read the following and use the information to help answer your summer assignment questions. You will be responsible for knowing all of the information contained in this

More information

Measures of Central Tendency

Measures of Central Tendency Page of 6 Measures of Central Tendency A measure of central tendency is a value used to represent the typical or average value in a data set. The Mean The sum of all data values divided by the number of

More information

UNIT 1A EXPLORING UNIVARIATE DATA

UNIT 1A EXPLORING UNIVARIATE DATA A.P. STATISTICS E. Villarreal Lincoln HS Math Department UNIT 1A EXPLORING UNIVARIATE DATA LESSON 1: TYPES OF DATA Here is a list of important terms that we must understand as we begin our study of statistics

More information

Chapter 3 - Displaying and Summarizing Quantitative Data

Chapter 3 - Displaying and Summarizing Quantitative Data Chapter 3 - Displaying and Summarizing Quantitative Data 3.1 Graphs for Quantitative Data (LABEL GRAPHS) August 25, 2014 Histogram (p. 44) - Graph that uses bars to represent different frequencies or relative

More information

CHAPTER 2 DESCRIPTIVE STATISTICS

CHAPTER 2 DESCRIPTIVE STATISTICS CHAPTER 2 DESCRIPTIVE STATISTICS 1. Stem-and-Leaf Graphs, Line Graphs, and Bar Graphs The distribution of data is how the data is spread or distributed over the range of the data values. This is one of

More information

Averages and Variation

Averages and Variation Averages and Variation 3 Copyright Cengage Learning. All rights reserved. 3.1-1 Section 3.1 Measures of Central Tendency: Mode, Median, and Mean Copyright Cengage Learning. All rights reserved. 3.1-2 Focus

More information

Lecture 3 Questions that we should be able to answer by the end of this lecture:

Lecture 3 Questions that we should be able to answer by the end of this lecture: Lecture 3 Questions that we should be able to answer by the end of this lecture: Which is the better exam score? 67 on an exam with mean 50 and SD 10 or 62 on an exam with mean 40 and SD 12 Is it fair

More information

Lecture 3 Questions that we should be able to answer by the end of this lecture:

Lecture 3 Questions that we should be able to answer by the end of this lecture: Lecture 3 Questions that we should be able to answer by the end of this lecture: Which is the better exam score? 67 on an exam with mean 50 and SD 10 or 62 on an exam with mean 40 and SD 12 Is it fair

More information

10.4 Measures of Central Tendency and Variation

10.4 Measures of Central Tendency and Variation 10.4 Measures of Central Tendency and Variation Mode-->The number that occurs most frequently; there can be more than one mode ; if each number appears equally often, then there is no mode at all. (mode

More information

10.4 Measures of Central Tendency and Variation

10.4 Measures of Central Tendency and Variation 10.4 Measures of Central Tendency and Variation Mode-->The number that occurs most frequently; there can be more than one mode ; if each number appears equally often, then there is no mode at all. (mode

More information

Statistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or me, I will answer promptly.

Statistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or  me, I will answer promptly. Statistical Methods Instructor: Lingsong Zhang 1 Issues before Class Statistical Methods Lingsong Zhang Office: Math 544 Email: lingsong@purdue.edu Phone: 765-494-7913 Office Hour: Monday 1:00 pm - 2:00

More information

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data. Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting

More information

Chapter 2 Describing, Exploring, and Comparing Data

Chapter 2 Describing, Exploring, and Comparing Data Slide 1 Chapter 2 Describing, Exploring, and Comparing Data Slide 2 2-1 Overview 2-2 Frequency Distributions 2-3 Visualizing Data 2-4 Measures of Center 2-5 Measures of Variation 2-6 Measures of Relative

More information

Chapter2 Description of samples and populations. 2.1 Introduction.

Chapter2 Description of samples and populations. 2.1 Introduction. Chapter2 Description of samples and populations. 2.1 Introduction. Statistics=science of analyzing data. Information collected (data) is gathered in terms of variables (characteristics of a subject that

More information

How individual data points are positioned within a data set.

How individual data points are positioned within a data set. Section 3.4 Measures of Position Percentiles How individual data points are positioned within a data set. P k is the value such that k% of a data set is less than or equal to P k. For example if we said

More information

MATH& 146 Lesson 10. Section 1.6 Graphing Numerical Data

MATH& 146 Lesson 10. Section 1.6 Graphing Numerical Data MATH& 146 Lesson 10 Section 1.6 Graphing Numerical Data 1 Graphs of Numerical Data One major reason for constructing a graph of numerical data is to display its distribution, or the pattern of variability

More information

Table of Contents (As covered from textbook)

Table of Contents (As covered from textbook) Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression

More information

Chapter 1. Looking at Data-Distribution

Chapter 1. Looking at Data-Distribution Chapter 1. Looking at Data-Distribution Statistics is the scientific discipline that provides methods to draw right conclusions: 1)Collecting the data 2)Describing the data 3)Drawing the conclusions Raw

More information

Univariate Statistics Summary

Univariate Statistics Summary Further Maths Univariate Statistics Summary Types of Data Data can be classified as categorical or numerical. Categorical data are observations or records that are arranged according to category. For example:

More information

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable. 5-number summary 68-95-99.7 Rule Area principle Bar chart Bimodal Boxplot Case Categorical data Categorical variable Center Changing center and spread Conditional distribution Context Contingency table

More information

Section 6.3: Measures of Position

Section 6.3: Measures of Position Section 6.3: Measures of Position Measures of position are numbers showing the location of data values relative to the other values within a data set. They can be used to compare values from different

More information

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 Objectives 2.1 What Are the Types of Data? www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative

More information

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data. 1 CHAPTER 1 Introduction Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data. Variable: Any characteristic of a person or thing that can be expressed

More information

Understanding and Comparing Distributions. Chapter 4

Understanding and Comparing Distributions. Chapter 4 Understanding and Comparing Distributions Chapter 4 Objectives: Boxplot Calculate Outliers Comparing Distributions Timeplot The Big Picture We can answer much more interesting questions about variables

More information

Measures of Dispersion

Measures of Dispersion Measures of Dispersion 6-3 I Will... Find measures of dispersion of sets of data. Find standard deviation and analyze normal distribution. Day 1: Dispersion Vocabulary Measures of Variation (Dispersion

More information

STA Module 4 The Normal Distribution

STA Module 4 The Normal Distribution STA 2023 Module 4 The Normal Distribution Learning Objectives Upon completing this module, you should be able to 1. Explain what it means for a variable to be normally distributed or approximately normally

More information

STA /25/12. Module 4 The Normal Distribution. Learning Objectives. Let s Look at Some Examples of Normal Curves

STA /25/12. Module 4 The Normal Distribution. Learning Objectives. Let s Look at Some Examples of Normal Curves STA 2023 Module 4 The Normal Distribution Learning Objectives Upon completing this module, you should be able to 1. Explain what it means for a variable to be normally distributed or approximately normally

More information

CHAPTER 3: Data Description

CHAPTER 3: Data Description CHAPTER 3: Data Description You ve tabulated and made pretty pictures. Now what numbers do you use to summarize your data? Ch3: Data Description Santorico Page 68 You ll find a link on our website to a

More information

Density Curve (p52) Density curve is a curve that - is always on or above the horizontal axis.

Density Curve (p52) Density curve is a curve that - is always on or above the horizontal axis. 1.3 Density curves p50 Some times the overall pattern of a large number of observations is so regular that we can describe it by a smooth curve. It is easier to work with a smooth curve, because the histogram

More information

Summarising Data. Mark Lunt 09/10/2018. Arthritis Research UK Epidemiology Unit University of Manchester

Summarising Data. Mark Lunt 09/10/2018. Arthritis Research UK Epidemiology Unit University of Manchester Summarising Data Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 09/10/2018 Summarising Data Today we will consider Different types of data Appropriate ways to summarise these

More information

AND NUMERICAL SUMMARIES. Chapter 2

AND NUMERICAL SUMMARIES. Chapter 2 EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 What Are the Types of Data? 2.1 Objectives www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative

More information

CHAPTER 2: SAMPLING AND DATA

CHAPTER 2: SAMPLING AND DATA CHAPTER 2: SAMPLING AND DATA This presentation is based on material and graphs from Open Stax and is copyrighted by Open Stax and Georgia Highlands College. OUTLINE 2.1 Stem-and-Leaf Graphs (Stemplots),

More information

STA 570 Spring Lecture 5 Tuesday, Feb 1

STA 570 Spring Lecture 5 Tuesday, Feb 1 STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row

More information

STP 226 ELEMENTARY STATISTICS NOTES PART 2 - DESCRIPTIVE STATISTICS CHAPTER 3 DESCRIPTIVE MEASURES

STP 226 ELEMENTARY STATISTICS NOTES PART 2 - DESCRIPTIVE STATISTICS CHAPTER 3 DESCRIPTIVE MEASURES STP 6 ELEMENTARY STATISTICS NOTES PART - DESCRIPTIVE STATISTICS CHAPTER 3 DESCRIPTIVE MEASURES Chapter covered organizing data into tables, and summarizing data with graphical displays. We will now use

More information

CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1. Daphne Skipper, Augusta University (2016)

CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1. Daphne Skipper, Augusta University (2016) CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1 Daphne Skipper, Augusta University (2016) 1. Stem-and-Leaf Graphs, Line Graphs, and Bar Graphs The distribution of data is

More information

MATH NATION SECTION 9 H.M.H. RESOURCES

MATH NATION SECTION 9 H.M.H. RESOURCES MATH NATION SECTION 9 H.M.H. RESOURCES SPECIAL NOTE: These resources were assembled to assist in student readiness for their upcoming Algebra 1 EOC. Although these resources have been compiled for your

More information

No. of blue jelly beans No. of bags

No. of blue jelly beans No. of bags Math 167 Ch5 Review 1 (c) Janice Epstein CHAPTER 5 EXPLORING DATA DISTRIBUTIONS A sample of jelly bean bags is chosen and the number of blue jelly beans in each bag is counted. The results are shown in

More information

Chapter 3 Analyzing Normal Quantitative Data

Chapter 3 Analyzing Normal Quantitative Data Chapter 3 Analyzing Normal Quantitative Data Introduction: In chapters 1 and 2, we focused on analyzing categorical data and exploring relationships between categorical data sets. We will now be doing

More information

Name: Date: Period: Chapter 2. Section 1: Describing Location in a Distribution

Name: Date: Period: Chapter 2. Section 1: Describing Location in a Distribution Name: Date: Period: Chapter 2 Section 1: Describing Location in a Distribution Suppose you earned an 86 on a statistics quiz. The question is: should you be satisfied with this score? What if it is the

More information

appstats6.notebook September 27, 2016

appstats6.notebook September 27, 2016 Chapter 6 The Standard Deviation as a Ruler and the Normal Model Objectives: 1.Students will calculate and interpret z scores. 2.Students will compare/contrast values from different distributions using

More information

MATH 1070 Introductory Statistics Lecture notes Descriptive Statistics and Graphical Representation

MATH 1070 Introductory Statistics Lecture notes Descriptive Statistics and Graphical Representation MATH 1070 Introductory Statistics Lecture notes Descriptive Statistics and Graphical Representation Objectives: 1. Learn the meaning of descriptive versus inferential statistics 2. Identify bar graphs,

More information

1.3 Graphical Summaries of Data

1.3 Graphical Summaries of Data Arkansas Tech University MATH 3513: Applied Statistics I Dr. Marcel B. Finan 1.3 Graphical Summaries of Data In the previous section we discussed numerical summaries of either a sample or a data. In this

More information

To calculate the arithmetic mean, sum all the values and divide by n (equivalently, multiple 1/n): 1 n. = 29 years.

To calculate the arithmetic mean, sum all the values and divide by n (equivalently, multiple 1/n): 1 n. = 29 years. 3: Summary Statistics Notation Consider these 10 ages (in years): 1 4 5 11 30 50 8 7 4 5 The symbol n represents the sample size (n = 10). The capital letter X denotes the variable. x i represents the

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Library, Teaching & Learning 014 Summary of Basic data Analysis DATA Qualitative Quantitative Counted Measured Discrete Continuous 3 Main Measures of Interest Central Tendency Dispersion

More information

Frequency Distributions

Frequency Distributions Displaying Data Frequency Distributions After collecting data, the first task for a researcher is to organize and summarize the data so that it is possible to get a general overview of the results. Remember,

More information

The first few questions on this worksheet will deal with measures of central tendency. These data types tell us where the center of the data set lies.

The first few questions on this worksheet will deal with measures of central tendency. These data types tell us where the center of the data set lies. Instructions: You are given the following data below these instructions. Your client (Courtney) wants you to statistically analyze the data to help her reach conclusions about how well she is teaching.

More information

Statistics Lecture 6. Looking at data one variable

Statistics Lecture 6. Looking at data one variable Statistics 111 - Lecture 6 Looking at data one variable Chapter 1.1 Moore, McCabe and Craig Probability vs. Statistics Probability 1. We know the distribution of the random variable (Normal, Binomial)

More information

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking

More information

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures Part I, Chapters 4 & 5 Data Tables and Data Analysis Statistics and Figures Descriptive Statistics 1 Are data points clumped? (order variable / exp. variable) Concentrated around one value? Concentrated

More information

Measures of Position

Measures of Position Measures of Position In this section, we will learn to use fractiles. Fractiles are numbers that partition, or divide, an ordered data set into equal parts (each part has the same number of data entries).

More information

The main issue is that the mean and standard deviations are not accurate and should not be used in the analysis. Then what statistics should we use?

The main issue is that the mean and standard deviations are not accurate and should not be used in the analysis. Then what statistics should we use? Chapter 4 Analyzing Skewed Quantitative Data Introduction: In chapter 3, we focused on analyzing bell shaped (normal) data, but many data sets are not bell shaped. How do we analyze quantitative data when

More information

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. + What is Data? Data is a collection of facts. Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. In most cases, data needs to be interpreted and

More information

IT 403 Practice Problems (1-2) Answers

IT 403 Practice Problems (1-2) Answers IT 403 Practice Problems (1-2) Answers #1. Using Tukey's Hinges method ('Inclusionary'), what is Q3 for this dataset? 2 3 5 7 11 13 17 a. 7 b. 11 c. 12 d. 15 c (12) #2. How do quartiles and percentiles

More information

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data Chapter 2 Descriptive Statistics: Organizing, Displaying and Summarizing Data Objectives Student should be able to Organize data Tabulate data into frequency/relative frequency tables Display data graphically

More information

Descriptive Statistics

Descriptive Statistics Chapter 2 Descriptive Statistics 2.1 Descriptive Statistics 1 2.1.1 Student Learning Objectives By the end of this chapter, the student should be able to: Display data graphically and interpret graphs:

More information

Chapter 2 Modeling Distributions of Data

Chapter 2 Modeling Distributions of Data Chapter 2 Modeling Distributions of Data Section 2.1 Describing Location in a Distribution Describing Location in a Distribution Learning Objectives After this section, you should be able to: FIND and

More information

Chapter 5: The standard deviation as a ruler and the normal model p131

Chapter 5: The standard deviation as a ruler and the normal model p131 Chapter 5: The standard deviation as a ruler and the normal model p131 Which is the better exam score? 67 on an exam with mean 50 and SD 10 62 on an exam with mean 40 and SD 12? Is it fair to say: 67 is

More information

Chapter 2: The Normal Distributions

Chapter 2: The Normal Distributions Chapter 2: The Normal Distributions Measures of Relative Standing & Density Curves Z-scores (Measures of Relative Standing) Suppose there is one spot left in the University of Michigan class of 2014 and

More information

Name Date Types of Graphs and Creating Graphs Notes

Name Date Types of Graphs and Creating Graphs Notes Name Date Types of Graphs and Creating Graphs Notes Graphs are helpful visual representations of data. Different graphs display data in different ways. Some graphs show individual data, but many do not.

More information

Chapter 5. Understanding and Comparing Distributions. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Chapter 5. Understanding and Comparing Distributions. Copyright 2012, 2008, 2005 Pearson Education, Inc. Chapter 5 Understanding and Comparing Distributions The Big Picture We can answer much more interesting questions about variables when we compare distributions for different groups. Below is a histogram

More information

Week 4: Describing data and estimation

Week 4: Describing data and estimation Week 4: Describing data and estimation Goals Investigate sampling error; see that larger samples have less sampling error. Visualize confidence intervals. Calculate basic summary statistics using R. Calculate

More information

LESSON 3: CENTRAL TENDENCY

LESSON 3: CENTRAL TENDENCY LESSON 3: CENTRAL TENDENCY Outline Arithmetic mean, median and mode Ungrouped data Grouped data Percentiles, fractiles, and quartiles Ungrouped data Grouped data 1 MEAN Mean is defined as follows: Sum

More information

Chapter 5. Understanding and Comparing Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Chapter 5. Understanding and Comparing Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc. Chapter 5 Understanding and Comparing Distributions The Big Picture We can answer much more interesting questions about variables when we compare distributions for different groups. Below is a histogram

More information

Chapter 2: The Normal Distribution

Chapter 2: The Normal Distribution Chapter 2: The Normal Distribution 2.1 Density Curves and the Normal Distributions 2.2 Standard Normal Calculations 1 2 Histogram for Strength of Yarn Bobbins 15.60 16.10 16.60 17.10 17.60 18.10 18.60

More information

Using a percent or a letter grade allows us a very easy way to analyze our performance. Not a big deal, just something we do regularly.

Using a percent or a letter grade allows us a very easy way to analyze our performance. Not a big deal, just something we do regularly. GRAPHING We have used statistics all our lives, what we intend to do now is formalize that knowledge. Statistics can best be defined as a collection and analysis of numerical information. Often times we

More information

Section 9: One Variable Statistics

Section 9: One Variable Statistics The following Mathematics Florida Standards will be covered in this section: MAFS.912.S-ID.1.1 MAFS.912.S-ID.1.2 MAFS.912.S-ID.1.3 Represent data with plots on the real number line (dot plots, histograms,

More information

+ Statistical Methods in

+ Statistical Methods in 9/4/013 Statistical Methods in Practice STA/MTH 379 Dr. A. B. W. Manage Associate Professor of Mathematics & Statistics Department of Mathematics & Statistics Sam Houston State University Discovering Statistics

More information

Chpt 3. Data Description. 3-2 Measures of Central Tendency /40

Chpt 3. Data Description. 3-2 Measures of Central Tendency /40 Chpt 3 Data Description 3-2 Measures of Central Tendency 1 /40 Chpt 3 Homework 3-2 Read pages 96-109 p109 Applying the Concepts p110 1, 8, 11, 15, 27, 33 2 /40 Chpt 3 3.2 Objectives l Summarize data using

More information

Parents Names Mom Cell/Work # Dad Cell/Work # Parent List the Math Courses you have taken and the grade you received 1 st 2 nd 3 rd 4th

Parents Names Mom Cell/Work # Dad Cell/Work # Parent   List the Math Courses you have taken and the grade you received 1 st 2 nd 3 rd 4th Full Name Phone # Parents Names Birthday Mom Cell/Work # Dad Cell/Work # Parent email: Extracurricular Activities: List the Math Courses you have taken and the grade you received 1 st 2 nd 3 rd 4th Turn

More information

Chapter 2: Descriptive Statistics

Chapter 2: Descriptive Statistics Chapter 2: Descriptive Statistics Student Learning Outcomes By the end of this chapter, you should be able to: Display data graphically and interpret graphs: stemplots, histograms and boxplots. Recognize,

More information

Ex.1 constructing tables. a) find the joint relative frequency of males who have a bachelors degree.

Ex.1 constructing tables. a) find the joint relative frequency of males who have a bachelors degree. Two-way Frequency Tables two way frequency table- a table that divides responses into categories. Joint relative frequency- the number of times a specific response is given divided by the sample. Marginal

More information

ECLT 5810 Data Preprocessing. Prof. Wai Lam

ECLT 5810 Data Preprocessing. Prof. Wai Lam ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate

More information

Probability and Statistics. Copyright Cengage Learning. All rights reserved.

Probability and Statistics. Copyright Cengage Learning. All rights reserved. Probability and Statistics Copyright Cengage Learning. All rights reserved. 14.5 Descriptive Statistics (Numerical) Copyright Cengage Learning. All rights reserved. Objectives Measures of Central Tendency:

More information

1.2. Pictorial and Tabular Methods in Descriptive Statistics

1.2. Pictorial and Tabular Methods in Descriptive Statistics 1.2. Pictorial and Tabular Methods in Descriptive Statistics Section Objectives. 1. Stem-and-Leaf displays. 2. Dotplots. 3. Histogram. Types of histogram shapes. Common notation. Sample size n : the number

More information

Basic Statistical Terms and Definitions

Basic Statistical Terms and Definitions I. Basics Basic Statistical Terms and Definitions Statistics is a collection of methods for planning experiments, and obtaining data. The data is then organized and summarized so that professionals can

More information

Exploratory Data Analysis

Exploratory Data Analysis Chapter 10 Exploratory Data Analysis Definition of Exploratory Data Analysis (page 410) Definition 12.1. Exploratory data analysis (EDA) is a subfield of applied statistics that is concerned with the investigation

More information

Chapter 11. Worked-Out Solutions Explorations (p. 585) Chapter 11 Maintaining Mathematical Proficiency (p. 583)

Chapter 11. Worked-Out Solutions Explorations (p. 585) Chapter 11 Maintaining Mathematical Proficiency (p. 583) Maintaining Mathematical Proficiency (p. 3) 1. After School Activities. Pets Frequency 1 1 3 7 Number of activities 3. Students Favorite Subjects Math English Science History Frequency 1 1 1 3 Number of

More information

L E A R N I N G O B JE C T I V E S

L E A R N I N G O B JE C T I V E S 2.2 Measures of Central Location L E A R N I N G O B JE C T I V E S 1. To learn the concept of the center of a data set. 2. To learn the meaning of each of three measures of the center of a data set the

More information

2.1: Frequency Distributions and Their Graphs

2.1: Frequency Distributions and Their Graphs 2.1: Frequency Distributions and Their Graphs Frequency Distribution - way to display data that has many entries - table that shows classes or intervals of data entries and the number of entries in each

More information

15 Wyner Statistics Fall 2013

15 Wyner Statistics Fall 2013 15 Wyner Statistics Fall 2013 CHAPTER THREE: CENTRAL TENDENCY AND VARIATION Summary, Terms, and Objectives The two most important aspects of a numerical data set are its central tendencies and its variation.

More information

DAY 52 BOX-AND-WHISKER

DAY 52 BOX-AND-WHISKER DAY 52 BOX-AND-WHISKER VOCABULARY The Median is the middle number of a set of data when the numbers are arranged in numerical order. The Range of a set of data is the difference between the highest and

More information

Sections 2.3 and 2.4

Sections 2.3 and 2.4 Sections 2.3 and 2.4 Shiwen Shen Department of Statistics University of South Carolina Elementary Statistics for the Biological and Life Sciences (STAT 205) 2 / 25 Descriptive statistics For continuous

More information

WELCOME! Lecture 3 Thommy Perlinger

WELCOME! Lecture 3 Thommy Perlinger Quantitative Methods II WELCOME! Lecture 3 Thommy Perlinger Program Lecture 3 Cleaning and transforming data Graphical examination of the data Missing Values Graphical examination of the data It is important

More information

Stat 528 (Autumn 2008) Density Curves and the Normal Distribution. Measures of center and spread. Features of the normal distribution

Stat 528 (Autumn 2008) Density Curves and the Normal Distribution. Measures of center and spread. Features of the normal distribution Stat 528 (Autumn 2008) Density Curves and the Normal Distribution Reading: Section 1.3 Density curves An example: GRE scores Measures of center and spread The normal distribution Features of the normal

More information

Slide Copyright 2005 Pearson Education, Inc. SEVENTH EDITION and EXPANDED SEVENTH EDITION. Chapter 13. Statistics Sampling Techniques

Slide Copyright 2005 Pearson Education, Inc. SEVENTH EDITION and EXPANDED SEVENTH EDITION. Chapter 13. Statistics Sampling Techniques SEVENTH EDITION and EXPANDED SEVENTH EDITION Slide - Chapter Statistics. Sampling Techniques Statistics Statistics is the art and science of gathering, analyzing, and making inferences from numerical information

More information

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &

More information

Name: Stat 300: Intro to Probability & Statistics Textbook: Introduction to Statistical Investigations

Name: Stat 300: Intro to Probability & Statistics Textbook: Introduction to Statistical Investigations Stat 300: Intro to Probability & Statistics Textbook: Introduction to Statistical Investigations Name: Chapter P: Preliminaries Section P.2: Exploring Data Example 1: Think About It! What will it look

More information

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student Organizing data Learning Outcome 1. make an array 2. divide the array into class intervals 3. describe the characteristics of a table 4. construct a frequency distribution table 5. constructing a composite

More information

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA This lab will assist you in learning how to summarize and display categorical and quantitative data in StatCrunch. In particular, you will learn how to

More information

AP Statistics Prerequisite Packet

AP Statistics Prerequisite Packet Types of Data Quantitative (or measurement) Data These are data that take on numerical values that actually represent a measurement such as size, weight, how many, how long, score on a test, etc. For these

More information

Chapter 3: Data Description - Part 3. Homework: Exercises 1-21 odd, odd, odd, 107, 109, 118, 119, 120, odd

Chapter 3: Data Description - Part 3. Homework: Exercises 1-21 odd, odd, odd, 107, 109, 118, 119, 120, odd Chapter 3: Data Description - Part 3 Read: Sections 1 through 5 pp 92-149 Work the following text examples: Section 3.2, 3-1 through 3-17 Section 3.3, 3-22 through 3.28, 3-42 through 3.82 Section 3.4,

More information

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables

More information

MATH& 146 Lesson 8. Section 1.6 Averages and Variation

MATH& 146 Lesson 8. Section 1.6 Averages and Variation MATH& 146 Lesson 8 Section 1.6 Averages and Variation 1 Summarizing Data The distribution of a variable is the overall pattern of how often the possible values occur. For numerical variables, three summary

More information