Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Similar documents
Univariate Statistics Summary

VCEasy VISUAL FURTHER MATHS. Overview

STA 570 Spring Lecture 5 Tuesday, Feb 1

Ex.1 constructing tables. a) find the joint relative frequency of males who have a bachelors degree.

Measures of Central Tendency. A measure of central tendency is a value used to represent the typical or average value in a data set.

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Measures of Central Tendency

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

1.3 Graphical Summaries of Data

2.1: Frequency Distributions and Their Graphs

SLStats.notebook. January 12, Statistics:

Frequency Distributions

Table of Contents (As covered from textbook)

CHAPTER 2 DESCRIPTIVE STATISTICS

Learning Log Title: CHAPTER 7: PROPORTIONS AND PERCENTS. Date: Lesson: Chapter 7: Proportions and Percents

10.4 Measures of Central Tendency and Variation

10.4 Measures of Central Tendency and Variation

Averages and Variation

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES

M7D1.a: Formulate questions and collect data from a census of at least 30 objects and from samples of varying sizes.

Chapter 2 Describing, Exploring, and Comparing Data

UNIT 1A EXPLORING UNIVARIATE DATA

3. Data Analysis and Statistics

Year 10 General Mathematics Unit 2

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

AND NUMERICAL SUMMARIES. Chapter 2

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures

Chapter 3 Analyzing Normal Quantitative Data

Chapter 3 - Displaying and Summarizing Quantitative Data

YEAR 12 Trial Exam Paper FURTHER MATHEMATICS. Written examination 1. Worked solutions

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.

CHAPTER 3: Data Description

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.

No. of blue jelly beans No. of bags

/4 Directions: Graph the functions, then answer the following question.

a. divided by the. 1) Always round!! a) Even if class width comes out to a, go up one.

Slide Copyright 2005 Pearson Education, Inc. SEVENTH EDITION and EXPANDED SEVENTH EDITION. Chapter 13. Statistics Sampling Techniques

Chapter 1. Looking at Data-Distribution

Algebra 1, 4th 4.5 weeks

Overview. Frequency Distributions. Chapter 2 Summarizing & Graphing Data. Descriptive Statistics. Inferential Statistics. Frequency Distribution

Chapter 6: DESCRIPTIVE STATISTICS

Basic Statistical Terms and Definitions

Maths Revision Worksheet: Algebra I Week 1 Revision 5 Problems per night

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

CHAPTER 2: SAMPLING AND DATA

CHAPTER 2: DESCRIPTIVE STATISTICS Lecture Notes for Introductory Statistics 1. Daphne Skipper, Augusta University (2016)

8: Statistics. Populations and Samples. Histograms and Frequency Polygons. Page 1 of 10

Chapter 2: Descriptive Statistics

Things to Know for the Algebra I Regents

DAY 52 BOX-AND-WHISKER

Name Date Types of Graphs and Creating Graphs Notes

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 2.1- #

Statistics: Normal Distribution, Sampling, Function Fitting & Regression Analysis (Grade 12) *

15 Wyner Statistics Fall 2013

MATH NATION SECTION 9 H.M.H. RESOURCES

This chapter will show how to organize data and then construct appropriate graphs to represent the data in a concise, easy-to-understand form.

MATH& 146 Lesson 10. Section 1.6 Graphing Numerical Data

Chapter 11. Worked-Out Solutions Explorations (p. 585) Chapter 11 Maintaining Mathematical Proficiency (p. 583)

Ms Nurazrin Jupri. Frequency Distributions

Chpt 3. Data Description. 3-2 Measures of Central Tendency /40

Section 1.2. Displaying Quantitative Data with Graphs. Mrs. Daniel AP Stats 8/22/2013. Dotplots. How to Make a Dotplot. Mrs. Daniel AP Statistics

Descriptive Statistics

STANDARDS OF LEARNING CONTENT REVIEW NOTES. ALGEBRA I Part II. 3 rd Nine Weeks,

Statistics can best be defined as a collection and analysis of numerical information.

Bar Graphs and Dot Plots

Descriptive Statistics

STP 226 ELEMENTARY STATISTICS NOTES PART 2 - DESCRIPTIVE STATISTICS CHAPTER 3 DESCRIPTIVE MEASURES

Lecture Notes 3: Data summarization

IAT 355 Visual Analytics. Data and Statistical Models. Lyn Bartram

Middle School Math Course 3

Chapter 2 - Graphical Summaries of Data

STA Module 2B Organizing Data and Comparing Distributions (Part II)

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)

Mean,Median, Mode Teacher Twins 2015

TMTH 3360 NOTES ON COMMON GRAPHS AND CHARTS

Section 9: One Variable Statistics

Chapter 2: The Normal Distribution

STANDARDS OF LEARNING CONTENT REVIEW NOTES ALGEBRA I. 4 th Nine Weeks,

Regression III: Advanced Methods

AP Statistics Summer Assignment:

WELCOME! Lecture 3 Thommy Perlinger

Lecture 3 Questions that we should be able to answer by the end of this lecture:

MAT 142 College Mathematics. Module ST. Statistics. Terri Miller revised July 14, 2015

Chapter 2 Modeling Distributions of Data

MAT 110 WORKSHOP. Updated Fall 2018

Chapter2 Description of samples and populations. 2.1 Introduction.

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures

Lecture 3 Questions that we should be able to answer by the end of this lecture:

Chapter 2: Understanding Data Distributions with Tables and Graphs

CCGPS UNIT 4 Semester 2 COORDINATE ALGEBRA Page 1 of 34. Describing Data

Multiple Regression White paper

WHOLE NUMBER AND DECIMAL OPERATIONS

NOTES TO CONSIDER BEFORE ATTEMPTING EX 1A TYPES OF DATA

Section 2-2 Frequency Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc

Statistical Methods. Instructor: Lingsong Zhang. Any questions, ask me during the office hour, or me, I will answer promptly.

ALGEBRA II A CURRICULUM OUTLINE

National 5 Mathematics Applications Notes with Examples

Box Plots. OpenStax College

Chapter 4: Analyzing Bivariate Data with Fathom

Transcription:

Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables Watch for a+bx and ax+b forms Remember to seasonalise data after predictions Always write Always check rounding requirements Never delete an answer Least squares gradient has S y on top Further Mathematics Units 3 and 4 Revision

Types of Data The main techniques used to organise data are: Frequency tables Dot plots Line Plots Stem-and-leaf plots Data can be.grouped before it is shown in any of the techniques above. There needs to be more than five and no more than fifteen groups. When displaying data, remember: Title Label axes Label important points Use the key. Always.use.a.ruler Core - Univariate Data Page 1

Core - Univariate Data Page 2 Skew Skew determines where.data is most spread out. Most distributions are skewed. Skewed data is usually a result of a barrier preventing data from going past a limit. Negatively skewed data has a 'tail' pointing to the left This means that: Most of the data is concentrated around the higher values of x Mean, median and mode will not be the same Mean, median and mode are located in the body of the curve (the right) Positively skewed data has a 'tail' pointing to the right This means that: Most of the data is concentrated around the lower values of x Mean, median and mode will not be the same Mean, median and mode are located in the body of the curve (the left) For skewed data, the median and IQR is usually used. Perfectly symmetrical data has an equal mean, median and mode. Frequency histograms and polygons show a bell-shaped curve. Bimodal data shows two equal modes. This usually indicates that there are two groups of data that need to be separated (eg. height of boys and girls)

Analysing data The following summary statistics are usually used together: Mean and standard deviation Median and IQR Mode and range Statistic Term number Q1 Medianth term. Q3 Mode The mode is the most commonly occurring value. There can be two or more modes. The mode is not usually an accurate measure of centre, and is only used when there is a high number of scores. Median The median is the middle value. 50% of data lies either side of the median. If there are two middle values, it is their average. To estimate the median of grouped data, use a cumulative frequency table. The median is not affected by extreme values. The median is usually used for skewed distributions. It may not be equal to one of the values in the data set. Mean The mean is the average value. Range The range is the difference between the smallest value and the largest. It is influenced by outliers. For grouped data, the range is the difference between the highest and lowest groups with values. Interquartile range The IQR is the range between which.the.middle 50% of values lie. It is not affected by outliers. It is best obtained using a calculator. IQR = Q3 - Q1 For ungrouped data: Outlier Lower limit = Q1 - (1.5 x IQR) Upper limit = Q3 + (1.5 x IQR) For grouped data: The mean is not necessarily equal to a value in the data set. The mean takes all data into account, and is therefore affected by extreme values. Core - Univariate Data Page 3

Core - Univariate Data Page 4 Standard Deviation and Variance The variance is the square of the standard deviation. Both values show the distribution of data around the mean. Both standard deviation and variance are always positive numbers. Both standard deviation and variance are affected by outliers. To estimate the standard deviation of a dataset, the following formula is used: This formula accounts for 95% of data being within two standard deviations of the mean. The 68-95-99.7 Rule 68% of values lie within one standard deviation of the mean. 95% of values lie within two standard deviations of the mean. 99.7% of values lie within three standard deviations of the mean. Z-Scores Z-scores are used to compare values between different sets of data. For example, the height of a girl can be compared with the height of a boy by comparing them with their gender and age. The formula for z-scores is: The following formulas can also be used to determine information: The z-scores then function like normal data distributions with a mean of zero and a standard deviation of one. The 68-95-99.7 rule can then be applied to the standardised data.

Types of Bivariate Data Bivariate data sets can be of three types: Categorical - Categorical Numeric - Categorical Numeric - Numeric or explanatory variable or response variable For any bivariate data set, one variable is usually.dependent and the other independent. The dependent variable responds to change in the independent variable. The independent variable explains the change in the dependent variable. The objective of bivariate analysis is to determine whether a relationship exists and if so, its: Strength r values - perfect, strong, moderate, weak or none r² values Form Linear or non-linear Direction Positive or negative y Dependent variable The dependent variable is always plotted on the y-axis, and the independent on the x-axis. Independent variable Core - Bivariate Data Page 5 x

Core - Bivariate Data Page 6 Correlation This diagram shows values of Pearson's Product-Moment Correlation Coefficient and what strength relationship they represent. Positive values always represent relationships with a positive gradient, and negative values always represent relationships with a negative gradient. Note that: It is designed for linear data only It should be used with caution if outliers are present The Coefficient of Determination is equal to r². It describes the influence the independent variable has on the dependent variable. It is usually expressed as a percentage. The standard analysis is: The coefficient of determination calculated to be [r²] shows that [r²%] of the variation in [dependent variable] can be explained by variation in [independent variable]. The other [(100-r²)]% of variation in [dependent variable] can be explained by other factors or influences.

Core - Bivariate Data Page 7 Regressions Line of best fit To calculate a line of best fit, first draw a line through the data so that approximately half the data points are on each side. Then, find two points and use the following equations to calculate the equation of the line: When calculating the equation of the line of best fit, it is sometimes possible to estimate the y -intercept, however before doing so ensure that both axes start at zero. In this graph, it is easy to estimate the y-intercept to be 60, however the axes are broken so that would be incorrect. 1. 2. 3. 4. Three Median Regression The three-median regression line is formed as follows: Plot the data on a scatterplot Divide the points symmetrically into three groups Find the median point of each group by finding the median of the x and y values Use the slope of the outside points to draw a line that is one third of the way towards the middle point. To calculate the equation of the three-median regression line, use the following formulae: Least Squares Regression The least squares regression line is formed using the following formulae: The Three-Median Regression line is usually used for data with outliers, and the Least Squares Regression line is usually used for data without outliers.

Residuals The residual for a data point is its vertical distance from a regression line. It is calculated using the following formula: actual y value - predicted y value A residual plot (shown below) shows important information about a relationship. A residual plot with a valley (left) or a crest (right) indicates that data is non-linear and a transformation should be applied to the data. A residual plot with a random pattern (centre) indicates that data is linear. WARNING: Residual plots are never 'scattered', they are always random. Transformations The following tables shows which transformations can be used on certain scatterplot shapes, and the effect of certain transformations. When a transformation is needed, use the values of r and r² to help determine which is best. Core - Bivariate Data Page 8

Core - Bivariate Data Page 9 Summarising Time Series Data Time series data is simply data with a timeframe as the independent variable. There are six ways in which it can be described: 1. Secular Trend Data displays a trend (or secular trend) when a consistent increase or decrease can be seen in the data over a significant period of time. A trend line can be fitted to such data. 2. Seasonality Seasonal data changes or fluctuates at given intervals, with given intensities. For example, sales of warm drinks might fluctuate in winter. This data can be deseasonalised. 3. Cyclic Cyclic data shows fluctuations, but not at consistent intervals, amplitudes or seasons. This includes data such as stock prices. 4. Random Random data shows no pattern. All fluctuations occur by chance and cannot be predicted.

Core - Bivariate Data Page 10 Smoothing Time Series Data Time series data can be smoothed in two ways - moving average smoothing and median smoothing. Moving Average Smoothing Moving average smoothing is simple. Each smoothed data point is equal to the average of a specified number of points surrounding it. For example, if a five point moving average is chosen, then each consecutive group of five points is averaged to give the points in the smoothed data set. Median Smoothing Median smoothing is very similar to moving average smoothing, however the median of the points is taken instead of the average. This method is generally preferred to moving average smoothing when outliers are present. The number of points to smooth to is generally equal to the number of data points in a time cycle. For example, five points would be used for data from each working week, and twelve points would be used for monthly data. Deseasonalising Data A procedure exists for deseasonalising data. It involves the following formulae: Seasonal indices can also be used to comment on the relationship between seasons, for example: A seasonal index of 1.3 means that the season is 30% above the average of the seasons A seasonal index of 1.0 means that the season is equal to the average of the seasons A seasonal index of 0.8 means that the season is 20% below the average of the seasons The sum of the seasonal indices is always equal to the number of seasons.