Q: Which month has the lowest sale? Answer: Q:There are three consecutive months for which sale grow. What are they? Answer: Q: Which month

Similar documents
3. Saving Your Work: You will want to save your work periodically, especially during long exercises.

Math 2524: Activity 1 (Using Excel) Fall 2002

Excel Functions & Tables

ICT & MATHS. Excel 2003 in Mathematics Teaching

IBM SPSS Statistics 22 Brief Guide

Lab1: Use of Word and Excel

Excel Functions & Tables

Example how not to do it: JMP in a nutshell 1 HR, 17 Apr Subject Gender Condition Turn Reactiontime. A1 male filler

Homework 1 Excel Basics

INSTRUCTIONS FOR USING MICROSOFT EXCEL PERFORMING DESCRIPTIVE AND INFERENTIAL STATISTICS AND GRAPHING

L E A R N I N G O B JE C T I V E S

Downloading other workbooks All our workbooks can be downloaded from:

Tricking it Out: Tricks to personalize and customize your graphs.

Excel Select a template category in the Office.com Templates section. 5. Click the Download button.

Graphical Analysis of Data using Microsoft Excel [2016 Version]

Working with Microsoft Excel. Touring Excel. Selecting Data. Presented by: Brian Pearson

Experiment 1 CH Fall 2004 INTRODUCTION TO SPREADSHEETS

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression

MAT 142 College Mathematics. Module ST. Statistics. Terri Miller revised July 14, 2015

IDS 101 Introduction to Spreadsheets

DOING MORE WITH EXCEL: MICROSOFT OFFICE 2013

Advanced Algebra Chapter 3 - Note Taking Guidelines

Contents Part I: Background Information About This Handbook... 2 Excel Terminology Part II: Advanced Excel Tasks...

COMPUTER TECHNOLOGY SPREADSHEETS BASIC TERMINOLOGY. A workbook is the file Excel creates to store your data.

Tips and Guidance for Analyzing Data. Executive Summary

Middle School Math Course 3

Here is the data collected.

Introduction to CS databases and statistics in Excel Jacek Wiślicki, Laurent Babout,

E D T 3 2 E D T 3. Slide 1

Barchard Introduction to SPSS Marks

Grade 6 Curriculum and Instructional Gap Analysis Implementation Year

Year 10 General Mathematics Unit 2

How to use FSBforecast Excel add in for regression analysis

Data Management Project Using Software to Carry Out Data Analysis Tasks

Charting 1. There are several ways to access the charting function There are three autolayouts which include a chart.

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

3. EXCEL FORMULAS & TABLES

MS Office for Engineers

Models for Nurses: Quadratic Model ( ) Linear Model Dx ( ) x Models for Doctors:

1.2. Pictorial and Tabular Methods in Descriptive Statistics

Gloucester County Library System EXCEL 2007

GRAPHING IN EXCEL EXCEL LAB #2

Put the Graphs for Each Health Plan on the Same Graph

Chapter 6: DESCRIPTIVE STATISTICS

Welcome to Introduction to Microsoft Excel 2010

Excel Core Certification

Canadian National Longitudinal Survey of Children and Youth (NLSCY)

MULTIPLE REGRESSION IN EXCEL EXCEL LAB #8

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9

Scottish Improvement Skills

Lab Activity #2- Statistics and Graphing

Using Excel for Graphical Analysis of Data

Using Excel for Graphical Analysis of Data

The main issue is that the mean and standard deviations are not accurate and should not be used in the analysis. Then what statistics should we use?

Using Tables, Sparklines and Conditional Formatting. Module 5. Adobe Captivate Wednesday, May 11, 2016

Excel Functions & Tables

Making EXCEL Work for YOU!

Assignment 1 MIS Spreadsheet (Excel)

Chapter 7. Joining Maps to Other Datasets in QGIS

Introduction to Excel Workshop

Database Concepts Using Microsoft Access

Create a bar graph that displays the data from the frequency table in Example 1. See the examples on p Does our graph look different?

Let s use Technology Use Data from Cycle 14 of the General Social Survey with Fathom for a data analysis project

Chapter 3: Data Description Calculate Mean, Median, Mode, Range, Variation, Standard Deviation, Quartiles, standard scores; construct Boxplots.

Excel Basics Rice Digital Media Commons Guide Written for Microsoft Excel 2010 Windows Edition by Eric Miller

Rev. C 11/09/2010 Downers Grove Public Library Page 1 of 41

Chapter 1 Histograms, Scatterplots, and Graphs of Functions

Introduction to creating and working with graphs

Introduction to the workbook and spreadsheet

At the end of the chapter, you will learn to: Present data in textual form. Construct different types of table and graphs

Lastly, in case you don t already know this, and don t have Excel on your computers, you can get it for free through IT s website under software.

MATH 117 Statistical Methods for Management I Chapter Two

Practical 2: Using Minitab (not assessed, for practice only!)

Introduction to Minitab 1

An introduction to plotting data

Let s take a closer look at the standard deviation.

Excel Tips for Compensation Practitioners Weeks Text Formulae

OneView. User s Guide

Gloucester County Library System. Excel 2010

INTRODUCTORY SPSS. Dr Feroz Mahomed Swalaha x2689

Excel Spreadsheets and Graphs

Tutorial: How to Use ERI's Executive Compensation Assessor

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

Excel 2016 Basics for Windows

Pre-Lab Excel Problem

Predicting housing price

M i c r o s o f t E x c e l A d v a n c e d P a r t 3-4. Microsoft Excel Advanced 3-4

Microsoft Excel 2010 Handout

Microsoft Office Excel

Beginner s Guide to Microsoft Excel 2002

Statistics: Interpreting Data and Making Predictions. Visual Displays of Data 1/31

CHAPTER 4: MICROSOFT OFFICE: EXCEL 2010

Barchard Introduction to SPSS Marks

Data Analysis Guidelines

Microsoft Word for Report-Writing (2016 Version)

Statistics can best be defined as a collection and analysis of numerical information.

Exam Review: Ch. 1-3 Answer Section

Appendix A OPENING AN EXCEL SPREADSHEET: To start working with a blank Excel spreadsheet, you should: 1. Log on to your station s computer.

1. Data Analysis Yields Numbers & Visualizations. 2. Why Visualize Data? 3. What do Visualizations do? 4. Research on Visualizations

Excel 2010: Getting Started with Excel

Transcription:

Lecture 1

Q: Which month has the lowest sale? Q:There are three consecutive months for which sale grow. What are they? Q: Which month experienced the biggest drop in sale?

Q: Just above November there is the light blue bar. What is its (approximate) height? Q:What is the interpretation of this number? (max one short sentence answer) Q: Which month has historically the highest rainfall? Q: Which month experienced the highest drop in precipitation compared to the historical values?

Q: Which country has the highest number? Answer: Saudi Arabia 152 China 789 Q:Which country has the second highest number? Answer: Q: Which countries have 35 and 31 numbers? Answer:

Q: What are these numbers? 789, 152, 35 and 31 represent what? China 789 Saudi Arabia 152

Q: What is the slope of the line? Q: What is the Y-intercept? Q: What is the approximate weight for a person who spends 50 minutes in the gym (weekly)? Q: What about the weight for a person that spends zero minutes in the gym? Q: If a person decides to spend additional 20 minutes in the gym, what will happen with his weight? The person would lose about pounds

Types of Data In Statistics we distinguish data by their types: Categorical type or Numerical type. The latter are often differentiated into Discrete data, (integers like 1, 5, 14,.. ) or Continuous data (decimals like 2.33.. or 4.35). The example of discrete data would be the number of credits a student has taken and the example for continuous data type would be the student s GPA. The Categorical data are the non-numerical data. For example, imagine that someone wants to make a statistical analysis of students names. The list of all these names would be characterized as the categorical data. Typically, one would transform these names into the appropriate list of number. For example there are 45Roberts, 22Tonys etc.

The charts here related to Sales, Precipitation and International students, are based on some data. Answer the following questions: Q1. Which of the three data files was Numerical but continuous type (i.e. decimal)? Q2. One of the charts is based on Categorical data. Which one? Q3. What type of data was used in making the Precipitation chart in Task 2?

Lecture 2 (uploading data) Review Lab 1 Motivation: We cannot do statistics without data Real data are often not given in Excel format but rather in plain text format. (i.e. with extensions:.txt ) These are easy to save and open using Notepad or WordPad Opening with Excel Open the excel page and click the top- left corner Icon and choose Open Trick: Before browsing, at the bottom of the window, under the tab Files of type, click on the circled region; a tab will open and then you need to select text files. (Otherwise your file will not be visible) Browse through the computer until you find the folder where you saved the data: Height and Weight.txt and click on your file.

Lecture 2 (uploading data) Preview for Lab 1 The formatting window will pop-up. Under Choose the file type click on Delimited; do not choose Fixed. Click on Next and then again on Next and then on Finished

Large data files -Splitting the screen a) UPLOADING : Use the Gender data. Save the data as txt file at any location on your computer. Next open the data via Excel by following the steps from previous slide. b) SPLITTING: Click on any of the cell in the far left column (Suggestion: somewhere half way, say on row 10 column A). Next, on the top menu, click on View and then on Split: (see chart). C) SCROLLING: This operation will split the screen and on your right you will find two scrollers, (see figure) designed to move up and down the data list. Scroll the bottom screen to the bottom of the data file.

Uploading very large data file. Use the data US_CRime The file contains the data related to US crimes by States. A) save as text B) Open in Excel C) Cut and paste Connecticut data The result is a new Excel document containing only the data related to Connecticut. (see the table to the right). The First trick has to do with delimitation. The file is in a text format but one has to experiment a bit with formatting (do not use space or Tab ). The Second trick is about splitting the screen. In order to extract Connecticut data, one only need to highlight the relevant portion and Cut & paste it into a new document. Splitting the screen trick helps a lot here.

Random Number Generator Click on the tab labeled Data and then on Data Analysis (top right). If Data Analysis is missing you need to upload Data Analysis Toolpak. Next, follow the instructions (see figures) Click on clicking on the O.K button when done The result should contain a column of 62 random numbers. The meaning and the significance of the words: Normal, distribution, and Random Seed will be explained later. The main goal at this moment is that we can create, at will, random data with a desired structure and of desired size.

Data Collection: There are various ways data are collected. Non-random or census type data Random sampling type of data collection. The census data are typically collected from a site, without using any random mechanism of selection. Example: Collect all the students enrolled at an University and analyze their data related to classes they have taken. Given the two data uploaded in this lecture Height-Weight-Gender data as well as The US Crime data, answer the following questions: Q1. Which, if any, of the two data files was collected using the Random sampling method? Q2. Which, if any, of the two data files was collected using the Census method? The sampling data : Pick randomly a subset of the students from a University and then collect the relevant data.

Q: What is the slope of the line? Q: What is the Y-intercept? Q: For a person that takes more math classes does the equation suggest that his salary will increase or decrease? Is this expected? Q: A person A has taken X math credits during his college years while person B has taken 10 more credits than him. What can you conclude regarding their respective salaries? Note: You are not asked what the salary will be; you are asked how much the salary will change if this person has taken 10 more math credits.

Q: What is the approximate average weight for the subjects presented on this chart? Q: What is the approximate lowest height for the subjects presented here? Q: Does the data present positive, or negative trend, or no trend at all? Q: What is the approximate range for the X-axis data? Q: What is the approximate number for the Y-Axis?

Q: Which of the two data files has the higher Average? Data A, Data B or Approximately equal? Q:Which of the two data files has the higher Variation? Data A, Data B or Approximately equal?

Q: Which of the two data files has the higher Median? Data A, Data B or Approximately equal? Q: Which of the two data files has the wider Range? Data A, Data B or Approximately equal?

Q: Which of the two data files has the higher Average? Data A, Data B or Approximately equal? Q:Which of the two data files has the higher Variation? Data A, Data B or Approximately equal?

Q: Which of the two data files has the higher Median? Data A, Data B or Approximately equal? Q: Which of the two data files has the wider Range? Data A, Data B or Approximately equal?

Q: Which of the two data files has the higher Average? Data A, Data B or Approximately equal? Q:Which of the two data files has the higher Variation? Data A, Data B or Approximately equal?

Q: Which of the two data files has the higher Median? Data A, Data B or Approximately equal? Q: Which of the two data files has the wider Range? Data A, Data B or Approximately equal? COMMENT: Be careful when you judge the charts! The y-axis were on different scale which can fool the readers.

In modern statistical two main numbers that describe a data set: The Mean and The Standard Deviation. Imagine that you are given a file containing weekly sales for a few hundred locations your company supervises. The list seems like endless non-informative collection of numbers: $4453.9, $5263.9, $899.8,..,and thousands just like these. What would be the first thing to do here? What would be the first number to compute in order to somehow describe these sales? Clearly, one would compute the Average; which in statistics we often refer as the Mean. The center of the data. Imagine, for a moment, that in this fictitious case the average is $3222.5. Well, now we have some idea about the data. Apparently, through hundreds of weeks and hundreds of different shops, the average sale was slightly larger than $3000. Which is rather informative. And intuitive. In statistics we refer to this as a center of the data; a number that captures the middle, of the data set. Moreover, this number is trivially computed.

As informative as this number is, very often it is not sufficient. Why? Namely, a quick glance at this list reveals that the numbers are haphazard and somewhat random. $4453.9, $5263.9, $899.8 The average might be $3222.5, but the very second number on our list is $5263.9 which is considerably higher than the average, and the third number on this list is $899.8 which is just a fraction of the second number. And who knows what other thousands of numbers would look like. Thus, we need to characterize this haphazardness, this variation. And for this we use : Standard Deviation. The Standard Deviation, or StDev, as we will call it here, is computed by a specific formula designed to capture the data variability. The formula is rather complex which implies that Mathematical theory behind it is difficult as well. Nevertheless, modern software has a built in function that computes this number for us and in this course we are more concerned with the interpretation and not the computations. And the intuition is as follows: StDev tells us, roughly, how much the data deviate from the average.

For typical data sets the following rules of thumb hold: 65% of the data are within the interval [Average-StDev, Average+StDev] 95% of the data are within the interval [Average-2*St Dev, Average+2*StDev] This rule is best understood via example. Imagine that for the above fictitious data set, the Standard Deviation is equal to $1200. The rule of thumb would now imply that about 65% of all data on this list are between [Average-StDev, Average+StDev] Henceforth, 65% of all data on this list are between [$3222.5-$1200, $3222.5+$1200] = [$2022.5, $4422.5]. Similarly, we can estimate that 95% of all data are between [3222.5-2400, 3222.5+2400] which is the interval [$822.5, $5622.5]. This alone is rather striking! Without actually counting and comparing the thousands of data points we can say that it is likely that 95% of all the sales are between $822.5 and $5622.5!

The Median. This is another statistical tool designed to characterize the center of the data. Its actual computation is straight forward: Given a data set, say 5,9,3,4,5,7,11, one first sorts the data: 3,4,5,5,7,9,11 and then picks the middle point which in this case is 5. Thus the Median=5. If we have an even number of data, we sort them, pick the middle two and then average them. Example: 4,2,6,7,12,5,4,3 after sorting becomes 2,3,4,4,5,6,7,12, and the Median=(4+5)/2=4.5. Why Median? Well, it turns out that some data have a structure and its center that is not well characterized by the Average. These are data that exhibit the outliers, that is a few observations that are much, much larger or smaller than the majority. A few examples will help.

Examples: Annual income: Clearly an average annual income for a town or a city, would be greatly swayed if a billionaire moves in the neighborhood. However, the median income would not change at all. House price: Similar argument. Imagine a city where the vast majority of houses, say 95% of them, cost less than $500, 000. Clearly, a sale of a 50 million dollar mansion would inflate the average sale price and provide a much distorted picture about the house pricing for this city. But the median sale price would not be effected. For these reasons, in literature, in newspapers and business reports, the house prices and incomes are typically characterized by their medians and not their averages.