Exploratory data analysis with one and two variables

Size: px
Start display at page:

Download "Exploratory data analysis with one and two variables"

Transcription

1 Exploratory data analysis with one and two variables Instructions for Lab # 1 Statistics Probability and Statistical Inference DUE DATE: Upload on Sakai on July 10 Lab Objective To explore data with histograms and scatter plots. Review A few highlights to review from the previous lab: 1. The Working Directory It is always important to know where the files that you are using are saved on the computer. This is so both you and Stata can access the correct files. Let s see how this would work for this problem. Download the data set movies2012.dta from the course website (link next to the lab handout) and save/put it in the folder you are going to work in (e.g., C:\Users\John\Documents\Stats111\ lab1). Now you need to direct Stata to this folder, which will be the folder you will be working from. There are two ways to accomplish this. (a) Use the menu bar and navigate to File -> Change Working Directory. This will pull up a window that you can then use to find the folder where you saved the data. (b) Issue a command to change the working directory by typing cd followed by the filepath to your file, (e.g., cd C:\Users\John\Documents\Stats111\lab2). Typing pwd into the command line well tell you the current directory and can be used to verify that you are in the right place. 2. Do-files All the commands you enter into the Command Line for the lab can (and should) be put into a "do-file to allow replication and access at a later date. You should save a "do-file in the

2 Stats Exploratory Data Analysis folder where the data is. You can save the example.do from the preovious lab session from the class website and use it as an example of your "do-file". The first three lines and the last two are creating a example.log file that will have all the commands that were executed, as well as the output of them. If you want to use this example.do file as a model for your lab script, save it as lab1.do in the correct directory. Also, change the name of the log file that you want to create (in the third line of the do file, replace example.log with lab1.log). The comments and the display comands are just a silly example. These are the lines that you will want to replace with the commands that you use during the lab. For example, after you have loaded the data (you can also save this command in your do file) and changed to the correct working directory, you may want to summarize a variable. In this case, you would type summarize varname in the do file. In order to run each command at a time, you can select the commands you want to run and click on the Run Selection button (or press a shortcut). Feel free to add as many comments (lines starting with *) as you want. They will help organize your script file so you can understand it the next time you open it. At the end of the lab, you should save your do file. It will allow you to recover all the analysis that you did. 3. Fonts Different fonts generally mean very specific things in the labs. This is to help you more easily distinguish Stata code from the rest of the text: Typewriter text refers to Stata commands. Italicized text, either in typewriter or in regular font, refer to variable names. Sometimes these are specific variables in a data set (Domestic) while at others they are simply a place holder for any variable or name that you may choose (cont_var). Red text is for Data Analysis Tips. Blue text are generally links to datasets, examples or places in the document. Lab Procedures Remeber to write your answers in a report that will be graded. Along with your answers, you can include outputs, graphs and commands to your report. Let s go to the movies! What are the characteristics of U.S. movies that make the most money? Let s address this question with the data set movies2012.dta, which you can download from the course website. It comprises

3 Stats Exploratory Data Analysis data on the 250 top domestic grossing movies of all time as of November The variables are: Variable Ranking Title Year Domestic Foreign Worldwide Budget Rating Best_Picture Genre All_Genres Director Description Ranking on Domestic gross sales Movie title Release year Domestic gross sales Foreign gross sales Worldwide gross sales Budget MPAA rating Academy Awards Best Picture (nominated or won) Main/first genre of the movie List of all genres the movie falls into Name of the director There are missing data in this file. We will ignore them for simplicity. In general, when confronted with missing data, it is best to get the advice of a professional statistician before doing analyses. Data Analysis Tip: The unit of measurement for the monetary variables is not stated. That is bad practice. Always include a description of the units somewhere on the file. Based on knowledge of movie revenues, it is clear that that the unit of measurement is $1,000,000. Questions: 1. After reading in the data, describe the distributions of foreign and domestic grosses. That is, say where most values are, note any outliers, and say whether the distribution is tightly packed around its mean or is spread out. Also, report the mean and standard deviation. In addition to summarizeing the data, you can use histograms to get a visual representation of the distribution of the data. The command is histogram varname, name(graph1, replace) where varname is the variable of interest. As with most commands, there are many options available with histogram. One of the most useful options is name(graph1, replace). This stores a version of the graph in temporary memory to be used later and it causes the graph to be displayed in a window titled graph1. This allow multiple graph windows to be open at the same time 1. You use the replace option within the perentheses of the name option to overwrite the previous version of the graph. Further, if you want to compare multiple histograms in one window you can combine them by typing 1 The default is for each graph, regardless of type, to overwrite the previous graph

4 Stats Exploratory Data Analysis graph combine graph1 graph2, name(graph_all, replace) where again graph1, graph2, and graph_all are just examples for names of the graphs. You can also navigate to Graphics-Histogram to get a wizard to help with the graph. Data Analysis Tip: The default histogram in Stata is a true histogram, where the areas of the bins sum to one. Often people want just the heights to sum to one. This is accomplished with the fraction option. Further, if you want the y-axis to simply count how many observations are in each bin, you can use the frequency option. 2. Which sentence best describes the distributions of domestic and foreign grosses? You can just write the letter of your choice on the lab report. (a) Domestic and foreign grosses are very similar. (b) Domestic and foreign grosses have similar distributional shapes, but foreign grosses tend to be larger than domestic grosses. (c) Domestic and foreign grosses have similar distributional shapes, but domestic grosses tend to be larger than foreign grosses. (d) The two distributions look nothing like each other, because one has a long left tail and the other has a long right tail. 3. What are the names of the two movies that are the largest outliers on all three monetary variables? 4. We can examine the relationship between world-wide gross and movie genre using a box plot. Use the variable Genre for this analysis. The command for a box plot is graph box cont_var, name(graph1, replace) or graph box cont_var, name(graph1, replace) over(cat_var) where cont_var represents the continuous variable that you are trying to graph. The option over(cat_var) allows you to break the box plot down by different values of a categorical variable. Alternatively, you can navigate to Graphics-Box Plot, type the continuous variable in the Main tab and the categorical variable in the Categories tab. If you want to clean up the graph, you can test out some of the other tabs in the wizard. Answer the three questions below. (a) Out of Comedy and Animated movies, which one has a distribution of world-wide grosses that is most similar to the distribution of world-wide grosses for Action movies? Justify your choice in at most two sentences. (b) Compare the distributions for Drama movies and Adventure movies. Do they have reasonably similar medians? Is one more spread out than the other (if so, say which one)?

5 Stats Exploratory Data Analysis (c) If you directed a movie and wanted to make lots of money worldwide, which type appears to give you the best chance of doing so? Base your answer on the results of the box plot. 5. Describe the relationship between domestic gross and foreign gross. To make a scatter plot, we will tell Stata that we want to do a twoway graph as a scatter: graph twoway scatter varname1 varname2 Alternatively, you can navigate to Graphics-Twoway graph (scatter, line, etc.), go to the Plots tab, and hit the Create button. Choose Scatter and the Y and X variables. Items to include in your description are the general trend of the relationship (e.g., positive and linear, negative and linear, some other pattern, no clear pattern) and whether there are any outliers or points that do not fit the pattern. 6. Report the three pairwise correlations between Foreign, Domestic, and World-wide gross. Further, graph the raw distribution of the data for each pair. To find correlations in Stata, type correlate varname1 varname2... which will show a matrix of the correlations between all the variables used as input (you can use as many as you would like). Note that the diagonal is always 1. Make sure you know why that is. In addition, you can create a scatterplot matrix which will create scatter plots of all the different outcomes by typing graph matrix varname1 varname2... Alternatively, you can navigate to Graphics-Scatterbox Matrix. And again, you can use multiple variables. Do the correlations suggest strongly positive linear relationships, weakly positive linear relationships, no linear relationships, weakly negative linear relationships, or strongly negative linear relationships? 7. Why are the correlations between Domestic and Worldwide, and Foreign and Worldwide, stronger than the correlation between Domestic and Foreign? The answer has to do with the definitions of the variables. 8. (extra) Outliers can have a strong effect on correlations. Let s check to see if excluding Avatar and Titanic changes the correlations substantially. To exclude Avatar and Titanic, let s again use the if functionality of Stata by typing if Title!= Avatar & Title!= Titantic at the end of the previous commands. Data Analysis Tip: Note that!= is defined as not equal to and & is the AND operator.

6 Stats Exploratory Data Analysis Now, re-calculate the correlations in (6). Did the correlations get stronger or weaker? Does the substance of your conclusions in (6) change very much when excluding Avatar and Titanic? Data Analysis Tip: It is not acceptable to exclude outliers from analyses unless you have a scientific reason to do so (e.g., a data entry error, or maybe the outlying unit is not part of your target population). Hiding outliers is fudging data to get results you want. That is dishonest and unethical. When you see outliers, do analyses with and without them. When the results do not change much, report the results based on the full data set, and tell your audience that the results were not sensitive to the outliers. When the results do change substantially, report both sets of analyses: one with and one without the outliers. This honestly informs people that your conclusions are not on very solid ground, because particular data points affect the results greatly.

An Introduction to Stata

An Introduction to Stata An Introduction to Stata Instructions Statistics 111 - Probability and Statistical Inference Jul 3, 2013 Lab Objective To become familiar with the software package Stata. Lab Procedures Stata gives us

More information

Introduction to Minitab 1

Introduction to Minitab 1 Introduction to Minitab 1 We begin by first starting Minitab. You may choose to either 1. click on the Minitab icon in the corner of your screen 2. go to the lower left and hit Start, then from All Programs,

More information

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures Part I, Chapters 4 & 5 Data Tables and Data Analysis Statistics and Figures Descriptive Statistics 1 Are data points clumped? (order variable / exp. variable) Concentrated around one value? Concentrated

More information

BIOSTATISTICS LABORATORY PART 1: INTRODUCTION TO DATA ANALYIS WITH STATA: EXPLORING AND SUMMARIZING DATA

BIOSTATISTICS LABORATORY PART 1: INTRODUCTION TO DATA ANALYIS WITH STATA: EXPLORING AND SUMMARIZING DATA BIOSTATISTICS LABORATORY PART 1: INTRODUCTION TO DATA ANALYIS WITH STATA: EXPLORING AND SUMMARIZING DATA Learning objectives: Getting data ready for analysis: 1) Learn several methods of exploring the

More information

BIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26

BIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26 Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26 INTRODUCTION Graphs are one of the most important aspects of data analysis and presentation of your of data. They are visual representations

More information

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student

The basic arrangement of numeric data is called an ARRAY. Array is the derived data from fundamental data Example :- To store marks of 50 student Organizing data Learning Outcome 1. make an array 2. divide the array into class intervals 3. describe the characteristics of a table 4. construct a frequency distribution table 5. constructing a composite

More information

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA This lab will assist you in learning how to summarize and display categorical and quantitative data in StatCrunch. In particular, you will learn how to

More information

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures STA 2023 Module 3 Descriptive Measures Learning Objectives Upon completing this module, you should be able to: 1. Explain the purpose of a measure of center. 2. Obtain and interpret the mean, median, and

More information

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression Your Name: Section: 36-201 INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression Objectives: 1. To learn how to interpret scatterplots. Specifically you will investigate, using

More information

Correlation. January 12, 2019

Correlation. January 12, 2019 Correlation January 12, 2019 Contents Correlations The Scattterplot The Pearson correlation The computational raw-score formula Survey data Fun facts about r Sensitivity to outliers Spearman rank-order

More information

Chapter 3 Analyzing Normal Quantitative Data

Chapter 3 Analyzing Normal Quantitative Data Chapter 3 Analyzing Normal Quantitative Data Introduction: In chapters 1 and 2, we focused on analyzing categorical data and exploring relationships between categorical data sets. We will now be doing

More information

Introduction to Stata Toy Program #1 Basic Descriptives

Introduction to Stata Toy Program #1 Basic Descriptives Introduction to Stata 2018-19 Toy Program #1 Basic Descriptives Summary The goal of this toy program is to get you in and out of a Stata session and, along the way, produce some descriptive statistics.

More information

3 Graphical Displays of Data

3 Graphical Displays of Data 3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked

More information

Chapter 6: DESCRIPTIVE STATISTICS

Chapter 6: DESCRIPTIVE STATISTICS Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling

More information

Excel Tips and FAQs - MS 2010

Excel Tips and FAQs - MS 2010 BIOL 211D Excel Tips and FAQs - MS 2010 Remember to save frequently! Part I. Managing and Summarizing Data NOTE IN EXCEL 2010, THERE ARE A NUMBER OF WAYS TO DO THE CORRECT THING! FAQ1: How do I sort my

More information

Homework 1 Excel Basics

Homework 1 Excel Basics Homework 1 Excel Basics Excel is a software program that is used to organize information, perform calculations, and create visual displays of the information. When you start up Excel, you will see the

More information

WELCOME! Lecture 3 Thommy Perlinger

WELCOME! Lecture 3 Thommy Perlinger Quantitative Methods II WELCOME! Lecture 3 Thommy Perlinger Program Lecture 3 Cleaning and transforming data Graphical examination of the data Missing Values Graphical examination of the data It is important

More information

Chapter 2 Modeling Distributions of Data

Chapter 2 Modeling Distributions of Data Chapter 2 Modeling Distributions of Data Section 2.1 Describing Location in a Distribution Describing Location in a Distribution Learning Objectives After this section, you should be able to: FIND and

More information

I Launching and Exiting Stata. Stata will ask you if you would like to check for updates. Update now or later, your choice.

I Launching and Exiting Stata. Stata will ask you if you would like to check for updates. Update now or later, your choice. I Launching and Exiting Stata 1. Launching Stata Stata can be launched in either of two ways: 1) in the stata program, click on the stata application; or 2) double click on the short cut that you have

More information

NOTES TO CONSIDER BEFORE ATTEMPTING EX 1A TYPES OF DATA

NOTES TO CONSIDER BEFORE ATTEMPTING EX 1A TYPES OF DATA NOTES TO CONSIDER BEFORE ATTEMPTING EX 1A TYPES OF DATA Statistics is concerned with scientific methods of collecting, recording, organising, summarising, presenting and analysing data from which future

More information

Frequency Distributions

Frequency Distributions Displaying Data Frequency Distributions After collecting data, the first task for a researcher is to organize and summarize the data so that it is possible to get a general overview of the results. Remember,

More information

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...

More information

E D T 3 2 E D T 3. Slide 1

E D T 3 2 E D T 3. Slide 1 Slide Spreadsheets Using Microsoft xcel Reminder: We had covered spreadsheets very briefly when we discussed the different types of software in a previous presentation. Spreadsheets are effective tools

More information

Chapter 2: Looking at Multivariate Data

Chapter 2: Looking at Multivariate Data Chapter 2: Looking at Multivariate Data Multivariate data could be presented in tables, but graphical presentations are more effective at displaying patterns. We can see the patterns in one variable at

More information

Table of Contents (As covered from textbook)

Table of Contents (As covered from textbook) Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression

More information

Data Analyst Nanodegree Syllabus

Data Analyst Nanodegree Syllabus Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working

More information

IT 403 Practice Problems (1-2) Answers

IT 403 Practice Problems (1-2) Answers IT 403 Practice Problems (1-2) Answers #1. Using Tukey's Hinges method ('Inclusionary'), what is Q3 for this dataset? 2 3 5 7 11 13 17 a. 7 b. 11 c. 12 d. 15 c (12) #2. How do quartiles and percentiles

More information

This chapter will show how to organize data and then construct appropriate graphs to represent the data in a concise, easy-to-understand form.

This chapter will show how to organize data and then construct appropriate graphs to represent the data in a concise, easy-to-understand form. CHAPTER 2 Frequency Distributions and Graphs Objectives Organize data using frequency distributions. Represent data in frequency distributions graphically using histograms, frequency polygons, and ogives.

More information

Chapter 5. Understanding and Comparing Distributions. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Chapter 5. Understanding and Comparing Distributions. Copyright 2012, 2008, 2005 Pearson Education, Inc. Chapter 5 Understanding and Comparing Distributions The Big Picture We can answer much more interesting questions about variables when we compare distributions for different groups. Below is a histogram

More information

GRAPHING BAYOUSIDE CLASSROOM DATA

GRAPHING BAYOUSIDE CLASSROOM DATA LUMCON S BAYOUSIDE CLASSROOM GRAPHING BAYOUSIDE CLASSROOM DATA Focus/Overview This activity allows students to answer questions about their environment using data collected during water sampling. Learning

More information

MHPE 494: Data Analysis. Welcome! The Analytic Process

MHPE 494: Data Analysis. Welcome! The Analytic Process MHPE 494: Data Analysis Alan Schwartz, PhD Department of Medical Education Memoona Hasnain,, MD, PhD, MHPE Department of Family Medicine College of Medicine University of Illinois at Chicago Welcome! Your

More information

Basic Statistical Terms and Definitions

Basic Statistical Terms and Definitions I. Basics Basic Statistical Terms and Definitions Statistics is a collection of methods for planning experiments, and obtaining data. The data is then organized and summarized so that professionals can

More information

Introduction to Stata: An In-class Tutorial

Introduction to Stata: An In-class Tutorial Introduction to Stata: An I. The Basics - Stata is a command-driven statistical software program. In other words, you type in a command, and Stata executes it. You can use the drop-down menus to avoid

More information

Introduction to STATA

Introduction to STATA Center for Teaching, Research and Learning Research Support Group American University, Washington, D.C. Hurst Hall 203 rsg@american.edu (202) 885-3862 Introduction to STATA WORKSHOP OBJECTIVE: This workshop

More information

STA Module 2B Organizing Data and Comparing Distributions (Part II)

STA Module 2B Organizing Data and Comparing Distributions (Part II) STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and

More information

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II) STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and

More information

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables

More information

Overview. Experiment Specifications. This tutorial will enable you to

Overview. Experiment Specifications. This tutorial will enable you to Defining a protocol in BioAssay Overview BioAssay provides an interface to store, manipulate, and retrieve biological assay data. The application allows users to define customized protocol tables representing

More information

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Minitab 17 commands Prepared by Jeffrey S. Simonoff Minitab 17 commands Prepared by Jeffrey S. Simonoff Data entry and manipulation To enter data by hand, click on the Worksheet window, and enter the values in as you would in any spreadsheet. To then save

More information

Exercise 1: Introduction to Stata

Exercise 1: Introduction to Stata Exercise 1: Introduction to Stata New Stata Commands use describe summarize stem graph box histogram log on, off exit New Stata Commands Downloading Data from the Web I recommend that you use Internet

More information

SAS Visual Analytics 8.2: Working with Report Content

SAS Visual Analytics 8.2: Working with Report Content SAS Visual Analytics 8.2: Working with Report Content About Objects After selecting your data source and data items, add one or more objects to display the results. SAS Visual Analytics provides objects

More information

Section 4 General Factorial Tutorials

Section 4 General Factorial Tutorials Section 4 General Factorial Tutorials General Factorial Part One: Categorical Introduction Design-Ease software version 6 offers a General Factorial option on the Factorial tab. If you completed the One

More information

3 Graphical Displays of Data

3 Graphical Displays of Data 3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked

More information

Data Analyst Nanodegree Syllabus

Data Analyst Nanodegree Syllabus Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working

More information

Understanding and Comparing Distributions. Chapter 4

Understanding and Comparing Distributions. Chapter 4 Understanding and Comparing Distributions Chapter 4 Objectives: Boxplot Calculate Outliers Comparing Distributions Timeplot The Big Picture We can answer much more interesting questions about variables

More information

CHAPTER 2 Modeling Distributions of Data

CHAPTER 2 Modeling Distributions of Data CHAPTER 2 Modeling Distributions of Data 2.2 Density Curves and Normal Distributions The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Density Curves

More information

TYPES OF VARIABLES, STRUCTURE OF DATASETS, AND BASIC STATA LAYOUT

TYPES OF VARIABLES, STRUCTURE OF DATASETS, AND BASIC STATA LAYOUT PRIMER FOR ACS OUTCOMES RESEARCH COURSE: TYPES OF VARIABLES, STRUCTURE OF DATASETS, AND BASIC STATA LAYOUT STEP 1: Install STATA statistical software. STEP 2: Read through this primer and complete the

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 3: Distributions Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture Examine data in graphical form Graphs for looking at univariate distributions

More information

Easing into Data Exploration, Reporting, and Analytics Using SAS Enterprise Guide

Easing into Data Exploration, Reporting, and Analytics Using SAS Enterprise Guide Paper 809-2017 Easing into Data Exploration, Reporting, and Analytics Using SAS Enterprise Guide ABSTRACT Marje Fecht, Prowerk Consulting Whether you have been programming in SAS for years, are new to

More information

Page 1. Graphical and Numerical Statistics

Page 1. Graphical and Numerical Statistics TOPIC: Description Statistics In this tutorial, we show how to use MINITAB to produce descriptive statistics, both graphical and numerical, for an existing MINITAB dataset. The example data come from Exercise

More information

Create a bar graph that displays the data from the frequency table in Example 1. See the examples on p Does our graph look different?

Create a bar graph that displays the data from the frequency table in Example 1. See the examples on p Does our graph look different? A frequency table is a table with two columns, one for the categories and another for the number of times each category occurs. See Example 1 on p. 247. Create a bar graph that displays the data from the

More information

Stata: A Brief Introduction Biostatistics

Stata: A Brief Introduction Biostatistics Stata: A Brief Introduction Biostatistics 140.621 2005-2006 1. Statistical Packages There are many statistical packages (Stata, SPSS, SAS, Splus, etc.) Statistical packages can be used for Analysis Data

More information

Orientation Assignment for Statistics Software (nothing to hand in) Mary Parker,

Orientation Assignment for Statistics Software (nothing to hand in) Mary Parker, Orientation to MINITAB, Mary Parker, mparker@austincc.edu. Last updated 1/3/10. page 1 of Orientation Assignment for Statistics Software (nothing to hand in) Mary Parker, mparker@austincc.edu When you

More information

2) familiarize you with a variety of comparative statistics biologists use to evaluate results of experiments;

2) familiarize you with a variety of comparative statistics biologists use to evaluate results of experiments; A. Goals of Exercise Biology 164 Laboratory Using Comparative Statistics in Biology "Statistics" is a mathematical tool for analyzing and making generalizations about a population from a number of individual

More information

/4 Directions: Graph the functions, then answer the following question.

/4 Directions: Graph the functions, then answer the following question. 1.) Graph y = x. Label the graph. Standard: F-BF.3 Identify the effect on the graph of replacing f(x) by f(x) +k, k f(x), f(kx), and f(x+k), for specific values of k; find the value of k given the graphs.

More information

Averages and Variation

Averages and Variation Averages and Variation 3 Copyright Cengage Learning. All rights reserved. 3.1-1 Section 3.1 Measures of Central Tendency: Mode, Median, and Mean Copyright Cengage Learning. All rights reserved. 3.1-2 Focus

More information

Tips and Guidance for Analyzing Data. Executive Summary

Tips and Guidance for Analyzing Data. Executive Summary Tips and Guidance for Analyzing Data Executive Summary This document has information and suggestions about three things: 1) how to quickly do a preliminary analysis of time-series data; 2) key things to

More information

CHAPTER 1 GETTING STARTED

CHAPTER 1 GETTING STARTED CHAPTER 1 GETTING STARTED Configuration Requirements This design of experiment software package is written for the Windows 2000, XP and Vista environment. The following system requirements are necessary

More information

CHAPTER 3: Data Description

CHAPTER 3: Data Description CHAPTER 3: Data Description You ve tabulated and made pretty pictures. Now what numbers do you use to summarize your data? Ch3: Data Description Santorico Page 68 You ll find a link on our website to a

More information

Bar Charts and Frequency Distributions

Bar Charts and Frequency Distributions Bar Charts and Frequency Distributions Use to display the distribution of categorical (nominal or ordinal) variables. For the continuous (numeric) variables, see the page Histograms, Descriptive Stats

More information

Chapter 5snow year.notebook March 15, 2018

Chapter 5snow year.notebook March 15, 2018 Chapter 5: Statistical Reasoning Section 5.1 Exploring Data Measures of central tendency (Mean, Median and Mode) attempt to describe a set of data by identifying the central position within a set of data

More information

Chapter 3 - Displaying and Summarizing Quantitative Data

Chapter 3 - Displaying and Summarizing Quantitative Data Chapter 3 - Displaying and Summarizing Quantitative Data 3.1 Graphs for Quantitative Data (LABEL GRAPHS) August 25, 2014 Histogram (p. 44) - Graph that uses bars to represent different frequencies or relative

More information

AND NUMERICAL SUMMARIES. Chapter 2

AND NUMERICAL SUMMARIES. Chapter 2 EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 What Are the Types of Data? 2.1 Objectives www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative

More information

M7D1.a: Formulate questions and collect data from a census of at least 30 objects and from samples of varying sizes.

M7D1.a: Formulate questions and collect data from a census of at least 30 objects and from samples of varying sizes. M7D1.a: Formulate questions and collect data from a census of at least 30 objects and from samples of varying sizes. Population: Census: Biased: Sample: The entire group of objects or individuals considered

More information

Lecture Notes 3: Data summarization

Lecture Notes 3: Data summarization Lecture Notes 3: Data summarization Highlights: Average Median Quartiles 5-number summary (and relation to boxplots) Outliers Range & IQR Variance and standard deviation Determining shape using mean &

More information

Lab 5, part b: Scatterplots and Correlation

Lab 5, part b: Scatterplots and Correlation Lab 5, part b: Scatterplots and Correlation Toews, Math 160, Fall 2014 November 21, 2014 Objectives: 1. Get more practice working with data frames 2. Start looking at relationships between two variables

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3

Data Mining: Exploring Data. Lecture Notes for Chapter 3 Data Mining: Exploring Data Lecture Notes for Chapter 3 1 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include

More information

ECO375 Tutorial 1 Introduction to Stata

ECO375 Tutorial 1 Introduction to Stata ECO375 Tutorial 1 Introduction to Stata Matt Tudball University of Toronto Mississauga September 14, 2017 Matt Tudball (University of Toronto) ECO375H5 September 14, 2017 1 / 25 What Is Stata? Stata is

More information

Introduction to Stata First Session. I- Launching and Exiting Stata Launching Stata Exiting Stata..

Introduction to Stata First Session. I- Launching and Exiting Stata Launching Stata Exiting Stata.. Introduction to Stata 2016-17 01. First Session I- Launching and Exiting Stata... 1. Launching Stata... 2. Exiting Stata.. II - Toolbar, Menu bar and Windows.. 1. Toolbar Key.. 2. Menu bar Key..... 3.

More information

Chapter 5. Understanding and Comparing Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Chapter 5. Understanding and Comparing Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc. Chapter 5 Understanding and Comparing Distributions The Big Picture We can answer much more interesting questions about variables when we compare distributions for different groups. Below is a histogram

More information

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 Objectives 2.1 What Are the Types of Data? www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative

More information

LAB 1: Graphical Descriptions of Data

LAB 1: Graphical Descriptions of Data LAB 1: Graphical Descriptions of Data Part I: Before Class 1) Read this assignment all the way through; 2) Know the terms and understand the concepts of: - scatterplots - stemplots - distributions - histograms

More information

MATH& 146 Lesson 8. Section 1.6 Averages and Variation

MATH& 146 Lesson 8. Section 1.6 Averages and Variation MATH& 146 Lesson 8 Section 1.6 Averages and Variation 1 Summarizing Data The distribution of a variable is the overall pattern of how often the possible values occur. For numerical variables, three summary

More information

Statistics Lecture 6. Looking at data one variable

Statistics Lecture 6. Looking at data one variable Statistics 111 - Lecture 6 Looking at data one variable Chapter 1.1 Moore, McCabe and Craig Probability vs. Statistics Probability 1. We know the distribution of the random variable (Normal, Binomial)

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3

Data Mining: Exploring Data. Lecture Notes for Chapter 3 Data Mining: Exploring Data Lecture Notes for Chapter 3 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Look for accompanying R code on the course web site. Topics Exploratory Data Analysis

More information

Introduction to BEST Viewpoints

Introduction to BEST Viewpoints Introduction to BEST Viewpoints This is not all but just one of the documentation files included in BEST Viewpoints. Introduction BEST Viewpoints is a user friendly data manipulation and analysis application

More information

Lecture 3 Questions that we should be able to answer by the end of this lecture:

Lecture 3 Questions that we should be able to answer by the end of this lecture: Lecture 3 Questions that we should be able to answer by the end of this lecture: Which is the better exam score? 67 on an exam with mean 50 and SD 10 or 62 on an exam with mean 40 and SD 12 Is it fair

More information

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. + What is Data? Data is a collection of facts. Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. In most cases, data needs to be interpreted and

More information

Chapter 2: The Normal Distribution

Chapter 2: The Normal Distribution Chapter 2: The Normal Distribution 2.1 Density Curves and the Normal Distributions 2.2 Standard Normal Calculations 1 2 Histogram for Strength of Yarn Bobbins 15.60 16.10 16.60 17.10 17.60 18.10 18.60

More information

Lecture 3 Questions that we should be able to answer by the end of this lecture:

Lecture 3 Questions that we should be able to answer by the end of this lecture: Lecture 3 Questions that we should be able to answer by the end of this lecture: Which is the better exam score? 67 on an exam with mean 50 and SD 10 or 62 on an exam with mean 40 and SD 12 Is it fair

More information

Ex.1 constructing tables. a) find the joint relative frequency of males who have a bachelors degree.

Ex.1 constructing tables. a) find the joint relative frequency of males who have a bachelors degree. Two-way Frequency Tables two way frequency table- a table that divides responses into categories. Joint relative frequency- the number of times a specific response is given divided by the sample. Marginal

More information

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.

More information

No. of blue jelly beans No. of bags

No. of blue jelly beans No. of bags Math 167 Ch5 Review 1 (c) Janice Epstein CHAPTER 5 EXPLORING DATA DISTRIBUTIONS A sample of jelly bean bags is chosen and the number of blue jelly beans in each bag is counted. The results are shown in

More information

VCEasy VISUAL FURTHER MATHS. Overview

VCEasy VISUAL FURTHER MATHS. Overview VCEasy VISUAL FURTHER MATHS Overview This booklet is a visual overview of the knowledge required for the VCE Year 12 Further Maths examination.! This booklet does not replace any existing resources that

More information

DSC 201: Data Analysis & Visualization

DSC 201: Data Analysis & Visualization DSC 201: Data Analysis & Visualization Exploratory Data Analysis Dr. David Koop What is Exploratory Data Analysis? "Detective work" to summarize and explore datasets Includes: - Data acquisition and input

More information

Data Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining Data Mining: Exploring Data Lecture Notes for Data Exploration Chapter Introduction to Data Mining by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 What is data exploration?

More information

Date:.. /. / 20.. Remas Language Schools. Name :. Class : Second Term 5th Primary 1 Computer Department

Date:.. /. / 20.. Remas Language Schools. Name :. Class : Second Term 5th Primary 1 Computer Department Name :. Class : Second Term 5th Primary 1 Computer Department Table of contents of the (Second term) Chapter 3: continue the PowerPoint: Lesson 8: View show Lesson 9: Slide to slide transitions Lesson

More information

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data. Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting

More information

How to Make Graphs with Excel 2007

How to Make Graphs with Excel 2007 Appendix A How to Make Graphs with Excel 2007 A.1 Introduction This is a quick-and-dirty tutorial to teach you the basics of graph creation and formatting in Microsoft Excel. Many of the tasks that you

More information

Bar Graphs and Dot Plots

Bar Graphs and Dot Plots CONDENSED LESSON 1.1 Bar Graphs and Dot Plots In this lesson you will interpret and create a variety of graphs find some summary values for a data set draw conclusions about a data set based on graphs

More information

Exploring and Understanding Data Using R.

Exploring and Understanding Data Using R. Exploring and Understanding Data Using R. Loading the data into an R data frame: variable

More information

STA Module 4 The Normal Distribution

STA Module 4 The Normal Distribution STA 2023 Module 4 The Normal Distribution Learning Objectives Upon completing this module, you should be able to 1. Explain what it means for a variable to be normally distributed or approximately normally

More information

STA /25/12. Module 4 The Normal Distribution. Learning Objectives. Let s Look at Some Examples of Normal Curves

STA /25/12. Module 4 The Normal Distribution. Learning Objectives. Let s Look at Some Examples of Normal Curves STA 2023 Module 4 The Normal Distribution Learning Objectives Upon completing this module, you should be able to 1. Explain what it means for a variable to be normally distributed or approximately normally

More information

To complete the computer assignments, you ll use the EViews software installed on the lab PCs in WMC 2502 and WMC 2506.

To complete the computer assignments, you ll use the EViews software installed on the lab PCs in WMC 2502 and WMC 2506. An Introduction to EViews The purpose of the computer assignments in BUEC 333 is to give you some experience using econometric software to analyse real-world data. Along the way, you ll become acquainted

More information

Glossary Common Core Curriculum Maps Math/Grade 6 Grade 8

Glossary Common Core Curriculum Maps Math/Grade 6 Grade 8 Glossary Common Core Curriculum Maps Math/Grade 6 Grade 8 Grade 6 Grade 8 absolute value Distance of a number (x) from zero on a number line. Because absolute value represents distance, the absolute value

More information

Introduction to GIS & Mapping: ArcGIS Desktop

Introduction to GIS & Mapping: ArcGIS Desktop Introduction to GIS & Mapping: ArcGIS Desktop Your task in this exercise is to determine the best place to build a mixed use facility in Hudson County, NJ. In order to revitalize the community and take

More information

DSCI 325: Handout 9 Sorting and Options for Printing Data in SAS Spring 2017

DSCI 325: Handout 9 Sorting and Options for Printing Data in SAS Spring 2017 DSCI 325: Handout 9 Sorting and Options for Printing Data in SAS Spring 2017 There are a handful of statements (TITLE, FOOTNOTE, WHERE, BY, etc.) that can be used in a wide variety of procedures. For example,

More information

Working with Charts Stratum.Viewer 6

Working with Charts Stratum.Viewer 6 Working with Charts Stratum.Viewer 6 Getting Started Tasks Additional Information Access to Charts Introduction to Charts Overview of Chart Types Quick Start - Adding a Chart to a View Create a Chart with

More information

Visual Analytics. Visualizing multivariate data:

Visual Analytics. Visualizing multivariate data: Visual Analytics 1 Visualizing multivariate data: High density time-series plots Scatterplot matrices Parallel coordinate plots Temporal and spectral correlation plots Box plots Wavelets Radar and /or

More information

Week 1: Introduction to Stata

Week 1: Introduction to Stata Week 1: Introduction to Stata Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ALL RIGHTS RESERVED 1 Outline Log

More information