EXPLORATORY DATA ANALYSIS. Introducing the data

Size: px
Start display at page:

Download "EXPLORATORY DATA ANALYSIS. Introducing the data"

Transcription

1 EXPLORATORY DATA ANALYSIS Introducing the data

2 data set > # A tibble: 3, spam to_multiple from cc sent_ time image <fctr> <dbl> <dbl> <int> <dbl> <dttm> <dbl> 1 not-spam :16: not-spam :03: not-spam :00: not-spam :09: not-spam :00: not-spam :04: not-spam :55: not-spam :45: not-spam :08: not-spam :12:00 0 #... with 3,911 more rows, and 14 more variables: attach <dbl>, # dollar <dbl>, winner <fctr>, inherit <dbl>, viagra <dbl>, # password <dbl>, num_char <dbl>, line_breaks <int>, format <dbl>, # re_subj <dbl>, exclaim_subj <dbl>, urgent_subj <dbl>, # exclaim_mess <dbl>, number <fctr>

3 Histograms > ggplot(data, aes(x = var1)) + geom_histogram()

4 Histograms > ggplot(data, aes(x = var1)) + geom_histogram() + facet_wrap(~var2)

5 Boxplots > ggplot(data, aes(x = var2, y = var1)) + geom_boxplot()

6 Boxplots > ggplot(data, aes(x = 1, y = var1)) + geom_boxplot()

7 Density plots > ggplot(data, aes(x = var1)) + geom_boxplot()

8 Density plots > ggplot(data, aes(x = var1, fill = var2)) + geom_density(alpha =.3)

9 EXPLORATORY DATA ANALYSIS Let s practice!

10 EXPLORATORY DATA ANALYSIS Check-in 1

11 Review

12 Zero inflation strategies Analyze the two components separately Collapse into two-level categorical variable

13 Zero inflation strategies > %>% mutate(zero = exclaim_mess == 0) %>% ggplot(aes(x = zero)) + geom_bar() + facet_wrap(~spam)

14 Barchart options > %>% mutate(zero = exclaim_mess == 0) %>% ggplot(aes(x = zero, fill = spam)) + geom_bar()

15 Barchart options > %>% mutate(zero = exclaim_mess == 0) %>% ggplot(aes(x = zero, fill = spam)) + geom_bar(position = "fill")

16 EXPLORATORY DATA ANALYSIS Let s practice!

17 EXPLORATORY DATA ANALYSIS Check-in 2

18 Spam and images > %>% mutate(has_image = image > 0) %>% ggplot(aes(x = as.factor(has_image), fill = spam)) + geom_bar(position = "fill")

19 Spam and images > %>% mutate(has_image = image > 0) %>% ggplot(aes(x = spam, fill = has_image)) + geom_bar(position = "fill")

20 Ordering bars

21 Ordering bars > <- %>% mutate(zero = exclaim_mess == 0) > levels( $zero) NULL > $zero <- factor( $zero, levels = c("true", "FALSE")) > %>% ggplot(aes(x = zero)) + geom_bar() + facet_wrap(~spam) TRUE first, then FALSE

22 Ordering bars > <- %>% mutate(zero = exclaim_mess == 0) > levels( $zero) NULL > $zero <- factor( $zero, levels = c("true", "FALSE")) > %>% ggplot(aes(x = zero)) + geom_bar() + facet_wrap(~spam)

23 EXPLORATORY DATA ANALYSIS Let s practice!

24 EXPLORATORY DATA ANALYSIS Conclusion

25 Pie chart vs. bar chart

26 Faceting vs. stacking

27 Histogram > ggplot(data, aes(x = var1)) + geom_histogram()

28 Density plot > cars %>% filter(eng_size < 2.0) %>% ggplot(aes(x = hwy_mpg)) + geom_density()

29 Side-by-side box plots > ggplot(common_cyl, aes(x = as.factor(ncyl), y = city_mpg)) + geom_boxplot() Warning message: Removed 11 rows containing non-finite values (stat_boxplot).

30 Center: mean, median, mode > x [1] > table(x) x

31 Shape of income > ggplot(life, aes(x = income, fill = west_coast)) + geom_density(alpha =.3) > ggplot(life, aes(x = log(income), fill = west_coast)) + geom_density(alpha =.3)

32 With group_by() > life %>% + slice(240:247) %>% + group_by(west_coast) %>% + summarize(mean(expectancy)) # A tibble: 2 x 2 west_coast mean(expectancy) <lgl> <dbl> 1 FALSE TRUE state county expectancy income west_coast California Tuolumne TRUE California Ventura TRUE California Yolo TRUE California Yuba TRUE Colorado Adams FALSE Colorado Alamosa FALSE Colorado Arapahoe FALSE Colorado Archuleta FALSE

33 Spam and exclamation points > %>% mutate(zero = exclaim_mess == 0) %>% ggplot(aes(x = zero, fill = spam)) + geom_bar()

34 Spam and images > %>% mutate(has_image = image > 0) %>% ggplot(aes(x = as.factor(has_image), fill = spam)) + geom_bar(position = "fill")

35 EXPLORATORY DATA ANALYSIS Congratulations!

Math 183 Statistical Methods

Math 183 Statistical Methods Math 183 Statistical Methods Eddie Aamari S.E.W. Assistant Professor eaamari@ucsd.edu math.ucsd.edu/~eaamari/ AP&M 5880A 1 / 24 Math 183 Statistical Methods Eddie Aamari S.E.W. Assistant Professor eaamari@ucsd.edu

More information

Old Faithful Chris Parrish

Old Faithful Chris Parrish Old Faithful Chris Parrish 17-4-27 Contents Old Faithful eruptions 1 data.................................................. 1 duration................................................ 1 waiting time..............................................

More information

The diamonds dataset Visualizing data in R with ggplot2

The diamonds dataset Visualizing data in R with ggplot2 Lecture 2 STATS/CME 195 Matteo Sesia Stanford University Spring 2018 Contents The diamonds dataset Visualizing data in R with ggplot2 The diamonds dataset The tibble package The tibble package is part

More information

Data Visualization in R

Data Visualization in R Data Visualization in R L. Torgo ltorgo@fc.up.pt Faculdade de Ciências / LIAAD-INESC TEC, LA Universidade do Porto Aug, 2017 Introduction Motivation for Data Visualization Humans are outstanding at detecting

More information

Facets and Continuous graphs

Facets and Continuous graphs Facets and Continuous graphs One way to add additional variables is with aesthetics. Another way, particularly useful for categorical variables, is to split your plot into facets, subplots that each display

More information

Lecture 4: Data Visualization I

Lecture 4: Data Visualization I Lecture 4: Data Visualization I Data Science for Business Analytics Thibault Vatter Department of Statistics, Columbia University and HEC Lausanne, UNIL 11.03.2018 Outline 1 Overview

More information

Introduction to R and the tidyverse. Paolo Crosetto

Introduction to R and the tidyverse. Paolo Crosetto Introduction to R and the tidyverse Paolo Crosetto Lecture 1: plotting Before we start: Rstudio Interactive console Object explorer Script window Plot window Before we start: R concatenate: c() assign:

More information

Statistical transformations

Statistical transformations Statistical transformations Next, let s take a look at a bar chart. Bar charts seem simple, but they are interesting because they reveal something subtle about plots. Consider a basic bar chart, as drawn

More information

Lab5A - Intro to GGPLOT2 Z.Sang Sept 24, 2018

Lab5A - Intro to GGPLOT2 Z.Sang Sept 24, 2018 LabA - Intro to GGPLOT2 Z.Sang Sept 24, 218 In this lab you will learn to visualize raw data by plotting exploratory graphics with ggplot2 package. Unlike final graphs for publication or thesis, exploratory

More information

Creating elegant graphics in R with ggplot2

Creating elegant graphics in R with ggplot2 Creating elegant graphics in R with ggplot2 Lauren Steely Bren School of Environmental Science and Management University of California, Santa Barbara What is ggplot2, and why is it so great? ggplot2 is

More information

The Average and SD in R

The Average and SD in R The Average and SD in R The Basics: mean() and sd() Calculating an average and standard deviation in R is straightforward. The mean() function calculates the average and the sd() function calculates the

More information

Package ggextra. April 4, 2018

Package ggextra. April 4, 2018 Package ggextra April 4, 2018 Title Add Marginal Histograms to 'ggplot2', and More 'ggplot2' Enhancements Version 0.8 Collection of functions and layers to enhance 'ggplot2'. The flagship function is 'ggmarginal()',

More information

Python for Data Analysis. Prof.Sushila Aghav-Palwe Assistant Professor MIT

Python for Data Analysis. Prof.Sushila Aghav-Palwe Assistant Professor MIT Python for Data Analysis Prof.Sushila Aghav-Palwe Assistant Professor MIT Four steps to apply data analytics: 1. Define your Objective What are you trying to achieve? What could the result look like? 2.

More information

INTRODUCTION TO DATA. Welcome to the course!

INTRODUCTION TO DATA. Welcome to the course! INTRODUCTION TO DATA Welcome to the course! High School and Beyond id gender race socst 70 male white 57 121 female white 61 86 male white 31 137 female white 61 Loading data > # Load package > library(openintro)

More information

Introduction to R and R-Studio Toy Program #2 Excel to R & Basic Descriptives

Introduction to R and R-Studio Toy Program #2 Excel to R & Basic Descriptives Introduction to R and R-Studio 2018-19 Toy Program #2 Basic Descriptives Summary The goal of this toy program is to give you a boiler for working with your own excel data. So, I m hoping you ll try!. In

More information

Data Visualization Using R & ggplot2. Karthik Ram October 6, 2013

Data Visualization Using R & ggplot2. Karthik Ram October 6, 2013 Data Visualization Using R & ggplot2 Karthik Ram October 6, 2013 Some housekeeping Install some packages install.packages("ggplot2", dependencies = TRUE) install.packages("plyr") install.packages("ggthemes")

More information

Statistics Lecture 6. Looking at data one variable

Statistics Lecture 6. Looking at data one variable Statistics 111 - Lecture 6 Looking at data one variable Chapter 1.1 Moore, McCabe and Craig Probability vs. Statistics Probability 1. We know the distribution of the random variable (Normal, Binomial)

More information

BIOSTATS 640 Spring 2018 Introduction to R Data Description. 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages...

BIOSTATS 640 Spring 2018 Introduction to R Data Description. 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages... BIOSTATS 640 Spring 2018 Introduction to R and R-Studio Data Description Page 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages... 2. Load R Data.. a. Load R data frames...

More information

Visualizing Data: Customization with ggplot2

Visualizing Data: Customization with ggplot2 Visualizing Data: Customization with ggplot2 Data Science 1 Stanford University, Department of Statistics ggplot2: Customizing graphics in R ggplot2 by RStudio s Hadley Wickham and Winston Chang offers

More information

Graphing Bivariate Relationships

Graphing Bivariate Relationships Graphing Bivariate Relationships Overview To fully explore the relationship between two variables both summary statistics and visualizations are important. For this assignment you will describe the relationship

More information

MTH 3210: PROBABILITY AND STATISTICS DESCRIPTIVE STATISTICS WORKSHEET

MTH 3210: PROBABILITY AND STATISTICS DESCRIPTIVE STATISTICS WORKSHEET MTH 3210: PROBABILITY AND STATISTICS DESCRIPTIVE STATISTICS WORKSHEET Before you work on the practice problems (Section 3) please make sure that you read the supplementary notes (Section 1) and work through

More information

Exploratory Data Analysis! 1D!

Exploratory Data Analysis! 1D! Exploratory Data Analysis! 1D!! philosophical view What is EDA? Paraphrasing John Tukey: It is a mindset (a willingness to look for what can be seen, whether or not it is an>cipated), a flexibility (let

More information

1 The ggplot2 workflow

1 The ggplot2 workflow ggplot2 @ statistics.com Week 2 Dope Sheet Page 1 dope, n. information especially from a reliable source [the inside dope]; v. figure out usually used with out; adj. excellent 1 This week s dope This week

More information

Ggplot2 QMMA. Emanuele Taufer. 2/19/2018 Ggplot2 (1)

Ggplot2 QMMA. Emanuele Taufer. 2/19/2018 Ggplot2 (1) Ggplot2 QMMA Emanuele Taufer file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20classes/1-4_ggplot2.html#(1) 1/27 Ggplot2 ggplot2 is a plotting system for R, based on the

More information

Exploratory Data Analysis on NCES Data Developed by Yuqi Liao, Paul Bailey, and Ting Zhang May 10, 2018

Exploratory Data Analysis on NCES Data Developed by Yuqi Liao, Paul Bailey, and Ting Zhang May 10, 2018 Exploratory Data Analysis on NCES Data Developed by Yuqi Liao, Paul Bailey, and Ting Zhang May 1, 218 Vignette Outline This vignette provides examples of conducting exploratory data analysis (EDA) on NAEP

More information

A Quick and focused overview of R data types and ggplot2 syntax MAHENDRA MARIADASSOU, MARIA BERNARD, GERALDINE PASCAL, LAURENT CAUQUIL

A Quick and focused overview of R data types and ggplot2 syntax MAHENDRA MARIADASSOU, MARIA BERNARD, GERALDINE PASCAL, LAURENT CAUQUIL A Quick and focused overview of R data types and ggplot2 syntax MAHENDRA MARIADASSOU, MARIA BERNARD, GERALDINE PASCAL, LAURENT CAUQUIL 1 R and RStudio OVERVIEW 2 R and RStudio R is a free and open environment

More information

Outline day 4 May 30th

Outline day 4 May 30th Graphing in R: basic graphing ggplot2 package Outline day 4 May 30th 05/2017 117 Graphing in R: basic graphing 05/2017 118 basic graphing Producing graphs R-base package graphics offers funcaons for producing

More information

Create a bar graph that displays the data from the frequency table in Example 1. See the examples on p Does our graph look different?

Create a bar graph that displays the data from the frequency table in Example 1. See the examples on p Does our graph look different? A frequency table is a table with two columns, one for the categories and another for the number of times each category occurs. See Example 1 on p. 247. Create a bar graph that displays the data from the

More information

Exploratory data analysis

Exploratory data analysis Lecture 4 STATS/CME 195 Matteo Sesia Stanford University Spring 2018 Contents Exploratory data analysis Exploratory data analysis What is exploratory data analysis (EDA) In this lecture we discuss how

More information

Section 2-2 Frequency Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc

Section 2-2 Frequency Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc Section 2-2 Frequency Distributions Copyright 2010, 2007, 2004 Pearson Education, Inc. 2.1-1 Frequency Distribution Frequency Distribution (or Frequency Table) It shows how a data set is partitioned among

More information

03 - Intro to graphics (with ggplot2)

03 - Intro to graphics (with ggplot2) 3 - Intro to graphics (with ggplot2) ST 597 Spring 217 University of Alabama 3-dataviz.pdf Contents 1 Intro to R Graphics 2 1.1 Graphics Packages................................ 2 1.2 Base Graphics...................................

More information

Rstudio GGPLOT2. Preparations. The first plot: Hello world! W2018 RENR690 Zihaohan Sang

Rstudio GGPLOT2. Preparations. The first plot: Hello world! W2018 RENR690 Zihaohan Sang Rstudio GGPLOT2 Preparations There are several different systems for creating data visualizations in R. We will introduce ggplot2, which is based on Leland Wilkinson s Grammar of Graphics. The learning

More information

Statistics 251: Statistical Methods

Statistics 251: Statistical Methods Statistics 251: Statistical Methods Summaries and Graphs in R Module R1 2018 file:///u:/documents/classes/lectures/251301/renae/markdown/master%20versions/summary_graphs.html#1 1/14 Summary Statistics

More information

K-fold cross validation in the Tidyverse Stephanie J. Spielman 11/7/2017

K-fold cross validation in the Tidyverse Stephanie J. Spielman 11/7/2017 K-fold cross validation in the Tidyverse Stephanie J. Spielman 11/7/2017 Requirements This demo requires several packages: tidyverse (dplyr, tidyr, tibble, ggplot2) modelr broom proc Background K-fold

More information

Graph tool instructions and R code

Graph tool instructions and R code Graph tool instructions and R code 1) Prepare data: tab-delimited format Data need to be inputted in a tab-delimited format. This can be easily achieved by preparing the data in a spread sheet program

More information

Maps & layers. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University.

Maps & layers. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University. Maps & layers Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University July 2010 1. Introduction to map data 2. Map projections 3. Loading & converting

More information

An Introduction to R Graphics

An Introduction to R Graphics An Introduction to R Graphics PnP Group Seminar 25 th April 2012 Why use R for graphics? Fast data exploration Easy automation and reproducibility Create publication quality figures Customisation of almost

More information

MTH 1080, SPRING 2018 DESCRIPTIVE STATISTICS WORKSHEET

MTH 1080, SPRING 2018 DESCRIPTIVE STATISTICS WORKSHEET MTH 1080, SPRING 2018 DESCRIPTIVE STATISTICS WORKSHEET Before you work on the practice problems (Section 3) please make sure that you read the supplementary notes (Section 1) and work through the examples

More information

Bar Charts and Frequency Distributions

Bar Charts and Frequency Distributions Bar Charts and Frequency Distributions Use to display the distribution of categorical (nominal or ordinal) variables. For the continuous (numeric) variables, see the page Histograms, Descriptive Stats

More information

Introduction to R. A Statistical Computing Environment. J.C. Wang. Department of Statistics Western Michigan University

Introduction to R. A Statistical Computing Environment. J.C. Wang. Department of Statistics Western Michigan University Introduction to R A Statistical Computing Environment J.C. Wang Department of Statistics Western Michigan University September 19, 2008 / Statistics Seminar Outline 1 Introduction What is R R Environment

More information

Large data. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University.

Large data. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University. Large data Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University November 2010 1. The diamonds data 2. Histograms and bar charts 3. Frequency polygons

More information

Data visualization with ggplot2

Data visualization with ggplot2 Data visualization with ggplot2 Visualizing data in R with the ggplot2 package Authors: Mateusz Kuzak, Diana Marek, Hedi Peterson, Dmytro Fishman Disclaimer We will be using the functions in the ggplot2

More information

Brief Guide on Using SPSS 10.0

Brief Guide on Using SPSS 10.0 Brief Guide on Using SPSS 10.0 (Use student data, 22 cases, studentp.dat in Dr. Chang s Data Directory Page) (Page address: http://www.cis.ysu.edu/~chang/stat/) I. Processing File and Data To open a new

More information

Data Science and Machine Learning Essentials

Data Science and Machine Learning Essentials Data Science and Machine Learning Essentials Lab 3A Visualizing Data By Stephen Elston and Graeme Malcolm Overview In this lab, you will learn how to use R or Python to visualize data. If you intend to

More information

ggplot2 basics Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University September 2011

ggplot2 basics Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University September 2011 ggplot2 basics Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University September 2011 1. Diving in: scatterplots & aesthetics 2. Facetting 3. Geoms

More information

Sections Graphical Displays and Measures of Center. Brian Habing Department of Statistics University of South Carolina.

Sections Graphical Displays and Measures of Center. Brian Habing Department of Statistics University of South Carolina. STAT 515 Statistical Methods I Sections 2.1-2.3 Graphical Displays and Measures of Center Brian Habing Department of Statistics University of South Carolina Redistribution of these slides without permission

More information

Data Visualization in R

Data Visualization in R Data Visualization in R L. Torgo ltorgo@fc.up.pt Faculdade de Ciências / LIAAD-INESC TEC, LA Universidade do Porto Oct, 216 Introduction Motivation for Data Visualization Humans are outstanding at detecting

More information

Subsetting, dplyr, magrittr Author: Lloyd Low; add:

Subsetting, dplyr, magrittr Author: Lloyd Low;  add: Subsetting, dplyr, magrittr Author: Lloyd Low; Email add: wai.low@adelaide.edu.au Introduction So you have got a table with data that might be a mixed of categorical, integer, numeric, etc variables? And

More information

06 Visualizing Information

06 Visualizing Information Professor Shoemaker 06-VisualizingInformation.xlsx 1 It can be sometimes difficult to uncover meaning in data that s presented in a table or list Especially if the table has many rows and/or columns But

More information

STA130 - Class #2: Nathan Taback

STA130 - Class #2: Nathan Taback STA130 - Class #2: Nathan Taback 2018-01-15 Today's Class Histograms and density functions Statistical data Tidy data Data wrangling Transforming data 2/51 Histograms and Density Functions Histograms and

More information

LondonR: Introduction to ggplot2. Nick Howlett Data Scientist

LondonR: Introduction to ggplot2. Nick Howlett Data Scientist LondonR: Introduction to ggplot2 Nick Howlett Data Scientist Email: nhowlett@mango-solutions.com Agenda Catie Gamble, M&S - Using R to Understand Revenue Opportunities for your Online Business Andrie de

More information

Chapitre 2 : modèle linéaire généralisé

Chapitre 2 : modèle linéaire généralisé Chapitre 2 : modèle linéaire généralisé Introduction et jeux de données Avant de commencer Faire pointer R vers votre répertoire setwd("~/dropbox/evry/m1geniomhe/cours/") source(file = "fonction_illustration_logistique.r")

More information

Plotting with Rcell (Version 1.2-5)

Plotting with Rcell (Version 1.2-5) Plotting with Rcell (Version 1.2-) Alan Bush October 7, 13 1 Introduction Rcell uses the functions of the ggplots2 package to create the plots. This package created by Wickham implements the ideas of Wilkinson

More information

Advanced Plotting with ggplot2. Algorithm Design & Software Engineering November 13, 2016 Stefan Feuerriegel

Advanced Plotting with ggplot2. Algorithm Design & Software Engineering November 13, 2016 Stefan Feuerriegel Advanced Plotting with ggplot2 Algorithm Design & Software Engineering November 13, 2016 Stefan Feuerriegel Today s Lecture Objectives 1 Distinguishing different types of plots and their purpose 2 Learning

More information

Introduction to ggvis. Aimee Gott R Consultant

Introduction to ggvis. Aimee Gott R Consultant Introduction to ggvis Overview Recap of the basics of ggplot2 Getting started with ggvis The %>% operator Changing aesthetics Layers Interactivity Resources for the Workshop R (version 3.1.2) RStudio ggvis

More information

More Numerical and Graphical Summaries using Percentiles. David Gerard

More Numerical and Graphical Summaries using Percentiles. David Gerard More Numerical and Graphical Summaries using Percentiles David Gerard 2017-09-18 1 Learning Objectives Percentiles Five Number Summary Boxplots to compare distributions. Sections 1.6.5 and 1.6.6 in DBC.

More information

Working with Census Data Excel 2013

Working with Census Data Excel 2013 Working with Census Data Excel 2013 Preparing the File If you see a lot of little green triangles next to the numbers, there is an error or warning that Excel is trying to call to your attention. In my

More information

Importing and visualizing data in R. Day 3

Importing and visualizing data in R. Day 3 Importing and visualizing data in R Day 3 R data.frames Like pandas in python, R uses data frame (data.frame) object to support tabular data. These provide: Data input Row- and column-wise manipulation

More information

MICROSOFT EXCEL BUILDING GRAPHS

MICROSOFT EXCEL BUILDING GRAPHS MICROSOFT EXCEL BUILDING GRAPHS Basic steps for creating graph in Microsoft Excel: 1. Input your data in an Excel file. 2. Choose a type of graph to create. 3. Switch axes if necessary. 4. Adjust your

More information

Intermediate 2 - Unit 2 - Practice NAB 1

Intermediate 2 - Unit 2 - Practice NAB 1 Intermediate 2 - Unit 2 - Practice NAB 1 Outcome 1 1. An advertising sign is in the shape of a triangle as shown. Angle ABC= 42, AB = 2m and BC = 2.2m A 2 a) Calculate the area of the sign (3) b) Calculate

More information

Milestone Porject - Coursera Sora June 10th, 2016

Milestone Porject - Coursera Sora June 10th, 2016 Milestone Porject - Coursera Sora June 10th, 2016 This report is a milestone report for the Coursera course, which aims to build a model for natural language process. There are three parts in this report.

More information

An Introduction to Minitab Statistics 529

An Introduction to Minitab Statistics 529 An Introduction to Minitab Statistics 529 1 Introduction MINITAB is a computing package for performing simple statistical analyses. The current version on the PC is 15. MINITAB is no longer made for the

More information

Introduction to R for Beginners, Level II. Jeon Lee Bio-Informatics Core Facility (BICF), UTSW

Introduction to R for Beginners, Level II. Jeon Lee Bio-Informatics Core Facility (BICF), UTSW Introduction to R for Beginners, Level II Jeon Lee Bio-Informatics Core Facility (BICF), UTSW Basics of R Powerful programming language and environment for statistical computing Useful for very basic analysis

More information

Unit I Supplement OpenIntro Statistics 3rd ed., Ch. 1

Unit I Supplement OpenIntro Statistics 3rd ed., Ch. 1 Unit I Supplement OpenIntro Statistics 3rd ed., Ch. 1 KEY SKILLS: Organize a data set into a frequency distribution. Construct a histogram to summarize a data set. Compute the percentile for a particular

More information

MATH 117 Statistical Methods for Management I Chapter Two

MATH 117 Statistical Methods for Management I Chapter Two Jubail University College MATH 117 Statistical Methods for Management I Chapter Two There are a wide variety of ways to summarize, organize, and present data: I. Tables 1. Distribution Table (Categorical

More information

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA This lab will assist you in learning how to summarize and display categorical and quantitative data in StatCrunch. In particular, you will learn how to

More information

CS Introduction to Computational and Data Science. Instructor: Renzhi Cao Computer Science Department Pacific Lutheran University Spring 2017

CS Introduction to Computational and Data Science. Instructor: Renzhi Cao Computer Science Department Pacific Lutheran University Spring 2017 CS 133 - Introduction to Computational and Data Science Instructor: Renzhi Cao Computer Science Department Pacific Lutheran University Spring 2017 Announcement Read book for R control structure and function.

More information

Chapter 6: Comparing Two Means Section 6.1: Comparing Two Groups Quantitative Response

Chapter 6: Comparing Two Means Section 6.1: Comparing Two Groups Quantitative Response Stat 300: Intro to Probability & Statistics Textbook: Introduction to Statistical Investigations Name: American River College Chapter 6: Comparing Two Means Section 6.1: Comparing Two Groups Quantitative

More information

The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan.

The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan. Graphics Agenda Grammer of Graphics Using ggplot2 The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan. ggplot2 (wiki) ggplot2 is a data visualization package Created

More information

STAT 1291: Data Science

STAT 1291: Data Science STAT 1291: Data Science Lecture 18 - Statistical modeling II: Machine learning Sungkyu Jung Where are we? data visualization data wrangling professional ethics statistical foundation Statistical modeling:

More information

Install RStudio from - use the standard installation.

Install RStudio from   - use the standard installation. Session 1: Reading in Data Before you begin: Install RStudio from http://www.rstudio.com/ide/download/ - use the standard installation. Go to the course website; http://faculty.washington.edu/kenrice/rintro/

More information

8 Organizing and Displaying

8 Organizing and Displaying CHAPTER 8 Organizing and Displaying Data for Comparison Chapter Outline 8.1 BASIC GRAPH TYPES 8.2 DOUBLE LINE GRAPHS 8.3 TWO-SIDED STEM-AND-LEAF PLOTS 8.4 DOUBLE BAR GRAPHS 8.5 DOUBLE BOX-AND-WHISKER PLOTS

More information

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable. 5-number summary 68-95-99.7 Rule Area principle Bar chart Bimodal Boxplot Case Categorical data Categorical variable Center Changing center and spread Conditional distribution Context Contingency table

More information

Microsoft Office Word 2013 Intermediate. Course 01 Working with Tables and Charts

Microsoft Office Word 2013 Intermediate. Course 01 Working with Tables and Charts Microsoft Office Word 2013 Intermediate Course 01 Working with Tables and Charts Slide 1 Course 01: Working with Tables and Charts Sort Table Data Control Cell Layout Perform Calculations in a Table Create

More information

Chapter 2: Descriptive Statistics

Chapter 2: Descriptive Statistics Chapter 2: Descriptive Statistics Student Learning Outcomes By the end of this chapter, you should be able to: Display data graphically and interpret graphs: stemplots, histograms and boxplots. Recognize,

More information

Stat405. Displaying distributions. Hadley Wickham. Thursday, August 23, 12

Stat405. Displaying distributions. Hadley Wickham. Thursday, August 23, 12 Stat405 Displaying distributions Hadley Wickham 1. The diamonds data 2. Histograms and bar charts 3. Homework Diamonds Diamonds data ~54,000 round diamonds from http://www.diamondse.info/ Carat, colour,

More information

Key Terms. Symbology. Categorical attributes. Style. Layer file

Key Terms. Symbology. Categorical attributes. Style. Layer file Key Terms Symbology Categorical attributes Style Layer file Review Questions POP-RANGE is a string field of the Cities feature class with the following entries: 0-9,999, 10,000-49,999, 50,000-99,000 This

More information

2.4-Statistical Graphs

2.4-Statistical Graphs 2.4-Statistical Graphs Frequency Polygon: A frequency polygon uses line segments connected to points directly above class midpoint values. Example: Given the following frequency table for the pulse rate

More information

Chapter 5: The beast of bias

Chapter 5: The beast of bias Chapter 5: The beast of bias Self-test answers SELF-TEST Compute the mean and sum of squared error for the new data set. First we need to compute the mean: + 3 + + 3 + 2 5 9 5 3. Then the sum of squared

More information

R Workshop Module 3: Plotting Data Katherine Thompson Department of Statistics, University of Kentucky

R Workshop Module 3: Plotting Data Katherine Thompson Department of Statistics, University of Kentucky R Workshop Module 3: Plotting Data Katherine Thompson (katherine.thompson@uky.edu Department of Statistics, University of Kentucky October 15, 2013 Reading in Data Start by reading the dataset practicedata.txt

More information

Raw Data. Statistics 1/8/2016. Relative Frequency Distribution. Frequency Distributions for Qualitative Data

Raw Data. Statistics 1/8/2016. Relative Frequency Distribution. Frequency Distributions for Qualitative Data Statistics Raw Data Raw data is random and unranked data. Organizing Data Frequency distributions list all the categories and the numbers of elements that belong to each category Frequency Distributions

More information

Data Science and Machine Learning Essentials

Data Science and Machine Learning Essentials Data Science and Machine Learning Essentials Lab 3B Building Models in Azure ML By Stephen Elston and Graeme Malcolm Overview In this lab, you will learn how to use R or Python to engineer or construct

More information

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data. 1 CHAPTER 1 Introduction Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data. Variable: Any characteristic of a person or thing that can be expressed

More information

CIND123 Module 6.2 Screen Capture

CIND123 Module 6.2 Screen Capture CIND123 Module 6.2 Screen Capture Hello, everyone. In this segment, we will discuss the basic plottings in R. Mainly; we will see line charts, bar charts, histograms, pie charts, and dot charts. Here is

More information

AP Statistics Summer Assignment:

AP Statistics Summer Assignment: AP Statistics Summer Assignment: Read the following and use the information to help answer your summer assignment questions. You will be responsible for knowing all of the information contained in this

More information

Econ 2148, spring 2019 Data visualization

Econ 2148, spring 2019 Data visualization Econ 2148, spring 2019 Maximilian Kasy Department of Economics, Harvard University 1 / 43 Agenda One way to think about statistics: Mapping data-sets into numerical summaries that are interpretable by

More information

Math 227 EXCEL / MEGASTAT Guide

Math 227 EXCEL / MEGASTAT Guide Math 227 EXCEL / MEGASTAT Guide Introduction Introduction: Ch2: Frequency Distributions and Graphs Construct Frequency Distributions and various types of graphs: Histograms, Polygons, Pie Charts, Stem-and-Leaf

More information

Minitab Lab #1 Math 120 Nguyen 1 of 7

Minitab Lab #1 Math 120 Nguyen 1 of 7 Minitab Lab #1 Math 120 Nguyen 1 of 7 Objectives: 1) Retrieve a MiniTab file 2) Generate a list of random integers 3) Draw a bar chart, pie chart, histogram, boxplot, stem-and-leaf diagram 4) Calculate

More information

Getting started with ggplot2

Getting started with ggplot2 Getting started with ggplot2 STAT 133 Gaston Sanchez Department of Statistics, UC Berkeley gastonsanchez.com github.com/gastonstat/stat133 Course web: gastonsanchez.com/stat133 ggplot2 2 Resources for

More information

PRACTICUM, day 1: R graphing: basic plotting and ggplot2 CRG Bioinformatics Unit, May 6th, 2016

PRACTICUM, day 1: R graphing: basic plotting and ggplot2 CRG Bioinformatics Unit, May 6th, 2016 PRACTICUM, day 1: R graphing: basic plotting and ggplot2 CRG Bioinformatics Unit, sarah.bonnin@crg.eu May 6th, 216 Contents Introduction 2 Packages................................................... 2

More information

Intro to R for Epidemiologists

Intro to R for Epidemiologists Lab 9 (3/19/15) Intro to R for Epidemiologists Part 1. MPG vs. Weight in mtcars dataset The mtcars dataset in the datasets package contains fuel consumption and 10 aspects of automobile design and performance

More information

Graphics in R. Jim Bentley. The following code creates a couple of sample data frames that we will use in our examples.

Graphics in R. Jim Bentley. The following code creates a couple of sample data frames that we will use in our examples. Graphics in R Jim Bentley 1 Sample Data The following code creates a couple of sample data frames that we will use in our examples. > sex = c(rep("female",12),rep("male",7)) > mass = c(36.1, 54.6, 48.5,

More information

Chapter 2 Exploring Data with Graphs and Numerical Summaries

Chapter 2 Exploring Data with Graphs and Numerical Summaries Chapter 2 Exploring Data with Graphs and Numerical Summaries Constructing a Histogram on the TI-83 Suppose we have a small class with the following scores on a quiz: 4.5, 5, 5, 6, 6, 7, 8, 8, 8, 8, 9,

More information

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 Objectives 2.1 What Are the Types of Data? www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative

More information

Knitr. Introduction to R for Public Health Researchers

Knitr. Introduction to R for Public Health Researchers Knitr Introduction to R for Public Health Researchers Introduction Exploratory Analysis Plots of bike length Multiple Facets Means by type Linear Models Grabbing coefficients Broom package Testing Nested

More information

Reports, Graphs and Queries 1

Reports, Graphs and Queries 1 Reports, Graphs and Queries A. Reports Reports produce tabular outputs of data contained in the database. Reports may be viewed, saved to file, printed or copied to the clipboard. Reports can also be viewed

More information

Mineração de Dados Aplicada

Mineração de Dados Aplicada Data Exploration August, 9 th 2017 DCC ICEx UFMG Summary of the last session Data mining Data mining is an empiricism; It can be seen as a generalization of querying; It lacks a unified theory; It implies

More information

STA 570 Spring Lecture 5 Tuesday, Feb 1

STA 570 Spring Lecture 5 Tuesday, Feb 1 STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row

More information

Multivariate Data More Overview

Multivariate Data More Overview Multivariate Data More Overview CS 4460 - Information Visualization Jim Foley Last Revision August 2016 Some Key Concepts Quick Review Data Types Data Marks Basic Data Types N-Nominal (categorical) Equal

More information

1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file

1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file 1 SPSS Guide 2009 Content 1. Basic Steps for Data Analysis. 3 2. Data Editor. 2.4.To create a new SPSS file 3 4 3. Data Analysis/ Frequencies. 5 4. Recoding the variable into classes.. 5 5. Data Analysis/

More information