Transform Data! The Basics Part I continued!

Size: px
Start display at page:

Download "Transform Data! The Basics Part I continued!"

Transcription

1 Transform Data! The Basics Part I continued!

2 arrange()

3 arrange() Order rows from smallest to largest values arrange(.data, ) Data frame to transform One or more columns to order by (addi3onal columns will be used as 3e breakers)

4 Common syntax Each function takes a data frame as the first argument, and returns a data frame arrange(.data, ) dplyr func3on data frame to transform func3on specific arguments

5 arrange() Order rows from smallest to largest values arrange(babynames, n) year sex name n prop 1899 M John M William M James M Lance e M Charles year sex name n prop 1899 M Lance e M Charles M James M William M John

6 Your Turn 3 Arrange babynames by n. Add prop as a second (tie breaking) variable to arrange on. Can you tell what the smallest value of n is? How does adding prop affect the arrangement?

7 arrange(babynames, n) arrange(babynames, n, prop)

8 Helper function desc() Change ordering to go from largest to smallest arrange(babynames, desc(n)) babynames year sex name n prop 1899 M John M William M James M Lance e M Charles year sex name n prop 1899 M John M William M James M Charles M Lance e-05

9 Your Turn 4 Use desc() to find the names with the highest prop. Then, use desc() to find the names with the highest n.

10 arrange(babynames, desc(prop)) arrange(babynames, desc(n))

11 mutate()

12 mutate() Create new columns mutate(.data, ) Data frame to transform One or more new columns to create

13 mutate() Create new columns mutate(babynames, percent = round(prop * 100, 2)) babynames year sex name n prop 1899 M John M William M James M Lance e M Charles year sex name n prop percent 1899 M John M William M James M Lance e M Charles

14 Create new columns mutate() mutate(babynames, percent = round(prop * 100, 2), nper = round(percent)) babynames year sex name n prop 1899 M John M William M James M Lance e M Charles year sex name n prop percent nper 1899 M John M William M James M Lance e M Charles

15

16 Vectorized function min_rank() A popular ranking function (ties share the lowest rank) min_rank(c(50, 100, 100, 1000)) # [1] min_rank(desc(c(50, 100, 100, 1000))) # [1]

17 Your Turn 5 Use min_rank() and mutate() to rank each row in babynames from largest prop to lowest prop

18 mutate(babynames, rank = min_rank(desc(prop)))

19 %>%

20 Multiple steps (composed functions) arrange(mutate(filter(babynames, year == 2015, sex == M ), rank == min_rank(desc(prop))), rank) 1. Filter babynames to just boys born in Rank the names by proportion so that higher proportions have lower rank 3. Arrange the names by rank

21 Multiple steps (intermediate data frames) boys_2015 <- filter(babynames, year == 2015, sex == M ) boys_2015 <- mutate(boys_2015, rank == min_rank(desc(prop))) boys_2015 <- arrange(boys_2015, rank) boys_2015

22 Multiple steps (intermediate data frames) boys_2015 <- filter(babynames, year == 2015, sex == M ) boys_2015 <- mutate(boys_2015, rank == min_rank(desc(prop))) boys_2015 <- arrange(boys_2015, rank) boys_2015

23 The pipe operator %>% %>% babynames filter(, n == 99680) Passes result on left into first argument of the function on right. So, these two lines do the same thing. Try it! filter(babynames, n == 99680) babynames %>% filter(n == 99680)

24 Multiple steps (pipe operator) babynames %>% filter(year == 2015, sex == M ) %>% mutate(rank = min_rank(desc(prop))) %>% arrange(rank) 1. Allows us to eliminate redundant code (assigning to the same data frame over and over) and/or unwanted intermediate data frames 2. Allows us to write code in the same way we think about the problem

25 Shortcut to type %>%

26 Your Turn 6 Use %>% to write a sequence of functions that: 1. Filter babynames to just the girls born in Mutate to make a percent column rounded to a whole number 3. Arrange the results so that the most popular names, based on the percent column, appear first.

27 babynames %>% filter(year == 1977, sex == "F") %>% mutate(percent = round(prop * 100)) %>% arrange(desc(percent))

28 Your Turn 7 Write code to do the following: 1. Trim babynames to just the rows that contain your name and your sex 2. Plot the results as a line graph with year on the x-axis and prop on the y-axis

29 babynames %>% filter(name == Lance, sex == M ) %>% ggplot() + geom_line(aes(year, prop))

30 What are the most popular names?

31 How should we define popularity? A name is popular if: 1. Sums a large number of children have the name when you sum across years 2. Ranks it consistently ranks among the top names from year to year

32 Question Do we have the right tools to: 1. Calculate the total number of children with each name? 2. Rank names within each year?

33 Deriving information mutate() create new variables summarise() summarise variables group_by() group cases

34 summarise()

35 summarise() Compute table of summaries babynames %>% summarise(total = sum(n), max = max(n)) babynames year sex name n prop 1899 M John M William M James M Lance e M Charles total max

36 Your Turn 8 Use summarise() to compute three statistics about the data: 1. The first (minimum) year in the data set 2. The last (maximum) year in the data set 3. The total number of children represented in the data set

37 babynames %>% summarise(first = min(year), last = max(year), total = sum(n))

38 Your Turn 9 Extract the rows where name == Khaleesi. Then use summarise() and summary functions to find: 1. The first year Khaleesi appeared in the data 2. The total number of children named Khaleesi

39 babynames %>% filter(name == Khaleesi ) %>% summarise(first = min(year), total = sum(n))

40

41 n() The number of rows in a data set babynames %>% summarise(n = n()) babynames year sex name n prop 1899 M John M William M James M Lance e M Charles F John e-04 n 6

42 n_distinct() The number of distinct values in a variable babynames %>% summarise(n = n(), nname = n_distinct(name)) babynames year sex name n prop 1899 M John M William M James M Lance e M Charles F John e-04 n nname 6 5

43 group_by()

44 group_by() Groups cases by common values of one or more columns babynames %>% group_by(sex)

45 group_by() babynames %>% group_by(sex) %>% summarise(total = sum(n)) babynames year sex name n prop 1899 F Anne e F John e F Mary M John M Mary e M Lance e-05 sex total F M 7094

46 group_by() babynames %>% group_by(year, sex) %>% summarise(total = sum(n)) babynames year sex name n prop 1899 F Anne e F John e F Mary M John M Mary e M Lance e-05 year sex total 1899 F M F M 99

47 Your Turn 10 Use group_by(), summarise(), and arrange() to display the ten most popular names. Compute popularity as the total number of children of a single gender given a name.

48 babynames %>% group_by(name, sex) %>% summarise(total = sum(n)) %>% arrange(desc(total))

49

50 babynames %>% group_by(name, sex) %>% summarise(total = sum(n)) %>% arrange(desc(total)) %>% ungroup() %>% slice(1:10) %>% ggplot() + geom_col(aes(fct_reorder(name, desc(total)), total/ , fill = sex)) + theme_bw() + scale_fill_brewer() + labs(x = name, y = total (in millions) )

Transform Data! The Basics Part I!

Transform Data! The Basics Part I! Transform Data! The Basics Part I! arrange() arrange() Order rows from smallest to largest values arrange(.data, ) Data frame to transform One or more columns to order by (addi3onal columns will be used

More information

Grammar of data. dplyr. Bjarki Þór Elvarsson and Einar Hjörleifsson. Marine Research Institute. Bjarki&Einar (MRI) R-ICES 1 / 29

Grammar of data. dplyr. Bjarki Þór Elvarsson and Einar Hjörleifsson. Marine Research Institute. Bjarki&Einar (MRI) R-ICES 1 / 29 dplyr Bjarki Þór Elvarsson and Einar Hjörleifsson Marine Research Institute Bjarki&Einar (MRI) R-ICES 1 / 29 Working with data A Reformat a variable (e.g. as factors or dates) B Split one variable into

More information

Text & Patterns. stat 579 Heike Hofmann

Text & Patterns. stat 579 Heike Hofmann Text & Patterns stat 579 Heike Hofmann Outline Character Variables Control Codes Patterns & Matching Baby Names Data The social security agency keeps track of all baby names used in applications for social

More information

Numerical Summaries of Data Section 14.3

Numerical Summaries of Data Section 14.3 MATH 11008: Numerical Summaries of Data Section 14.3 MEAN mean: The mean (or average) of a set of numbers is computed by determining the sum of all the numbers and dividing by the total number of observations.

More information

Lecture 12: Data carpentry with tidyverse

Lecture 12: Data carpentry with tidyverse http://127.0.0.1:8000/.html Lecture 12: Data carpentry with tidyverse STAT598z: Intro. to computing for statistics Vinayak Rao Department of Statistics, Purdue University options(repr.plot.width=5, repr.plot.height=3)

More information

The Average and SD in R

The Average and SD in R The Average and SD in R The Basics: mean() and sd() Calculating an average and standard deviation in R is straightforward. The mean() function calculates the average and the sd() function calculates the

More information

Raw Data is data before it has been arranged in a useful manner or analyzed using statistical techniques.

Raw Data is data before it has been arranged in a useful manner or analyzed using statistical techniques. Section 2.1 - Introduction Graphs are commonly used to organize, summarize, and analyze collections of data. Using a graph to visually present a data set makes it easy to comprehend and to describe the

More information

Overview. Frequency Distributions. Chapter 2 Summarizing & Graphing Data. Descriptive Statistics. Inferential Statistics. Frequency Distribution

Overview. Frequency Distributions. Chapter 2 Summarizing & Graphing Data. Descriptive Statistics. Inferential Statistics. Frequency Distribution Chapter 2 Summarizing & Graphing Data Slide 1 Overview Descriptive Statistics Slide 2 A) Overview B) Frequency Distributions C) Visualizing Data summarize or describe the important characteristics of a

More information

2.3 Organizing Quantitative Data

2.3 Organizing Quantitative Data 2.3 Organizing Quantitative Data This section will focus on ways to organize quantitative data into tables, charts, and graphs. Quantitative data is organized by dividing the observations into classes

More information

Data wrangling. Reduction/Aggregation: reduces a variable to a scalar

Data wrangling. Reduction/Aggregation: reduces a variable to a scalar Data Wrangling Some definitions A data table is a collection of variables and observations A variable (when data are tidy) is a single column in a data table An observation is a single row in a data table,

More information

THE KNOWLEDGE MANAGEMENT STRATEGY IN ORGANIZATIONS. Summer semester, 2016/2017

THE KNOWLEDGE MANAGEMENT STRATEGY IN ORGANIZATIONS. Summer semester, 2016/2017 THE KNOWLEDGE MANAGEMENT STRATEGY IN ORGANIZATIONS Summer semester, 2016/2017 SOCIAL NETWORK ANALYSIS: THEORY AND APPLICATIONS 1. A FEW THINGS ABOUT NETWORKS NETWORKS IN THE REAL WORLD There are four categories

More information

IMPORTANT WORDS TO KNOW UNIT 1

IMPORTANT WORDS TO KNOW UNIT 1 IMPORTANT WORDS TO KNOW UNIT READ THESE WORDS ALOUD THREE TIMES WITH YOUR TEACHER! Chapter. equation. integer 3. greater than 4. positive 5. negative 6. operation 7. solution 8. variable Chapter. ordered

More information

WHOLE NUMBER AND DECIMAL OPERATIONS

WHOLE NUMBER AND DECIMAL OPERATIONS WHOLE NUMBER AND DECIMAL OPERATIONS Whole Number Place Value : 5,854,902 = Ten thousands thousands millions Hundred thousands Ten thousands Adding & Subtracting Decimals : Line up the decimals vertically.

More information

DAY 52 BOX-AND-WHISKER

DAY 52 BOX-AND-WHISKER DAY 52 BOX-AND-WHISKER VOCABULARY The Median is the middle number of a set of data when the numbers are arranged in numerical order. The Range of a set of data is the difference between the highest and

More information

CHAPTER 2. Objectives. Frequency Distributions and Graphs. Basic Vocabulary. Introduction. Organise data using frequency distributions.

CHAPTER 2. Objectives. Frequency Distributions and Graphs. Basic Vocabulary. Introduction. Organise data using frequency distributions. CHAPTER 2 Objectives Organise data using frequency distributions. Distributions and Graphs Represent data in frequency distributions graphically using histograms, frequency polygons, and ogives. Represent

More information

An Introduction to R. Ed D. J. Berry 9th January 2017

An Introduction to R. Ed D. J. Berry 9th January 2017 An Introduction to R Ed D. J. Berry 9th January 2017 Overview Why now? Why R? General tips Recommended packages Recommended resources 2/48 Why now? Efficiency Pointandclick software just isn't time efficient

More information

PGQL 0.9 Specification

PGQL 0.9 Specification PGQL 0.9 Specification Table of Contents Table of Contents Introduction Basic Query Structure Clause Topology Constraint Repeated Variables in Multiple Topology Constraints Syntactic Sugars for Topology

More information

Supporting our children to aim high!

Supporting our children to aim high! Reach for the Sky Supporting our children to aim high! St Mary s CE School Maths Support Resources Parents often ask us, how can I help my child in maths? Firstly, we provide parents with the expectations

More information

This chapter will show how to organize data and then construct appropriate graphs to represent the data in a concise, easy-to-understand form.

This chapter will show how to organize data and then construct appropriate graphs to represent the data in a concise, easy-to-understand form. CHAPTER 2 Frequency Distributions and Graphs Objectives Organize data using frequency distributions. Represent data in frequency distributions graphically using histograms, frequency polygons, and ogives.

More information

Transformations. Hadley Wickham. October 2009

Transformations. Hadley Wickham. October 2009 Transformations Hadley Wickham October 2009 1. US baby names data 2. Transformations 3. Summaries 4. Doing it by group Baby names Top 1000 male and female baby names in the US, from 1880 to 2008. 258,000

More information

Section 1.1 The Distance and Midpoint Formulas; Graphing Utilities; Introduction to Graphing Equations

Section 1.1 The Distance and Midpoint Formulas; Graphing Utilities; Introduction to Graphing Equations Section 1.1 The Distance and Midpoint Formulas; Graphing Utilities; Introduction to Graphing Equations origin (x, y) Ordered pair (x-coordinate, y-coordinate) (abscissa, ordinate) x axis Rectangular or

More information

Section 2-2 Frequency Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc

Section 2-2 Frequency Distributions. Copyright 2010, 2007, 2004 Pearson Education, Inc Section 2-2 Frequency Distributions Copyright 2010, 2007, 2004 Pearson Education, Inc. 2.1-1 Frequency Distribution Frequency Distribution (or Frequency Table) It shows how a data set is partitioned among

More information

Day 4 Percentiles and Box and Whisker.notebook. April 20, 2018

Day 4 Percentiles and Box and Whisker.notebook. April 20, 2018 Day 4 Box & Whisker Plots and Percentiles In a previous lesson, we learned that the median divides a set a data into 2 equal parts. Sometimes it is necessary to divide the data into smaller more precise

More information

Introducing R/Tidyverse to Clinical Statistical Programming

Introducing R/Tidyverse to Clinical Statistical Programming Introducing R/Tidyverse to Clinical Statistical Programming MBSW 2018 Freeman Wang, @freestatman 2018-05-15 Slides available at https://bit.ly/2knkalu Where are my biases Biomarker Statistician Genomic

More information

Data Management Page 1 of 12 Permutations Extra Problems (solutions)

Data Management Page 1 of 12 Permutations Extra Problems (solutions) Data Management Page of. a) How many -digit numbers can be formed using the digits 0,,,,, if no digits may be repeated in the number? remaining digits, including zero not zero {,,,, } b) How many of the

More information

74 Wyner Math Academy I Spring 2016

74 Wyner Math Academy I Spring 2016 74 Wyner Math Academy I Spring 2016 CHAPTER EIGHT: SPREADSHEETS Review April 18 Test April 25 Spreadsheets are an extremely useful and versatile tool. Some basic knowledge allows many basic tasks to be

More information

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables

More information

Test Bank for Privitera, Statistics for the Behavioral Sciences

Test Bank for Privitera, Statistics for the Behavioral Sciences 1. A simple frequency distribution A) can be used to summarize grouped data B) can be used to summarize ungrouped data C) summarizes the frequency of scores in a given category or range 2. To determine

More information

Chapter 2 Ratios, Percents, Simple Equations, and Ratio-Proportion

Chapter 2 Ratios, Percents, Simple Equations, and Ratio-Proportion Chapter 2 Ratios, Percents, Simple Equations, and Ratio-Proportion PROBLEM Decimal Fraction Percent Ratio 1. 0.05 2. 3. 45% 4. 1. Complete row 1 in the table above., 5%, 1:20 DIF: Application REF: Ratios

More information

Stage 5 PROMPT sheet. 5/3 Negative numbers 4 7 = -3. l l l l l l l l l /1 Place value in numbers to 1million = 4

Stage 5 PROMPT sheet. 5/3 Negative numbers 4 7 = -3. l l l l l l l l l /1 Place value in numbers to 1million = 4 Stage PROMPT sheet / Place value in numbers to million The position of the digit gives its size / Negative numbers A number line is very useful for negative numbers. The number line below shows: 7 - l

More information

Stage 5 PROMPT sheet. 5/3 Negative numbers 4 7 = -3. l l l l l l l l l /1 Place value in numbers to 1million = 4

Stage 5 PROMPT sheet. 5/3 Negative numbers 4 7 = -3. l l l l l l l l l /1 Place value in numbers to 1million = 4 Millions Hundred thousands Ten thousands Thousands Hundreds Tens Ones Stage PROMPT sheet / Place value in numbers to million The position of the digit gives its size / Negative numbers A number line is

More information

Measures of Central Tendency

Measures of Central Tendency Page of 6 Measures of Central Tendency A measure of central tendency is a value used to represent the typical or average value in a data set. The Mean The sum of all data values divided by the number of

More information

Averages and Variation

Averages and Variation Averages and Variation 3 Copyright Cengage Learning. All rights reserved. 3.1-1 Section 3.1 Measures of Central Tendency: Mode, Median, and Mean Copyright Cengage Learning. All rights reserved. 3.1-2 Focus

More information

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES

2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 Objectives 2.1 What Are the Types of Data? www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative

More information

Year 5 PROMPT sheet. Negative numbers 4 7 = -3. l l l l l l l l l Place value in numbers to 1million = 4

Year 5 PROMPT sheet. Negative numbers 4 7 = -3. l l l l l l l l l Place value in numbers to 1million = 4 Year PROMPT sheet Place value in numbers to million The position of the digit gives its size Millions Hundred thousands Ten thousands thousands hundreds tens units 7 Negative numbers A number line is very

More information

Chapter 2 Describing, Exploring, and Comparing Data

Chapter 2 Describing, Exploring, and Comparing Data Slide 1 Chapter 2 Describing, Exploring, and Comparing Data Slide 2 2-1 Overview 2-2 Frequency Distributions 2-3 Visualizing Data 2-4 Measures of Center 2-5 Measures of Variation 2-6 Measures of Relative

More information

Organizing and Summarizing Data

Organizing and Summarizing Data 1 Organizing and Summarizing Data Key Definitions Frequency Distribution: This lists each category of data and how often they occur. : The percent of observations within the one of the categories. This

More information

Лекция 4 Трансформация данных в R

Лекция 4 Трансформация данных в R Анализ данных Лекция 4 Трансформация данных в R Гедранович Ольга Брониславовна, старший преподаватель кафедры ИТ, МИУ volha.b.k@gmail.com 2 Вопросы лекции Фильтрация (filter) Сортировка (arrange) Выборка

More information

Data Wrangling Jo Hardin September 11 & 13, 2017

Data Wrangling Jo Hardin September 11 & 13, 2017 Data Wrangling Jo Hardin September 11 & 13, 2017 Goals Piping / chaining Basic data verbs Higher level data verbs Datasets starwars is from dplyr, although originally from SWAPI, the Star Wars API, http://swapi.co/.

More information

Measures of Central Tendency. A measure of central tendency is a value used to represent the typical or average value in a data set.

Measures of Central Tendency. A measure of central tendency is a value used to represent the typical or average value in a data set. Measures of Central Tendency A measure of central tendency is a value used to represent the typical or average value in a data set. The Mean the sum of all data values divided by the number of values in

More information

06 Visualizing Information

06 Visualizing Information Professor Shoemaker 06-VisualizingInformation.xlsx 1 It can be sometimes difficult to uncover meaning in data that s presented in a table or list Especially if the table has many rows and/or columns But

More information

An introduction to ggplot: An implementation of the grammar of graphics in R

An introduction to ggplot: An implementation of the grammar of graphics in R An introduction to ggplot: An implementation of the grammar of graphics in R Hadley Wickham 00-0-7 1 Introduction Currently, R has two major systems for plotting data, base graphics and lattice graphics

More information

JUST THE MATHS UNIT NUMBER STATISTICS 1 (The presentation of data) A.J.Hobson

JUST THE MATHS UNIT NUMBER STATISTICS 1 (The presentation of data) A.J.Hobson JUST THE MATHS UNIT NUMBER 18.1 STATISTICS 1 (The presentation of data) by A.J.Hobson 18.1.1 Introduction 18.1.2 The tabulation of data 18.1.3 The graphical representation of data 18.1.4 Exercises 18.1.5

More information

Section 3.2 Measures of Central Tendency MDM4U Jensen

Section 3.2 Measures of Central Tendency MDM4U Jensen Section 3.2 Measures of Central Tendency MDM4U Jensen Part 1: Video This video will review shape of distributions and introduce measures of central tendency. Answer the following questions while watching.

More information

2.1: Frequency Distributions

2.1: Frequency Distributions 2.1: Frequency Distributions Frequency Distribution: organization of data into groups called. A: Categorical Frequency Distribution used for and level qualitative data that can be put into categories.

More information

Common Core Vocabulary and Representations

Common Core Vocabulary and Representations Vocabulary Description Representation 2-Column Table A two-column table shows the relationship between two values. 5 Group Columns 5 group columns represent 5 more or 5 less. a ten represented as a 5-group

More information

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data Chapter 2 Descriptive Statistics: Organizing, Displaying and Summarizing Data Objectives Student should be able to Organize data Tabulate data into frequency/relative frequency tables Display data graphically

More information

Math 155. Measures of Central Tendency Section 3.1

Math 155. Measures of Central Tendency Section 3.1 Math 155. Measures of Central Tendency Section 3.1 The word average can be used in a variety of contexts: for example, your average score on assignments or the average house price in Riverside. This is

More information

Session 5 Nick Hathaway;

Session 5 Nick Hathaway; Session 5 Nick Hathaway; nicholas.hathaway@umassmed.edu Contents Adding Text To Plots 1 Line graph................................................. 1 Bar graph..................................................

More information

HOW TO DIVIDE: MCC6.NS.2 Fluently divide multi-digit numbers using the standard algorithm. WORD DEFINITION IN YOUR WORDS EXAMPLE

HOW TO DIVIDE: MCC6.NS.2 Fluently divide multi-digit numbers using the standard algorithm. WORD DEFINITION IN YOUR WORDS EXAMPLE MCC6.NS. Fluently divide multi-digit numbers using the standard algorithm. WORD DEFINITION IN YOUR WORDS EXAMPLE Dividend A number that is divided by another number. Divisor A number by which another number

More information

Excel Boot Camp PIONEER TRAINING, INC.

Excel Boot Camp PIONEER TRAINING, INC. Excel Boot Camp Dates and Times: Cost: $250 1/22, 2-4 PM 1/29, 2-4 PM 2/5, 2-4 PM 2/12, 2-4 PM Please register online or call our office. (413) 387-1040 This consists of four-part class is aimed at students

More information

Frequency Distributions

Frequency Distributions Displaying Data Frequency Distributions After collecting data, the first task for a researcher is to organize and summarize the data so that it is possible to get a general overview of the results. Remember,

More information

Session 3 Nick Hathaway;

Session 3 Nick Hathaway; Session 3 Nick Hathaway; nicholas.hathaway@umassmed.edu Contents Manipulating Data frames and matrices 1 Converting to long vs wide formats.................................... 2 Manipulating data in table........................................

More information

Chapter 2 Organizing and Graphing Data. 2.1 Organizing and Graphing Qualitative Data

Chapter 2 Organizing and Graphing Data. 2.1 Organizing and Graphing Qualitative Data Chapter 2 Organizing and Graphing Data 2.1 Organizing and Graphing Qualitative Data 2.2 Organizing and Graphing Quantitative Data 2.3 Stem-and-leaf Displays 2.4 Dotplots 2.1 Organizing and Graphing Qualitative

More information

Descriptive Statistics and Graphing

Descriptive Statistics and Graphing Anatomy and Physiology Page 1 of 9 Measures of Central Tendency Descriptive Statistics and Graphing Measures of central tendency are used to find typical numbers in a data set. There are different ways

More information

Unit 3 Fill Series, Functions, Sorting

Unit 3 Fill Series, Functions, Sorting Unit 3 Fill Series, Functions, Sorting Fill enter repetitive values or formulas in an indicated direction Using the Fill command is much faster than using copy and paste you can do entire operation in

More information

Unit 3 Functions Review, Fill Series, Sorting, Merge & Center

Unit 3 Functions Review, Fill Series, Sorting, Merge & Center Unit 3 Functions Review, Fill Series, Sorting, Merge & Center Function built-in formula that performs simple or complex calculations automatically names a function instead of using operators (+, -, *,

More information

0001 Understand the structure of numeration systems and multiple representations of numbers. Example: Factor 30 into prime factors.

0001 Understand the structure of numeration systems and multiple representations of numbers. Example: Factor 30 into prime factors. NUMBER SENSE AND OPERATIONS 0001 Understand the structure of numeration systems and multiple representations of numbers. Prime numbers are numbers that can only be factored into 1 and the number itself.

More information

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 2.1- #

Lecture Slides. Elementary Statistics Twelfth Edition. by Mario F. Triola. and the Triola Statistics Series. Section 2.1- # Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series by Mario F. Triola Chapter 2 Summarizing and Graphing Data 2-1 Review and Preview 2-2 Frequency Distributions 2-3 Histograms

More information

Prob and Stats, Sep 4

Prob and Stats, Sep 4 Prob and Stats, Sep 4 Variations on the Frequency Histogram Book Sections: N/A Essential Questions: What are the methods for displaying data, and how can I build them? What are variations of the frequency

More information

Working with Data and Charts

Working with Data and Charts PART 9 Working with Data and Charts In Excel, a formula calculates a value based on the values in other cells of the workbook. Excel displays the result of a formula in a cell as a numeric value. A function

More information

Excel Formulas & Functions I CS101

Excel Formulas & Functions I CS101 Excel Formulas & Functions I CS101 Topics Covered Use statistical functions Use cell references Use AutoFill Write formulas Use the RANK.EQ function Calculation in Excel Click the cell where you want to

More information

Data Classes. Introduction to R for Public Health Researchers

Data Classes. Introduction to R for Public Health Researchers Data Classes Introduction to R for Public Health Researchers Data Types: One dimensional types ( vectors ): - Character: strings or individual characters, quoted - Numeric: any real number(s) - Integer:

More information

DSC 201: Data Analysis & Visualization

DSC 201: Data Analysis & Visualization DSC 201: Data Analysis & Visualization Data Aggregation & Time Series Dr. David Koop Tidy Data: Baby Names Example Baby Names, Social Security Administration Popularity in 2016 Rank Male name Female name

More information

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1 Lecture Slides Elementary Statistics Tenth Edition and the Triola Statistics Series by Mario F. Triola Slide 1 Chapter 2 Summarizing and Graphing Data 2-1 Overview 2-2 Frequency Distributions 2-3 Histograms

More information

Find-A-Code Finding Codes Table of Contents

Find-A-Code Finding Codes Table of Contents Find-A-Code Finding Codes Table of Contents General Introduction...2 Using Find-A-Code Search...3 Using Click-A-Dex...7 Using Build-A-Code...9 Using Browse-A-Code...11 Using Cross-A-Code...14 General Introduction

More information

MyCodingTools Finding Codes Table of Contents

MyCodingTools Finding Codes Table of Contents MyCodingTools Finding Codes Table of Contents General Introduction...2 Using MyCodingTools Search...3 Using Click-A-Dex...7 Using Build-A-Code...9 Using Browse-A-Code...12 Using Cross-A-Code...15 General

More information

AND NUMERICAL SUMMARIES. Chapter 2

AND NUMERICAL SUMMARIES. Chapter 2 EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 What Are the Types of Data? 2.1 Objectives www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative

More information

Data visualization with ggplot2

Data visualization with ggplot2 Data visualization with ggplot2 Visualizing data in R with the ggplot2 package Authors: Mateusz Kuzak, Diana Marek, Hedi Peterson, Dmytro Fishman Disclaimer We will be using the functions in the ggplot2

More information

Franklin Math Bowl 2008 Group Problem Solving Test Grade 6

Franklin Math Bowl 2008 Group Problem Solving Test Grade 6 Group Problem Solving Test Grade 6 1. The fraction 32 17 can be rewritten by division in the form 1 p + q 1 + r Find the values of p, q, and r. 2. Robert has 48 inches of heavy gauge wire. He decided to

More information

Aston Hall s A-Z of mathematical terms

Aston Hall s A-Z of mathematical terms Aston Hall s A-Z of mathematical terms The following guide is a glossary of mathematical terms, covering the concepts children are taught in FS2, KS1 and KS2. This may be useful to clear up any homework

More information

Know how to use fractions to describe part of something Write an improper fraction as a mixed number Write a mixed number as an improper fraction

Know how to use fractions to describe part of something Write an improper fraction as a mixed number Write a mixed number as an improper fraction . Fractions Know how to use fractions to describe part of something Write an improper fraction as a mixed number Write a mixed number as an improper fraction Key words fraction denominator numerator proper

More information

Dplyr Introduction Matthew Flickinger July 12, 2017

Dplyr Introduction Matthew Flickinger July 12, 2017 Dplyr Introduction Matthew Flickinger July 12, 2017 Introduction to Dplyr This document gives an overview of many of the features of the dplyr library include in the tidyverse of related R pacakges. First

More information

Downloaded from

Downloaded from UNIT 2 WHAT IS STATISTICS? Researchers deal with a large amount of data and have to draw dependable conclusions on the basis of data collected for the purpose. Statistics help the researchers in making

More information

Converting between Percents, Decimals, and Fractions

Converting between Percents, Decimals, and Fractions Section. PRE-ACTIVITY PREPARATION Converting between Percents, Decimals, and Fractions Think about how often you have heard, read, or used the term percent (%) in its many everyday applications: The sales

More information

Chapter 2 - Graphical Summaries of Data

Chapter 2 - Graphical Summaries of Data Chapter 2 - Graphical Summaries of Data Data recorded in the sequence in which they are collected and before they are processed or ranked are called raw data. Raw data is often difficult to make sense

More information

Data Manipulation using dplyr

Data Manipulation using dplyr Data Manipulation in R Reading and Munging Data L. Torgo ltorgo@fc.up.pt Faculdade de Ciências / LIAAD-INESC TEC, LA Universidade do Porto Oct, 2017 Data Manipulation using dplyr The dplyr is a package

More information

COMP 250 Fall heaps 2 Nov. 3, 2017

COMP 250 Fall heaps 2 Nov. 3, 2017 At the end of last lecture, I showed how to represent a heap using an array. The idea is that an array representation defines a simple relationship between a tree node s index and its children s index.

More information

Writing Functions! Part I!

Writing Functions! Part I! Writing Functions! Part I! In your mat219_class project 1. Create a new R script or R notebook called wri7ng_func7ons 2. Include this code in your script or notebook: library(tidyverse) library(gapminder)

More information

Middle Years Data Analysis Display Methods

Middle Years Data Analysis Display Methods Middle Years Data Analysis Display Methods Double Bar Graph A double bar graph is an extension of a single bar graph. Any bar graph involves categories and counts of the number of people or things (frequency)

More information

B. Graphing Representation of Data

B. Graphing Representation of Data B Graphing Representation of Data The second way of displaying data is by use of graphs Although such visual aids are even easier to read than tables, they often do not give the same detail It is essential

More information

Pandas III: Grouping and Presenting Data

Pandas III: Grouping and Presenting Data Lab 8 Pandas III: Grouping and Presenting Data Lab Objective: Learn about Pivot tables, groupby, etc. Introduction Pandas originated as a wrapper for numpy that was developed for purposes of data analysis.

More information

SML 201 Week 3 John D. Storey Spring 2016

SML 201 Week 3 John D. Storey Spring 2016 SML 201 Week 3 John D. Storey Spring 2016 Contents Functions 4 Rationale................................. 4 Defining a New Function......................... 4 Example 1.................................

More information

HW3 Solutions. Answer: Let X be the random variable which denotes the number of packets lost.

HW3 Solutions. Answer: Let X be the random variable which denotes the number of packets lost. HW3 Solutions 1. (20 pts.) Packets Over the Internet n packets are sent over the Internet (n even). Consider the following probability models for the process: (a) Each packet is routed over a different

More information

Formulas and Functions

Formulas and Functions Conventions used in this document: Keyboard keys that must be pressed will be shown as Enter or Ctrl. Controls to be activated with the mouse will be shown as Start button > Settings > System > About.

More information

CHAPTER 2: SAMPLING AND DATA

CHAPTER 2: SAMPLING AND DATA CHAPTER 2: SAMPLING AND DATA This presentation is based on material and graphs from Open Stax and is copyrighted by Open Stax and Georgia Highlands College. OUTLINE 2.1 Stem-and-Leaf Graphs (Stemplots),

More information

Exploratory Data Analysis

Exploratory Data Analysis Chapter 10 Exploratory Data Analysis Definition of Exploratory Data Analysis (page 410) Definition 12.1. Exploratory data analysis (EDA) is a subfield of applied statistics that is concerned with the investigation

More information

TABLE OF CONTENTS. i Excel 2016 Basic

TABLE OF CONTENTS. i Excel 2016 Basic i TABLE OF CONTENTS TABLE OF CONTENTS I PREFACE VII 1 INTRODUCING EXCEL 1 1.1 Starting Excel 1 Starting Excel using the Start button in Windows 1 1.2 Screen components 2 Tooltips 3 Title bar 4 Window buttons

More information

Distributions of Continuous Data

Distributions of Continuous Data C H A P T ER Distributions of Continuous Data New cars and trucks sold in the United States average about 28 highway miles per gallon (mpg) in 2010, up from about 24 mpg in 2004. Some of the improvement

More information

Review Guide for Term Paper (Teza)

Review Guide for Term Paper (Teza) Review Guide for Term Paper (Teza) We will soon have a term paper over the material covered in Chapters 1, 2, 8, 9, 15, 22, 23, 16 and 3. In Chapter 1 we covered: Place value, multiplying and dividing

More information

Graphing Bivariate Relationships

Graphing Bivariate Relationships Graphing Bivariate Relationships Overview To fully explore the relationship between two variables both summary statistics and visualizations are important. For this assignment you will describe the relationship

More information

MATH 1070 Introductory Statistics Lecture notes Descriptive Statistics and Graphical Representation

MATH 1070 Introductory Statistics Lecture notes Descriptive Statistics and Graphical Representation MATH 1070 Introductory Statistics Lecture notes Descriptive Statistics and Graphical Representation Objectives: 1. Learn the meaning of descriptive versus inferential statistics 2. Identify bar graphs,

More information

Vocabulary: Data Distributions

Vocabulary: Data Distributions Vocabulary: Data Distributions Concept Two Types of Data. I. Categorical data: is data that has been collected and recorded about some non-numerical attribute. For example: color is an attribute or variable

More information

Ten Great Reasons to Learn SAS Software's SQL Procedure

Ten Great Reasons to Learn SAS Software's SQL Procedure Ten Great Reasons to Learn SAS Software's SQL Procedure Kirk Paul Lafler, Software Intelligence Corporation ABSTRACT The SQL Procedure has so many great features for both end-users and programmers. It's

More information

Gateway Regional School District VERTICAL ALIGNMENT OF MATHEMATICS STANDARDS Grades 3-6

Gateway Regional School District VERTICAL ALIGNMENT OF MATHEMATICS STANDARDS Grades 3-6 NUMBER SENSE & OPERATIONS 3.N.1 Exhibit an understanding of the values of the digits in the base ten number system by reading, modeling, writing, comparing, and ordering whole numbers through 9,999. Our

More information

How many toothpicks are needed for her second pattern? How many toothpicks are needed for her third pattern?

How many toothpicks are needed for her second pattern? How many toothpicks are needed for her third pattern? Problem of the Month Tri - Triangles Level A: Lisa is making triangle patterns out of toothpicks all the same length. A triangle is made from three toothpicks. Her first pattern is a single triangle. Her

More information

Multiple-Subscripted Arrays

Multiple-Subscripted Arrays Arrays in C can have multiple subscripts. A common use of multiple-subscripted arrays (also called multidimensional arrays) is to represent tables of values consisting of information arranged in rows and

More information

STAT 20060: Statistics for Engineers. Statistical Programming with R

STAT 20060: Statistics for Engineers. Statistical Programming with R STAT 20060: Statistics for Engineers Statistical Programming with R Why R? Because it s free to download for everyone! Most statistical software is very, very expensive, so this is a big advantage. Statisticians

More information

Summarising Data. Mark Lunt 09/10/2018. Arthritis Research UK Epidemiology Unit University of Manchester

Summarising Data. Mark Lunt 09/10/2018. Arthritis Research UK Epidemiology Unit University of Manchester Summarising Data Mark Lunt Arthritis Research UK Epidemiology Unit University of Manchester 09/10/2018 Summarising Data Today we will consider Different types of data Appropriate ways to summarise these

More information

Spreadsheet Applications Test

Spreadsheet Applications Test Spreadsheet Applications Test 1. The expression returns the maximum value in the range A1:A100 and then divides the value by 100. a. =MAX(A1:A100/100) b. =MAXIMUM(A1:A100)/100 c. =MAX(A1:A100)/100 d. =MAX(100)/(A1:A100)

More information