Session 26 TS, Predictive Analytics: Moving Out of Square One. Moderator: Jean-Marc Fix, FSA, MAAA

Size: px
Start display at page:

Download "Session 26 TS, Predictive Analytics: Moving Out of Square One. Moderator: Jean-Marc Fix, FSA, MAAA"

Transcription

1 Session 26 TS, Predictive Analytics: Moving Out of Square One Moderator: Jean-Marc Fix, FSA, MAAA Presenters: Jean-Marc Fix, FSA, MAAA Jeffery Robert Huddleston, ASA, CERA, MAAA

2 Predictive Modeling: Getting out of Life and Annuity Symposium Nashville, May 2016 Jean-Marc Fix, FSA, MAAA Vice President, R&D Optimum Re Insurance Jeff Huddleston, ASA, MAAA, CERA Senior Consultant Deloitte Consulting LLP

3 On to Square 2 Today s goal Things to have before you start Things to know before you start Starting The view from square 2 2

4 Today s Goal You have heard a lot about predictive modeling Time to get your feet wet Moving to square 2 3

5 Things to Have R (free!) Basic understanding of key concepts Data A question (for this lesson we reverse the logical order!) Patience Willingness to ask questions 4

6 Things to Know Basic R concepts Basic linear modeling concepts Basic statistical concepts 5

7 Starting Relevant ASOPs Document Use script Comments line start with # Keep script clean 6

8 What is R? For oldies: similarities to APL Don t think of it as a programming language to start Collection of functions extracted from useful packages Easy to dabble Lots of online resources ( Coursera) 7

9 Useful Packages and Functions Get R Finding functions: word of the net R-seek Quick-R R-blogger Loading package that has the functions you want install.package(packireallywant) library(packireallywant) 8

10 R Concepts Dataframe Categorical variables: factors Tidy data 9

11 Concept of Tidy Data Hadley Wickham tidyr package Tidy data row=observation, column=variable 10

12 Data Wrangling Get data Basic cleaning in excel First row: headers Variable names Avoid blank spaces in names First column: ID dplyr package (also by Wickham) 11

13 Clean Data Does data look as expected Remove quotation marks Consistent date format Clean trailing spaces Fill blank values or NA values with NA Save as CSV file Check CSV file in Notepad 12

14 Load Data Open R library(libraryname) #load libraries you will need Set working directory where your working file is getwd(), setwd() Use readcsv or readcsv2 functions Can also read directly from Excel 13

15 Useful Basic R Commands c(v1,v2,v3) concatenate x:y seq(min, max, by=5) sequence?fn()??fn() x<-5 14

16 Useful Basic R Commands ls() lists object in workspace rm(object) remove object rm(list=ls()) empties workspace cbind(v1,v2,v3) concatenate vectors in column rbind(v1,v2,v3) concatenate vectors in row? rep(x, times) 15

17 Useful Basic R Commands unique() runif( num, min, max) rnorm( num, mean, sd) as.date(as.character(textdate, format) as.factor(data$var1) as.data.frame(matrix that looks like a data frame) 16

18 Scripts ~ Programs Run from scripts File/new script Select what you want to run and press Ctr-r 17

19 Basic Useful Packages See script Ask around See r-bloggers community For development look into RStudio 18

20 Get Data a<-read.csv(filename.csv) 19

21 Explore Data class() names() head() tail() dataset$varname dataset[,varnumber] dataset[obsnumber,] 20

22 Explore Data str() summary() dim() length() For factors: unique(), levels() Conditions ==,!, 21

23 Data Craft Histogram Correlations See Regression diagnostics by John Fox 22

24 Why Explore? Anscombe s Quartet 23

25 Basic Graphic Exploration Histogram and charts hist(), qplot(), boxplot(), ggplot() Correlations summarize(), aggregate(), cor(), pairs.panels() 24

26 Split Data in Two Train set Test set Size vary from 50% train/50% set to 80/20 Want a decent size in test set Can be purely random, random by groups or by time 25

27 Easy splitting in R iris[iris$species %in% c( versicolor, virginica )] iris[iris[,5]== versicolor ]! is not and == is equal Split Use dplyr library Look at sample_n() and sample_frac() functions in dplyr 26

28 Why the split? Overfitting is the bane of the modeler! 27

29 Data 28

30 Model 1 29

31 Model 2 30

32 What s Next? 31

33 Components of GLM E(Y)=g -1 (βx) or g(e(y))=βx or g(y)=βx + ε Random error Link function Dependent variables generated by an exponential family distribution Independent variables 32

34 Choose Model Parameters Distribution for lapses: commonly used Poisson distribution Link function: default link for Poisson is log 33

35 Modeling Group Data Using an offset to adjust for exposure 34

36 Setting up the Model glm(y~offset(log(exposure)) +var1+var2, family=poisson(), data=yourdataframe) link function is implied as log when family is poisson() 35

37 Run First Iteration on Train Look at variables Pick the one with the most lift Run the model 36

38 Run First Iteration on Train Add a new variable Evaluate AIC - lower is better Is the complexity worth it? Repeat Variables can be interaction between variables or lagged or power of variables 37

39 You Think Your Model is Good? Look at the residuals Any patterns? 38

40 The Proof is in the (New) Pudding Now run your model on the test set? Does it still look good? 39

41 The View from Square 2 Pygmalion by Etienne Falconet 40

42 The End Look at the final model developed by Richard Xu and his team for the report Lapse Modeling for the Post-Level Period on the SOA website Pay special attention to the appendices Go forth and multiply model 41

Data Import and Formatting

Data Import and Formatting Data Import and Formatting http://datascience.tntlab.org Module 4 Today s Agenda Importing text data Basic data visualization tidyverse vs data.table Data reshaping and type conversion Basic Text Data

More information

Data Wrangling in the Tidyverse

Data Wrangling in the Tidyverse Data Wrangling in the Tidyverse 21 st Century R DS Portugal Meetup, at Farfetch, Porto, Portugal April 19, 2017 Jim Porzak Data Science for Customer Insights 4/27/2017 1 Outline 1. A very quick introduction

More information

Description/History Objects/Language Description Commonly Used Basic Functions. More Specific Functionality Further Resources

Description/History Objects/Language Description Commonly Used Basic Functions. More Specific Functionality Further Resources R Outline Description/History Objects/Language Description Commonly Used Basic Functions Basic Stats and distributions I/O Plotting Programming More Specific Functionality Further Resources www.r-project.org

More information

Visualizing the World

Visualizing the World Visualizing the World An Introduction to Visualization 15.071x The Analytics Edge Why Visualization? The picture-examining eye is the best finder we have of the wholly unanticipated -John Tukey Visualizing

More information

An Introduction to R- Programming

An Introduction to R- Programming An Introduction to R- Programming Hadeel Alkofide, Msc, PhD NOT a biostatistician or R expert just simply an R user Some slides were adapted from lectures by Angie Mae Rodday MSc, PhD at Tufts University

More information

Business Statistics: R tutorials

Business Statistics: R tutorials Business Statistics: R tutorials Jingyu He September 29, 2017 Install R and RStudio R is a free software environment for statistical computing and graphics. Download free R and RStudio for Windows/Mac:

More information

Introduction to R Programming

Introduction to R Programming Course Overview Over the past few years, R has been steadily gaining popularity with business analysts, statisticians and data scientists as a tool of choice for conducting statistical analysis of data

More information

file:///users/williams03/a/workshops/2015.march/final/intro_to_r.html

file:///users/williams03/a/workshops/2015.march/final/intro_to_r.html Intro to R R is a functional programming language, which means that most of what one does is apply functions to objects. We will begin with a brief introduction to R objects and how functions work, and

More information

BGGN 213 Working with R packages Barry Grant

BGGN 213 Working with R packages Barry Grant BGGN 213 Working with R packages Barry Grant http://thegrantlab.org/bggn213 Recap From Last Time: Why it is important to visualize data during exploratory data analysis. Discussed data visualization best

More information

Introduction to R and R-Studio Toy Program #1 R Essentials. This illustration Assumes that You Have Installed R and R-Studio

Introduction to R and R-Studio Toy Program #1 R Essentials. This illustration Assumes that You Have Installed R and R-Studio Introduction to R and R-Studio 2018-19 Toy Program #1 R Essentials This illustration Assumes that You Have Installed R and R-Studio If you have not already installed R and RStudio, please see: Windows

More information

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015 STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, tsvv@steno.dk, Steno Diabetes Center June 11, 2015 Contents 1 Introduction 1 2 Recap: Variables 2 3 Data Containers 2 3.1 Vectors................................................

More information

An Introduction to R. Ed D. J. Berry 9th January 2017

An Introduction to R. Ed D. J. Berry 9th January 2017 An Introduction to R Ed D. J. Berry 9th January 2017 Overview Why now? Why R? General tips Recommended packages Recommended resources 2/48 Why now? Efficiency Pointandclick software just isn't time efficient

More information

A Whistle-Stop Tour of the Tidyverse

A Whistle-Stop Tour of the Tidyverse A Whistle-Stop Tour of the Tidyverse Aimee Gott Senior Consultant agott@mango-solutions.com @aimeegott_r In This Workshop You will learn What the tidyverse is & why bother using it What tools are available

More information

Introduction to R: Using R for Statistics and Data Analysis. BaRC Hot Topics

Introduction to R: Using R for Statistics and Data Analysis. BaRC Hot Topics Introduction to R: Using R for Statistics and Data Analysis BaRC Hot Topics http://barc.wi.mit.edu/hot_topics/ Why use R? Perform inferential statistics (e.g., use a statistical test to calculate a p-value)

More information

Getting and Cleaning Data. Biostatistics

Getting and Cleaning Data. Biostatistics Getting and Cleaning Data Biostatistics 140.776 Getting and Cleaning Data Getting data: APIs and web scraping Cleaning data: Tidy data Transforming data: Regular expressions Getting Data Web site Nature

More information

Introduction to R: Using R for Statistics and Data Analysis. BaRC Hot Topics

Introduction to R: Using R for Statistics and Data Analysis. BaRC Hot Topics Introduction to R: Using R for Statistics and Data Analysis BaRC Hot Topics http://barc.wi.mit.edu/hot_topics/ Why use R? Perform inferential statistics (e.g., use a statistical test to calculate a p-value)

More information

No Name What it does? 1 attach Attach your data frame to your working environment. 2 boxplot Creates a boxplot.

No Name What it does? 1 attach Attach your data frame to your working environment. 2 boxplot Creates a boxplot. No Name What it does? 1 attach Attach your data frame to your working environment. 2 boxplot Creates a boxplot. 3 confint A metafor package function that gives you the confidence intervals of effect sizes.

More information

Variables: Objects in R

Variables: Objects in R Variables: Objects in R Basic R Functionality Introduction to R for Public Health Researchers Common new users frustations 1. Different versions of software 2. Data type problems (is that a string or a

More information

STAT 1291: Data Science

STAT 1291: Data Science STAT 1291: Data Science Lecture 20 - Summary Sungkyu Jung Semester recap data visualization data wrangling professional ethics statistical foundation Statistical modeling: Regression Cause and effect:

More information

Lab 1: Introduction, Plotting, Data manipulation

Lab 1: Introduction, Plotting, Data manipulation Linear Statistical Models, R-tutorial Fall 2009 Lab 1: Introduction, Plotting, Data manipulation If you have never used Splus or R before, check out these texts and help pages; http://cran.r-project.org/doc/manuals/r-intro.html,

More information

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Contents Overview 2 Generating random numbers 2 rnorm() to generate random numbers from

More information

Introduction to scientific programming in R

Introduction to scientific programming in R Introduction to scientific programming in R John M. Drake & Pejman Rohani 1 Introduction This course will use the R language programming environment for computer modeling. The purpose of this exercise

More information

Introduction to R Commander

Introduction to R Commander Introduction to R Commander 1. Get R and Rcmdr to run 2. Familiarize yourself with Rcmdr 3. Look over Rcmdr metadata (Fox, 2005) 4. Start doing stats / plots with Rcmdr Tasks 1. Clear Workspace and History.

More information

Lab 1: Getting started with R and RStudio Questions? or

Lab 1: Getting started with R and RStudio Questions? or Lab 1: Getting started with R and RStudio Questions? david.montwe@ualberta.ca or isaacren@ualberta.ca 1. Installing R and RStudio To install R, go to https://cran.r-project.org/ and click on the Download

More information

SISG/SISMID Module 3

SISG/SISMID Module 3 SISG/SISMID Module 3 Introduction to R Ken Rice Tim Thornton University of Washington Seattle, July 2018 Introduction: Course Aims This is a first course in R. We aim to cover; Reading in, summarizing

More information

Data Analyst Nanodegree Syllabus

Data Analyst Nanodegree Syllabus Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working

More information

Data Analyst Nanodegree Syllabus

Data Analyst Nanodegree Syllabus Data Analyst Nanodegree Syllabus Discover Insights from Data with Python, R, SQL, and Tableau Before You Start Prerequisites : In order to succeed in this program, we recommend having experience working

More information

Introduction to R. UCLA Statistical Consulting Center R Bootcamp. Irina Kukuyeva September 20, 2010

Introduction to R. UCLA Statistical Consulting Center R Bootcamp. Irina Kukuyeva September 20, 2010 UCLA Statistical Consulting Center R Bootcamp Irina Kukuyeva ikukuyeva@stat.ucla.edu September 20, 2010 Outline 1 Introduction 2 Preliminaries 3 Working with Vectors and Matrices 4 Data Sets in R 5 Overview

More information

Excel to R and back 1

Excel to R and back 1 Excel to R and back 1 The R interface in RegressIt allows the user to transfer data from an Excel file to a new data frame in RStudio, load packages, and run regression models with customized table and

More information

DATA STRUCTURE AND ALGORITHM USING PYTHON

DATA STRUCTURE AND ALGORITHM USING PYTHON DATA STRUCTURE AND ALGORITHM USING PYTHON Common Use Python Module II Peter Lo Pandas Data Structures and Data Analysis tools 2 What is Pandas? Pandas is an open-source Python library providing highperformance,

More information

DATA SCIENCE INTRODUCTION QSHORE TECHNOLOGIES. About the Course:

DATA SCIENCE INTRODUCTION QSHORE TECHNOLOGIES. About the Course: DATA SCIENCE About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst/Analytics Manager/Actuarial Scientist/Business

More information

Stat405. More about data. Hadley Wickham. Tuesday, September 11, 12

Stat405. More about data. Hadley Wickham. Tuesday, September 11, 12 Stat405 More about data Hadley Wickham 1. (Data update + announcement) 2. Motivating problem 3. External data 4. Strings and factors 5. Saving data Slot machines they be sure casinos are honest? CC by-nc-nd:

More information

POL 345: Quantitative Analysis and Politics

POL 345: Quantitative Analysis and Politics POL 345: Quantitative Analysis and Politics Precept Handout 1 Week 2 (Verzani Chapter 1: Sections 1.2.4 1.4.31) Remember to complete the entire handout and submit the precept questions to the Blackboard

More information

Tutorial: SeqAPass Boxplot Generator

Tutorial: SeqAPass Boxplot Generator 1 Tutorial: SeqAPass Boxplot Generator 1. Access SeqAPASS by opening https://seqapass.epa.gov/seqapass/ using Mozilla Firefox web browser 2. Open the About link on the login page or upon logging in to

More information

LAB #1: DESCRIPTIVE STATISTICS WITH R

LAB #1: DESCRIPTIVE STATISTICS WITH R NAVAL POSTGRADUATE SCHOOL LAB #1: DESCRIPTIVE STATISTICS WITH R Statistics (OA3102) Lab #1: Descriptive Statistics with R Goal: Introduce students to various R commands for descriptive statistics. Lab

More information

Incident Response Programming with R. Eric Zielinski Sr. Consultant, Nationwide

Incident Response Programming with R. Eric Zielinski Sr. Consultant, Nationwide Incident Response Programming with R Eric Zielinski Sr. Consultant, Nationwide About Me? Cyber Defender for Nationwide Over 15 years in Information Security Speaker at various conferences FIRST, CEIC,

More information

Week 1 R Warm-Ups for Finance

Week 1 R Warm-Ups for Finance Week 1 R Warm-Ups for Finance Copyright 2016, William G. Foote. All rights reserved. Copyright 2016, William G. Foote. All rights reserved. Week 1 R Warm-Ups for Finance 1 / 97 Imagine this... You work

More information

A (very) brief introduction to R

A (very) brief introduction to R A (very) brief introduction to R You typically start R at the command line prompt in a command line interface (CLI) mode. It is not a graphical user interface (GUI) although there are some efforts to produce

More information

Advanced Econometric Methods EMET3011/8014

Advanced Econometric Methods EMET3011/8014 Advanced Econometric Methods EMET3011/8014 Lecture 2 John Stachurski Semester 1, 2011 Announcements Missed first lecture? See www.johnstachurski.net/emet Weekly download of course notes First computer

More information

STAT 540 Computing in Statistics

STAT 540 Computing in Statistics STAT 540 Computing in Statistics Introduces programming skills in two important statistical computer languages/packages. 30-40% R and 60-70% SAS Examples of Programming Skills: 1. Importing Data from External

More information

Statistics for Biologists: Practicals

Statistics for Biologists: Practicals Statistics for Biologists: Practicals Peter Stoll University of Basel HS 2012 Peter Stoll (University of Basel) Statistics for Biologists: Practicals HS 2012 1 / 22 Outline Getting started Essentials of

More information

Merging, exploring, and batch processing data from the Human Fertility Database and Human Mortality Database

Merging, exploring, and batch processing data from the Human Fertility Database and Human Mortality Database Max-Planck-Institut für demografische Forschung Max Planck Institute for Demographic Research Konrad-Zuse-Strasse 1 D-18057 Rostock GERMANY Tel +49 (0) 3 81 20 81-0; Fax +49 (0) 3 81 20 81-202; http://www.demogr.mpg.de

More information

Working w BIG data sets

Working w BIG data sets Working w BIG data sets MEMORANDUM To: Students in Big Data Analytics Subject: How to handle BIG data sets From: Prof. Roger Bohn Date: Revised May 14, 2018 version 1.4 Most of you will never run into

More information

Lecture 1: Getting Started and Data Basics

Lecture 1: Getting Started and Data Basics Lecture 1: Getting Started and Data Basics The first lecture is intended to provide you the basics for running R. Outline: 1. An Introductory R Session 2. R as a Calculator 3. Import, export and manipulate

More information

ELAC 2018 INTRODUCTION TO DATA SCIENCE. Day 2 Rafael Santos

ELAC 2018 INTRODUCTION TO DATA SCIENCE. Day 2 Rafael Santos ELAC 2018 INTRODUCTION TO DATA SCIENCE Day 2 Rafael Santos rafael.santos@inpe.br www.lac.inpe.br/~rafael.santos/talks.html Introduction to Data Science About this Lecture Where are we? 3 RStudio Introduction

More information

A brief introduction to R

A brief introduction to R A brief introduction to R Cavan Reilly September 29, 2017 Table of contents Background R objects Operations on objects Factors Input and Output Figures Missing Data Random Numbers Control structures Background

More information

Computer lab 2 Course: Introduction to R for Biologists

Computer lab 2 Course: Introduction to R for Biologists Computer lab 2 Course: Introduction to R for Biologists April 23, 2012 1 Scripting As you have seen, you often want to run a sequence of commands several times, perhaps with small changes. An efficient

More information

Data input & output. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University.

Data input & output. Hadley Wickham. Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University. Data input & output Hadley Wickham Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University June 2012 1. Working directories 2. Loading data 3. Strings and factors

More information

2017 Predictive Analytics Symposium

2017 Predictive Analytics Symposium 2017 Predictive Analytics Symposium Session 8, Genetic Algorithms - Why and How to Use Them (workshop) Moderator: Stuart Klugman, FSA, CERA, Ph.D. Presenters: Brian Charles Grossmiller, FSA, FCA, MAAA

More information

Introduction to R. Introduction to Econometrics W

Introduction to R. Introduction to Econometrics W Introduction to R Introduction to Econometrics W3412 Begin Download R from the Comprehensive R Archive Network (CRAN) by choosing a location close to you. Students are also recommended to download RStudio,

More information

Grammar of data. dplyr. Bjarki Þór Elvarsson and Einar Hjörleifsson. Marine Research Institute. Bjarki&Einar (MRI) R-ICES 1 / 29

Grammar of data. dplyr. Bjarki Þór Elvarsson and Einar Hjörleifsson. Marine Research Institute. Bjarki&Einar (MRI) R-ICES 1 / 29 dplyr Bjarki Þór Elvarsson and Einar Hjörleifsson Marine Research Institute Bjarki&Einar (MRI) R-ICES 1 / 29 Working with data A Reformat a variable (e.g. as factors or dates) B Split one variable into

More information

References R's single biggest strenght is it online community. There are tons of free tutorials on R.

References R's single biggest strenght is it online community. There are tons of free tutorials on R. Introduction to R Syllabus Instructor Grant Cavanaugh Department of Agricultural Economics University of Kentucky E-mail: gcavanugh@uky.edu Course description Introduction to R is a short course intended

More information

Pure, predictable, pipeable: creating fluent interfaces with R. Hadley Chief Scientist, RStudio

Pure, predictable, pipeable: creating fluent interfaces with R. Hadley Chief Scientist, RStudio Pure, predictable, pipeable: creating fluent interfaces with R Hadley Wickham @hadleywickham Chief Scientist, RStudio January 2015 MOAH PIPES! Hadley Wickham @hadleywickham Chief Scientist, RStudio January

More information

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Minitab 17 commands Prepared by Jeffrey S. Simonoff Minitab 17 commands Prepared by Jeffrey S. Simonoff Data entry and manipulation To enter data by hand, click on the Worksheet window, and enter the values in as you would in any spreadsheet. To then save

More information

Collecting data. stat 480 Heike Hofmann

Collecting data. stat 480 Heike Hofmann Collecting data stat 480 Heike Hofmann Salaries...economics..financial.data Health...fitness Movies..e.g..ratings..box.office.revenues... Global.issues..comparison.across.countries Favorite Like it very

More information

CLEANING DATA IN R. Type conversions

CLEANING DATA IN R. Type conversions CLEANING DATA IN R Type conversions Types of variables in R character: "treatment", "123", "A" numeric: 23.44, 120, NaN, Inf integer: 4L, 1123L factor: factor("hello"), factor(8) logical: TRUE, FALSE,

More information

GLM II. Basic Modeling Strategy CAS Ratemaking and Product Management Seminar by Paul Bailey. March 10, 2015

GLM II. Basic Modeling Strategy CAS Ratemaking and Product Management Seminar by Paul Bailey. March 10, 2015 GLM II Basic Modeling Strategy 2015 CAS Ratemaking and Product Management Seminar by Paul Bailey March 10, 2015 Building predictive models is a multi-step process Set project goals and review background

More information

ACHIEVEMENTS FROM TRAINING

ACHIEVEMENTS FROM TRAINING LEARN WELL TECHNOCRAFT DATA SCIENCE/ MACHINE LEARNING SYLLABUS 8TH YEAR OF ACCOMPLISHMENTS AUTHORIZED GLOBAL CERTIFICATION CENTER FOR MICROSOFT, ORACLE, IBM, AWS AND MANY MORE. 8411002339/7709292162 WWW.DW-LEARNWELL.COM

More information

Introduction to R 21/11/2016

Introduction to R 21/11/2016 Introduction to R 21/11/2016 C3BI Vincent Guillemot & Anne Biton R: presentation and installation Where? https://cran.r-project.org/ How to install and use it? Follow the steps: you don t need advanced

More information

Short Introduction to R

Short Introduction to R Short Introduction to R Paulino Pérez 1 José Crossa 2 1 ColPos-México 2 CIMMyT-México June, 2015. CIMMYT, México-SAGPDB Short Introduction to R 1/51 Contents 1 Introduction 2 Simple objects 3 User defined

More information

Overview and Practical Application of Machine Learning in Pricing

Overview and Practical Application of Machine Learning in Pricing Overview and Practical Application of Machine Learning in Pricing 2017 CAS Spring Meeting May 23, 2017 Duncan Anderson and Claudine Modlin (Willis Towers Watson) Mark Richards (Allstate Insurance Company)

More information

Index. Bar charts, 106 bartlett.test function, 159 Bottles dataset, 69 Box plots, 113

Index. Bar charts, 106 bartlett.test function, 159 Bottles dataset, 69 Box plots, 113 Index A Add-on packages information page, 186 187 Linux users, 191 Mac users, 189 mirror sites, 185 Windows users, 187 aggregate function, 62 Analysis of variance (ANOVA), 152 anova function, 152 as.data.frame

More information

Python for Data Analysis. Prof.Sushila Aghav-Palwe Assistant Professor MIT

Python for Data Analysis. Prof.Sushila Aghav-Palwe Assistant Professor MIT Python for Data Analysis Prof.Sushila Aghav-Palwe Assistant Professor MIT Four steps to apply data analytics: 1. Define your Objective What are you trying to achieve? What could the result look like? 2.

More information

MIS2502: Data Analytics Introduction to Advanced Analytics and R. Jing Gong

MIS2502: Data Analytics Introduction to Advanced Analytics and R. Jing Gong MIS2502: Data Analytics Introduction to Advanced Analytics and R Jing Gong gong@temple.edu http://community.mis.temple.edu/gong The Information Architecture of an Organization Now we re here Data entry

More information

Statistical Software Camp: Introduction to R

Statistical Software Camp: Introduction to R Statistical Software Camp: Introduction to R Day 1 August 24, 2009 1 Introduction 1.1 Why Use R? ˆ Widely-used (ever-increasingly so in political science) ˆ Free ˆ Power and flexibility ˆ Graphical capabilities

More information

Data analysis case study using R for readily available data set using any one machine learning Algorithm

Data analysis case study using R for readily available data set using any one machine learning Algorithm Assignment-4 Data analysis case study using R for readily available data set using any one machine learning Algorithm Broadly, there are 3 types of Machine Learning Algorithms.. 1. Supervised Learning

More information

An Introduction to R Graphics

An Introduction to R Graphics An Introduction to R Graphics PnP Group Seminar 25 th April 2012 Why use R for graphics? Fast data exploration Easy automation and reproducibility Create publication quality figures Customisation of almost

More information

VCEasy VISUAL FURTHER MATHS. Overview

VCEasy VISUAL FURTHER MATHS. Overview VCEasy VISUAL FURTHER MATHS Overview This booklet is a visual overview of the knowledge required for the VCE Year 12 Further Maths examination.! This booklet does not replace any existing resources that

More information

Visualizing ASH. John Beresniewicz NoCOUG 2018

Visualizing ASH. John Beresniewicz NoCOUG 2018 Visualizing ASH John Beresniewicz NoCOUG 2018 Agenda What is ASH? Mechanism and properties Usage: ASH Math, Average Active Sessions ASH Visualizations EM Performance: Wait class details, Top Activity,

More information

R syntax guide. Richard Gonzalez Psychology 613. August 27, 2015

R syntax guide. Richard Gonzalez Psychology 613. August 27, 2015 R syntax guide Richard Gonzalez Psychology 613 August 27, 2015 This handout will help you get started with R syntax. There are obviously many details that I cannot cover in these short notes but these

More information

Handout #1. The abbreviations of FIVE references are PE, MPS, BR, FCDAE, and PRA. There is additional reference about the use of R (BR).

Handout #1. The abbreviations of FIVE references are PE, MPS, BR, FCDAE, and PRA. There is additional reference about the use of R (BR). Handout #1 Title: FAE Course: Econ 368/01 Spring/2015 Instructor: Dr. I-Ming Chiu The abbreviations of FIVE references are PE, MPS, BR, FCDAE, and PRA. There is additional reference about the use of R

More information

the star lab introduction to R Day 2 Open R and RWinEdt should follow: we ll need that today.

the star lab introduction to R Day 2 Open R and RWinEdt should follow: we ll need that today. R-WinEdt Open R and RWinEdt should follow: we ll need that today. Cleaning the memory At any one time, R is storing objects in its memory. The fact that everything is an object in R is generally a good

More information

Hal Varian, Google s Chief Economist The McKinsey Quarterly, Jan 2009

Hal Varian, Google s Chief Economist The McKinsey Quarterly, Jan 2009 The ability to take data to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it that s going to be a hugely important skill in the next decades, because

More information

Computing With R Handout 1

Computing With R Handout 1 Computing With R Handout 1 Getting Into R To access the R language (free software), go to a computing lab that has R installed, or a computer on which you have downloaded R from one of the distribution

More information

Goals of this course. Crash Course in R. Getting Started with R. What is R? What is R? Getting you setup to use R under Windows

Goals of this course. Crash Course in R. Getting Started with R. What is R? What is R? Getting you setup to use R under Windows Oxford Spring School, April 2013 Effective Presentation ti Monday morning lecture: Crash Course in R Robert Andersen Department of Sociology University of Toronto And Dave Armstrong Department of Political

More information

The Tidyverse BIOF 339 9/25/2018

The Tidyverse BIOF 339 9/25/2018 The Tidyverse BIOF 339 9/25/2018 What is the Tidyverse? The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar,

More information

R Basics / Course Business

R Basics / Course Business R Basics / Course Business We ll be using a sample dataset in class today: CourseWeb: Course Documents " Sample Data " Week 2 Can download to your computer before class CourseWeb survey on research/stats

More information

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA

Predictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,

More information

Excel 2010 Tutorials - Video File Attributes

Excel 2010 Tutorials - Video File Attributes Get Familiar with Excel 2010 42.30 2.70 The Excel 2010 Environment 4.10 0.18 Quick Access Toolbar 3.10 0.27 Excel 2010 Ribbon 3.10 0.26 File Tab 3.10 0.28 Home Tab 5.10 0.17 Insert Tab 3.10 0.18 Page Layout

More information

Introduction to R Jason Huff, QB3 CGRL UC Berkeley April 15, 2016

Introduction to R Jason Huff, QB3 CGRL UC Berkeley April 15, 2016 Introduction to R Jason Huff, QB3 CGRL UC Berkeley April 15, 2016 Installing R R is constantly updated and you should download a recent version; the version when this workshop was written was 3.2.4 I also

More information

Reading and writing data

Reading and writing data An introduction to WS 2017/2018 Reading and writing data Dr. Noémie Becker Dr. Sonja Grath Special thanks to: Prof. Dr. Martin Hutzenthaler and Dr. Benedikt Holtmann for significant contributions to course

More information

Mails : ; Document version: 14/09/12

Mails : ; Document version: 14/09/12 Mails : leslie.regad@univ-paris-diderot.fr ; gaelle.lelandais@univ-paris-diderot.fr Document version: 14/09/12 A freely available language and environment Statistical computing Graphics Supplementary

More information

Introduction to Statistics using R/Rstudio

Introduction to Statistics using R/Rstudio Introduction to Statistics using R/Rstudio R and Rstudio Getting Started Assume that R for Windows and Macs already installed on your laptop. (Instructions for installations sent) R on Windows R on MACs

More information

Excel Tutorials - File Size & Duration

Excel Tutorials - File Size & Duration Get Familiar with Excel 46.30 2.96 The Excel Environment 4.10 0.17 Quick Access Toolbar 3.10 0.26 Excel Ribbon 3.10 0.26 File Tab 3.10 0.32 Home Tab 5.10 0.16 Insert Tab 3.10 0.16 Page Layout Tab 3.10

More information

R package

R package R package www.r-project.org Download choose the R version for your OS install R for the first time Download R 3 run R MAGDA MIELCZAREK 2 help help( nameofthefunction )? nameofthefunction args(nameofthefunction)

More information

Dr. Barbara Morgan Quantitative Methods

Dr. Barbara Morgan Quantitative Methods Dr. Barbara Morgan Quantitative Methods 195.650 Basic Stata This is a brief guide to using the most basic operations in Stata. Stata also has an on-line tutorial. At the initial prompt type tutorial. In

More information

Fuzzy Rogers Research Computing Administrator Materials Research Laboratory (MRL) Center for Scientific Computing (CSC)

Fuzzy Rogers Research Computing Administrator Materials Research Laboratory (MRL) Center for Scientific Computing (CSC) Intro to R Fuzzy Rogers Research Computing Administrator Materials Research Laboratory (MRL) Center for Scientific Computing (CSC) fuz@mrl.ucsb.edu MRL 2066B Sharon Solis Paul Weakliem Research Computing

More information

Recap From Last Time: Today s Learning Goals BIMM 143. Data analysis with R Lecture 4. Barry Grant.

Recap From Last Time: Today s Learning Goals BIMM 143. Data analysis with R Lecture 4. Barry Grant. BIMM 143 Data analysis with R Lecture 4 Barry Grant http://thegrantlab.org/bimm143 Recap From Last Time: Substitution matrices: Where our alignment match and mis-match scores typically come from Comparing

More information

Financial Econometrics Practical

Financial Econometrics Practical Financial Econometrics Practical Practical 3: Plotting in R NF Katzke Table of Contents 1 Introduction 1 1.0.1 Install ggplot2................................................. 2 1.1 Get data Tidy.....................................................

More information

Learning SAS. Hadley Wickham

Learning SAS. Hadley Wickham Learning SAS Hadley Wickham Outline Intro & data manipulation basics Fitting models x2 Writing macros No graphics (see http://support.sas.com/ techsup/sample/sample_graph.html for why) Today s outline

More information

Introduction to Graphics with ggplot2

Introduction to Graphics with ggplot2 Introduction to Graphics with ggplot2 Reaction 2017 Flavio Santi Sept. 6, 2017 Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 1 / 28 Graphics with ggplot2 ggplot2 [... ] allows you to

More information

R Workshop Guide. 1 Some Programming Basics. 1.1 Writing and executing code in R

R Workshop Guide. 1 Some Programming Basics. 1.1 Writing and executing code in R R Workshop Guide This guide reviews the examples we will cover in today s workshop. It should be a helpful introduction to R, but for more details, you can access a more extensive user guide for R on the

More information

AN INTRODUCTION TO R FOR MANAGEMENT SCHOLARS

AN INTRODUCTION TO R FOR MANAGEMENT SCHOLARS AN INTRODUCTION TO R FOR MANAGEMENT SCHOLARS 24 January 2017 Stefan Breet breet@rsm.nl www.stefanbreet.com TODAY What is R? How to use R? The Basics How to use R? The Data Analysis Process WHAT IS R? AN

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming

More information

SQL Bits - the great data heist Manchester An R primer for SQL folks. Thomas Hütter

SQL Bits - the great data heist Manchester An R primer for SQL folks. Thomas Hütter SQL Bits - the great data heist Manchester 2019 An R primer for SQL folks Thomas Hütter An R primer for SQL folks Thomas Hütter, Diplom-Betriebswirt Application developer, consultant, accidental DBA, author

More information

Author: Leonore Findsen, Qi Wang, Sarah H. Sellke, Jeremy Troisi

Author: Leonore Findsen, Qi Wang, Sarah H. Sellke, Jeremy Troisi 0. Downloading Data from the Book Website 1. Go to http://bcs.whfreeman.com/ips8e 2. Click on Data Sets 3. Click on Data Sets: PC Text 4. Click on Click here to download. 5. Right Click PC Text and choose

More information

Intro to Stata for Political Scientists

Intro to Stata for Political Scientists Intro to Stata for Political Scientists Andrew S. Rosenberg Junior PRISM Fellow Department of Political Science Workshop Description This is an Introduction to Stata I will assume little/no prior knowledge

More information

Lab 1. Introduction to R & SAS. R is free, open-source software. Get it here:

Lab 1. Introduction to R & SAS. R is free, open-source software. Get it here: Lab 1. Introduction to R & SAS R is free, open-source software. Get it here: http://tinyurl.com/yfet8mj for your own computer. 1.1. Using R like a calculator Open R and type these commands into the R Console

More information

Data-informed collection decisions using R or, learning R using collection data

Data-informed collection decisions using R or, learning R using collection data Data-informed collection decisions using R or, learning R using collection data Heidi Tebbe Collections & Research Librarian for Engineering and Data Science NCSU Libraries Collections & Research Librarian

More information

Work through the sheet in any order you like. Skip the starred (*) bits in the first instance, unless you re fairly confident.

Work through the sheet in any order you like. Skip the starred (*) bits in the first instance, unless you re fairly confident. CDT R Review Sheet Work through the sheet in any order you like. Skip the starred (*) bits in the first instance, unless you re fairly confident. 1. Vectors (a) Generate 100 standard normal random variables,

More information