Beer, Farms, and Peas
|
|
- Elvin Lucas
- 5 years ago
- Views:
Transcription
1 Sir Francis Galton Karl Pearson William Sealy Gosset Beer, Farms, and Peas Applied Statistics & 석사과정이상열 Sir Ronald Fisher
2 Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data, [1] or the quantitative description itself. Descriptive statistics are distinguished from inferential statistics (Wikipedia) 1. Central tendency(arithmetic mean, median, mode, Geometric mean ) 2. Dispersion(Sample standard deviation, Range, variance, mean difference ) 3. Minimum and Maximum values 4. Kurtosis, skewness R graphics : low-level graphics function high-level graphics function (R Package ggplot2, igraph) FF / Bigmemory : Large data processing package 01 Challenge 02 Descriptive statistics 03 R graphics 04 FF / Bigmemory
3 Challenge
4 Pareto Distribution( Generate 51 random numbers. Create a histogram of these random numbers hist(rnorm(51, , )) hist(rpareto(51,1,1))
5 Random numbers Pareto Distribution Code ( ppareto <- function(x, scale, shape) { ifelse(x > scale, 1 - (scale / x) ^ shape, 0) } - CDF qpareto <- function(y, scale, shape) { ifelse( y >= 0 & y <= 1, scale * ((1 - y) ^ (-1 / shape)), NaN ) } inverse cdf rpareto <- function(n, scale, shape, lower_bound = scale, upper_bound = Inf) { quantiles <- ppareto(c(lower_bound, upper_bound), scale, shape) uniform_random_numbers <- runif(n, quantiles[1], quantiles[2]) qpareto(uniform_random_numbers, scale, shape) }
6 Descriptive statistics
7 1. Data acquisition 서울열린데이터광장 ( 생필품가격 ( 2. Descriptive statistics pp <- read.csv("c:/users/user/desktop/korea/2013 년 2 학기 /datascience/ 발표 1/ 생필품가격.csv", header=true) dim(pp) [1] ppta <- split(pp$ 가격, list(pp$ 품목. 이름, pp$ 년도. 월, pp$ 시장유형. 구분. 시장. 마트.. 이름 )) ppta$` 고등어 ( 생물, 국산 ).Apr-13. 대형마트 ` [1] [24] ppta$` 고등어 ( 생물, 국산 ).Apr-13. 전통시장 ` [1] mean(ppta$` 사과 ( 부사, 300g).Sep-13. 전통시장 `, trim=0.05) [1] mean(ppta$` 사과 ( 부사, 300g).Sep-13. 대형마트 `, trim=0.05) [1] median(ppta$` 사과 ( 부사, 300g).Sep-13. 전통시장 `) [1] 2000 median(ppta$` 사과 ( 부사, 300g).Sep-13. 대형마트 `) [1] 2175
8 2. Descriptive statistics median(ppta$` 쇠고기 ( 한우, 불고기 ).Sep-13. 대형마트 `) [1] median(ppta$` 쇠고기 ( 한우, 불고기 ).Sep-13. 전통시장 `) [1] sd(ppta$` 쇠고기 ( 한우, 불고기 ).Sep-13. 전통시장 `) [1] sd(ppta$` 쇠고기 ( 한우, 불고기 ).Sep-13. 대형마트 `) [1] ( 한우, 불고기 ).Sep-13. 전통시장 `) [1] lapply(ppta, summary) $` 배추 ( 중간 ).Sep-13. 전통시장 ` Min. 1st Qu. Median Mean 3rd Qu. Max $` 사과.Sep-13. 전통시장 ` Min. 1st Qu. Median Mean 3rd Qu. Max $` 사과 ( 부사 ).Sep-13. 전통시장 ` Min. 1st Qu. Median Mean 3rd Qu. Max
9 # Split 함수이외의그룹핑하는방법 1. By (tapply 와비슷한역할이지만벡터대신객체를사용 ) 1. Aggregate ( 그룹의각변수별로 tapply 를한번식호출 )
10 3. Histogram 4. Boxplot
11 R graphics
12 1. High-level graphics function plot (generic graph function) boxplot Hist qqnorm curve 2. Low-level graphics function points lines abline segments polygon text High-level graphics function 를사용하면새로운그래프생성, low-level graphics function 는우선고수준그래픽을불러온뒤에추가호출
13 3. 9 월달생필품가격차이 ( 전통시장, 대형마트 ) plyr, ggplot2 package 사용하여생필품의품목별, 시장별가격차이를알수있었다. ( 코드는 에서참조 )
14 4. ggplot2 ggplot example iris data R 기반의그래픽패키지 The Grammar of Graphcis(Wikinson, 2005) 기반그래프객체를사용 유연한플로팅환경을제공. 그래프를프로그래밍화함. qplot 빠른플로팅을위한함수 ggplot 문법기준, 상세설정을위한함수 ggplot(iris,aes(sepal.length,sepal.width)) + geom_point(aes(colour=species)) + geom_smooth(aes(colour=species), method=lm) Referenece (
15 5. Network graph 활용사례 : 시계열데이터간상관계수값을이용하여네트워크그래프생성 library(igraph) gd <- graph(c(1,2, 2,3, 2,4, 1,4, 5,5, 3,6)) plot(gd) Label 사이의이름을추가하거나거리, 색깔, 사이즈모두바꿀수있음. 네트워크그래프에대한자세한설명은아래사이트에나와있음. R Graphics Cookbook ( urce=web&cd=1&ved=0cdiqfjaa&url=http%3a%2f%2fdeca.c uc.edu.cn%2fcommunity%2fmedia%2fp%2f23508%2fdownloa d.aspx&ei=vem9uu6- O8W1iQeFrIHgAg&usg=AFQjCNHJNZvnRcHXqfmXzNGuwmVRsE ZD1A&sig2=xNEX- Z9Eu6S4qsEyyHw7aQ&bvm=bv ,d.aGc&cad=rjt) : 개별 : 그룹화
16 FF / Bigmemory
17 Big data in the limit of R R works only on RAM (R 은 바이트의객체크기제한이있다. 객체가메모리에저장되기때문에 ) File.txt (big data) Data = read.table( File.txt, ) 1. 로딩이되어도오래걸리거나 2. 메모리에저장이안될정도로크거나 해결책 1. 청킹 (Chunking) 2. 메모리관리용 R package (ff, bigmemory, RevoScaleR) 3. 병렬실행 (Parallel execution)
18 1. Chunking read.table(file=, header = FALSE, nrow =?, skip =?, ) File.txt Data1 = read.table( File.txt, nrow = , skip = 0, ) Stat1 = summary(data1) Data2 = read.table( File.txt, nrow = , skip = , ) Stat2 = summary(data2) Data3 = read.table( File.txt, nrow = , skip = , ) Stat3 = summary(data3) Data4 = read.table( File.txt, nrow = , skip = , ) Stat4 = summary(data4) Aggregate(Stat1, Stat2, Stat3, Stat4) 데이터를분할하여분석가능하지만이것조차크다면처리가불가능하고분할및순차계산으로의해계산속도가느려질수있음.
19 2. 메모리관리용패키지 설명장점단점 ff 메모리를디스크에저장 Clean system few examples Bigmemory 메모리를디스크뿐만아니라메인메모리에도저장 쉬운사용방법 ff 보다널리쓰임 병렬패키지와의연동 (SNOW) 윈도우에서사용불가, 문자가들어있는데이터는전처리작업필요데이터자료형태를통일해야함 (like double, integer, short, char) RevoScaleR Revolution R Enterprise 상용프로그램으로 XDF 데이터형식을사용해서메모리한계를극복 C++ 프로그래머가외부메모리알고리즘을작성할수있도록확장가능한프레임워크제공 상용제품이기때문에아카데미버전만사용가능 ( 학생 ) 빠른연산속도
20 2. ff package ( 관련패키지 : ff, ffbase, biglm, biglars, speedglm)
21 2. ff example library(ff) rff <- read.table.ffdf(file="e:/dataset/3d_spatial_network.txt", sep=",",header=false) rff Data (3D Road Network Data set) V1 : ID V2 : LONGITUDE( 경도 ) V3 : LATITUDE( 위도 ) V4 : ALTITUDE( 높이 ) Data UCI machine Learning (
22 2. ff example library(ffbase) - Basic statistical functions for ff dim(rff) [1] sum(rff$v2) [1] mean(rff$v2) [1] min(rff$v2) [1] max(rff$v2) [1] range(rff$v2) [1] quantile(rff$v2) 0% 25% 50% 75% 100% hist(rff$v2)
23 2. ff example library(biglm) - Bounded memory linear regression dim(rff) [1] rfflm <- biglm(v1 ~ V2 + V3 + V4, data=rff) rfflm Large data regression model: biglm(v1 ~ V2 + V3 + V4, data = rff) Sample size = summary(rfflm) Large data regression model: biglm(v1 ~ V2 + V3 + V4, data = rff)
24 2. Bigmemory package( bigalgebra (linear algebra function) biganalytics (big.matrix such as GLM, bigkmeans) bigmemory bigtabulate (table, tapply, split 와같은함수를제공 ) synchronicity (data streaming, shared-memory capabilities) library(bigmemory) X <- read.big.matrix("/media/604e93df4e93ac72/dataset/3d_spatial_network.txt", header = FALSE, sep=",", type = "double", backingfile ="BigMem.bin", descriptorfile = "BigMem.desc", shared = TRUE) X An object of class "big.matrix" Slot "address": <pointer: 0xad5c470> > summary(x) Length Class Mode big.matrix S4 Data UCI machine Learning (
25 2. Bigmemory example > head(x) y x1 x2 x3 [1,] [2,] [3,] [4,] [5,] [6,] > dim(x) [1] Library(biganalytics) > sum(x) [1] e+13 > mean(x) [1] > colmax(x) y x1 x2 x e e e e+02 > colsd(x) y x1 x2 x e e e e+01 > colrange(x) min max y e e+08 x e e+01 x e e+01 x e e+02 > colsum(x) y x1 x2 x e e e e+06
26 2. Bigmemory example library(biganalytics) xlm <- biglm.big.matrix(y ~ x1 + x2 + x3, data=x) xlm Large data regression model: biglm(formula = formula, data = data,...) Sample size = summary(xlm) Large data regression model: biglm(formula = formula, data = data,...) Sample size = Coef (95% CI) SE p (Intercept) x x x bigkmeans(x, 3, iter.max=0, nstart=1) K-means clustering with 3 clusters of sizes , , Cluster means: [,1] [,2] [,3] [,4] [1,] [2,] [3,] Clustering vector: [1] [67] [133]
Package biganalytics
Version 1.1.14 Date 2016-02-17 Package biganalytics February 18, 2016 Title Utilities for 'big.matrix' Objects from Package 'bigmemory' Author John W. Emerson and Michael J. Kane
More informationPackage biganalytics
Version 1.0.14 Date 2010-06-22 Package biganalytics June 25, 2010 Title A library of utilities for big.matrix objects of package bigmemory. Author John W. Emerson and Michael J.
More informationStatistics 251: Statistical Methods
Statistics 251: Statistical Methods Summaries and Graphs in R Module R1 2018 file:///u:/documents/classes/lectures/251301/renae/markdown/master%20versions/summary_graphs.html#1 1/14 Summary Statistics
More informationWhat is Scalable Data Processing?
SCALABLE DATA PROCESSING IN R What is Scalable Data Processing? Michael J. Kane and Simon Urbanek Instructors, DataCamp In this course.. Work with data that is too large for your computer Write Scalable
More informationPsychology 405: Psychometric Theory Homework 1: answers
Psychology 405: Psychometric Theory Homework 1: answers William Revelle Department of Psychology Northwestern University Evanston, Illinois USA April, 2017 1 / 12 Outline Preliminaries Assignment Analysis
More informationIntroduction to R, Github and Gitlab
Introduction to R, Github and Gitlab 27/11/2018 Pierpaolo Maisano Delser mail: maisanop@tcd.ie ; pm604@cam.ac.uk Outline: Why R? What can R do? Basic commands and operations Data analysis in R Github and
More informationSTA Module 2B Organizing Data and Comparing Distributions (Part II)
STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and
More informationSTA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)
STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and
More informationVisualizing univariate data 1
Visualizing univariate data 1 Xijin Ge SDSU Math/Stat Broad perspectives of exploratory data analysis(eda) EDA is not a mere collection of techniques; EDA is a new altitude and philosophy as to how we
More informationSTA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures
STA 2023 Module 3 Descriptive Measures Learning Objectives Upon completing this module, you should be able to: 1. Explain the purpose of a measure of center. 2. Obtain and interpret the mean, median, and
More informationOld title: The bigmemory package: handling large data sets in R using RAM and shared memory
New Abstract: Old title: The bigmemory package: handling large data sets in R using RAM and shared memory New title: The R Package bigmemory: Supporting Efficient Computation and Concurrent Programming
More informationChapter 5: The beast of bias
Chapter 5: The beast of bias Self-test answers SELF-TEST Compute the mean and sum of squared error for the new data set. First we need to compute the mean: + 3 + + 3 + 2 5 9 5 3. Then the sum of squared
More informationAn introduction to R WS 2013/2014
An introduction to R WS 2013/2014 Dr. Noémie Becker (AG Metzler) Dr. Sonja Grath (AG Parsch) Special thanks to: Dr. Martin Hutzenthaler (previously AG Metzler, now University of Frankfurt) course development,
More informationTI-83 Users Guide. to accompany. Statistics: Unlocking the Power of Data by Lock, Lock, Lock, Lock, and Lock
TI-83 Users Guide to accompany by Lock, Lock, Lock, Lock, and Lock TI-83 Users Guide- 1 Getting Started Entering Data Use the STAT menu, then select EDIT and hit Enter. Enter data for a single variable
More informationAn introduction to WS 2015/2016
An introduction to WS 2015/2016 Dr. Noémie Becker (AG Metzler) Dr. Sonja Grath (AG Parsch) Special thanks to: Prof. Dr. Martin Hutzenthaler (previously AG Metzler, now University of Duisburg-Essen) course
More informationData Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha
Data Preprocessing S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha 1 Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking
More informationNo Name What it does? 1 attach Attach your data frame to your working environment. 2 boxplot Creates a boxplot.
No Name What it does? 1 attach Attach your data frame to your working environment. 2 boxplot Creates a boxplot. 3 confint A metafor package function that gives you the confidence intervals of effect sizes.
More informationfile:///d:/r/stateofther/bigdata/slides/bigdatapresentation.html#1 1 sur 44 06/07/2018 à 16:56
1 sur 44 06/07/2018 à 16:56 2 sur 44 06/07/2018 à 16:56 Arabidopsis[1:5,1:10] ## L1 L2 L3 L4 L5 L6 L7 L8 L9 L10 ## M1 1 0 1 1 0 1 0 1 1 1 ## M2 1 0 1 1 0 1 1 1 1 1 ## M3 1 0 1 1 0 1 1 1 1 1 ## M4 0 0 0
More informationCS Introduction to Computational and Data Science. Instructor: Renzhi Cao Computer Science Department Pacific Lutheran University Spring 2017
CS 133 - Introduction to Computational and Data Science Instructor: Renzhi Cao Computer Science Department Pacific Lutheran University Spring 2017 Announcement Read book for R control structure and function.
More informationSTENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015
STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, tsvv@steno.dk, Steno Diabetes Center June 11, 2015 Contents 1 Introduction 1 2 Recap: Variables 2 3 Data Containers 2 3.1 Vectors................................................
More informationAcquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.
Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting
More informationEXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression
EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression OBJECTIVES 1. Prepare a scatter plot of the dependent variable on the independent variable 2. Do a simple linear regression
More informationPackage bigmemory. January 11, 2018
Version 4.5.33 Package bigmemory January 11, 2018 Title Manage Massive Matrices with Shared Memory and Memory-Mapped Files Author Michael J. Kane , John W. Emerson ,
More informationQuantitative - One Population
Quantitative - One Population The Quantitative One Population VISA procedures allow the user to perform descriptive and inferential procedures for problems involving one population with quantitative (interval)
More informationIntegrated Math I. IM1.1.3 Understand and use the distributive, associative, and commutative properties.
Standard 1: Number Sense and Computation Students simplify and compare expressions. They use rational exponents and simplify square roots. IM1.1.1 Compare real number expressions. IM1.1.2 Simplify square
More informationPython for Data Analysis. Prof.Sushila Aghav-Palwe Assistant Professor MIT
Python for Data Analysis Prof.Sushila Aghav-Palwe Assistant Professor MIT Four steps to apply data analytics: 1. Define your Objective What are you trying to achieve? What could the result look like? 2.
More informationECLT 5810 Data Preprocessing. Prof. Wai Lam
ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate
More informationR Programming: Worksheet 6
R Programming: Worksheet 6 Today we ll study a few useful functions we haven t come across yet: all(), any(), `%in%`, match(), pmax(), pmin(), unique() We ll also apply our knowledge to the bootstrap.
More informationPart I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures
Part I, Chapters 4 & 5 Data Tables and Data Analysis Statistics and Figures Descriptive Statistics 1 Are data points clumped? (order variable / exp. variable) Concentrated around one value? Concentrated
More informationwith High Performance Computing: Parallel processing and large memory Many thanks allocations
R with High Performance Computing: Parallel processing and large memory Amy F. Szczepański, Remote Data Analysis and Visualization Center, University of Tennessee http://rdav.nics.tennessee.edu/ Many thanks
More informationPackage bigalgebra. R topics documented: February 19, 2015
Package bigalgebra February 19, 2015 Version 0.8.4 Date 2014-04-15 Title BLAS routines for native R matrices and big.matrix objects. Author Michael J. Kane, Bryan Lewis, and John W. Emerson Maintainer
More information3. Data Analysis and Statistics
3. Data Analysis and Statistics 3.1 Visual Analysis of Data 3.2.1 Basic Statistics Examples 3.2.2 Basic Statistical Theory 3.3 Normal Distributions 3.4 Bivariate Data 3.1 Visual Analysis of Data Visual
More informationStatistical Graphics
Idea: Instant impression Statistical Graphics Bad graphics abound: From newspapers, magazines, Excel defaults, other software. 1 Color helpful: if used effectively. Avoid "chartjunk." Keep level/interests
More information#a- a vector of 100 random number from a normal distribution a<-rnorm(100, mean= 32, sd=6)
1 Transition to R Class 3: Basic functions for descriptive statistics and summarizing data Use simple functions and Xapply functions for summarizing and describing data Goals: (1) Summarizing continuous
More informationIntroduction to R. UCLA Statistical Consulting Center R Bootcamp. Irina Kukuyeva September 20, 2010
UCLA Statistical Consulting Center R Bootcamp Irina Kukuyeva ikukuyeva@stat.ucla.edu September 20, 2010 Outline 1 Introduction 2 Preliminaries 3 Working with Vectors and Matrices 4 Data Sets in R 5 Overview
More information15 Wyner Statistics Fall 2013
15 Wyner Statistics Fall 2013 CHAPTER THREE: CENTRAL TENDENCY AND VARIATION Summary, Terms, and Objectives The two most important aspects of a numerical data set are its central tendencies and its variation.
More informationHomework set 4 - Solutions
Homework set 4 - Solutions Math 3200 Renato Feres 1. (Eercise 4.12, page 153) This requires importing the data set for Eercise 4.12. You may, if you wish, type the data points into a vector. (a) Calculate
More informationStat 290: Lab 2. Introduction to R/S-Plus
Stat 290: Lab 2 Introduction to R/S-Plus Lab Objectives 1. To introduce basic R/S commands 2. Exploratory Data Tools Assignment Work through the example on your own and fill in numerical answers and graphs.
More informationData Management Project Using Software to Carry Out Data Analysis Tasks
Data Management Project Using Software to Carry Out Data Analysis Tasks This activity involves two parts: Part A deals with finding values for: Mean, Median, Mode, Range, Standard Deviation, Max and Min
More informationBIO5312: R Session 1 An Introduction to R and Descriptive Statistics
BIO5312: R Session 1 An Introduction to R and Descriptive Statistics Yujin Chung August 30th, 2016 Fall, 2016 Yujin Chung R Session 1 Fall, 2016 1/24 Introduction to R R software R is both open source
More informationBasics of Plotting Data
Basics of Plotting Data Luke Chang Last Revised July 16, 2010 One of the strengths of R over other statistical analysis packages is its ability to easily render high quality graphs. R uses vector based
More informationIndex. Bar charts, 106 bartlett.test function, 159 Bottles dataset, 69 Box plots, 113
Index A Add-on packages information page, 186 187 Linux users, 191 Mac users, 189 mirror sites, 185 Windows users, 187 aggregate function, 62 Analysis of variance (ANOVA), 152 anova function, 152 as.data.frame
More information36-402/608 HW #1 Solutions 1/21/2010
36-402/608 HW #1 Solutions 1/21/2010 1. t-test (20 points) Use fullbumpus.r to set up the data from fullbumpus.txt (both at Blackboard/Assignments). For this problem, analyze the full dataset together
More informationIn Minitab interface has two windows named Session window and Worksheet window.
Minitab Minitab is a statistics package. It was developed at the Pennsylvania State University by researchers Barbara F. Ryan, Thomas A. Ryan, Jr., and Brian L. Joiner in 1972. Minitab began as a light
More informationR Workshop Guide. 1 Some Programming Basics. 1.1 Writing and executing code in R
R Workshop Guide This guide reviews the examples we will cover in today s workshop. It should be a helpful introduction to R, but for more details, you can access a more extensive user guide for R on the
More informationTopics for today Input / Output Using data frames Mathematics with vectors and matrices Summary statistics Basic graphics
Topics for today Input / Output Using data frames Mathematics with vectors and matrices Summary statistics Basic graphics Introduction to S-Plus 1 Input: Data files For rectangular data files (n rows,
More informationStatistical Programming Camp: An Introduction to R
Statistical Programming Camp: An Introduction to R Handout 3: Data Manipulation and Summarizing Univariate Data Fox Chapters 1-3, 7-8 In this handout, we cover the following new materials: ˆ Using logical
More informationThe following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan.
Graphics Agenda Grammer of Graphics Using ggplot2 The following presentation is based on the ggplot2 tutotial written by Prof. Jennifer Bryan. ggplot2 (wiki) ggplot2 is a data visualization package Created
More informationVisualizing the World
Visualizing the World An Introduction to Visualization 15.071x The Analytics Edge Why Visualization? The picture-examining eye is the best finder we have of the wholly unanticipated -John Tukey Visualizing
More informationStreaming Data And Concurrency In R
Streaming Data And Concurrency In R Rory Winston rory@theresearchkitchen.com About Me Independent Software Consultant M.Sc. Applied Computing, 2000 M.Sc. Finance, 2008 Apache Committer Interested in practical
More informationPackage samplesizelogisticcasecontrol
Package samplesizelogisticcasecontrol February 4, 2017 Title Sample Size Calculations for Case-Control Studies Version 0.0.6 Date 2017-01-31 Author Mitchell H. Gail To determine sample size for case-control
More information1 Simple Linear Regression
Math 158 Jo Hardin R code 1 Simple Linear Regression Consider a dataset from ISLR on credit scores. Because we don t know the sampling mechanism used to collect the data, we are unable to generalize the
More informationbiglasso: extending lasso model to Big Data in R
biglasso: extending lasso model to Big Data in R Yaohui Zeng, Patrick Breheny Package Version: 1.2-3 December 1, 2016 1 User guide 1.1 Small data When the data size is small, the usage of biglasso package
More informationRegression III: Advanced Methods
Lecture 3: Distributions Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture Examine data in graphical form Graphs for looking at univariate distributions
More informationBluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition
Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition Online Learning Centre Technology Step-by-Step - Minitab Minitab is a statistical software application originally created
More informationInstall RStudio from - use the standard installation.
Session 1: Reading in Data Before you begin: Install RStudio from http://www.rstudio.com/ide/download/ - use the standard installation. Go to the course website; http://faculty.washington.edu/kenrice/rintro/
More informationSTAT:5400 Computing in Statistics
STAT:5400 Computing in Statistics Introduction to SAS Lecture 18 Oct 12, 2015 Kate Cowles 374 SH, 335-0727 kate-cowles@uiowaedu SAS SAS is the statistical software package most commonly used in business,
More informationFathom Dynamic Data TM Version 2 Specifications
Data Sources Fathom Dynamic Data TM Version 2 Specifications Use data from one of the many sample documents that come with Fathom. Enter your own data by typing into a case table. Paste data from other
More informationExtremely Large Data Challenges What R can and can't do
Extremely Large Data Challenges What R can and can't do Susan Holmes http://www-stat.stanford.edu/ susan/ Bio-X and Statistics, Stanford University NIH-R01GM086884 jeihgfdcbabakl A roadmap xkcd Some Advantages
More informationIST 3108 Data Analysis and Graphics Using R. Summarizing Data Data Import-Export
IST 3108 Data Analysis and Graphics Using R Summarizing Data Data Import-Export Engin YILDIZTEPE, PhD Working with Vectors and Logical Subscripts >xsum(x) how many of the values were less than
More informationIntroduction to Graphics with ggplot2
Introduction to Graphics with ggplot2 Reaction 2017 Flavio Santi Sept. 6, 2017 Flavio Santi Introduction to Graphics with ggplot2 Sept. 6, 2017 1 / 28 Graphics with ggplot2 ggplot2 [... ] allows you to
More informationAn Introductory Guide to R
An Introductory Guide to R By Claudia Mahler 1 Contents Installing and Operating R 2 Basics 4 Importing Data 5 Types of Data 6 Basic Operations 8 Selecting and Specifying Data 9 Matrices 11 Simple Statistics
More informationLecture 11: Distributions as Models October 2014
Lecture 11: Distributions as Models 36-350 1 October 2014 Previously R functions for regression models R functions for probability distributions Agenda Distributions from data Review of R for theoretical
More informationSTA Module 4 The Normal Distribution
STA 2023 Module 4 The Normal Distribution Learning Objectives Upon completing this module, you should be able to 1. Explain what it means for a variable to be normally distributed or approximately normally
More informationSTA /25/12. Module 4 The Normal Distribution. Learning Objectives. Let s Look at Some Examples of Normal Curves
STA 2023 Module 4 The Normal Distribution Learning Objectives Upon completing this module, you should be able to 1. Explain what it means for a variable to be normally distributed or approximately normally
More informationSolution to Tumor growth in mice
Solution to Tumor growth in mice Exercise 1 1. Import the data to R Data is in the file tumorvols.csv which can be read with the read.csv2 function. For a succesful import you need to tell R where exactly
More informationGetting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018
Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Contents Overview 2 Generating random numbers 2 rnorm() to generate random numbers from
More informationplots Chris Parrish August 20, 2015
plots Chris Parrish August 20, 2015 plots We construct some of the most commonly used types of plots for numerical data. dotplot A stripchart is most suitable for displaying small data sets. data
More informationMath 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency
Math 1 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency lowest value + highest value midrange The word average: is very ambiguous and can actually refer to the mean,
More informationGEN BUS 806 R COMMANDS
GEN BUS 806 R COMMANDS The following list of commands and information intends to assist you in getting familiar with the commands used in R common to the panel data analysis in GEN BUS 806 Useful Websites
More informationTessera: Open Source Tools for Big Data Analysis in R
Tessera: Open Source Tools for Big Data Analysis in R David Zeitler - Grand Valley State University Statistics August 12, 2015 Attribution This presentation is based work done for the June 30, 2015 user!
More informationStatistics Lecture 6. Looking at data one variable
Statistics 111 - Lecture 6 Looking at data one variable Chapter 1.1 Moore, McCabe and Craig Probability vs. Statistics Probability 1. We know the distribution of the random variable (Normal, Binomial)
More informationPage 1. Graphical and Numerical Statistics
TOPIC: Description Statistics In this tutorial, we show how to use MINITAB to produce descriptive statistics, both graphical and numerical, for an existing MINITAB dataset. The example data come from Exercise
More informationPre-Calculus Multiple Choice Questions - Chapter S2
1 Which of the following is NOT part of a univariate EDA? a Shape b Center c Dispersion d Distribution Pre-Calculus Multiple Choice Questions - Chapter S2 2 Which of the following is NOT an acceptable
More informationThe first few questions on this worksheet will deal with measures of central tendency. These data types tell us where the center of the data set lies.
Instructions: You are given the following data below these instructions. Your client (Courtney) wants you to statistically analyze the data to help her reach conclusions about how well she is teaching.
More informationMassive data, shared and distributed memory, and concurrent programming: bigmemory and foreach
Massive data, shared and distributed memory, and concurrent programming: bigmemory and foreach Michael J Kane John W. Emerson Department of Statistics Yale University ASA 2009 Data Expo: Airline on-time
More informationBusiness Statistics: R tutorials
Business Statistics: R tutorials Jingyu He September 29, 2017 Install R and RStudio R is a free software environment for statistical computing and graphics. Download free R and RStudio for Windows/Mac:
More informationAssignments. Math 338 Lab 1: Introduction to R. Atoms, Vectors and Matrices
Assignments Math 338 Lab 1: Introduction to R. Generally speaking, there are three basic forms of assigning data. Case one is the single atom or a single number. Assigning a number to an object in this
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3
Data Mining: Exploring Data Lecture Notes for Chapter 3 1 What is data exploration? A preliminary exploration of the data to better understand its characteristics. Key motivations of data exploration include
More informationA (very) brief introduction to R
A (very) brief introduction to R You typically start R at the command line prompt in a command line interface (CLI) mode. It is not a graphical user interface (GUI) although there are some efforts to produce
More informationThemes in the Texas CCRS - Mathematics
1. Compare real numbers. a. Classify numbers as natural, whole, integers, rational, irrational, real, imaginary, &/or complex. b. Use and apply the relative magnitude of real numbers by using inequality
More informationPractice in R. 1 Sivan s practice. 2 Hetroskadasticity. January 28, (pdf version)
Practice in R January 28, 2010 (pdf version) 1 Sivan s practice Her practice file should be (here), or check the web for a more useful pointer. 2 Hetroskadasticity ˆ Let s make some hetroskadastic data:
More informationSections 2.3 and 2.4
Sections 2.3 and 2.4 Shiwen Shen Department of Statistics University of South Carolina Elementary Statistics for the Biological and Life Sciences (STAT 205) 2 / 25 Descriptive statistics For continuous
More informationDescription/History Objects/Language Description Commonly Used Basic Functions. More Specific Functionality Further Resources
R Outline Description/History Objects/Language Description Commonly Used Basic Functions Basic Stats and distributions I/O Plotting Programming More Specific Functionality Further Resources www.r-project.org
More informationTable Of Contents. Table Of Contents
Statistics Table Of Contents Table Of Contents Basic Statistics... 7 Basic Statistics Overview... 7 Descriptive Statistics Available for Display or Storage... 8 Display Descriptive Statistics... 9 Store
More informationLecture 6: Chapter 6 Summary
1 Lecture 6: Chapter 6 Summary Z-score: Is the distance of each data value from the mean in standard deviation Standardizes data values Standardization changes the mean and the standard deviation: o Z
More informationLampiran 6 HASIL STATISTIK
Lampiran 6 HASIL STATISTIK Usia 11.37 of.450 Median 12.00 Mode 12 Std. Deviation 3.488 Minimum 2 Maximum 16 usia Frequency Valid Valid 2 2 3.3 3.3 3.3 4 2 3.3 3.3 6.7 6 2 3.3 3.3 10.0 7 4 6.7 6.7 16.7
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES 2: Data Pre-Processing Instructor: Yizhou Sun yzsun@ccs.neu.edu September 10, 2013 2: Data Pre-Processing Getting to know your data Basic Statistical Descriptions of Data
More informationCITS4009 Introduction to Data Science
School of Computer Science and Software Engineering CITS4009 Introduction to Data Science SEMESTER 2, 2017: CHAPTER 4 MANAGING DATA 1 Chapter Objectives Fixing data quality problems Organizing your data
More informationPackage bayesdp. July 10, 2018
Type Package Package bayesdp July 10, 2018 Title Tools for the Bayesian Discount Prior Function Version 1.3.2 Date 2018-07-10 Depends R (>= 3.2.3), ggplot2, survival, methods Functions for data augmentation
More informationData Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Chapter 3 Introduction to Data Mining by Tan, Steinbach, Kumar What is data exploration? A preliminary exploration of the data to better understand its characteristics.
More informationData Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining
Data Mining: Exploring Data Lecture Notes for Data Exploration Chapter Introduction to Data Mining by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 What is data exploration?
More information050 0 N 03 BECABCDDDBDBCDBDBCDADDBACACBCCBAACEDEDBACBECCDDCEA
050 0 N 03 BECABCDDDBDBCDBDBCDADDBACACBCCBAACEDEDBACBECCDDCEA 55555555555555555555555555555555555555555555555555 NYYNNYNNNYNYYYYYNNYNNNNNYNYYYYYNYNNNNYNNYNNNYNNNNN 01 CAEADDBEDEDBABBBBCBDDDBAAAECEEDCDCDBACCACEECACCCEA
More informationChapter 5: Joint Probability Distributions and Random
Chapter 5: Joint Probability Distributions and Random Samples Curtis Miller 2018-06-13 Introduction We may naturally inquire about collections of random variables that are related to each other in some
More informationIntegrated Math 1. Integrated Math, Part 1
Integrated Math 1 Course Description: This Integrated Math course will give students an understanding of the foundations of Algebra and Geometry. Students will build on an an understanding of variables,
More informationLecture Notes 3: Data summarization
Lecture Notes 3: Data summarization Highlights: Average Median Quartiles 5-number summary (and relation to boxplots) Outliers Range & IQR Variance and standard deviation Determining shape using mean &
More informationUnivariate Data - 2. Numeric Summaries
Univariate Data - 2. Numeric Summaries Young W. Lim 2018-08-01 Mon Young W. Lim Univariate Data - 2. Numeric Summaries 2018-08-01 Mon 1 / 36 Outline 1 Univariate Data Based on Numerical Summaries R Numeric
More informationCHAPTER-13. Mining Class Comparisons: Discrimination between DifferentClasses: 13.4 Class Description: Presentation of Both Characterization and
CHAPTER-13 Mining Class Comparisons: Discrimination between DifferentClasses: 13.1 Introduction 13.2 Class Comparison Methods and Implementation 13.3 Presentation of Class Comparison Descriptions 13.4
More informationK-means Clustering. customers.data <- read.csv(file = "wholesale_customers_data1.csv") str(customers.data)
K-means Clustering Dataset Wholesale Customer dataset contains data about clients of a wholesale distributor. It includes the annual spending in monetary units (m.u.) on diverse product categories.the
More informationR Programming for Computational Linguists and Similar Creatures
R Programming for Computational Linguists and Similar Creatures Marco Baroni 1 and Stefan Evert 2 1 Center for Mind/Brain Sciences University of Trento 2 Cognitive Science Institute University of Onsabrück
More information