Kernel Density Estimation (KDE)

Size: px
Start display at page:

Download "Kernel Density Estimation (KDE)"

Transcription

1 Kernel Density Estimation (KDE) Previously, we ve seen how to use the histogram method to infer the probability density function (PDF) of a random variable (population) using a finite data sample. In this tutorial, we ll carry on the problem of probability density function inference, but using another method: Kernel density estimation. Kernel density estimates (KDE) are closely related to histograms, but can be endowed with properties such as smoothness or continuity by using a suitable kernel. Why do we care? One of the main problems in practical applications is that the needed probability distribution is usually not readily available, but rather it must be derived from other existing information (e.g. sample data). KDEs are similar to histograms in terms of being non parametric method, so there are no restrictive assumptions about the shape of the density function, but KDE is far more superior to histograms as far as accuracy and continuity. Overview Let s consider a finite data sample {x 1, x 2,..., x} observed from a stochastic (i.e. continuous and random) process. We wish to infer the population probability density function. In the histogram method, we select the left bound of the histogram ( x o ), the bin s width ( h ), and then compute the bin k probability estimator f ˆ h( k ): Bin k represents the following interval [ x ( k 1) h, x k h) fˆ ( k ) h i1 I{( k1) h x x kh} i o I{} is an event function that returns 1 (one) if the condition is true, 0 (zero) otherwise. The choice of bins, especially the bin width ( h ), has a substantial effect on the shape and other properties of f ˆ h( k ). Finally, we can think of the histogram method as follows: o o Each observation (event) is statistically independent of all others, and its occurrence probability is equal to 1. fˆ ( k ) is simply the integral (sum) of the event probabilities in each bin. h Kernel Density Estimation (KDE) Tutorial 1 Spider Financial Corp, 2013

2 What is a kernel? A kernel is a non negative, real valued, integrable function K (.) satisfying the following two requirements: Kudu ( ) 1 Ku ( ) K( u) And, as a result, the scaled function K * ( u ), where K * ( u) K( u), is a kernel as well. ow, place a scaled kernel function at each observation in the sample and compute the new probability estimators fˆ ( x ) for a value x (compared to an earlier bin in the histogram). h ˆ 1 f ( x ) K ( xx ) h h i i1 1 u Kh(u) K( ) h h ˆ 1 x x ( ) ( i fh x K ) h h As an example, let K(.) be the standardized Gaussian density function. The KDE looks like the sum of Gaussian curves, each centered on one observation. i1 ote: For Gaussian kernel, the bandwidth h is the same as the standard deviation of ( x x ). i Kernel Density Estimation (KDE) Tutorial 2 Spider Financial Corp, 2013

3 1/ x{ x1, x2,.., x} The KDE method replaces the discrete probability P(x) with a kernel 0 x{ x1, x2,.., x} function. This permits overlap between kernels, thus promoting continuity in the probability estimator. Why KDE? Due to our data sampling, we are left with a finite set of values for continuous random variables. Using a kernel instead of discrete probabilities, we promote the continuity nature in the underlying random variable. To proceed with KDE, you ll need to decide on two key parameters: Kernel function and bandwidth. Which kernel should I use? A range of kernel functions are commonly used: uniform, triangular, biweight, triweight and Epanechnikov. The Gaussian kernel is often used; K(.) (.), where is the standard normal density function. How do I properly compute kernel bandwidth? Intuitively, one wants to choose an h as small as the data allows, but there is a trade off between the bias of the estimator and its variance. Selection of the bandwidth of a kernel estimator is a subject of considerable research. We will outline two popular methods: 1. Subjective selection One can experiment by using different bandwidths and simply selecting one that looks right for the type of data under investigation. 2. Selection with reference to some given distribution Here one selects the bandwidth that would be optimal for a particular PDF. Keep in mind that you are not assuming that f( x ) is normal, but rather selecting an h which would be optimal if the PDF were normal. Using a Gaussian kernel, the optimal bandwidth h opt is defined as follows: h opt The normal distribution is not a wiggly distribution; it is uni modal and bell shaped. It is therefore to be expected that h opt will be too large for multimodal distributions. Furthermore, 2 2 the sample variance ( s ) is not a robust estimator of ; it overestimates if some outliers (extreme observations) are present. To overcome these problems, Silverman proposed the following bandwidth estimator: Kernel Density Estimation (KDE) Tutorial 3 Spider Financial Corp, 2013

4 0.9 ˆ h opt 5 R ˆ min(s, ) 1.34 R IQR Q Q 3 1 Where IQR is the interquartile range and s is the sample standard deviation. 3. Data driven estimation this is an area of current research using several different methods: Fourier transform, diffusion based, etc. Process Using the umxl add in for Excel, you can compute the KDE values for different kernel functions (e.g. Gaussian, uniform, triangular, etc.) and (optionally) with a bandwidth value. For our sample data, we are using 50 randomly generated values of the normal distribution (using the random generator in the Excel Analysis Pack). We plotted the histogram for our reference: ow we are ready to construct our KDE plot. First, select the empty cell in your worksheet where you wish the output table to be generated, then locate and click on the Descriptive Statistics icon in the umxl tab (or toolbar). Then, select the Kernel density estimation item from the drop down menu. Kernel Density Estimation (KDE) Tutorial 4 Spider Financial Corp, 2013

5 The KDE wizard appears. Select the cells range for the values of the input variable. otes: 1. The cells range includes (optional) the heading ( Label ) cell, which would be used in the output tables where it references those variables. 2. By default, the output table cells range is set to the current selected cell in your worksheet. 3. By default, the output graph cells range is set to the 7 cells to the right of the currently selected cell in your worksheet. Finally, once we select the input data (X) cells range, the Options and Missing Values tabs become available (enabled). ext, select the Options tab: Kernel Density Estimation (KDE) Tutorial 5 Spider Financial Corp, 2013

6 otes: 1. By default, the Gaussian kernel function is selected. Let s leave this option unchanged. 2. By default, the optimal bandwidth option is checked. The KDE function will use the Silverman estimate for the bandwidth. Leave it checked. 3. By default, the output table size is set to 5. Leave it unchanged. 4. Overlay ormal distribution is checked. This option in effect instructs the wizard to generate a second curve for the Gaussian distribution for comparison purposes. Leave this option checked. ow, click on the Missing Values tab. In this tab, you can select an approach to handle missing values in the data set (X s). By default, any observation with missing value would be excluded from the analysis. This treatment is a good approach for our analysis, so let s leave it unchanged. ow, click OK to generate the output tables. otes: 1. The values of all X are sorted in ascending order. Kernel Density Estimation (KDE) Tutorial 6 Spider Financial Corp, 2013

7 2. The summary statistics in the 1 st row are computed merely to facilitate the creation of the table or computing the overlay Gaussian distribution function. The generated plot of the KDE is shown below: ote that the KDE curve (blue) tracks very closely with the Gaussian density (orange) curve. Case 2 ow let s try a non normal sample data set. We generated 50 random values of a uniform distribution between 3 and 3. Following similar steps, we plotted the histogram and the KDE: ote that the KDE curve (blue) tracks much more closely with the underlying distribution (i.e. uniform) than the histogram. Case 3 For our 3 rd case, we generated 50 random values of a binomial distribution (p=0.2 and batch size=20). Following similar steps, we plotted the histogram and the KDE. Kernel Density Estimation (KDE) Tutorial 7 Spider Financial Corp, 2013

8 ote that KDE curve (blue) tracks much more closely with the underlying distribution (i.e. uniform) than the histogram. Conclusion In this tutorial, we demonstrated the process to generate a kernel density estimation in Excel using umxl s add in functions. The KDE method is a major improvement for inferring the probability density function of the population, in terms of accuracy and continuity of the function. evertheless, it introduce a new challenge: selecting a proper bandwidth. In the majority of cases, the Silverman estimator for the bandwidth proves to be satisfactory, but is it optimal? Do we care? Where do we go from here? First, to answer the question of optimality, we need to introduce additional algorithms to estimate its values. For example, in Annals of Statistics, Volume 38, umber 5, pages , Z. I. Botev, J. F. Grotowski, and D. P. Kroese described a numerical sample data driven method for finding the optimal bandwidth using a Kernel density estimation via the diffusion approach. Second, in cases where the range of values that the random number can take are known to be constrained from one side (e.g. prices, binomial data, etc.), or in a range (e.g. survival rate, default rate, etc.), then how do we adapt the KDE to factor in those constraints? Finally, we defined the KDE probability estimator using a fixed bandwidth ( h ) for all observations. If the bandwidth is not held fixed, but is varied depending upon the location of either the estimate (balloon estimator) or the samples (point wise estimator), this produces a particularly powerful method known as adaptive or variable bandwidth kernel density estimation. Kernel Density Estimation (KDE) Tutorial 8 Spider Financial Corp, 2013

Section 4 Matching Estimator

Section 4 Matching Estimator Section 4 Matching Estimator Matching Estimators Key Idea: The matching method compares the outcomes of program participants with those of matched nonparticipants, where matches are chosen on the basis

More information

University of North Dakota PeopleSoft Finance Tip Sheets. Utilizing the Query Download Feature

University of North Dakota PeopleSoft Finance Tip Sheets. Utilizing the Query Download Feature There is a custom feature available in Query Viewer that allows files to be created from queries and copied to a user s PC. This feature doesn t have the same size limitations as running a query to HTML

More information

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data. Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting

More information

Unit 7 Statistics. AFM Mrs. Valentine. 7.1 Samples and Surveys

Unit 7 Statistics. AFM Mrs. Valentine. 7.1 Samples and Surveys Unit 7 Statistics AFM Mrs. Valentine 7.1 Samples and Surveys v Obj.: I will understand the different methods of sampling and studying data. I will be able to determine the type used in an example, and

More information

Chapter 6: DESCRIPTIVE STATISTICS

Chapter 6: DESCRIPTIVE STATISTICS Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling

More information

On Kernel Density Estimation with Univariate Application. SILOKO, Israel Uzuazor

On Kernel Density Estimation with Univariate Application. SILOKO, Israel Uzuazor On Kernel Density Estimation with Univariate Application BY SILOKO, Israel Uzuazor Department of Mathematics/ICT, Edo University Iyamho, Edo State, Nigeria. A Seminar Presented at Faculty of Science, Edo

More information

Master Reports Guide. Table of Contents

Master Reports Guide. Table of Contents Table of Contents Welcome to Master Reports... 2 Report Basics... 2 Access Reports... 2 Download or Print a Report... 2 Report Help... 2 Save or Share a Report... 2 Master Report Formulas... 3 Filter a

More information

Math 227 EXCEL / MEGASTAT Guide

Math 227 EXCEL / MEGASTAT Guide Math 227 EXCEL / MEGASTAT Guide Introduction Introduction: Ch2: Frequency Distributions and Graphs Construct Frequency Distributions and various types of graphs: Histograms, Polygons, Pie Charts, Stem-and-Leaf

More information

Nonparametric Density Estimation

Nonparametric Density Estimation Nonparametric Estimation Data: X 1,..., X n iid P where P is a distribution with density f(x). Aim: Estimation of density f(x) Parametric density estimation: Fit parametric model {f(x θ) θ Θ} to data parameter

More information

Topic 5 - Joint distributions and the CLT

Topic 5 - Joint distributions and the CLT Topic 5 - Joint distributions and the CLT Joint distributions Calculation of probabilities, mean and variance Expectations of functions based on joint distributions Central Limit Theorem Sampling distributions

More information

COMPUTING AND DATA ANALYSIS WITH EXCEL. Numerical integration techniques

COMPUTING AND DATA ANALYSIS WITH EXCEL. Numerical integration techniques COMPUTING AND DATA ANALYSIS WITH EXCEL Numerical integration techniques Outline 1 Quadrature in one dimension Mid-point method Trapezium method Simpson s methods Uniform random number generation in Excel,

More information

Nonparametric Estimation of Distribution Function using Bezier Curve

Nonparametric Estimation of Distribution Function using Bezier Curve Communications for Statistical Applications and Methods 2014, Vol. 21, No. 1, 105 114 DOI: http://dx.doi.org/10.5351/csam.2014.21.1.105 ISSN 2287-7843 Nonparametric Estimation of Distribution Function

More information

Non-Parametric Modeling

Non-Parametric Modeling Non-Parametric Modeling CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Non-Parametric Density Estimation Parzen Windows Kn-Nearest Neighbor

More information

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 1st, 2018

Data Mining. CS57300 Purdue University. Bruno Ribeiro. February 1st, 2018 Data Mining CS57300 Purdue University Bruno Ribeiro February 1st, 2018 1 Exploratory Data Analysis & Feature Construction How to explore a dataset Understanding the variables (values, ranges, and empirical

More information

Economics Nonparametric Econometrics

Economics Nonparametric Econometrics Economics 217 - Nonparametric Econometrics Topics covered in this lecture Introduction to the nonparametric model The role of bandwidth Choice of smoothing function R commands for nonparametric models

More information

height VUD x = x 1 + x x N N 2 + (x 2 x) 2 + (x N x) 2. N

height VUD x = x 1 + x x N N 2 + (x 2 x) 2 + (x N x) 2. N Math 3: CSM Tutorial: Probability, Statistics, and Navels Fall 2 In this worksheet, we look at navel ratios, means, standard deviations, relative frequency density histograms, and probability density functions.

More information

Data Management Project Using Software to Carry Out Data Analysis Tasks

Data Management Project Using Software to Carry Out Data Analysis Tasks Data Management Project Using Software to Carry Out Data Analysis Tasks This activity involves two parts: Part A deals with finding values for: Mean, Median, Mode, Range, Standard Deviation, Max and Min

More information

Frequency Distributions

Frequency Distributions Displaying Data Frequency Distributions After collecting data, the first task for a researcher is to organize and summarize the data so that it is possible to get a general overview of the results. Remember,

More information

Introduction to CS graphs and plots in Excel Jacek Wiślicki, Laurent Babout,

Introduction to CS graphs and plots in Excel Jacek Wiślicki, Laurent Babout, MS Excel 2010 offers a large set of graphs and plots for data visualization. For those who are familiar with older version of Excel, the layout is completely different. The following exercises demonstrate

More information

Clustering. Discover groups such that samples within a group are more similar to each other than samples across groups.

Clustering. Discover groups such that samples within a group are more similar to each other than samples across groups. Clustering 1 Clustering Discover groups such that samples within a group are more similar to each other than samples across groups. 2 Clustering Discover groups such that samples within a group are more

More information

Statistics with a Hemacytometer

Statistics with a Hemacytometer Statistics with a Hemacytometer Overview This exercise incorporates several different statistical analyses. Data gathered from cell counts with a hemacytometer is used to explore frequency distributions

More information

BIOL Gradation of a histogram (a) into the normal curve (b)

BIOL Gradation of a histogram (a) into the normal curve (b) (التوزيع الطبيعي ( Distribution Normal (Gaussian) One of the most important distributions in statistics is a continuous distribution called the normal distribution or Gaussian distribution. Consider the

More information

Chapter 3 - Displaying and Summarizing Quantitative Data

Chapter 3 - Displaying and Summarizing Quantitative Data Chapter 3 - Displaying and Summarizing Quantitative Data 3.1 Graphs for Quantitative Data (LABEL GRAPHS) August 25, 2014 Histogram (p. 44) - Graph that uses bars to represent different frequencies or relative

More information

Access: Printing Data with Reports

Access: Printing Data with Reports Access: Printing Data with Reports Reports are a means for displaying and summarizing data from tables or queries. While forms are primarily for on-screen viewing, reports are for presenting your data

More information

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data. 1 CHAPTER 1 Introduction Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data. Variable: Any characteristic of a person or thing that can be expressed

More information

Tracking Computer Vision Spring 2018, Lecture 24

Tracking Computer Vision Spring 2018, Lecture 24 Tracking http://www.cs.cmu.edu/~16385/ 16-385 Computer Vision Spring 2018, Lecture 24 Course announcements Homework 6 has been posted and is due on April 20 th. - Any questions about the homework? - How

More information

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order. Chapter 2 2.1 Descriptive Statistics A stem-and-leaf graph, also called a stemplot, allows for a nice overview of quantitative data without losing information on individual observations. It can be a good

More information

Decision Support Risk handout. Simulating Spreadsheet models

Decision Support Risk handout. Simulating Spreadsheet models Decision Support Models @ Risk handout Simulating Spreadsheet models using @RISK 1. Step 1 1.1. Open Excel and @RISK enabling any macros if prompted 1.2. There are four on line help options available.

More information

Lab 7 Statistics I LAB 7 QUICK VIEW

Lab 7 Statistics I LAB 7 QUICK VIEW Lab 7 Statistics I This lab will cover how to do statistical calculations in excel using formulas. (Note that your version of excel may have additional formulas to calculate statistics, but these formulas

More information

ICT & MATHS. Excel 2003 in Mathematics Teaching

ICT & MATHS. Excel 2003 in Mathematics Teaching ICT & MATHS Excel 2003 in Mathematics Teaching Published by The National Centre for Technology in Education in association with the Project Maths Development Team. Permission granted to reproduce for educational

More information

Chapter 2: The Normal Distribution

Chapter 2: The Normal Distribution Chapter 2: The Normal Distribution 2.1 Density Curves and the Normal Distributions 2.2 Standard Normal Calculations 1 2 Histogram for Strength of Yarn Bobbins 15.60 16.10 16.60 17.10 17.60 18.10 18.60

More information

Geostatistics 2D GMS 7.0 TUTORIALS. 1 Introduction. 1.1 Contents

Geostatistics 2D GMS 7.0 TUTORIALS. 1 Introduction. 1.1 Contents GMS 7.0 TUTORIALS 1 Introduction Two-dimensional geostatistics (interpolation) can be performed in GMS using the 2D Scatter Point module. The module is used to interpolate from sets of 2D scatter points

More information

What s New in Oracle Crystal Ball? What s New in Version Browse to:

What s New in Oracle Crystal Ball? What s New in Version Browse to: What s New in Oracle Crystal Ball? Browse to: - What s new in version 11.1.1.0.00 - What s new in version 7.3 - What s new in version 7.2 - What s new in version 7.1 - What s new in version 7.0 - What

More information

Use of Extreme Value Statistics in Modeling Biometric Systems

Use of Extreme Value Statistics in Modeling Biometric Systems Use of Extreme Value Statistics in Modeling Biometric Systems Similarity Scores Two types of matching: Genuine sample Imposter sample Matching scores Enrolled sample 0.95 0.32 Probability Density Decision

More information

Discrete Fourier Transform

Discrete Fourier Transform Discrete Fourier Transfor This is the first tutorial in our ongoing series on tie series spectral analysis. In this entry, we will closely exaine the discrete Fourier transfor (aa DFT) and its inverse,

More information

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data

Chapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data Chapter 2 Descriptive Statistics: Organizing, Displaying and Summarizing Data Objectives Student should be able to Organize data Tabulate data into frequency/relative frequency tables Display data graphically

More information

CHAPTER 6. The Normal Probability Distribution

CHAPTER 6. The Normal Probability Distribution The Normal Probability Distribution CHAPTER 6 The normal probability distribution is the most widely used distribution in statistics as many statistical procedures are built around it. The central limit

More information

SPSS. (Statistical Packages for the Social Sciences)

SPSS. (Statistical Packages for the Social Sciences) Inger Persson SPSS (Statistical Packages for the Social Sciences) SHORT INSTRUCTIONS This presentation contains only relatively short instructions on how to perform basic statistical calculations in SPSS.

More information

COMPUTATIONAL STATISTICS UNSUPERVISED LEARNING

COMPUTATIONAL STATISTICS UNSUPERVISED LEARNING COMPUTATIONAL STATISTICS UNSUPERVISED LEARNING Luca Bortolussi Department of Mathematics and Geosciences University of Trieste Office 238, third floor, H2bis luca@dmi.units.it Trieste, Winter Semester

More information

BIOSTATISTICS LABORATORY PART 1: INTRODUCTION TO DATA ANALYIS WITH STATA: EXPLORING AND SUMMARIZING DATA

BIOSTATISTICS LABORATORY PART 1: INTRODUCTION TO DATA ANALYIS WITH STATA: EXPLORING AND SUMMARIZING DATA BIOSTATISTICS LABORATORY PART 1: INTRODUCTION TO DATA ANALYIS WITH STATA: EXPLORING AND SUMMARIZING DATA Learning objectives: Getting data ready for analysis: 1) Learn several methods of exploring the

More information

The first few questions on this worksheet will deal with measures of central tendency. These data types tell us where the center of the data set lies.

The first few questions on this worksheet will deal with measures of central tendency. These data types tell us where the center of the data set lies. Instructions: You are given the following data below these instructions. Your client (Courtney) wants you to statistically analyze the data to help her reach conclusions about how well she is teaching.

More information

Chapter 2 Describing, Exploring, and Comparing Data

Chapter 2 Describing, Exploring, and Comparing Data Slide 1 Chapter 2 Describing, Exploring, and Comparing Data Slide 2 2-1 Overview 2-2 Frequency Distributions 2-3 Visualizing Data 2-4 Measures of Center 2-5 Measures of Variation 2-6 Measures of Relative

More information

University of Wisconsin-Madison Spring 2018 BMI/CS 776: Advanced Bioinformatics Homework #2

University of Wisconsin-Madison Spring 2018 BMI/CS 776: Advanced Bioinformatics Homework #2 Assignment goals Use mutual information to reconstruct gene expression networks Evaluate classifier predictions Examine Gibbs sampling for a Markov random field Control for multiple hypothesis testing

More information

Assignment 3 due Thursday Oct. 11

Assignment 3 due Thursday Oct. 11 Instructor Linda C. Stephenson due Thursday Oct. 11 GENERAL NOTE: These assignments often build on each other what you learn in one assignment may be carried over to subsequent assignments. If I have already

More information

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures Part I, Chapters 4 & 5 Data Tables and Data Analysis Statistics and Figures Descriptive Statistics 1 Are data points clumped? (order variable / exp. variable) Concentrated around one value? Concentrated

More information

Nonparametric regression using kernel and spline methods

Nonparametric regression using kernel and spline methods Nonparametric regression using kernel and spline methods Jean D. Opsomer F. Jay Breidt March 3, 016 1 The statistical model When applying nonparametric regression methods, the researcher is interested

More information

Kernel Density Estimation

Kernel Density Estimation Kernel Density Estimation An Introduction Justus H. Piater, Université de Liège Overview 1. Densities and their Estimation 2. Basic Estimators for Univariate KDE 3. Remarks 4. Methods for Particular Domains

More information

Visualizing Data: Freq. Tables, Histograms

Visualizing Data: Freq. Tables, Histograms Visualizing Data: Freq. Tables, Histograms Engineering Statistics Section 1.2 Josh Engwer TTU 25 January 2016 Josh Engwer (TTU) Visualizing Data: Freq. Tables, Histograms 25 January 2016 1 / 23 Descriptive

More information

CHAPTER 3: Data Description

CHAPTER 3: Data Description CHAPTER 3: Data Description You ve tabulated and made pretty pictures. Now what numbers do you use to summarize your data? Ch3: Data Description Santorico Page 68 You ll find a link on our website to a

More information

Table of Contents (As covered from textbook)

Table of Contents (As covered from textbook) Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression

More information

Chapter2 Description of samples and populations. 2.1 Introduction.

Chapter2 Description of samples and populations. 2.1 Introduction. Chapter2 Description of samples and populations. 2.1 Introduction. Statistics=science of analyzing data. Information collected (data) is gathered in terms of variables (characteristics of a subject that

More information

The MKLE Package. July 24, Kdensity... 1 Kernels... 3 MKLE-package... 3 checkparms... 4 Klik... 5 MKLE... 6 mklewarp... 7 state... 8.

The MKLE Package. July 24, Kdensity... 1 Kernels... 3 MKLE-package... 3 checkparms... 4 Klik... 5 MKLE... 6 mklewarp... 7 state... 8. The MKLE Package July 24, 2006 Type Package Title Maximum kernel likelihood estimation Version 0.02 Date 2006-07-12 Author Maintainer Package to compute the maximum kernel likelihood

More information

3 Graphical Displays of Data

3 Graphical Displays of Data 3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked

More information

SPSS Basics for Probability Distributions

SPSS Basics for Probability Distributions Built-in Statistical Functions in SPSS Begin by defining some variables in the Variable View of a data file, save this file as Probability_Distributions.sav and save the corresponding output file as Probability_Distributions.spo.

More information

SAS Visual Analytics 8.2: Working with Report Content

SAS Visual Analytics 8.2: Working with Report Content SAS Visual Analytics 8.2: Working with Report Content About Objects After selecting your data source and data items, add one or more objects to display the results. SAS Visual Analytics provides objects

More information

QuickLoad-Central. User Guide

QuickLoad-Central. User Guide QuickLoad-Central User Guide Contents Introduction... 4 Navigating QuickLoad Central... 4 Viewing QuickLoad-Central Information... 6 Registering a License... 6 Managing File Stores... 6 Adding File Stores...

More information

Tutorial: Using Tina Vision s Quantitative Pattern Recognition Tool.

Tutorial: Using Tina Vision s Quantitative Pattern Recognition Tool. Tina Memo No. 2014-004 Internal Report Tutorial: Using Tina Vision s Quantitative Pattern Recognition Tool. P.D.Tar. Last updated 07 / 06 / 2014 ISBE, Medical School, University of Manchester, Stopford

More information

Quantitative - One Population

Quantitative - One Population Quantitative - One Population The Quantitative One Population VISA procedures allow the user to perform descriptive and inferential procedures for problems involving one population with quantitative (interval)

More information

Multivariate Calibration Quick Guide

Multivariate Calibration Quick Guide Last Updated: 06.06.2007 Table Of Contents 1. HOW TO CREATE CALIBRATION MODELS...1 1.1. Introduction into Multivariate Calibration Modelling... 1 1.1.1. Preparing Data... 1 1.2. Step 1: Calibration Wizard

More information

Area and Perimeter EXPERIMENT. How are the area and perimeter of a rectangle related? You probably know the formulas by heart:

Area and Perimeter EXPERIMENT. How are the area and perimeter of a rectangle related? You probably know the formulas by heart: Area and Perimeter How are the area and perimeter of a rectangle related? You probably know the formulas by heart: Area Length Width Perimeter (Length Width) But if you look at data for many different

More information

Data 100 Lecture 5: Data Cleaning & Exploratory Data Analysis

Data 100 Lecture 5: Data Cleaning & Exploratory Data Analysis OrderNum ProdID Name OrderId Cust Name Date 1 42 Gum 1 Joe 8/21/2017 2 999 NullFood 2 Arthur 8/14/2017 2 42 Towel 2 Arthur 8/14/2017 1/31/18 Data 100 Lecture 5: Data Cleaning & Exploratory Data Analysis

More information

CALCULATION OF OPERATIONAL LOSSES WITH NON- PARAMETRIC APPROACH: DERAILMENT LOSSES

CALCULATION OF OPERATIONAL LOSSES WITH NON- PARAMETRIC APPROACH: DERAILMENT LOSSES 2. Uluslar arası Raylı Sistemler Mühendisliği Sempozyumu (ISERSE 13), 9-11 Ekim 2013, Karabük, Türkiye CALCULATION OF OPERATIONAL LOSSES WITH NON- PARAMETRIC APPROACH: DERAILMENT LOSSES Zübeyde Öztürk

More information

Survey Design, Distribution & Analysis Software. professional quest. Whitepaper Extracting Data into Microsoft Excel

Survey Design, Distribution & Analysis Software. professional quest. Whitepaper Extracting Data into Microsoft Excel Survey Design, Distribution & Analysis Software professional quest Whitepaper Extracting Data into Microsoft Excel WHITEPAPER Extracting Scoring Data into Microsoft Excel INTRODUCTION... 1 KEY FEATURES

More information

Data 100. Lecture 5: Data Cleaning & Exploratory Data Analysis

Data 100. Lecture 5: Data Cleaning & Exploratory Data Analysis Data 100 Lecture 5: Data Cleaning & Exploratory Data Analysis Slides by: Joseph E. Gonzalez, Deb Nolan, & Joe Hellerstein jegonzal@berkeley.edu deborah_nolan@berkeley.edu hellerstein@berkeley.edu? Last

More information

Quick Start Guide. Version R94. English

Quick Start Guide. Version R94. English Custom Reports Quick Start Guide Version R94 English December 12, 2016 Copyright Agreement The purchase and use of all Software and Services is subject to the Agreement as defined in Kaseya s Click-Accept

More information

Three-Dimensional (Surface) Plots

Three-Dimensional (Surface) Plots Three-Dimensional (Surface) Plots Creating a Data Array 3-Dimensional plots (surface plots) are often useful for visualizing the behavior of functions and identifying important mathematical/physical features

More information

Light Speed with Excel

Light Speed with Excel Work @ Light Speed with Excel 2018 Excel University, Inc. All Rights Reserved. http://beacon.by/magazine/v4/94012/pdf?type=print 1/64 Table of Contents Cover Table of Contents PivotTable from Many CSV

More information

NCSS Statistical Software

NCSS Statistical Software Chapter 245 Introduction This procedure generates R control charts for variables. The format of the control charts is fully customizable. The data for the subgroups can be in a single column or in multiple

More information

Parametric & Hone User Guide

Parametric & Hone User Guide Parametric & Hone User Guide IES Virtual Environment Copyright 2017 Integrated Environmental Solutions Limited. All rights reserved. No part of the manual is to be copied or reproduced in any Contents

More information

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures

STA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures STA 2023 Module 3 Descriptive Measures Learning Objectives Upon completing this module, you should be able to: 1. Explain the purpose of a measure of center. 2. Obtain and interpret the mean, median, and

More information

Microsoft Excel Using Excel in the Science Classroom

Microsoft Excel Using Excel in the Science Classroom Microsoft Excel Using Excel in the Science Classroom OBJECTIVE Students will take data and use an Excel spreadsheet to manipulate the information. This will include creating graphs, manipulating data,

More information

MS Excel Advanced Level

MS Excel Advanced Level MS Excel Advanced Level Trainer : Etech Global Solution Contents Conditional Formatting... 1 Remove Duplicates... 4 Sorting... 5 Filtering... 6 Charts Column... 7 Charts Line... 10 Charts Bar... 10 Charts

More information

Name Date Types of Graphs and Creating Graphs Notes

Name Date Types of Graphs and Creating Graphs Notes Name Date Types of Graphs and Creating Graphs Notes Graphs are helpful visual representations of data. Different graphs display data in different ways. Some graphs show individual data, but many do not.

More information

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below.

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below. Graphing in Excel featuring Excel 2007 1 A spreadsheet can be a powerful tool for analyzing and graphing data, but it works completely differently from the graphing calculator that you re used to. If you

More information

Section 2 Comparing distributions - Worksheet

Section 2 Comparing distributions - Worksheet The data are from the paper: Exploring Relationships in Body Dimensions Grete Heinz and Louis J. Peterson San José State University Roger W. Johnson and Carter J. Kerk South Dakota School of Mines and

More information

Numerical Descriptive Measures

Numerical Descriptive Measures Chapter 3 Numerical Descriptive Measures 1 Numerical Descriptive Measures Chapter 3 Measures of Central Tendency and Measures of Dispersion A sample of 40 students at a university was randomly selected,

More information

Lecture 6: Chapter 6 Summary

Lecture 6: Chapter 6 Summary 1 Lecture 6: Chapter 6 Summary Z-score: Is the distance of each data value from the mean in standard deviation Standardizes data values Standardization changes the mean and the standard deviation: o Z

More information

Chapter 2 Modeling Distributions of Data

Chapter 2 Modeling Distributions of Data Chapter 2 Modeling Distributions of Data Section 2.1 Describing Location in a Distribution Describing Location in a Distribution Learning Objectives After this section, you should be able to: FIND and

More information

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Minitab 17 commands Prepared by Jeffrey S. Simonoff Minitab 17 commands Prepared by Jeffrey S. Simonoff Data entry and manipulation To enter data by hand, click on the Worksheet window, and enter the values in as you would in any spreadsheet. To then save

More information

Model Based Symbolic Description for Big Data Analysis

Model Based Symbolic Description for Big Data Analysis Model Based Symbolic Description for Big Data Analysis 1 Model Based Symbolic Description for Big Data Analysis *Carlo Drago, **Carlo Lauro and **Germana Scepi *University of Rome Niccolo Cusano, **University

More information

Page 1. Graphical and Numerical Statistics

Page 1. Graphical and Numerical Statistics TOPIC: Description Statistics In this tutorial, we show how to use MINITAB to produce descriptive statistics, both graphical and numerical, for an existing MINITAB dataset. The example data come from Exercise

More information

Charts in Excel 2003

Charts in Excel 2003 Charts in Excel 2003 Contents Introduction Charts in Excel 2003...1 Part 1: Generating a Basic Chart...1 Part 2: Adding Another Data Series...3 Part 3: Other Handy Options...5 Introduction Charts in Excel

More information

Chapter 1. Looking at Data-Distribution

Chapter 1. Looking at Data-Distribution Chapter 1. Looking at Data-Distribution Statistics is the scientific discipline that provides methods to draw right conclusions: 1)Collecting the data 2)Describing the data 3)Drawing the conclusions Raw

More information

Candy is Dandy Project (Project #12)

Candy is Dandy Project (Project #12) Candy is Dandy Project (Project #12) You have been hired to conduct some market research about M&M's. First, you had your team purchase 4 large bags and the results are given for the contents of those

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2015 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows K-Nearest

More information

Averages and Variation

Averages and Variation Averages and Variation 3 Copyright Cengage Learning. All rights reserved. 3.1-1 Section 3.1 Measures of Central Tendency: Mode, Median, and Mean Copyright Cengage Learning. All rights reserved. 3.1-2 Focus

More information

Advanced Excel Skills

Advanced Excel Skills Advanced Excel Skills Note : This tutorial is based upon MSExcel 2000. If you are using MSExcel 2002, there may be some operations which look slightly different (e.g. pivot tables), but the same principles

More information

Week 4: Describing data and estimation

Week 4: Describing data and estimation Week 4: Describing data and estimation Goals Investigate sampling error; see that larger samples have less sampling error. Visualize confidence intervals. Calculate basic summary statistics using R. Calculate

More information

Bandwidth Selection for Kernel Density Estimation Using Total Variation with Fourier Domain Constraints

Bandwidth Selection for Kernel Density Estimation Using Total Variation with Fourier Domain Constraints IEEE SIGNAL PROCESSING LETTERS 1 Bandwidth Selection for Kernel Density Estimation Using Total Variation with Fourier Domain Constraints Alexander Suhre, Orhan Arikan, Member, IEEE, and A. Enis Cetin,

More information

Quality and Six Sigma Tools using MINITAB Statistical Software: A complete Guide to Six Sigma DMAIC Tools using MINITAB

Quality and Six Sigma Tools using MINITAB Statistical Software: A complete Guide to Six Sigma DMAIC Tools using MINITAB Samples from MINITAB Book Quality and Six Sigma Tools using MINITAB Statistical Software A complete Guide to Six Sigma DMAIC Tools using MINITAB Prof. Amar Sahay, Ph.D. One of the major objectives of this

More information

STA Module 4 The Normal Distribution

STA Module 4 The Normal Distribution STA 2023 Module 4 The Normal Distribution Learning Objectives Upon completing this module, you should be able to 1. Explain what it means for a variable to be normally distributed or approximately normally

More information

STA /25/12. Module 4 The Normal Distribution. Learning Objectives. Let s Look at Some Examples of Normal Curves

STA /25/12. Module 4 The Normal Distribution. Learning Objectives. Let s Look at Some Examples of Normal Curves STA 2023 Module 4 The Normal Distribution Learning Objectives Upon completing this module, you should be able to 1. Explain what it means for a variable to be normally distributed or approximately normally

More information

Access 2013 Introduction to Forms and Reports

Access 2013 Introduction to Forms and Reports Forms Overview You can create forms to present data in a more attractive and easier to use format They can be used for viewing, editing and printing data and in advanced cases, used to automate the database

More information

Today Function. Note: If you want to retrieve the date and time that the computer is set to, use the =NOW() function.

Today Function. Note: If you want to retrieve the date and time that the computer is set to, use the =NOW() function. Today Function The today function: =TODAY() It has no arguments, and returns the date that the computer is set to. It is volatile, so if you save it and reopen the file one month later the new, updated

More information

Frequency tables Create a new Frequency Table

Frequency tables Create a new Frequency Table Frequency tables Create a new Frequency Table Contents FREQUENCY TABLES CREATE A NEW FREQUENCY TABLE... 1 Results Table... 2 Calculate Descriptive Statistics for Frequency Tables... 6 Transfer Results

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Machine Learning Algorithms (IFT6266 A7) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

More information

Pre-Lab Excel Problem

Pre-Lab Excel Problem Pre-Lab Excel Problem Read and follow the instructions carefully! Below you are given a problem which you are to solve using Excel. If you have not used the Excel spreadsheet a limited tutorial is given

More information

SAS (Statistical Analysis Software/System)

SAS (Statistical Analysis Software/System) SAS (Statistical Analysis Software/System) SAS Adv. Analytics or Predictive Modelling:- Class Room: Training Fee & Duration : 30K & 3 Months Online Training Fee & Duration : 33K & 3 Months Learning SAS:

More information

STA Module 2B Organizing Data and Comparing Distributions (Part II)

STA Module 2B Organizing Data and Comparing Distributions (Part II) STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and

More information

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)

STA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II) STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and

More information