Kernel Density Estimation (KDE)
|
|
- Gavin Simmons
- 5 years ago
- Views:
Transcription
1 Kernel Density Estimation (KDE) Previously, we ve seen how to use the histogram method to infer the probability density function (PDF) of a random variable (population) using a finite data sample. In this tutorial, we ll carry on the problem of probability density function inference, but using another method: Kernel density estimation. Kernel density estimates (KDE) are closely related to histograms, but can be endowed with properties such as smoothness or continuity by using a suitable kernel. Why do we care? One of the main problems in practical applications is that the needed probability distribution is usually not readily available, but rather it must be derived from other existing information (e.g. sample data). KDEs are similar to histograms in terms of being non parametric method, so there are no restrictive assumptions about the shape of the density function, but KDE is far more superior to histograms as far as accuracy and continuity. Overview Let s consider a finite data sample {x 1, x 2,..., x} observed from a stochastic (i.e. continuous and random) process. We wish to infer the population probability density function. In the histogram method, we select the left bound of the histogram ( x o ), the bin s width ( h ), and then compute the bin k probability estimator f ˆ h( k ): Bin k represents the following interval [ x ( k 1) h, x k h) fˆ ( k ) h i1 I{( k1) h x x kh} i o I{} is an event function that returns 1 (one) if the condition is true, 0 (zero) otherwise. The choice of bins, especially the bin width ( h ), has a substantial effect on the shape and other properties of f ˆ h( k ). Finally, we can think of the histogram method as follows: o o Each observation (event) is statistically independent of all others, and its occurrence probability is equal to 1. fˆ ( k ) is simply the integral (sum) of the event probabilities in each bin. h Kernel Density Estimation (KDE) Tutorial 1 Spider Financial Corp, 2013
2 What is a kernel? A kernel is a non negative, real valued, integrable function K (.) satisfying the following two requirements: Kudu ( ) 1 Ku ( ) K( u) And, as a result, the scaled function K * ( u ), where K * ( u) K( u), is a kernel as well. ow, place a scaled kernel function at each observation in the sample and compute the new probability estimators fˆ ( x ) for a value x (compared to an earlier bin in the histogram). h ˆ 1 f ( x ) K ( xx ) h h i i1 1 u Kh(u) K( ) h h ˆ 1 x x ( ) ( i fh x K ) h h As an example, let K(.) be the standardized Gaussian density function. The KDE looks like the sum of Gaussian curves, each centered on one observation. i1 ote: For Gaussian kernel, the bandwidth h is the same as the standard deviation of ( x x ). i Kernel Density Estimation (KDE) Tutorial 2 Spider Financial Corp, 2013
3 1/ x{ x1, x2,.., x} The KDE method replaces the discrete probability P(x) with a kernel 0 x{ x1, x2,.., x} function. This permits overlap between kernels, thus promoting continuity in the probability estimator. Why KDE? Due to our data sampling, we are left with a finite set of values for continuous random variables. Using a kernel instead of discrete probabilities, we promote the continuity nature in the underlying random variable. To proceed with KDE, you ll need to decide on two key parameters: Kernel function and bandwidth. Which kernel should I use? A range of kernel functions are commonly used: uniform, triangular, biweight, triweight and Epanechnikov. The Gaussian kernel is often used; K(.) (.), where is the standard normal density function. How do I properly compute kernel bandwidth? Intuitively, one wants to choose an h as small as the data allows, but there is a trade off between the bias of the estimator and its variance. Selection of the bandwidth of a kernel estimator is a subject of considerable research. We will outline two popular methods: 1. Subjective selection One can experiment by using different bandwidths and simply selecting one that looks right for the type of data under investigation. 2. Selection with reference to some given distribution Here one selects the bandwidth that would be optimal for a particular PDF. Keep in mind that you are not assuming that f( x ) is normal, but rather selecting an h which would be optimal if the PDF were normal. Using a Gaussian kernel, the optimal bandwidth h opt is defined as follows: h opt The normal distribution is not a wiggly distribution; it is uni modal and bell shaped. It is therefore to be expected that h opt will be too large for multimodal distributions. Furthermore, 2 2 the sample variance ( s ) is not a robust estimator of ; it overestimates if some outliers (extreme observations) are present. To overcome these problems, Silverman proposed the following bandwidth estimator: Kernel Density Estimation (KDE) Tutorial 3 Spider Financial Corp, 2013
4 0.9 ˆ h opt 5 R ˆ min(s, ) 1.34 R IQR Q Q 3 1 Where IQR is the interquartile range and s is the sample standard deviation. 3. Data driven estimation this is an area of current research using several different methods: Fourier transform, diffusion based, etc. Process Using the umxl add in for Excel, you can compute the KDE values for different kernel functions (e.g. Gaussian, uniform, triangular, etc.) and (optionally) with a bandwidth value. For our sample data, we are using 50 randomly generated values of the normal distribution (using the random generator in the Excel Analysis Pack). We plotted the histogram for our reference: ow we are ready to construct our KDE plot. First, select the empty cell in your worksheet where you wish the output table to be generated, then locate and click on the Descriptive Statistics icon in the umxl tab (or toolbar). Then, select the Kernel density estimation item from the drop down menu. Kernel Density Estimation (KDE) Tutorial 4 Spider Financial Corp, 2013
5 The KDE wizard appears. Select the cells range for the values of the input variable. otes: 1. The cells range includes (optional) the heading ( Label ) cell, which would be used in the output tables where it references those variables. 2. By default, the output table cells range is set to the current selected cell in your worksheet. 3. By default, the output graph cells range is set to the 7 cells to the right of the currently selected cell in your worksheet. Finally, once we select the input data (X) cells range, the Options and Missing Values tabs become available (enabled). ext, select the Options tab: Kernel Density Estimation (KDE) Tutorial 5 Spider Financial Corp, 2013
6 otes: 1. By default, the Gaussian kernel function is selected. Let s leave this option unchanged. 2. By default, the optimal bandwidth option is checked. The KDE function will use the Silverman estimate for the bandwidth. Leave it checked. 3. By default, the output table size is set to 5. Leave it unchanged. 4. Overlay ormal distribution is checked. This option in effect instructs the wizard to generate a second curve for the Gaussian distribution for comparison purposes. Leave this option checked. ow, click on the Missing Values tab. In this tab, you can select an approach to handle missing values in the data set (X s). By default, any observation with missing value would be excluded from the analysis. This treatment is a good approach for our analysis, so let s leave it unchanged. ow, click OK to generate the output tables. otes: 1. The values of all X are sorted in ascending order. Kernel Density Estimation (KDE) Tutorial 6 Spider Financial Corp, 2013
7 2. The summary statistics in the 1 st row are computed merely to facilitate the creation of the table or computing the overlay Gaussian distribution function. The generated plot of the KDE is shown below: ote that the KDE curve (blue) tracks very closely with the Gaussian density (orange) curve. Case 2 ow let s try a non normal sample data set. We generated 50 random values of a uniform distribution between 3 and 3. Following similar steps, we plotted the histogram and the KDE: ote that the KDE curve (blue) tracks much more closely with the underlying distribution (i.e. uniform) than the histogram. Case 3 For our 3 rd case, we generated 50 random values of a binomial distribution (p=0.2 and batch size=20). Following similar steps, we plotted the histogram and the KDE. Kernel Density Estimation (KDE) Tutorial 7 Spider Financial Corp, 2013
8 ote that KDE curve (blue) tracks much more closely with the underlying distribution (i.e. uniform) than the histogram. Conclusion In this tutorial, we demonstrated the process to generate a kernel density estimation in Excel using umxl s add in functions. The KDE method is a major improvement for inferring the probability density function of the population, in terms of accuracy and continuity of the function. evertheless, it introduce a new challenge: selecting a proper bandwidth. In the majority of cases, the Silverman estimator for the bandwidth proves to be satisfactory, but is it optimal? Do we care? Where do we go from here? First, to answer the question of optimality, we need to introduce additional algorithms to estimate its values. For example, in Annals of Statistics, Volume 38, umber 5, pages , Z. I. Botev, J. F. Grotowski, and D. P. Kroese described a numerical sample data driven method for finding the optimal bandwidth using a Kernel density estimation via the diffusion approach. Second, in cases where the range of values that the random number can take are known to be constrained from one side (e.g. prices, binomial data, etc.), or in a range (e.g. survival rate, default rate, etc.), then how do we adapt the KDE to factor in those constraints? Finally, we defined the KDE probability estimator using a fixed bandwidth ( h ) for all observations. If the bandwidth is not held fixed, but is varied depending upon the location of either the estimate (balloon estimator) or the samples (point wise estimator), this produces a particularly powerful method known as adaptive or variable bandwidth kernel density estimation. Kernel Density Estimation (KDE) Tutorial 8 Spider Financial Corp, 2013
Section 4 Matching Estimator
Section 4 Matching Estimator Matching Estimators Key Idea: The matching method compares the outcomes of program participants with those of matched nonparticipants, where matches are chosen on the basis
More informationUniversity of North Dakota PeopleSoft Finance Tip Sheets. Utilizing the Query Download Feature
There is a custom feature available in Query Viewer that allows files to be created from queries and copied to a user s PC. This feature doesn t have the same size limitations as running a query to HTML
More informationAcquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.
Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting
More informationUnit 7 Statistics. AFM Mrs. Valentine. 7.1 Samples and Surveys
Unit 7 Statistics AFM Mrs. Valentine 7.1 Samples and Surveys v Obj.: I will understand the different methods of sampling and studying data. I will be able to determine the type used in an example, and
More informationChapter 6: DESCRIPTIVE STATISTICS
Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling
More informationOn Kernel Density Estimation with Univariate Application. SILOKO, Israel Uzuazor
On Kernel Density Estimation with Univariate Application BY SILOKO, Israel Uzuazor Department of Mathematics/ICT, Edo University Iyamho, Edo State, Nigeria. A Seminar Presented at Faculty of Science, Edo
More informationMaster Reports Guide. Table of Contents
Table of Contents Welcome to Master Reports... 2 Report Basics... 2 Access Reports... 2 Download or Print a Report... 2 Report Help... 2 Save or Share a Report... 2 Master Report Formulas... 3 Filter a
More informationMath 227 EXCEL / MEGASTAT Guide
Math 227 EXCEL / MEGASTAT Guide Introduction Introduction: Ch2: Frequency Distributions and Graphs Construct Frequency Distributions and various types of graphs: Histograms, Polygons, Pie Charts, Stem-and-Leaf
More informationNonparametric Density Estimation
Nonparametric Estimation Data: X 1,..., X n iid P where P is a distribution with density f(x). Aim: Estimation of density f(x) Parametric density estimation: Fit parametric model {f(x θ) θ Θ} to data parameter
More informationTopic 5 - Joint distributions and the CLT
Topic 5 - Joint distributions and the CLT Joint distributions Calculation of probabilities, mean and variance Expectations of functions based on joint distributions Central Limit Theorem Sampling distributions
More informationCOMPUTING AND DATA ANALYSIS WITH EXCEL. Numerical integration techniques
COMPUTING AND DATA ANALYSIS WITH EXCEL Numerical integration techniques Outline 1 Quadrature in one dimension Mid-point method Trapezium method Simpson s methods Uniform random number generation in Excel,
More informationNonparametric Estimation of Distribution Function using Bezier Curve
Communications for Statistical Applications and Methods 2014, Vol. 21, No. 1, 105 114 DOI: http://dx.doi.org/10.5351/csam.2014.21.1.105 ISSN 2287-7843 Nonparametric Estimation of Distribution Function
More informationNon-Parametric Modeling
Non-Parametric Modeling CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Non-Parametric Density Estimation Parzen Windows Kn-Nearest Neighbor
More informationData Mining. CS57300 Purdue University. Bruno Ribeiro. February 1st, 2018
Data Mining CS57300 Purdue University Bruno Ribeiro February 1st, 2018 1 Exploratory Data Analysis & Feature Construction How to explore a dataset Understanding the variables (values, ranges, and empirical
More informationEconomics Nonparametric Econometrics
Economics 217 - Nonparametric Econometrics Topics covered in this lecture Introduction to the nonparametric model The role of bandwidth Choice of smoothing function R commands for nonparametric models
More informationheight VUD x = x 1 + x x N N 2 + (x 2 x) 2 + (x N x) 2. N
Math 3: CSM Tutorial: Probability, Statistics, and Navels Fall 2 In this worksheet, we look at navel ratios, means, standard deviations, relative frequency density histograms, and probability density functions.
More informationData Management Project Using Software to Carry Out Data Analysis Tasks
Data Management Project Using Software to Carry Out Data Analysis Tasks This activity involves two parts: Part A deals with finding values for: Mean, Median, Mode, Range, Standard Deviation, Max and Min
More informationFrequency Distributions
Displaying Data Frequency Distributions After collecting data, the first task for a researcher is to organize and summarize the data so that it is possible to get a general overview of the results. Remember,
More informationIntroduction to CS graphs and plots in Excel Jacek Wiślicki, Laurent Babout,
MS Excel 2010 offers a large set of graphs and plots for data visualization. For those who are familiar with older version of Excel, the layout is completely different. The following exercises demonstrate
More informationClustering. Discover groups such that samples within a group are more similar to each other than samples across groups.
Clustering 1 Clustering Discover groups such that samples within a group are more similar to each other than samples across groups. 2 Clustering Discover groups such that samples within a group are more
More informationStatistics with a Hemacytometer
Statistics with a Hemacytometer Overview This exercise incorporates several different statistical analyses. Data gathered from cell counts with a hemacytometer is used to explore frequency distributions
More informationBIOL Gradation of a histogram (a) into the normal curve (b)
(التوزيع الطبيعي ( Distribution Normal (Gaussian) One of the most important distributions in statistics is a continuous distribution called the normal distribution or Gaussian distribution. Consider the
More informationChapter 3 - Displaying and Summarizing Quantitative Data
Chapter 3 - Displaying and Summarizing Quantitative Data 3.1 Graphs for Quantitative Data (LABEL GRAPHS) August 25, 2014 Histogram (p. 44) - Graph that uses bars to represent different frequencies or relative
More informationAccess: Printing Data with Reports
Access: Printing Data with Reports Reports are a means for displaying and summarizing data from tables or queries. While forms are primarily for on-screen viewing, reports are for presenting your data
More informationCHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.
1 CHAPTER 1 Introduction Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data. Variable: Any characteristic of a person or thing that can be expressed
More informationTracking Computer Vision Spring 2018, Lecture 24
Tracking http://www.cs.cmu.edu/~16385/ 16-385 Computer Vision Spring 2018, Lecture 24 Course announcements Homework 6 has been posted and is due on April 20 th. - Any questions about the homework? - How
More informationPrepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.
Chapter 2 2.1 Descriptive Statistics A stem-and-leaf graph, also called a stemplot, allows for a nice overview of quantitative data without losing information on individual observations. It can be a good
More informationDecision Support Risk handout. Simulating Spreadsheet models
Decision Support Models @ Risk handout Simulating Spreadsheet models using @RISK 1. Step 1 1.1. Open Excel and @RISK enabling any macros if prompted 1.2. There are four on line help options available.
More informationLab 7 Statistics I LAB 7 QUICK VIEW
Lab 7 Statistics I This lab will cover how to do statistical calculations in excel using formulas. (Note that your version of excel may have additional formulas to calculate statistics, but these formulas
More informationICT & MATHS. Excel 2003 in Mathematics Teaching
ICT & MATHS Excel 2003 in Mathematics Teaching Published by The National Centre for Technology in Education in association with the Project Maths Development Team. Permission granted to reproduce for educational
More informationChapter 2: The Normal Distribution
Chapter 2: The Normal Distribution 2.1 Density Curves and the Normal Distributions 2.2 Standard Normal Calculations 1 2 Histogram for Strength of Yarn Bobbins 15.60 16.10 16.60 17.10 17.60 18.10 18.60
More informationGeostatistics 2D GMS 7.0 TUTORIALS. 1 Introduction. 1.1 Contents
GMS 7.0 TUTORIALS 1 Introduction Two-dimensional geostatistics (interpolation) can be performed in GMS using the 2D Scatter Point module. The module is used to interpolate from sets of 2D scatter points
More informationWhat s New in Oracle Crystal Ball? What s New in Version Browse to:
What s New in Oracle Crystal Ball? Browse to: - What s new in version 11.1.1.0.00 - What s new in version 7.3 - What s new in version 7.2 - What s new in version 7.1 - What s new in version 7.0 - What
More informationUse of Extreme Value Statistics in Modeling Biometric Systems
Use of Extreme Value Statistics in Modeling Biometric Systems Similarity Scores Two types of matching: Genuine sample Imposter sample Matching scores Enrolled sample 0.95 0.32 Probability Density Decision
More informationDiscrete Fourier Transform
Discrete Fourier Transfor This is the first tutorial in our ongoing series on tie series spectral analysis. In this entry, we will closely exaine the discrete Fourier transfor (aa DFT) and its inverse,
More informationChapter 2. Descriptive Statistics: Organizing, Displaying and Summarizing Data
Chapter 2 Descriptive Statistics: Organizing, Displaying and Summarizing Data Objectives Student should be able to Organize data Tabulate data into frequency/relative frequency tables Display data graphically
More informationCHAPTER 6. The Normal Probability Distribution
The Normal Probability Distribution CHAPTER 6 The normal probability distribution is the most widely used distribution in statistics as many statistical procedures are built around it. The central limit
More informationSPSS. (Statistical Packages for the Social Sciences)
Inger Persson SPSS (Statistical Packages for the Social Sciences) SHORT INSTRUCTIONS This presentation contains only relatively short instructions on how to perform basic statistical calculations in SPSS.
More informationCOMPUTATIONAL STATISTICS UNSUPERVISED LEARNING
COMPUTATIONAL STATISTICS UNSUPERVISED LEARNING Luca Bortolussi Department of Mathematics and Geosciences University of Trieste Office 238, third floor, H2bis luca@dmi.units.it Trieste, Winter Semester
More informationBIOSTATISTICS LABORATORY PART 1: INTRODUCTION TO DATA ANALYIS WITH STATA: EXPLORING AND SUMMARIZING DATA
BIOSTATISTICS LABORATORY PART 1: INTRODUCTION TO DATA ANALYIS WITH STATA: EXPLORING AND SUMMARIZING DATA Learning objectives: Getting data ready for analysis: 1) Learn several methods of exploring the
More informationThe first few questions on this worksheet will deal with measures of central tendency. These data types tell us where the center of the data set lies.
Instructions: You are given the following data below these instructions. Your client (Courtney) wants you to statistically analyze the data to help her reach conclusions about how well she is teaching.
More informationChapter 2 Describing, Exploring, and Comparing Data
Slide 1 Chapter 2 Describing, Exploring, and Comparing Data Slide 2 2-1 Overview 2-2 Frequency Distributions 2-3 Visualizing Data 2-4 Measures of Center 2-5 Measures of Variation 2-6 Measures of Relative
More informationUniversity of Wisconsin-Madison Spring 2018 BMI/CS 776: Advanced Bioinformatics Homework #2
Assignment goals Use mutual information to reconstruct gene expression networks Evaluate classifier predictions Examine Gibbs sampling for a Markov random field Control for multiple hypothesis testing
More informationAssignment 3 due Thursday Oct. 11
Instructor Linda C. Stephenson due Thursday Oct. 11 GENERAL NOTE: These assignments often build on each other what you learn in one assignment may be carried over to subsequent assignments. If I have already
More informationPart I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures
Part I, Chapters 4 & 5 Data Tables and Data Analysis Statistics and Figures Descriptive Statistics 1 Are data points clumped? (order variable / exp. variable) Concentrated around one value? Concentrated
More informationNonparametric regression using kernel and spline methods
Nonparametric regression using kernel and spline methods Jean D. Opsomer F. Jay Breidt March 3, 016 1 The statistical model When applying nonparametric regression methods, the researcher is interested
More informationKernel Density Estimation
Kernel Density Estimation An Introduction Justus H. Piater, Université de Liège Overview 1. Densities and their Estimation 2. Basic Estimators for Univariate KDE 3. Remarks 4. Methods for Particular Domains
More informationVisualizing Data: Freq. Tables, Histograms
Visualizing Data: Freq. Tables, Histograms Engineering Statistics Section 1.2 Josh Engwer TTU 25 January 2016 Josh Engwer (TTU) Visualizing Data: Freq. Tables, Histograms 25 January 2016 1 / 23 Descriptive
More informationCHAPTER 3: Data Description
CHAPTER 3: Data Description You ve tabulated and made pretty pictures. Now what numbers do you use to summarize your data? Ch3: Data Description Santorico Page 68 You ll find a link on our website to a
More informationTable of Contents (As covered from textbook)
Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression
More informationChapter2 Description of samples and populations. 2.1 Introduction.
Chapter2 Description of samples and populations. 2.1 Introduction. Statistics=science of analyzing data. Information collected (data) is gathered in terms of variables (characteristics of a subject that
More informationThe MKLE Package. July 24, Kdensity... 1 Kernels... 3 MKLE-package... 3 checkparms... 4 Klik... 5 MKLE... 6 mklewarp... 7 state... 8.
The MKLE Package July 24, 2006 Type Package Title Maximum kernel likelihood estimation Version 0.02 Date 2006-07-12 Author Maintainer Package to compute the maximum kernel likelihood
More information3 Graphical Displays of Data
3 Graphical Displays of Data Reading: SW Chapter 2, Sections 1-6 Summarizing and Displaying Qualitative Data The data below are from a study of thyroid cancer, using NMTR data. The investigators looked
More informationSPSS Basics for Probability Distributions
Built-in Statistical Functions in SPSS Begin by defining some variables in the Variable View of a data file, save this file as Probability_Distributions.sav and save the corresponding output file as Probability_Distributions.spo.
More informationSAS Visual Analytics 8.2: Working with Report Content
SAS Visual Analytics 8.2: Working with Report Content About Objects After selecting your data source and data items, add one or more objects to display the results. SAS Visual Analytics provides objects
More informationQuickLoad-Central. User Guide
QuickLoad-Central User Guide Contents Introduction... 4 Navigating QuickLoad Central... 4 Viewing QuickLoad-Central Information... 6 Registering a License... 6 Managing File Stores... 6 Adding File Stores...
More informationTutorial: Using Tina Vision s Quantitative Pattern Recognition Tool.
Tina Memo No. 2014-004 Internal Report Tutorial: Using Tina Vision s Quantitative Pattern Recognition Tool. P.D.Tar. Last updated 07 / 06 / 2014 ISBE, Medical School, University of Manchester, Stopford
More informationQuantitative - One Population
Quantitative - One Population The Quantitative One Population VISA procedures allow the user to perform descriptive and inferential procedures for problems involving one population with quantitative (interval)
More informationMultivariate Calibration Quick Guide
Last Updated: 06.06.2007 Table Of Contents 1. HOW TO CREATE CALIBRATION MODELS...1 1.1. Introduction into Multivariate Calibration Modelling... 1 1.1.1. Preparing Data... 1 1.2. Step 1: Calibration Wizard
More informationArea and Perimeter EXPERIMENT. How are the area and perimeter of a rectangle related? You probably know the formulas by heart:
Area and Perimeter How are the area and perimeter of a rectangle related? You probably know the formulas by heart: Area Length Width Perimeter (Length Width) But if you look at data for many different
More informationData 100 Lecture 5: Data Cleaning & Exploratory Data Analysis
OrderNum ProdID Name OrderId Cust Name Date 1 42 Gum 1 Joe 8/21/2017 2 999 NullFood 2 Arthur 8/14/2017 2 42 Towel 2 Arthur 8/14/2017 1/31/18 Data 100 Lecture 5: Data Cleaning & Exploratory Data Analysis
More informationCALCULATION OF OPERATIONAL LOSSES WITH NON- PARAMETRIC APPROACH: DERAILMENT LOSSES
2. Uluslar arası Raylı Sistemler Mühendisliği Sempozyumu (ISERSE 13), 9-11 Ekim 2013, Karabük, Türkiye CALCULATION OF OPERATIONAL LOSSES WITH NON- PARAMETRIC APPROACH: DERAILMENT LOSSES Zübeyde Öztürk
More informationSurvey Design, Distribution & Analysis Software. professional quest. Whitepaper Extracting Data into Microsoft Excel
Survey Design, Distribution & Analysis Software professional quest Whitepaper Extracting Data into Microsoft Excel WHITEPAPER Extracting Scoring Data into Microsoft Excel INTRODUCTION... 1 KEY FEATURES
More informationData 100. Lecture 5: Data Cleaning & Exploratory Data Analysis
Data 100 Lecture 5: Data Cleaning & Exploratory Data Analysis Slides by: Joseph E. Gonzalez, Deb Nolan, & Joe Hellerstein jegonzal@berkeley.edu deborah_nolan@berkeley.edu hellerstein@berkeley.edu? Last
More informationQuick Start Guide. Version R94. English
Custom Reports Quick Start Guide Version R94 English December 12, 2016 Copyright Agreement The purchase and use of all Software and Services is subject to the Agreement as defined in Kaseya s Click-Accept
More informationThree-Dimensional (Surface) Plots
Three-Dimensional (Surface) Plots Creating a Data Array 3-Dimensional plots (surface plots) are often useful for visualizing the behavior of functions and identifying important mathematical/physical features
More informationLight Speed with Excel
Work @ Light Speed with Excel 2018 Excel University, Inc. All Rights Reserved. http://beacon.by/magazine/v4/94012/pdf?type=print 1/64 Table of Contents Cover Table of Contents PivotTable from Many CSV
More informationNCSS Statistical Software
Chapter 245 Introduction This procedure generates R control charts for variables. The format of the control charts is fully customizable. The data for the subgroups can be in a single column or in multiple
More informationParametric & Hone User Guide
Parametric & Hone User Guide IES Virtual Environment Copyright 2017 Integrated Environmental Solutions Limited. All rights reserved. No part of the manual is to be copied or reproduced in any Contents
More informationSTA Rev. F Learning Objectives. Learning Objectives (Cont.) Module 3 Descriptive Measures
STA 2023 Module 3 Descriptive Measures Learning Objectives Upon completing this module, you should be able to: 1. Explain the purpose of a measure of center. 2. Obtain and interpret the mean, median, and
More informationMicrosoft Excel Using Excel in the Science Classroom
Microsoft Excel Using Excel in the Science Classroom OBJECTIVE Students will take data and use an Excel spreadsheet to manipulate the information. This will include creating graphs, manipulating data,
More informationMS Excel Advanced Level
MS Excel Advanced Level Trainer : Etech Global Solution Contents Conditional Formatting... 1 Remove Duplicates... 4 Sorting... 5 Filtering... 6 Charts Column... 7 Charts Line... 10 Charts Bar... 10 Charts
More informationName Date Types of Graphs and Creating Graphs Notes
Name Date Types of Graphs and Creating Graphs Notes Graphs are helpful visual representations of data. Different graphs display data in different ways. Some graphs show individual data, but many do not.
More informationThe first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below.
Graphing in Excel featuring Excel 2007 1 A spreadsheet can be a powerful tool for analyzing and graphing data, but it works completely differently from the graphing calculator that you re used to. If you
More informationSection 2 Comparing distributions - Worksheet
The data are from the paper: Exploring Relationships in Body Dimensions Grete Heinz and Louis J. Peterson San José State University Roger W. Johnson and Carter J. Kerk South Dakota School of Mines and
More informationNumerical Descriptive Measures
Chapter 3 Numerical Descriptive Measures 1 Numerical Descriptive Measures Chapter 3 Measures of Central Tendency and Measures of Dispersion A sample of 40 students at a university was randomly selected,
More informationLecture 6: Chapter 6 Summary
1 Lecture 6: Chapter 6 Summary Z-score: Is the distance of each data value from the mean in standard deviation Standardizes data values Standardization changes the mean and the standard deviation: o Z
More informationChapter 2 Modeling Distributions of Data
Chapter 2 Modeling Distributions of Data Section 2.1 Describing Location in a Distribution Describing Location in a Distribution Learning Objectives After this section, you should be able to: FIND and
More informationMinitab 17 commands Prepared by Jeffrey S. Simonoff
Minitab 17 commands Prepared by Jeffrey S. Simonoff Data entry and manipulation To enter data by hand, click on the Worksheet window, and enter the values in as you would in any spreadsheet. To then save
More informationModel Based Symbolic Description for Big Data Analysis
Model Based Symbolic Description for Big Data Analysis 1 Model Based Symbolic Description for Big Data Analysis *Carlo Drago, **Carlo Lauro and **Germana Scepi *University of Rome Niccolo Cusano, **University
More informationPage 1. Graphical and Numerical Statistics
TOPIC: Description Statistics In this tutorial, we show how to use MINITAB to produce descriptive statistics, both graphical and numerical, for an existing MINITAB dataset. The example data come from Exercise
More informationCharts in Excel 2003
Charts in Excel 2003 Contents Introduction Charts in Excel 2003...1 Part 1: Generating a Basic Chart...1 Part 2: Adding Another Data Series...3 Part 3: Other Handy Options...5 Introduction Charts in Excel
More informationChapter 1. Looking at Data-Distribution
Chapter 1. Looking at Data-Distribution Statistics is the scientific discipline that provides methods to draw right conclusions: 1)Collecting the data 2)Describing the data 3)Drawing the conclusions Raw
More informationCandy is Dandy Project (Project #12)
Candy is Dandy Project (Project #12) You have been hired to conduct some market research about M&M's. First, you had your team purchase 4 large bags and the results are given for the contents of those
More informationInstance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015
Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2015 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows K-Nearest
More informationAverages and Variation
Averages and Variation 3 Copyright Cengage Learning. All rights reserved. 3.1-1 Section 3.1 Measures of Central Tendency: Mode, Median, and Mean Copyright Cengage Learning. All rights reserved. 3.1-2 Focus
More informationAdvanced Excel Skills
Advanced Excel Skills Note : This tutorial is based upon MSExcel 2000. If you are using MSExcel 2002, there may be some operations which look slightly different (e.g. pivot tables), but the same principles
More informationWeek 4: Describing data and estimation
Week 4: Describing data and estimation Goals Investigate sampling error; see that larger samples have less sampling error. Visualize confidence intervals. Calculate basic summary statistics using R. Calculate
More informationBandwidth Selection for Kernel Density Estimation Using Total Variation with Fourier Domain Constraints
IEEE SIGNAL PROCESSING LETTERS 1 Bandwidth Selection for Kernel Density Estimation Using Total Variation with Fourier Domain Constraints Alexander Suhre, Orhan Arikan, Member, IEEE, and A. Enis Cetin,
More informationQuality and Six Sigma Tools using MINITAB Statistical Software: A complete Guide to Six Sigma DMAIC Tools using MINITAB
Samples from MINITAB Book Quality and Six Sigma Tools using MINITAB Statistical Software A complete Guide to Six Sigma DMAIC Tools using MINITAB Prof. Amar Sahay, Ph.D. One of the major objectives of this
More informationSTA Module 4 The Normal Distribution
STA 2023 Module 4 The Normal Distribution Learning Objectives Upon completing this module, you should be able to 1. Explain what it means for a variable to be normally distributed or approximately normally
More informationSTA /25/12. Module 4 The Normal Distribution. Learning Objectives. Let s Look at Some Examples of Normal Curves
STA 2023 Module 4 The Normal Distribution Learning Objectives Upon completing this module, you should be able to 1. Explain what it means for a variable to be normally distributed or approximately normally
More informationAccess 2013 Introduction to Forms and Reports
Forms Overview You can create forms to present data in a more attractive and easier to use format They can be used for viewing, editing and printing data and in advanced cases, used to automate the database
More informationToday Function. Note: If you want to retrieve the date and time that the computer is set to, use the =NOW() function.
Today Function The today function: =TODAY() It has no arguments, and returns the date that the computer is set to. It is volatile, so if you save it and reopen the file one month later the new, updated
More informationFrequency tables Create a new Frequency Table
Frequency tables Create a new Frequency Table Contents FREQUENCY TABLES CREATE A NEW FREQUENCY TABLE... 1 Results Table... 2 Calculate Descriptive Statistics for Frequency Tables... 6 Transfer Results
More informationThese slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Machine Learning Algorithms (IFT6266 A7) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
More informationPre-Lab Excel Problem
Pre-Lab Excel Problem Read and follow the instructions carefully! Below you are given a problem which you are to solve using Excel. If you have not used the Excel spreadsheet a limited tutorial is given
More informationSAS (Statistical Analysis Software/System)
SAS (Statistical Analysis Software/System) SAS Adv. Analytics or Predictive Modelling:- Class Room: Training Fee & Duration : 30K & 3 Months Online Training Fee & Duration : 33K & 3 Months Learning SAS:
More informationSTA Module 2B Organizing Data and Comparing Distributions (Part II)
STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and
More informationSTA Learning Objectives. Learning Objectives (cont.) Module 2B Organizing Data and Comparing Distributions (Part II)
STA 2023 Module 2B Organizing Data and Comparing Distributions (Part II) Learning Objectives Upon completing this module, you should be able to 1 Explain the purpose of a measure of center 2 Obtain and
More information