CALCULATION OF OPERATIONAL LOSSES WITH NON- PARAMETRIC APPROACH: DERAILMENT LOSSES

Size: px
Start display at page:

Download "CALCULATION OF OPERATIONAL LOSSES WITH NON- PARAMETRIC APPROACH: DERAILMENT LOSSES"

Transcription

1 2. Uluslar arası Raylı Sistemler Mühendisliği Sempozyumu (ISERSE 13), 9-11 Ekim 2013, Karabük, Türkiye CALCULATION OF OPERATIONAL LOSSES WITH NON- PARAMETRIC APPROACH: DERAILMENT LOSSES Zübeyde Öztürk a, and Ö.Emre Özcan b* a, ITU Faculty of Civil Engineering Transportation Department, Istanbul, Turkey, b * Vitsan Llyods Agent, Istanbul, Turkey, emre.ozcan@ttmail.com Abstract In railway transportation systems, in order to control risk and to reduce probable operational losses, probability density functions of expected operational losses should be defined. If independent variables are not known, probability distributions of operational losses can be calculated by using histogram and parametric distribution family. However, if the functional structure of the distribution functions of the population is not known and probability distribution function has multiple peaks, the results obtained from the parametric distribution families would be far off from the actual values. In this case, the univariate probability density functions of operational losses can be derived by using kernel estimation method which is called a non-parametric way. However, the reliability of the kernel density estimation is related to the choice of the smoothing parameter. The optimal smoothing parameter which illustrates the accuracy of a density estimator is obtained minimizing mean integrated square error. In this study, the univariate probability density function of the derailment hazard events, which is one of the operational losses, is calculated by using different kernel density functions. The severity and the frequency of the events are obtained from the derailment hazard events occurred in the Turkish State Railways Region I, between 2000 and Additionally, aggregate losses for the derailment hazard events are calculated with parametric distribution family and the results are compared. Keywords: Kernel estimation, smoothing parameter, operational losses, mean integrated square error, derailment. 1. Introduction The probability distribution of a random variable is described in terms of its probability density function (PDF) or cumulative distribution function (CDF). Density estimation of operational losses deal with the problem of estimating the PDF based on the historical data sampled from the PDF. The parametric approach with parametric family of distributions estimates the parameters of the dependent random variable. This approach has advantages as long as the distributional family is correct and the distribution of the random variable has not irregular shape [1]. To avoid these restrictive assumptions, non-parametric approach is useful to estimate the form of the distribution. In the kernel density estimation (KDE), determination of the bandwidth (or called smoothing parameter) is essential. The kernel function estimates PDF by simply computing the geometric mean of the kernel functions for all data [2] and the bias of the kernel density estimators depends on the smoothing parameters [3,4]. In this study, we examined the expected derailment losses based on KDE and parametric distribution family and are compared the results. 2. Kernel Density Estimator Non parametric approach avoids restrictive assumptions about the form PDF and estimates probability values from the data. The general formula for Kernel density estimator is;

2 Öztürk, Z. and Özcan, Ö.E. ( ) ( ) 1.1 where is a smoothing parameter, is number of observation and is a kernel function which is symmetric around zero and integrates to 1. Various KDE functions have been proposed. Some of KDE functions are shown in Table 1. Table 1. KDE functions. Kernel K(u) Bounds Epanechnikov ( ) Gaussian ( ) Triweight ( ) - Quartic ( ) The expression (1.2) provides the cumulative distribution function of the KDE functions. ( ) ( ) 1.2 One simple way of choosing the smoothing parameter ( ) is to do comparing visually graphical density estimations corresponding to arbitrary choices of smoothing parameters. Among several smoothing parameter optimizing principles, in this study, mean integrated square error (MISE) for quantify the accuracy of a density estimator is applied for the derailment density estimators. The optimal smoothing parameter at all points will yield an estimated density as close as possible to the true density. ( ) ( [ ( )] ( )) [ ( )] 1.3 The first term of MISE criterion depends on the expectation value of the true (or unknown) density ( ). Therefore, approximations to this criterion are used (i.e. asymptotical approach). The problem of automatic chose of smoothing parameters has been widely studied. By minimizing the expression (1.3) with respect to, the optimal smoothing parameter can be obtained ( is number of data). ( ( ) ( ) ( ) ) 1.4 Substituting the expression (1.4) in the expression (1.3) gives the minimal MISE. However, the expression (1.4) depends on the unknown. With some simulation techniques (plug-in, crossvalidation, bootstrap etc. estimators), unknown PDF can be estimated [4]. If ( ) is normal, then unknown PDF can be expressed by (1.5) ( ) 1.5 The practical recommendations for the choice of optimal smoothing parameter (1.6 and 1.7), which minimizes the MISE criterion [5]; or 1.6

3 Öztürk, Z. and Özcan, Ö.E. * ( ( ) ( ) )+ 1.7 where ( ( ( ) ) ) ( ( ( ) ) ) : Standard deviation of ( ): Points taken at regular intervals from the cumulative distribution function (CDF) of. 3. Derailment Losses Derailment is one of the railway accident risks or called an operational loss and takes place when a railway car (rolling stock) runs off its track. By improving safety measures or by making risk control, derailment accidents can be brought to a lower level in some countries (in the US, derailments have dropped since 1986 from 1,000 to 500 in 2010 and in the UK, derailments have dropped since 2006 from 26 to 16 in 2012 [6]). In Turkey, the severity and the frequency of the derailment hazard events are obtained from the archive research in the Turkish State Railways Region I (the railway lines are divided into seven regions in Turkey), between 2000 and The derailment risks to railway undertaking are dealt separately by below figures. Numbers of derailment events (or derailment accident frequencies) are illustrated by Figure 1. As in Figure 1, the risk frequency from station derailments has the maximum value compared with derailment types. Observed total cost level for station derailments from Figure 2, USD 414,520.0, USD 367,548.0 and USD 419,899.0 for vehicle, superstructure and train delay costs, respectively, are calculated (these figures are actualized for Fatality and injury compensations occurred from the workforce, passenger and member of public are not included). There were 6 other derailments (branch lines, depots, ports) during 2000/10. These account for around 1.2% of the total number of derailment events that are classified under other derailment groups and contribute around USD 45,340.0 (vehicle, superstructure and delay costs) of the total derailment event costs. Figure 1. Derailment events. Figure 2. Derailment costs at stations. The Figure 3 shows derailment costs occurred at switches. The maximum derailment costs in total at switches occurred from train delays with amount of USD 169,928.0, whereas USD 236,390.0 and USD 104,013.0 are respectively for vehicle and superstructure cost.

4 f(x) Öztürk, Z. and Özcan, Ö.E. Most of the derailment risk to railway undertaking arises from derailment accidents on main lines, which account for USD 4,940,216.0 in total between 2000 and 2010 (Figure 4). Figure 3. Derailment costs at switches. Figure 4. Derailment costs on main lines Non-Parametric PDF Calculation: Derailment Losses Nonparametric tests do not rely on assumptions about the shape or parameters of the underlying population distribution. If the data deviate strongly from the assumptions of a parametric procedure, using the parametric procedure could lead to incorrect conclusions. Obtained from the historical data for the derailment hazard events between 2000 and 2010, the most appropriate severity PDF distribution fits to Quartic kernel function witimum smoothing parameter of = (Figure 5). The Table 2 lists the aggregate errors of the kernel functions with respect to true PDF values Non-Parametric Probability Density Functions (Derailments) (Quartic)= (Triweight)= True Epanechnikov Gaussian Triweight Quartic (Epanechnikov)= (Gaussian)= x Figure 5. Non-parametric PDF estimators. Table2. Non-parametric PDF errors with respect to true density. Errors Epanechnikov Gaussian Triweight Quartic Aggregate Error Average Error

5 f(x) Öztürk, Z. and Özcan, Ö.E Parametric PDF Calculation: Derailment Losses Parametric calculations are based on assumptions about the distribution of the underlying population from which the sample was taken. This approach has advantages as long as the theoretic distribution is correct. The main disadvantage of the parametric approach is lack of flexibility (i.e. restrictions on the shapes of ( )). With respect to goodness of fit tests for parametric calculations, the best parametric distributions with parameters are illustrated by Table 3. Table 3. Parametric distributions for derailment hazard events involved between 2000 and Parametric Distributions Statistic P-Value 95% 98% 99% Wakeby K-S α=4.8665;β= A-D γ= ; δ= ξ= C-S Gen.Gamma K-S k=4.8382; α= A-D ;β= ; γ= C-S Dagum 4P K-S k= ; α= A-D ;β= ; γ= C-S K-S:Kolmogorov-Smirnov, A-D: Anderson-Darling, C-S: Chi-Squared The Table 4 lists the aggregate errors of the theoretical distributions corresponding to the true PDF values. Figure 6 shows the PDF values of the best fit parametric distributions. Table 4. PDF errors of parametric distributions. Errors GenGamma Wakeby Dagum 4P Aggregate Error Average Error Parametric Probability Density Functions (Derailments) True Wakeby Dagum4P GenGamma x

6 f(x) Öztürk, Z. and Özcan, Ö.E Expected Derailment Operational Losses Figure 6. Parametric PDF estimators. Figure 7 illustrates the best parametric and non-parametric PDF values from derailment costs in 2000/2010, in which totally 501 accidents occurred at the vicinity of Region I. Calculation of the expected operational losses for the derailment hazard events is related to the estimating the PDFs of severity and frequency random variables with good assumptions. Severity and frequency of risks are modeled in different process. In these models, severity and frequency of losses are brought together under the aggregate losses and distribution of aggregate losses ( ) could be obtained [7] Parametric & Non-Parametric Probability Density Functions (Derailments) True Wakeby Quartic x Figure 7. The best parametric and non-parametric PDF estimators. In this study, according to goodness of fit tests, obtained from the historical data, it is calculated that the most appropriate frequency distribution fits to Poisson distribution. Table 5. Parametric and non-parametric approaches for aggregate PDF and expected losses. [ ] [ ] E E E E E E Total , , Expected losses that involved derailment risks in 2013 are calculated at USD 165,528.91, whereas the parametric modeling gives USD 122, reflecting a margin of error of 26%, compared to the nonparametric modeling. 4. Conclusion Revealing of derailment risks by solving severity and frequency distributions will generate expected losses, in case of good assumption from parametric distribution families. However, when the form of

7 Öztürk, Z. and Özcan, Ö.E. the distribution is fluctuating, non-parametric density estimators are better than parametric modeling for efficiency corrections. In kernel density estimation, the bias in the PDF approximation is evaluated by minimizing the mean integrated square error (MISE) between the true density and approximation, which gives the optimum smoothing parameter. In this study, expected losses for derailment hazard events are investigated by parametric and nonparametric modeling. Based on the historical data, since the true density function has irregular shape, Quartic PDF estimator gives better result than the parametric modeling (Wakeby) which estimates the PDF with 16% aggregate error rate corresponding to kernel estimator (Quartic kernel). Although there is 16% aggregate error rate of parametric distribution (Wakeby), this figure increased to a total error rate of 26% in terms of aggregate expected loss. References [1] Zucchini, W., Applied Smoothing Techniques Part 1:Kernel Density Estimation, Temple University Book, October [2] Roberts S.J., Parametric and Non-parametric Unsupervised Cluster Analysis. Pattern Recognition, 30(2): , [3] Loader,C.,1999. Bandwidth Selection:Classical or Plug-in?, The annals of Statistics, 27(2), [4] Wand, M.P and Jones, M.C., Kernel Smoothing, Chapman and Hall, London. [5] Cameron, A.C and Trivedi, P.K., Microeconometrics:Methods and Applications, Cambridge University Press, New York,2005. [6] RSSB, Annual Safety Performance Report 2012/2013, Rail Safety and Standards Board, [7] Öztürk, Z. and Özcan, Ö.E Calculation of Aggregate Losses with Collective Risk Model in Public Transport Systems, Transist2012, November 2012.

On Kernel Density Estimation with Univariate Application. SILOKO, Israel Uzuazor

On Kernel Density Estimation with Univariate Application. SILOKO, Israel Uzuazor On Kernel Density Estimation with Univariate Application BY SILOKO, Israel Uzuazor Department of Mathematics/ICT, Edo University Iyamho, Edo State, Nigeria. A Seminar Presented at Faculty of Science, Edo

More information

Nonparametric regression using kernel and spline methods

Nonparametric regression using kernel and spline methods Nonparametric regression using kernel and spline methods Jean D. Opsomer F. Jay Breidt March 3, 016 1 The statistical model When applying nonparametric regression methods, the researcher is interested

More information

Improving the Post-Smoothing of Test Norms with Kernel Smoothing

Improving the Post-Smoothing of Test Norms with Kernel Smoothing Improving the Post-Smoothing of Test Norms with Kernel Smoothing Anli Lin Qing Yi Michael J. Young Pearson Paper presented at the Annual Meeting of National Council on Measurement in Education, May 1-3,

More information

Nonparametric Estimation of Distribution Function using Bezier Curve

Nonparametric Estimation of Distribution Function using Bezier Curve Communications for Statistical Applications and Methods 2014, Vol. 21, No. 1, 105 114 DOI: http://dx.doi.org/10.5351/csam.2014.21.1.105 ISSN 2287-7843 Nonparametric Estimation of Distribution Function

More information

Kernel Density Estimation (KDE)

Kernel Density Estimation (KDE) Kernel Density Estimation (KDE) Previously, we ve seen how to use the histogram method to infer the probability density function (PDF) of a random variable (population) using a finite data sample. In this

More information

Section 4 Matching Estimator

Section 4 Matching Estimator Section 4 Matching Estimator Matching Estimators Key Idea: The matching method compares the outcomes of program participants with those of matched nonparticipants, where matches are chosen on the basis

More information

Bandwidth Selection for Kernel Density Estimation Using Total Variation with Fourier Domain Constraints

Bandwidth Selection for Kernel Density Estimation Using Total Variation with Fourier Domain Constraints IEEE SIGNAL PROCESSING LETTERS 1 Bandwidth Selection for Kernel Density Estimation Using Total Variation with Fourier Domain Constraints Alexander Suhre, Orhan Arikan, Member, IEEE, and A. Enis Cetin,

More information

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea Chapter 3 Bootstrap 3.1 Introduction The estimation of parameters in probability distributions is a basic problem in statistics that one tends to encounter already during the very first course on the subject.

More information

Dynamic Thresholding for Image Analysis

Dynamic Thresholding for Image Analysis Dynamic Thresholding for Image Analysis Statistical Consulting Report for Edward Chan Clean Energy Research Center University of British Columbia by Libo Lu Department of Statistics University of British

More information

An Introduction to the Bootstrap

An Introduction to the Bootstrap An Introduction to the Bootstrap Bradley Efron Department of Statistics Stanford University and Robert J. Tibshirani Department of Preventative Medicine and Biostatistics and Department of Statistics,

More information

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015 Instance-based Learning CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2015 Outline Non-parametric approach Unsupervised: Non-parametric density estimation Parzen Windows K-Nearest

More information

Economics Nonparametric Econometrics

Economics Nonparametric Econometrics Economics 217 - Nonparametric Econometrics Topics covered in this lecture Introduction to the nonparametric model The role of bandwidth Choice of smoothing function R commands for nonparametric models

More information

Modelling Bivariate Distributions Using Kernel Density Estimation

Modelling Bivariate Distributions Using Kernel Density Estimation Modelling Bivariate Distributions Using Kernel Density Estimation Alexander Bilock, Carl Jidling and Ylva Rydin Project in Computational Science 6 January 6 Department of information technology Abstract

More information

Nonparametric Density Estimation

Nonparametric Density Estimation Nonparametric Estimation Data: X 1,..., X n iid P where P is a distribution with density f(x). Aim: Estimation of density f(x) Parametric density estimation: Fit parametric model {f(x θ) θ Θ} to data parameter

More information

Use of Extreme Value Statistics in Modeling Biometric Systems

Use of Extreme Value Statistics in Modeling Biometric Systems Use of Extreme Value Statistics in Modeling Biometric Systems Similarity Scores Two types of matching: Genuine sample Imposter sample Matching scores Enrolled sample 0.95 0.32 Probability Density Decision

More information

CREATING THE DISTRIBUTION ANALYSIS

CREATING THE DISTRIBUTION ANALYSIS Chapter 12 Examining Distributions Chapter Table of Contents CREATING THE DISTRIBUTION ANALYSIS...176 BoxPlot...178 Histogram...180 Moments and Quantiles Tables...... 183 ADDING DENSITY ESTIMATES...184

More information

TRACK MAINTENANCE STRATEGIES OPTIMISATION PROBLEM

TRACK MAINTENANCE STRATEGIES OPTIMISATION PROBLEM TRACK MAINTENANCE STRATEGIES OPTIMISATION PROBLEM Gregory A. Krug Dr. S Krug Consulting Service P.O.B. 44051 Tel-Aviv 61440, Israel Viig@Inter.Net.Il Janusz Madejski Silesian University Of Technology In

More information

Nonparametric Approaches to Regression

Nonparametric Approaches to Regression Nonparametric Approaches to Regression In traditional nonparametric regression, we assume very little about the functional form of the mean response function. In particular, we assume the model where m(xi)

More information

Non-Parametric Modeling

Non-Parametric Modeling Non-Parametric Modeling CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Non-Parametric Density Estimation Parzen Windows Kn-Nearest Neighbor

More information

Kernel Density Estimation

Kernel Density Estimation Kernel Density Estimation An Introduction Justus H. Piater, Université de Liège Overview 1. Densities and their Estimation 2. Basic Estimators for Univariate KDE 3. Remarks 4. Methods for Particular Domains

More information

Modelling and Quantitative Methods in Fisheries

Modelling and Quantitative Methods in Fisheries SUB Hamburg A/553843 Modelling and Quantitative Methods in Fisheries Second Edition Malcolm Haddon ( r oc) CRC Press \ y* J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of

More information

Visualizing and Exploring Data

Visualizing and Exploring Data Visualizing and Exploring Data Sargur University at Buffalo The State University of New York Visual Methods for finding structures in data Power of human eye/brain to detect structures Product of eons

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 3: Distributions Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture Examine data in graphical form Graphs for looking at univariate distributions

More information

Generating random samples from user-defined distributions

Generating random samples from user-defined distributions The Stata Journal (2011) 11, Number 2, pp. 299 304 Generating random samples from user-defined distributions Katarína Lukácsy Central European University Budapest, Hungary lukacsy katarina@phd.ceu.hu Abstract.

More information

Package gplm. August 29, 2016

Package gplm. August 29, 2016 Type Package Title Generalized Partial Linear Models (GPLM) Version 0.7-4 Date 2016-08-28 Author Package gplm August 29, 2016 Maintainer Provides functions for estimating a generalized

More information

ESTIMATING PARAMETERS FOR MODIFIED GREENSHIELD S MODEL AT FREEWAY SECTIONS FROM FIELD OBSERVATIONS

ESTIMATING PARAMETERS FOR MODIFIED GREENSHIELD S MODEL AT FREEWAY SECTIONS FROM FIELD OBSERVATIONS 0 ESTIMATING PARAMETERS FOR MODIFIED GREENSHIELD S MODEL AT FREEWAY SECTIONS FROM FIELD OBSERVATIONS Omor Sharif University of South Carolina Department of Civil and Environmental Engineering 00 Main Street

More information

PATTERN CLASSIFICATION AND SCENE ANALYSIS

PATTERN CLASSIFICATION AND SCENE ANALYSIS PATTERN CLASSIFICATION AND SCENE ANALYSIS RICHARD O. DUDA PETER E. HART Stanford Research Institute, Menlo Park, California A WILEY-INTERSCIENCE PUBLICATION JOHN WILEY & SONS New York Chichester Brisbane

More information

INDEPENDENT COMPONENT ANALYSIS WITH QUANTIZING DENSITY ESTIMATORS. Peter Meinicke, Helge Ritter. Neuroinformatics Group University Bielefeld Germany

INDEPENDENT COMPONENT ANALYSIS WITH QUANTIZING DENSITY ESTIMATORS. Peter Meinicke, Helge Ritter. Neuroinformatics Group University Bielefeld Germany INDEPENDENT COMPONENT ANALYSIS WITH QUANTIZING DENSITY ESTIMATORS Peter Meinicke, Helge Ritter Neuroinformatics Group University Bielefeld Germany ABSTRACT We propose an approach to source adaptivity in

More information

A Bayesian approach to parameter estimation for kernel density estimation via transformations

A Bayesian approach to parameter estimation for kernel density estimation via transformations A Bayesian approach to parameter estimation for kernel density estimation via transformations Qing Liu,, David Pitt 2, Xibin Zhang 3, Xueyuan Wu Centre for Actuarial Studies, Faculty of Business and Economics,

More information

An Introduction to PDF Estimation and Clustering

An Introduction to PDF Estimation and Clustering Sigmedia, Electronic Engineering Dept., Trinity College, Dublin. 1 An Introduction to PDF Estimation and Clustering David Corrigan corrigad@tcd.ie Electrical and Electronic Engineering Dept., University

More information

Nonparametric Risk Assessment of Gas Turbine Engines

Nonparametric Risk Assessment of Gas Turbine Engines Nonparametric Risk Assessment of Gas Turbine Engines Michael P. Enright *, R. Craig McClung, and Stephen J. Hudak Southwest Research Institute, San Antonio, TX, 78238, USA The accuracy associated with

More information

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures: Homework Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression 3.0-3.2 Pod-cast lecture on-line Next lectures: I posted a rough plan. It is flexible though so please come with suggestions Bayes

More information

The Bootstrap and Jackknife

The Bootstrap and Jackknife The Bootstrap and Jackknife Summer 2017 Summer Institutes 249 Bootstrap & Jackknife Motivation In scientific research Interest often focuses upon the estimation of some unknown parameter, θ. The parameter

More information

LOCAL BANDWIDTH SELECTION FOR KERNEL ESTIMATION OF' POPULATION DENSITIES WITH LINE TRANSECT SAMPLING

LOCAL BANDWIDTH SELECTION FOR KERNEL ESTIMATION OF' POPULATION DENSITIES WITH LINE TRANSECT SAMPLING LOCAL BANDWIDTH SELECTION FOR KERNEL ESTIMATION OF' POPULATION DENSITIES WITH LINE TRANSECT SAMPLING Patrick D. Gerard Experimental Statistics Unit Mississippi State University, Mississippi 39762 William

More information

Machine Learning / Jan 27, 2010

Machine Learning / Jan 27, 2010 Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,

More information

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop Machine Learning Algorithms (IFT6266 A7) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

More information

Interpolation by Spline Functions

Interpolation by Spline Functions Interpolation by Spline Functions Com S 477/577 Sep 0 007 High-degree polynomials tend to have large oscillations which are not the characteristics of the original data. To yield smooth interpolating curves

More information

Package sbf. R topics documented: February 20, Type Package Title Smooth Backfitting Version Date Author A. Arcagni, L.

Package sbf. R topics documented: February 20, Type Package Title Smooth Backfitting Version Date Author A. Arcagni, L. Type Package Title Smooth Backfitting Version 1.1.1 Date 2014-12-19 Author A. Arcagni, L. Bagnato Package sbf February 20, 2015 Maintainer Alberto Arcagni Smooth Backfitting

More information

Chapter 5. Track Geometry Data Analysis

Chapter 5. Track Geometry Data Analysis Chapter Track Geometry Data Analysis This chapter explains how and why the data collected for the track geometry was manipulated. The results of these studies in the time and frequency domain are addressed.

More information

Clustering. Discover groups such that samples within a group are more similar to each other than samples across groups.

Clustering. Discover groups such that samples within a group are more similar to each other than samples across groups. Clustering 1 Clustering Discover groups such that samples within a group are more similar to each other than samples across groups. 2 Clustering Discover groups such that samples within a group are more

More information

The Projected Dip-means Clustering Algorithm

The Projected Dip-means Clustering Algorithm Theofilos Chamalis Department of Computer Science & Engineering University of Ioannina GR 45110, Ioannina, Greece thchama@cs.uoi.gr ABSTRACT One of the major research issues in data clustering concerns

More information

A Random Number Based Method for Monte Carlo Integration

A Random Number Based Method for Monte Carlo Integration A Random Number Based Method for Monte Carlo Integration J Wang and G Harrell Department Math and CS, Valdosta State University, Valdosta, Georgia, USA Abstract - A new method is proposed for Monte Carlo

More information

Proximity Analysis of Vulnerable Facilities to Infrastructure Facilities Using Density Estimation: California Facilities

Proximity Analysis of Vulnerable Facilities to Infrastructure Facilities Using Density Estimation: California Facilities Proximity Analysis of Vulnerable Facilities to Infrastructure Facilities Using Estimation: California Facilities DRAFT REPORT, May 6, 2008 Rae Zimmerman, Jeffrey S. Simonoff, Zvia Segal Naphtali, Carlos

More information

A Review on Privacy Preserving Data Mining Approaches

A Review on Privacy Preserving Data Mining Approaches A Review on Privacy Preserving Data Mining Approaches Anu Thomas Asst.Prof. Computer Science & Engineering Department DJMIT,Mogar,Anand Gujarat Technological University Anu.thomas@djmit.ac.in Jimesh Rana

More information

Response to API 1163 and Its Impact on Pipeline Integrity Management

Response to API 1163 and Its Impact on Pipeline Integrity Management ECNDT 2 - Tu.2.7.1 Response to API 3 and Its Impact on Pipeline Integrity Management Munendra S TOMAR, Martin FINGERHUT; RTD Quality Services, USA Abstract. Knowing the accuracy and reliability of ILI

More information

Nonparametric Regression

Nonparametric Regression Nonparametric Regression John Fox Department of Sociology McMaster University 1280 Main Street West Hamilton, Ontario Canada L8S 4M4 jfox@mcmaster.ca February 2004 Abstract Nonparametric regression analysis

More information

Introduction to Nonparametric/Semiparametric Econometric Analysis: Implementation

Introduction to Nonparametric/Semiparametric Econometric Analysis: Implementation to Nonparametric/Semiparametric Econometric Analysis: Implementation Yoichi Arai National Graduate Institute for Policy Studies 2014 JEA Spring Meeting (14 June) 1 / 30 Motivation MSE (MISE): Measures

More information

Simulating multivariate distributions with sparse data: a kernel density smoothing procedure

Simulating multivariate distributions with sparse data: a kernel density smoothing procedure Simulating multivariate distributions with sparse data: a kernel density smoothing procedure James W. Richardson Department of gricultural Economics, Texas &M University Gudbrand Lien Norwegian gricultural

More information

Supervised vs. Unsupervised Learning

Supervised vs. Unsupervised Learning Clustering Supervised vs. Unsupervised Learning So far we have assumed that the training samples used to design the classifier were labeled by their class membership (supervised learning) We assume now

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2009 CS 551, Spring 2009 c 2009, Selim Aksoy (Bilkent University)

More information

Statistics: Normal Distribution, Sampling, Function Fitting & Regression Analysis (Grade 12) *

Statistics: Normal Distribution, Sampling, Function Fitting & Regression Analysis (Grade 12) * OpenStax-CNX module: m39305 1 Statistics: Normal Distribution, Sampling, Function Fitting & Regression Analysis (Grade 12) * Free High School Science Texts Project This work is produced by OpenStax-CNX

More information

LOCAL-GLOBAL OPTICAL FLOW FOR IMAGE REGISTRATION

LOCAL-GLOBAL OPTICAL FLOW FOR IMAGE REGISTRATION LOCAL-GLOBAL OPTICAL FLOW FOR IMAGE REGISTRATION Ammar Zayouna Richard Comley Daming Shi Middlesex University School of Engineering and Information Sciences Middlesex University, London NW4 4BT, UK A.Zayouna@mdx.ac.uk

More information

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010

Overview Citation. ML Introduction. Overview Schedule. ML Intro Dataset. Introduction to Semi-Supervised Learning Review 10/4/2010 INFORMATICS SEMINAR SEPT. 27 & OCT. 4, 2010 Introduction to Semi-Supervised Learning Review 2 Overview Citation X. Zhu and A.B. Goldberg, Introduction to Semi- Supervised Learning, Morgan & Claypool Publishers,

More information

On the Parameter Estimation of the Generalized Exponential Distribution Under Progressive Type-I Interval Censoring Scheme

On the Parameter Estimation of the Generalized Exponential Distribution Under Progressive Type-I Interval Censoring Scheme arxiv:1811.06857v1 [math.st] 16 Nov 2018 On the Parameter Estimation of the Generalized Exponential Distribution Under Progressive Type-I Interval Censoring Scheme Mahdi Teimouri Email: teimouri@aut.ac.ir

More information

Unit 5: Estimating with Confidence

Unit 5: Estimating with Confidence Unit 5: Estimating with Confidence Section 8.3 The Practice of Statistics, 4 th edition For AP* STARNES, YATES, MOORE Unit 5 Estimating with Confidence 8.1 8.2 8.3 Confidence Intervals: The Basics Estimating

More information

Keywords. net install st0037_2, from(

Keywords. net install st0037_2, from( Kernel smoothed CDF estimation with akdensity Philippe Van Kerm CEPS/INSTEAD, Luxembourg Abstract This note describes formulas for estimation of kernel smoothed CDF in the Stata user-written package akdensity,

More information

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality

Data Preprocessing. Why Data Preprocessing? MIT-652 Data Mining Applications. Chapter 3: Data Preprocessing. Multi-Dimensional Measure of Data Quality Why Data Preprocessing? Data in the real world is dirty incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate data e.g., occupation = noisy: containing

More information

Development of Optimum Assessment Technique for Railway Infrastructure Clearances

Development of Optimum Assessment Technique for Railway Infrastructure Clearances Development of Optimum Assessment Technique for Railway Infrastructure Clearances F. Moghadasnejad 1 and M. Fathali 2 1- Assistant Professor, moghadas@aut.ac.ir 2- PhD. Student, m_fathali@aut.ac.ir Amirkabir

More information

Probability Models.S4 Simulating Random Variables

Probability Models.S4 Simulating Random Variables Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard Probability Models.S4 Simulating Random Variables In the fashion of the last several sections, we will often create probability

More information

GPUML: Graphical processors for speeding up kernel machines

GPUML: Graphical processors for speeding up kernel machines GPUML: Graphical processors for speeding up kernel machines http://www.umiacs.umd.edu/~balajiv/gpuml.htm Balaji Vasan Srinivasan, Qi Hu, Ramani Duraiswami Department of Computer Science, University of

More information

Nonparametric Survey Regression Estimation in Two-Stage Spatial Sampling

Nonparametric Survey Regression Estimation in Two-Stage Spatial Sampling Nonparametric Survey Regression Estimation in Two-Stage Spatial Sampling Siobhan Everson-Stewart, F. Jay Breidt, Jean D. Opsomer January 20, 2004 Key Words: auxiliary information, environmental surveys,

More information

Frequency Distributions

Frequency Distributions Displaying Data Frequency Distributions After collecting data, the first task for a researcher is to organize and summarize the data so that it is possible to get a general overview of the results. Remember,

More information

Tracking Computer Vision Spring 2018, Lecture 24

Tracking Computer Vision Spring 2018, Lecture 24 Tracking http://www.cs.cmu.edu/~16385/ 16-385 Computer Vision Spring 2018, Lecture 24 Course announcements Homework 6 has been posted and is due on April 20 th. - Any questions about the homework? - How

More information

Tsutomu Morimura Central Japan Railway Company

Tsutomu Morimura Central Japan Railway Company 1 Introduction of the N700-I Bullet Train The World s Safest and Most Efficient High Speed Rail System January 13, 2012 Tsutomu Morimura Central Japan Railway Company 2 Contents of Presentation - Overview

More information

A Modified Weibull Distribution

A Modified Weibull Distribution IEEE TRANSACTIONS ON RELIABILITY, VOL. 52, NO. 1, MARCH 2003 33 A Modified Weibull Distribution C. D. Lai, Min Xie, Senior Member, IEEE, D. N. P. Murthy, Member, IEEE Abstract A new lifetime distribution

More information

COPULA MODELS FOR BIG DATA USING DATA SHUFFLING

COPULA MODELS FOR BIG DATA USING DATA SHUFFLING COPULA MODELS FOR BIG DATA USING DATA SHUFFLING Krish Muralidhar, Rathindra Sarathy Department of Marketing & Supply Chain Management, Price College of Business, University of Oklahoma, Norman OK 73019

More information

Meta-model based optimization of spot-welded crash box using differential evolution algorithm

Meta-model based optimization of spot-welded crash box using differential evolution algorithm Meta-model based optimization of spot-welded crash box using differential evolution algorithm Abstract Ahmet Serdar Önal 1, Necmettin Kaya 2 1 Beyçelik Gestamp Kalip ve Oto Yan San. Paz. ve Tic. A.Ş, Bursa,

More information

Generalized Additive Model

Generalized Additive Model Generalized Additive Model by Huimin Liu Department of Mathematics and Statistics University of Minnesota Duluth, Duluth, MN 55812 December 2008 Table of Contents Abstract... 2 Chapter 1 Introduction 1.1

More information

Unsupervised Learning and Clustering

Unsupervised Learning and Clustering Unsupervised Learning and Clustering Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Spring 2008 CS 551, Spring 2008 c 2008, Selim Aksoy (Bilkent University)

More information

Self-consistent density estimation

Self-consistent density estimation Self-consistent density estimation Joerg Luedicke Alberto Bernacchia Manuscript currently under review by The Stata Journal 5 April 2013 Contact: joerg.luedicke@ufl.edu The Stata Journal (yyyy) vv, Number

More information

Nonparametric Methods Recap

Nonparametric Methods Recap Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority

More information

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani Feature Selection CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Dimensionality reduction Feature selection vs. feature extraction Filter univariate

More information

Generative and discriminative classification techniques

Generative and discriminative classification techniques Generative and discriminative classification techniques Machine Learning and Category Representation 2014-2015 Jakob Verbeek, November 28, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15

More information

Programs for MDE Modeling and Conditional Distribution Calculation

Programs for MDE Modeling and Conditional Distribution Calculation Programs for MDE Modeling and Conditional Distribution Calculation Sahyun Hong and Clayton V. Deutsch Improved numerical reservoir models are constructed when all available diverse data sources are accounted

More information

Active learning for visual object recognition

Active learning for visual object recognition Active learning for visual object recognition Written by Yotam Abramson and Yoav Freund Presented by Ben Laxton Outline Motivation and procedure How this works: adaboost and feature details Why this works:

More information

Target Tracking Based on Mean Shift and KALMAN Filter with Kernel Histogram Filtering

Target Tracking Based on Mean Shift and KALMAN Filter with Kernel Histogram Filtering Target Tracking Based on Mean Shift and KALMAN Filter with Kernel Histogram Filtering Sara Qazvini Abhari (Corresponding author) Faculty of Electrical, Computer and IT Engineering Islamic Azad University

More information

SYMMETRIZED NEAREST NEIGHBOR REGRESSION ESTIMATES

SYMMETRIZED NEAREST NEIGHBOR REGRESSION ESTIMATES SYMMETRZED NEAREST NEGHBOR REGRESSON ESTMATES R. J. CarrolP W. Hardle 2 1 Department of Statistics, Texas A&M University, College Station, TX 77843 (USA). Research supported by the Air Force Office of

More information

GLM II. Basic Modeling Strategy CAS Ratemaking and Product Management Seminar by Paul Bailey. March 10, 2015

GLM II. Basic Modeling Strategy CAS Ratemaking and Product Management Seminar by Paul Bailey. March 10, 2015 GLM II Basic Modeling Strategy 2015 CAS Ratemaking and Product Management Seminar by Paul Bailey March 10, 2015 Building predictive models is a multi-step process Set project goals and review background

More information

An Introduction to Machine Learning

An Introduction to Machine Learning TRIPODS Summer Boorcamp: Topology and Machine Learning August 6, 2018 General Set-up Introduction Set-up and Goal Suppose we have X 1,X 2,...,X n data samples. Can we predict properites about any given

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Many slides adapted from B. Schiele Machine Learning Lecture 3 Probability Density Estimation II 26.04.2016 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course

More information

What is machine learning?

What is machine learning? Machine learning, pattern recognition and statistical data modelling Lecture 12. The last lecture Coryn Bailer-Jones 1 What is machine learning? Data description and interpretation finding simpler relationship

More information

Truss structural configuration optimization using the linear extended interior penalty function method

Truss structural configuration optimization using the linear extended interior penalty function method ANZIAM J. 46 (E) pp.c1311 C1326, 2006 C1311 Truss structural configuration optimization using the linear extended interior penalty function method Wahyu Kuntjoro Jamaluddin Mahmud (Received 25 October

More information

Machine Learning Lecture 3

Machine Learning Lecture 3 Course Outline Machine Learning Lecture 3 Fundamentals (2 weeks) Bayes Decision Theory Probability Density Estimation Probability Density Estimation II 26.04.206 Discriminative Approaches (5 weeks) Linear

More information

Chapter 12: Statistics

Chapter 12: Statistics Chapter 12: Statistics Once you have imported your data or created a geospatial model, you may wish to calculate some simple statistics, run some simple tests, or see some traditional plots. On the main

More information

Nokia Services Current State and Future Direction Niklas Savander

Nokia Services Current State and Future Direction Niklas Savander Nokia Services Current State and Future Direction Niklas Savander 1 2008 Nokia Disclaimer It should be noted that certain statements herein which are not historical facts, including, without limitation,

More information

Discount curve estimation by monotonizing McCulloch Splines

Discount curve estimation by monotonizing McCulloch Splines Discount curve estimation by monotonizing McCulloch Splines H.Dette, D.Ziggel Ruhr-Universität Bochum Fakultät für Mathematik 44780 Bochum, Germany e-mail: holger.dette@ruhr-uni-bochum.de Februar 2006

More information

Performance and cost effectiveness of caching in mobile access networks

Performance and cost effectiveness of caching in mobile access networks Performance and cost effectiveness of caching in mobile access networks Jim Roberts (IRT-SystemX) joint work with Salah Eddine Elayoubi (Orange Labs) ICN 2015 October 2015 The memory-bandwidth tradeoff

More information

Bootstrapping Method for 14 June 2016 R. Russell Rhinehart. Bootstrapping

Bootstrapping Method for  14 June 2016 R. Russell Rhinehart. Bootstrapping Bootstrapping Method for www.r3eda.com 14 June 2016 R. Russell Rhinehart Bootstrapping This is extracted from the book, Nonlinear Regression Modeling for Engineering Applications: Modeling, Model Validation,

More information

Constructing Statistical Tolerance Limits for Non-Normal Data. Presented by Dr. Neil W. Polhemus

Constructing Statistical Tolerance Limits for Non-Normal Data. Presented by Dr. Neil W. Polhemus Constructing Statistical Tolerance Limits for Non-Normal Data Presented by Dr. Neil W. Polhemus Statistical Tolerance Limits Consider a sample of n observations taken from a continuous population. {X 1,

More information

Approximation of 3D-Parametric Functions by Bicubic B-spline Functions

Approximation of 3D-Parametric Functions by Bicubic B-spline Functions International Journal of Mathematical Modelling & Computations Vol. 02, No. 03, 2012, 211-220 Approximation of 3D-Parametric Functions by Bicubic B-spline Functions M. Amirfakhrian a, a Department of Mathematics,

More information

Effective probabilistic stopping rules for randomized metaheuristics: GRASP implementations

Effective probabilistic stopping rules for randomized metaheuristics: GRASP implementations Effective probabilistic stopping rules for randomized metaheuristics: GRASP implementations Celso C. Ribeiro Isabel Rosseti Reinaldo C. Souza Universidade Federal Fluminense, Brazil July 2012 1/45 Contents

More information

Nonparametric Risk Attribution for Factor Models of Portfolios. October 3, 2017 Kellie Ottoboni

Nonparametric Risk Attribution for Factor Models of Portfolios. October 3, 2017 Kellie Ottoboni Nonparametric Risk Attribution for Factor Models of Portfolios October 3, 2017 Kellie Ottoboni Outline The problem Page 3 Additive model of returns Page 7 Euler s formula for risk decomposition Page 11

More information

Continuous Improvement Toolkit. Normal Distribution. Continuous Improvement Toolkit.

Continuous Improvement Toolkit. Normal Distribution. Continuous Improvement Toolkit. Continuous Improvement Toolkit Normal Distribution The Continuous Improvement Map Managing Risk FMEA Understanding Performance** Check Sheets Data Collection PDPC RAID Log* Risk Analysis* Benchmarking***

More information

Model Based Symbolic Description for Big Data Analysis

Model Based Symbolic Description for Big Data Analysis Model Based Symbolic Description for Big Data Analysis 1 Model Based Symbolic Description for Big Data Analysis *Carlo Drago, **Carlo Lauro and **Germana Scepi *University of Rome Niccolo Cusano, **University

More information

Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning

Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning Online Pattern Recognition in Multivariate Data Streams using Unsupervised Learning Devina Desai ddevina1@csee.umbc.edu Tim Oates oates@csee.umbc.edu Vishal Shanbhag vshan1@csee.umbc.edu Machine Learning

More information

Package r2d2. February 20, 2015

Package r2d2. February 20, 2015 Package r2d2 February 20, 2015 Version 1.0-0 Date 2014-03-31 Title Bivariate (Two-Dimensional) Confidence Region and Frequency Distribution Author Arni Magnusson [aut], Julian Burgos [aut, cre], Gregory

More information

Optimization and Simulation

Optimization and Simulation Optimization and Simulation Statistical analysis and bootstrapping Michel Bierlaire Transport and Mobility Laboratory School of Architecture, Civil and Environmental Engineering Ecole Polytechnique Fédérale

More information

Automatic basis selection for RBF networks using Stein s unbiased risk estimator

Automatic basis selection for RBF networks using Stein s unbiased risk estimator Automatic basis selection for RBF networks using Stein s unbiased risk estimator Ali Ghodsi School of omputer Science University of Waterloo University Avenue West NL G anada Email: aghodsib@cs.uwaterloo.ca

More information

Chapter 6: Examples 6.A Introduction

Chapter 6: Examples 6.A Introduction Chapter 6: Examples 6.A Introduction In Chapter 4, several approaches to the dual model regression problem were described and Chapter 5 provided expressions enabling one to compute the MSE of the mean

More information

Notes and Announcements

Notes and Announcements Notes and Announcements Midterm exam: Oct 20, Wednesday, In Class Late Homeworks Turn in hardcopies to Michelle. DO NOT ask Michelle for extensions. Note down the date and time of submission. If submitting

More information