CALCULATION OF OPERATIONAL LOSSES WITH NON- PARAMETRIC APPROACH: DERAILMENT LOSSES

Size: px

Start display at page:

Download "CALCULATION OF OPERATIONAL LOSSES WITH NON- PARAMETRIC APPROACH: DERAILMENT LOSSES"

Kerry Webb
5 years ago
Views:

1 2. Uluslar arası Raylı Sistemler Mühendisliği Sempozyumu (ISERSE 13), 9-11 Ekim 2013, Karabük, Türkiye CALCULATION OF OPERATIONAL LOSSES WITH NON- PARAMETRIC APPROACH: DERAILMENT LOSSES Zübeyde Öztürk a, and Ö.Emre Özcan b* a, ITU Faculty of Civil Engineering Transportation Department, Istanbul, Turkey, b * Vitsan Llyods Agent, Istanbul, Turkey, emre.ozcan@ttmail.com Abstract In railway transportation systems, in order to control risk and to reduce probable operational losses, probability density functions of expected operational losses should be defined. If independent variables are not known, probability distributions of operational losses can be calculated by using histogram and parametric distribution family. However, if the functional structure of the distribution functions of the population is not known and probability distribution function has multiple peaks, the results obtained from the parametric distribution families would be far off from the actual values. In this case, the univariate probability density functions of operational losses can be derived by using kernel estimation method which is called a non-parametric way. However, the reliability of the kernel density estimation is related to the choice of the smoothing parameter. The optimal smoothing parameter which illustrates the accuracy of a density estimator is obtained minimizing mean integrated square error. In this study, the univariate probability density function of the derailment hazard events, which is one of the operational losses, is calculated by using different kernel density functions. The severity and the frequency of the events are obtained from the derailment hazard events occurred in the Turkish State Railways Region I, between 2000 and Additionally, aggregate losses for the derailment hazard events are calculated with parametric distribution family and the results are compared. Keywords: Kernel estimation, smoothing parameter, operational losses, mean integrated square error, derailment. 1. Introduction The probability distribution of a random variable is described in terms of its probability density function (PDF) or cumulative distribution function (CDF). Density estimation of operational losses deal with the problem of estimating the PDF based on the historical data sampled from the PDF. The parametric approach with parametric family of distributions estimates the parameters of the dependent random variable. This approach has advantages as long as the distributional family is correct and the distribution of the random variable has not irregular shape [1]. To avoid these restrictive assumptions, non-parametric approach is useful to estimate the form of the distribution. In the kernel density estimation (KDE), determination of the bandwidth (or called smoothing parameter) is essential. The kernel function estimates PDF by simply computing the geometric mean of the kernel functions for all data [2] and the bias of the kernel density estimators depends on the smoothing parameters [3,4]. In this study, we examined the expected derailment losses based on KDE and parametric distribution family and are compared the results. 2. Kernel Density Estimator Non parametric approach avoids restrictive assumptions about the form PDF and estimates probability values from the data. The general formula for Kernel density estimator is;

2 Öztürk, Z. and Özcan, Ö.E. ( ) ( ) 1.1 where is a smoothing parameter, is number of observation and is a kernel function which is symmetric around zero and integrates to 1. Various KDE functions have been proposed. Some of KDE functions are shown in Table 1. Table 1. KDE functions. Kernel K(u) Bounds Epanechnikov ( ) Gaussian ( ) Triweight ( ) - Quartic ( ) The expression (1.2) provides the cumulative distribution function of the KDE functions. ( ) ( ) 1.2 One simple way of choosing the smoothing parameter ( ) is to do comparing visually graphical density estimations corresponding to arbitrary choices of smoothing parameters. Among several smoothing parameter optimizing principles, in this study, mean integrated square error (MISE) for quantify the accuracy of a density estimator is applied for the derailment density estimators. The optimal smoothing parameter at all points will yield an estimated density as close as possible to the true density. ( ) ( [ ( )] ( )) [ ( )] 1.3 The first term of MISE criterion depends on the expectation value of the true (or unknown) density ( ). Therefore, approximations to this criterion are used (i.e. asymptotical approach). The problem of automatic chose of smoothing parameters has been widely studied. By minimizing the expression (1.3) with respect to, the optimal smoothing parameter can be obtained ( is number of data). ( ( ) ( ) ( ) ) 1.4 Substituting the expression (1.4) in the expression (1.3) gives the minimal MISE. However, the expression (1.4) depends on the unknown. With some simulation techniques (plug-in, crossvalidation, bootstrap etc. estimators), unknown PDF can be estimated [4]. If ( ) is normal, then unknown PDF can be expressed by (1.5) ( ) 1.5 The practical recommendations for the choice of optimal smoothing parameter (1.6 and 1.7), which minimizes the MISE criterion [5]; or 1.6

3 Öztürk, Z. and Özcan, Ö.E. * ( ( ) ( ) )+ 1.7 where ( ( ( ) ) ) ( ( ( ) ) ) : Standard deviation of ( ): Points taken at regular intervals from the cumulative distribution function (CDF) of. 3. Derailment Losses Derailment is one of the railway accident risks or called an operational loss and takes place when a railway car (rolling stock) runs off its track. By improving safety measures or by making risk control, derailment accidents can be brought to a lower level in some countries (in the US, derailments have dropped since 1986 from 1,000 to 500 in 2010 and in the UK, derailments have dropped since 2006 from 26 to 16 in 2012 [6]). In Turkey, the severity and the frequency of the derailment hazard events are obtained from the archive research in the Turkish State Railways Region I (the railway lines are divided into seven regions in Turkey), between 2000 and The derailment risks to railway undertaking are dealt separately by below figures. Numbers of derailment events (or derailment accident frequencies) are illustrated by Figure 1. As in Figure 1, the risk frequency from station derailments has the maximum value compared with derailment types. Observed total cost level for station derailments from Figure 2, USD 414,520.0, USD 367,548.0 and USD 419,899.0 for vehicle, superstructure and train delay costs, respectively, are calculated (these figures are actualized for Fatality and injury compensations occurred from the workforce, passenger and member of public are not included). There were 6 other derailments (branch lines, depots, ports) during 2000/10. These account for around 1.2% of the total number of derailment events that are classified under other derailment groups and contribute around USD 45,340.0 (vehicle, superstructure and delay costs) of the total derailment event costs. Figure 1. Derailment events. Figure 2. Derailment costs at stations. The Figure 3 shows derailment costs occurred at switches. The maximum derailment costs in total at switches occurred from train delays with amount of USD 169,928.0, whereas USD 236,390.0 and USD 104,013.0 are respectively for vehicle and superstructure cost.

4 f(x) Öztürk, Z. and Özcan, Ö.E. Most of the derailment risk to railway undertaking arises from derailment accidents on main lines, which account for USD 4,940,216.0 in total between 2000 and 2010 (Figure 4). Figure 3. Derailment costs at switches. Figure 4. Derailment costs on main lines Non-Parametric PDF Calculation: Derailment Losses Nonparametric tests do not rely on assumptions about the shape or parameters of the underlying population distribution. If the data deviate strongly from the assumptions of a parametric procedure, using the parametric procedure could lead to incorrect conclusions. Obtained from the historical data for the derailment hazard events between 2000 and 2010, the most appropriate severity PDF distribution fits to Quartic kernel function witimum smoothing parameter of = (Figure 5). The Table 2 lists the aggregate errors of the kernel functions with respect to true PDF values Non-Parametric Probability Density Functions (Derailments) (Quartic)= (Triweight)= True Epanechnikov Gaussian Triweight Quartic (Epanechnikov)= (Gaussian)= x Figure 5. Non-parametric PDF estimators. Table2. Non-parametric PDF errors with respect to true density. Errors Epanechnikov Gaussian Triweight Quartic Aggregate Error Average Error

5 f(x) Öztürk, Z. and Özcan, Ö.E Parametric PDF Calculation: Derailment Losses Parametric calculations are based on assumptions about the distribution of the underlying population from which the sample was taken. This approach has advantages as long as the theoretic distribution is correct. The main disadvantage of the parametric approach is lack of flexibility (i.e. restrictions on the shapes of ( )). With respect to goodness of fit tests for parametric calculations, the best parametric distributions with parameters are illustrated by Table 3. Table 3. Parametric distributions for derailment hazard events involved between 2000 and Parametric Distributions Statistic P-Value 95% 98% 99% Wakeby K-S α=4.8665;β= A-D γ= ; δ= ξ= C-S Gen.Gamma K-S k=4.8382; α= A-D ;β= ; γ= C-S Dagum 4P K-S k= ; α= A-D ;β= ; γ= C-S K-S:Kolmogorov-Smirnov, A-D: Anderson-Darling, C-S: Chi-Squared The Table 4 lists the aggregate errors of the theoretical distributions corresponding to the true PDF values. Figure 6 shows the PDF values of the best fit parametric distributions. Table 4. PDF errors of parametric distributions. Errors GenGamma Wakeby Dagum 4P Aggregate Error Average Error Parametric Probability Density Functions (Derailments) True Wakeby Dagum4P GenGamma x

6 f(x) Öztürk, Z. and Özcan, Ö.E Expected Derailment Operational Losses Figure 6. Parametric PDF estimators. Figure 7 illustrates the best parametric and non-parametric PDF values from derailment costs in 2000/2010, in which totally 501 accidents occurred at the vicinity of Region I. Calculation of the expected operational losses for the derailment hazard events is related to the estimating the PDFs of severity and frequency random variables with good assumptions. Severity and frequency of risks are modeled in different process. In these models, severity and frequency of losses are brought together under the aggregate losses and distribution of aggregate losses ( ) could be obtained [7] Parametric & Non-Parametric Probability Density Functions (Derailments) True Wakeby Quartic x Figure 7. The best parametric and non-parametric PDF estimators. In this study, according to goodness of fit tests, obtained from the historical data, it is calculated that the most appropriate frequency distribution fits to Poisson distribution. Table 5. Parametric and non-parametric approaches for aggregate PDF and expected losses. [ ] [ ] E E E E E E Total , , Expected losses that involved derailment risks in 2013 are calculated at USD 165,528.91, whereas the parametric modeling gives USD 122, reflecting a margin of error of 26%, compared to the nonparametric modeling. 4. Conclusion Revealing of derailment risks by solving severity and frequency distributions will generate expected losses, in case of good assumption from parametric distribution families. However, when the form of

7 Öztürk, Z. and Özcan, Ö.E. the distribution is fluctuating, non-parametric density estimators are better than parametric modeling for efficiency corrections. In kernel density estimation, the bias in the PDF approximation is evaluated by minimizing the mean integrated square error (MISE) between the true density and approximation, which gives the optimum smoothing parameter. In this study, expected losses for derailment hazard events are investigated by parametric and nonparametric modeling. Based on the historical data, since the true density function has irregular shape, Quartic PDF estimator gives better result than the parametric modeling (Wakeby) which estimates the PDF with 16% aggregate error rate corresponding to kernel estimator (Quartic kernel). Although there is 16% aggregate error rate of parametric distribution (Wakeby), this figure increased to a total error rate of 26% in terms of aggregate expected loss. References [1] Zucchini, W., Applied Smoothing Techniques Part 1:Kernel Density Estimation, Temple University Book, October [2] Roberts S.J., Parametric and Non-parametric Unsupervised Cluster Analysis. Pattern Recognition, 30(2): , [3] Loader,C.,1999. Bandwidth Selection:Classical or Plug-in?, The annals of Statistics, 27(2), [4] Wand, M.P and Jones, M.C., Kernel Smoothing, Chapman and Hall, London. [5] Cameron, A.C and Trivedi, P.K., Microeconometrics:Methods and Applications, Cambridge University Press, New York,2005. [6] RSSB, Annual Safety Performance Report 2012/2013, Rail Safety and Standards Board, [7] Öztürk, Z. and Özcan, Ö.E Calculation of Aggregate Losses with Collective Risk Model in Public Transport Systems, Transist2012, November 2012.

On Kernel Density Estimation with Univariate Application. SILOKO, Israel Uzuazor

On Kernel Density Estimation with Univariate Application BY SILOKO, Israel Uzuazor Department of Mathematics/ICT, Edo University Iyamho, Edo State, Nigeria. A Seminar Presented at Faculty of Science, Edo